No I don't have a super accurate way for determining how many work units I should run simulanteously, other than looking at GPU usage and trying to max it out, and then just comparing how many work units I'm outputting per unit of time. So for example, I did a quick test for PrimeGrid, with only 1 task per GPU to see what the difference is:
TITAN V
1 task - 1:50 per WU
2 tasks - 2:30 per WU
1080 Ti
1 task - 3:20 per WU
2 tasks - 5:00 per WU
So you can see here, running two tasks in parallel on the TITAN would end up producing equivalently 1 WU per 75 seconds, whereas if I was running one task at a time, I'd be producing 1 WU per 110 seconds. So yes, running multiple tasks does slow down the individual tasks, but overall you get more output, if GPU usage wasn't maxed out already.