Do Companies Purposely Mislead?

in math •  7 years ago  (edited)


suckville.png

I am sure you’ve heard or seen it before…the random post spewing some percent, statistic, fraction, ratio, data, etc. to make a point. Maybe it's from a credited source with evidence and research to support it. But most likely it's some old out-dated information. Or maybe it's not actual data at all, but rather made up numbers picked out of the air. What I've come across many times these past few months on social media are things like...

  • "The unemployment rate has dropped x % since..."
  • "Crime has risen y % with [fill-in-the-blank] in charge."
  • "The debt has risen/dropped z % since last year and it will be x quadrillion dollars in 2050."

Perhaps relying on social media for factual information is the problem. I know it’s not a new trend or phenomena and I can go on and on with examples. All I'd need to do is open up Facebook at this very moment and find any political discussion for/against any political party or stance, and in less than 10 seconds I can quote a "statistic" from a post. But really that's not my intention. What stands out to me in these claims are they are not actually followed up with relevant credible evidence, facts, sources, or data (or if present, at the very least, a biased one-sided view that doesn’t present all the information…just the bits to fit the argument), and then presented as "fact".

Living in Chicago, there is certainly a split between North-side baseball fans (Cubs) and the South-siders (White Sox). The Cubs dominate 72% of the city’s fan-base compared to 28% support for the White Sox. Where did I get these figures from? Well…I made them up! It’s purely based on my perspective and perception. No data. No evidence. No facts. No sources. No studies. But is it wrong?


If you want to see some not-made-up numbers of the Chicago Baseball fan-bases, check out https://www.nytimes.com/interactive/2014/04/23/upshot/24-upshot-baseball.html?_r=0

I admittedly am guilty as charged at doing this. However, since beginning my Masters in Data Science 6 months ago, I’ve grown a huge appreciation to not arbitrarily throw around made up numbers to make a point. I’ve also learned to take someone’s claim with a grain of salt, and if it’s a topic that is really of interest to me, then I’ll investigate it myself.


dehydration.png
So many different studies. Who do we believe?

Why do we do this? I think there are a few reasons (and please share your thoughts as well in the comments).

  1. It’s certainly easier and less time consuming to just take someone’s word for it than to actually look it up, do the math, etc.
  2. We can justify a number with our own experiences (as I did above with what I feel is a reasonable split between Cubs vs White Sox fan-base in Chicago. Certainly, since I have lived IN Chicago for 30+ years, I can feel to be somewhat of a credible source to “calculate” those percentages).
  3. Sometimes there are so many conflicting reports that we just don’t know which one is true or who to believe. So we try to determine or pick the results that fit our point of view, what we want to believe, and throw away the others.
  4. We just don’t understand the fundamentals of the math involved or the question being asked.


math-jokes-math-humor.jpg
I displayed this in my classroom and I made sure my students understood this joke

Are We Taken Advantage Of?
It’s one thing to casually throw stats on a wall to see what sticks in a conversation, whether it’s on Facebook, Steemit, etc. I understand those are public forums and a means to have discussions (hopefully respectful). But when companies, media, or governments do it, aren't they slipping into an area of being unethical, deceptive, and misleading?

Knowledge is power, and that is one point I always tried to get across to my math students. If someone has an agenda, whether it is to persuade your beliefs or to get you to buy their product, you need to think for yourself and do not take their word for it.

The Reason of Today's Post
So with all that said, here's what inspired my post. My professor recently planned a trip to France and purchased off Amazon the breathalyzers that the French police like you to carry.

He then received an email from the seller asking to give feedback, including the seller’s website which included the chart below. He then posed the question to us...."Those figures look pretty compelling, but are they correct?"


probab.pngThe table above can be found at http://alcosense.co.uk/alcosense/alcosense-trutouch.html
The company makes a pretty compelling argument that you would need to buy their product, but is it necessary?

Did I Hear Say Someone Say “Let’s Do a Monte Carlo Simulation”?
http://www.palisade.com/risk/monte_carlo_simulation.asp gives a nice explanation of a MCS.

“Monte Carlo simulation performs risk analysis by building models of possible results by substituting a range of values—a probability distribution—for any factor that has inherent uncertainty. It then calculates results over and over, each time using a different set of random values from the probability functions."

I set up a Monte Carlo Simulation in R to actually test this scenario out as described in their chart. I won’t go into great detail as this post isn’t necessarily about the coding or R of a MCS, but rather to share my results.

The Quick Explanation of How It Worked
So, imagine a set of numbers 1 to 260. Each number represents a working day.
We will have 2 sets of numbers. Let’s call them:

d = the set of days employee is drunk
t = the set of days employer tests

The simulation randomly takes 12 numbers, without replacement, out of the 260 working days (set d: the days the employee comes to work drunk).

The simulation randomly takes 52 numbers (simulating a weekly testing frequency) , without replacement, out of 260 (set t: the days tested).

To “catch” a drunk employee after 1 year of testing, all the simulation needs to do is see if there’s one single match (when d = t) between my set of days the employee shows up drunk with the set of days the employer tests. If there’s a match (doesn’t matter if it’s the first number in the set of testing days or the last number in the set of testing days), we can say that the employee was caught within the year...which is the calculation the chart claims.

The wonderful thing about Monte Carlo Simulations is we can run this simulation thousands (or millions) of times to get a more accurate percentage of how many times an employee is caught within a year of testing. And the results of testing weekly are….


MCS.JPG
TRUE means there was at least 1 match between the two sets, meaning the employee was "caught" within a year of testing. I ran the simulation 15,000 times resulting in over 93% chance of being caught! I must have done something wrong!

No, I do not believe the simulation is wrong. The numbers this company presented to us are misleading. If a company were to test weekly and an employee was under the influence on average 12 times in a year, that employee would have over a 93% chance of being caught within a year! Quite a difference compared to their 0.9% chance.

So Where Did They Get The 0.9% From?
Their probabilities are not necessarily made up. When calculating the probability that 2 independent events will occur, you multiple each of those events' probabilities. It appears they multiplied the probability of the employee being drunk within a year (12/260) by the probability of days tested within a year (x/260). This is purely my assumption as to how they calculated the probabilities in the chart. But I feel confident in my assumption, in that these values do match their table when rounded (with the exception of testing daily).


their probability.JPG
Note: This is my assumption of how they calculated the probabilities. I could not find their actual source/work.

These figures are correct if we were to determine the probability of picking 1 random day from the working days that were both a day the employee is drunk AND a day the employer tested.

For example, looking at "Test Frequency - Weekly":

P(drunk) = 12/260
P(tested) = 52/260

P(drunk AND tested) = 624/67600

BUT, that's not what they claim in the narrative. Their exact words are, “The probability of the employee being caught after one year of testing”. It is NOT the probability of randomly selecting 1 day out of the year that the employee was caught.

However, they conveniently did not make that mistake in the probability calculation when testing daily.

So now the question is, are they purposely misleading the consumer, did they just choose the wrong wording, or do they not understand the fundamental difference in what they are claiming/asking and what they are calculating?

That is for you to decide. But I will supply my evidence and work. And as I stated from the start…do not take my word for it, check my work or try it out on your own.

Thanks for reading!

Here's My Work

IMG_1500.JPGIMG_1501.JPG
Calculating a formula for R

library(Rmpfr)

working = readline(prompt="Enter number of working days within a year: ")
drunk = readline(prompt="Enter number of days drunk at work within a year: ")
test = readline(prompt="Enter the number of days testing within a year: ")

# convert character into integer
w = as.integer(working)
d = as.integer(drunk)
t = as.integer(test)

#Testing with Given Values
#w = 260
#d = 12
#t = 52

sober = (w - d)
y = (w-t)
z = (sober-t)

s_fact = gamma(as(sober,"mpfr"))
w_fact = gamma(as(w,"mpfr"))

y_fact = gamma(as(y,"mpfr"))
z_fact = gamma(as(z,"mpfr"))

if (y == 0){
  y_fact = 0
}

if (z < 0){
  z = abs(z)
  z_fact = gamma(as(z,"mpfr"))
}

prob = 1 - ((s_fact/w_fact)) * ((y_fact/z_fact))
prob_caught = as.numeric(prob)

print(paste("You have a", prob_caught, "chance of being caught."))

Code in R to Calculate Different Scenarios


Capture.JPG
My Calculations vs Their Calculations
graph.jpg

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

The world is too random for events to be predicted by statistics.

Whether it's Monte Carlo or anything that we humans come up with there is no way we can predict events in the future.

When the guy who wrote "The Signal and the noise" was completely wrong and so many others who try to predict the future with historical data are "fooled by randomness" (check this book too), we can stop trying to think that statistical significance can predict human behavior.

Whether they do it on purpose or not, just expect that those company reports have no idea what they're talking about.

I'll have to read those two. I agree with you 100%. Obviously the desire to know or predict the future could be beneficial in society on so many levels (and in some cases detrimental, depending on who would use it for selfish reasons and how that knowledge would be used).

You bring up a good point. But by just changing one word in how I view statistics, I believe, makes it fundamentally different to me. For me, statistics is more about calculating the risk or finding how likely something could occur, not how likely something will occur.

Congratulations @stats-n-lats! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

You got a First Reply

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Good post

This post is very insightful. Thank you for sharing it with us! You did a fantastic job with exploring this topic and relaying important information to us so that we can understand how serious this issue is. I think it is important that we often take a step back and look at how we handle life and circumstances. I believe that this post will help many of us do just that. Thank you!

Very interesting.
Is there a chance that 0.9% is meant to be a 0.9 correlation between people being drunk at work, and being caught drunk?

I think they are pretty clear that they are presenting it as the probability of being caught and not the correlation. It appears they're trying to show the dramatic increase between testing weekly (52 times) vs daily (260 times), and why it would be cost effective for you (the employer) to test daily.

Gotcha. Great piece.

Many thanks for sharing this article!

Nice post.

Is anything ever the complete truth? Everything is manipulated by the person presenting the information. Just look at the different news channels. Do you hear what you want to hear? So statistics really matter? Just look at who the President is and how he got there.

Why did I feel so stupid reading this???? I need better math in my life!!!

@flashbid, don't feel stupid. We're all here to learn and share our passions with each other. So please ask any questions or for clarification...no matter how basic or "stupid" you think it is...no one will judge (P.S. if it makes you feel more comfortable, my wife stopped reading it when she didn't understand the whole "exclamation points" in my shown work).

Thank you for sharing this helpful information

Love how you showed your work with your math. You are right about social media and statistics. I hate statistics because they are never truly accurate. They are only accurate in the small population they were taken in, and not indicative of what is truly going on in the masses. Though I am too guilty of using stats when trying to prove a point, though I do try to find the most recent stats

Thanks tecnosgirl. Showing my work is just my "teacher side" at work...practice what you preach, right?

I always hated showing my work, because my teachers hated my short cuts that my uncle and dad taught me using 9s when I was small because I had trouble with math, and I would spend hours showing work that my teachers wanted that wasn't the way I was getting the answers but even if I had the right answer I would get marked partly wrong if I didn't do it the way they wanted it. Though I think now days that it would be different I would like to think teaching has evolved since I was little.

I think now-a-days it depends on the teacher. There's multiple ways to get a solution and different ways students learn and understand.

For example, there are 3 ways to solve a quadratic.

1). Graphing and finding the intersections
2). Quadratic formula
3). Factoring and solving

I was never concerned with which method a student chose. The process and understanding is more important than the solution in my opinion (at least when learning the content in school).

And personally for my classes, I LOVE when students have shortcuts. It shows that they're thinking and learning outside of the classroom. I'd make them share the shortcuts in class too because you never know if that one way will click better for another student (provided the shortcut is mathematically correct).

But the reason for me to want to see work was to GIVE credit, not take away credit. I'm assessing what they know, NOT what don't they know (well, to an extent...yes I want to know what they don't know so that I could re-teach).

But what I mean is, if you just put down a solution, and nothing else, and the solution was wrong, I couldn't tell where the mistake was made or see the thought process.

Versus when they show their work, I could see "Ah ha...you just forgot a decimal, or "you had the correct answer but just plugged it in wrong on a calculator", or "you simply forgot a negative sign". To me, it shows you knew what you were doing and I'm more interested in how the student worked out the problem, not what their final answer is.

I do see your point, and it is a little bit of a balancing act to allow students some freedom to work it they're own way, but also in some way showing how they worked through it.

Does that make sense?

Makes complete sense, my dad use to tell me there are multiple ways for you to get to school they are all the correct way because you get there. The same things with problems in math and in life, just because someone does it a different way doesn't mean it is wrong.

I hate misleading claims in advertising.

One of my pet hates is 'up to'. That covers a multitude of sins. Head & Shoulders dandruff shampoo says you'll be up to 100% flake free. So anything is acceptable, then?

Even worse is when they give you 'up to' a range, I remember an advert for driving instructors, I'm afraid I can't cite a source as I forget which school it was, but the claim was that you could earn up to £25,000 - £30,000. What the heck is that supposed to mean? I assume it's meant to give you the false impression that £25,000 is the minimum you'll earn, when actually no minimum is stated.

BT's was a great one when they launched Inifinity, which claimed to be up to 5 times faster than the UK's average standard broadband. For a start it's the dreaded 'up to', which means nothing. Then I would like clarification of what the difference is between '5 times faster' and '5 times as fast'. Then who actually knows what the UK's average speed is?

Anyway, suffice it to say I agree with you.

There's a book I was reading a year ago called "The Grapes of Math". I wasn't very far into into it, about 2 chapters in. But there was a part about how companies use numbers and words to invoke a certain, positive, psychological feeling. For example, Levi's 501 jeans (why is it 501 and not just 500?). Hienze 57, 7up, etc. all those products. There's a method to it from a marketing stand point to make their product feel superior to the consumer.

When you mentioned the 'up to' range that the adverts will claim, it made me think that of that book. I've put it to the side for a while now, but I'll have to pick it back up.

Right that not all we can read online are true. Misleading posts are abundant on the web.