Lies and Statistics: Statistics for Dummies

Not to imply that you are a dummy. But it occurs to me that rather than explain concepts over and over again in this blog I would like to have a quick reference for these definitions. So this post will be my brief tutorial of statistical terms and concepts so that you, dear reader, can navigate the labyrinth that is statistics with less difficulty. If the word probability puts you in a tizzy of confusion then you probably want to read this before proceeding.

Experiment- This is " done in a controlled environment designed to minimize biases and lurking variables. These will take a group and give them a treatment (Like a drug, or watching a video or soemthing else) and then examine how the treatment affects them.

Observational Study- An observational study merely gathers data on the world as it is. It observes the state of the world and reports on it, but does not act on it. Observational Studies cannot prove causal affects because they are affected by uncontrolable factors. A Survey is a type of observational study.

Sample- a sample is a randomly selected group of people. Assume that there are 1000 brunettes in a room that are all over 5'6" tall and you want to do an observational study of what eyecolor occurs most frequently in these tall brunettes. Rather than asking all 1000 brunettes, you could randomly select 100 of them and as long as the selection is random then you should get an approximation of the eyecolors of the whole population ( the 1000 brunettes in a room, I hope it's a big room). If these 100 people are not selected randomly then this can lead to Sampling Bias which can cause results of a survey to be wrong.

Margin of Error- every survey and experiment has a margin of error which is a number that is based on sample size and methods. This number signifies a range for the data. So lets say you found that 51% of brunettes have blue eyes and 49% had brown eyes. Well if your margin of error is 2.5% (which is pretty normal) that means that the actual % of brunettes with blue eyes is 51% plus or minus the margin of error. That is it could be between 48.5% and 53.5%. ( so this means that your results are inconclusive, so don't go running around shouting about how all brunettes have blue eyes will ya?)

Significance- For something to be statistically significant means that the probability of getting this result by chance is very low. (Usually below .05) So let's say that your survey of brunettes yeilded 25% for blue and 75% for brown. But that most other surveys of this nature have yeilded the above 50, 50 results. You can calculate the probability that you would get results this different from normal results. And this probability could be something like .01 which would mean that the results are statistically significant. So there is something about your population of 1000 brunettes that makes them special. It must be because they are all over 5'6" tall!

Correlation- A correlation is a measure of how linearly related two variables are. Like weight and height, as weight increases, so does height and therefore they are linearly correlated. HOWEVER this does not mean that one causes the other. Correlation does not imply causation, EVER!

Lurking Variable- This is a variable that causes results in a study to show up, but may not have been taken into consideration by the experimentors/ surveyors. In our study of our 1000 brunettes for example, we assumed that the results we got were different from the average population of brunettes because of their height. But have failed to take into account that this room happens to be in an italian part of town. Which means that most of the brunettes in the room have italian ancestry and therefore brown eyes. So while we may have attributed their brown eyes to their height there was a lurking variable, that was italian ancestry. This caused our study to have results that are not valid to the entire population of brunettes, but only the brunettes in our big room.

That is all I've got for now, I will add terms to this post as I get tired of defining them.

Lies and Statistics

Monday, February 07, 2005

Statistics for Dummies

No comments:

About Me

Blog Archive