Thursday, January 16, 2014

Picking Nits

There's a very useful page for journalists that explains various terms found in academic studies, especially statistical or research methodology terms.That said, it's time for some nitpicking. Lemme be clear, I point my students to this page, so I'm just being persnickety. My criticisms below are too long and I'd tighten them up if I were rewriting this site. Most of what's there is written with clarity and explains nicely, if simply, what these terms mean so a journalist can translate murky, dense academic research. So to pick a few nits in no particular order:
  •  "There are two primary types of population samples: random and stratified." Well, yes, to a degree. There are also convenience samples and snowball samples, the kinds that a journalist really wants to pay attention to because they suggest a weak study. Really sampling comes down to its random nature, does everyone theoretically have a chance to be included in the study or survey. If not, that raises serious questions. 
  • Margin of error is not well explained, but perhaps it's well enough explained for most journalists. And it gets mixed up with the confidence level but, and this is important, the confidence level does not necessarily change as sample size increases. You set in advance what your level is, typically 5 percent (or the p<.05 if you consider inferential tests). In other words, a survey result of 44 percent with a 3 percent margin of error means the real number, if we could question everyone, would be between 41 to 47 percent. The error has to do with 95 times out of 100, we'd expect this to be valid. If you choose, you can bump this up to .01 (or 1 percent, or 99 percent) and that affects your margin of error.
  • The cause and effect section needs to include that there are basically three things that must be present to truly infer cause: time order (one proceeds the other), covariation (as one changes, the other changes) and no third variables, meaning something else could better explain this relationship. In fairness, this latter point comes up nicely just after as confounding variables.
  • The difference between mean and median and their journalistic usefulness is touched on well. I'd point out that the mean is sensitive to outliers and use a better example. So, if you're looking at the salaries of a department and there's one really high salary and the rest are middling, the mean will be pulled upward by that single high salary (or home value in a neighborhood, etc.). A median helps correct for that, which is why we tend to use median for salaries and home values and similar skewed distributions. I preach this to my j-students again and again that it's the median, stupid.
  • Most research is NOT about the relationship between two variables. Often it includes a set of independent variables thought to predict a single dependent variable. Nerdy, I know. Regression is tricky to explain, but basically it statistically controls for the power of a number of variables to predict a single variable. Take income as the thing you're trying to predict. Education is a good independent variable, and so is age. But if you put both in a regression model it's likely that education "explains all of the variance" and age is no longer a significant predictor. Yes, age and income correlate, but in a regression if you control for education, age may no longer be a factor. I'd tie correlation and regression together with a single example and show how the correlation disappears in some regression, which also ties into cause and effect.
  • As some of this has to do with covering polls and surveys, I'd add more or link to a couple of well-known sites designed to guide journalists on how to evaluate a good poll. A good one is here, 20 questions journalists should ask about polls. Highly recommended.
    There are two primary types of population samples: random and stratified - See more at: http://journalistsresource.org/skills/research/statistics-for-journalists#sthash.OPBWQwrO.E4egGTwH.dpuf

No comments: