Assessing Normality

A common assumption in many inferential statistical methods for numeric variables (including t-tests, ANOVA, and linear regression) is that the observations are normally distributed.

How can I check if my variable is normally distributed?

Typically, a visual check is sufficient for determining normality. You can do this by making a histogram of your variable and looking for asymmetry (skewness) or outlying values. If you are comparing multiple groups for a numeric outcome variable (two-sample independent t-test or ANOVA), be sure to look at the distribution of the outcome variable for each group separately.

What if my variable isn’t normally distributed?

If your data does not appear to be normally distributed, it doesn’t mean that you can’t run your analysis. First, you can attempt to transform your variable (see below). If that doesn’t work, then there are often alternative tests called non-parametric tests (like the sign test and Mann-Whitney U-test). These tests can be run on numeric variables with any distribution, but have less power than their parametric equivalents. Therefore, non-parametric tests are typically only used as a last resort.

Another thing to keep in mind is that if you have a very large sample size, many parametric tests (t-tests, ANOVA, linear regression) are robust to violations of normality. Unsurprisingly, there is no consensus on how big a sample must be for this, as it depends on the severity of skewness and other factors.

How do I apply a transformation?

A transformation is when you apply a function to all the values of a variable. There are many different types of transformations that can improve normality, and which one you use depends on the shape of the distribution you have. The most common transformations are the log and square-root functions, which are used for right-skewed data.

After applying a transformation, you should make a histogram of the transformed variable and if it is normally distributed, then it should be used in your analysis. It is important to remember to interpret any results of your analysis in terms of the transformed variable with transformed units.

What if my sample size is too small to assess normality?

Very small sample sizes (typically n<15) make it hard to even assess normality, since a histogram of a handful of cases won’t give you much idea about the underlying population distribution. In these cases, your only option is to use a non-parametric test.