A scatterplot is a tool for visualizing the association between two numeric variables. In a scatterplot, the predictor, or independent variable, is located on the horizontal axis and the outcome, or dependent variable, is located on the vertical axis. The outcome variable is what we want to explain or predict based on the value of the predictor variable. Several examples are below:
Describing Direction
By looking at a scatterplot, we can quickly determine the direction of the relationship between the two variables. If increases in the predictor variable tend to be associated with increases in the outcome variable (the points flow from the lower left to the upper right, like in the first plot), then the two variables are positively related. If increases in the predictor variable tend to be associated with decreases in the outcome variable (like in the third plot), then the variables are negatively related. The middle plot has no clear direction, so there is likely no actual relationship between the length of someone’s first name and how many hours they slept last night.
Describing Strength and Linearity
We can also tell from a scatterplot whether or not there is a linear relationship between two variables, and if so, how strong that relationship is by looking at how tightly packed the points are around the trendline. Of the three plots above, the first shows the strongest linear relationship between the variables. The second plot shows no clear relationship, while the third shows a weaker linear relationship than the first plot (and in the opposition direction). Evidence of a linear relationship is important to confirm before conducting analyses such as correlation and regression.
Outliers
Finally, we can find the presence of outliers by viewing a scatterplot. Outliers are points that do not follow the same general trend as the other points. In the first plot above, there appears to be an outlier at (26, 46). This point is visually not a part of the larger swath of data. Outliers may be points with different values for the outcome than the data with similar values for the predictor, or are farther out on the horizontal axis than the rest of the data. There are no hard and fast rules regarding outliers, so the key is just to identify points which may potentially alter your results. Click here for more information on dealing with outliers in your data.
Example 1: Creating scatterplots in Excel 2016 on
In this example, you will learn how to make a scatterplot of respondents, ratings of how happy and how funny they are.
Dataset used in video
PDF directions corresponding to video
Sample conclusion:
In evaluating the scatterplot of the relationship between how funny you are rated and how happy you are for male students, there is a positive moderately strong linear relationship between the variables.
Example 2: Creating scatterplots in R
This scatterplot displays the relationship between BMI and blood pressure.
Dataset used in videos
R script file used in video