A common assumption across all inferential tests is that the observations in your sample are independent from each other, meaning that the measurements for each sample subject are in no way influenced by or related to the measurements of other subjects.
Below are a few examples of violations of this assumption, and suggestions on how to address them:
1. You want to test if training students on a new study technique improves their test performance, so you randomly assign 10 classes at a high school to either receive the training or be in a control group. You then measure the student scores on a test at the end of the semester. In this scenario, the measurements of students within the same class are related to each other because they have the same teacher and other classroom-level characteristics in common.
Possible solutions: You could aggregate the test scores by classroom, creating a single average score for each class and comparing those that received the training to the control group. Another option would be to run a more advanced statistical analysis, such as a mixed model or multi-level model, which can account for class-level variation.
2. For a group of your friends, you want to know if height is related to arm span. However, two of your friends are identical twins. Because one twin’s measurements will be the same as the other, these two sample records are not independent.
Possible solution: Randomly select one twin to keep in your sample, and do not measure the other twin.
3. You launched an online survey and to increase participation, you promised respondents a gift card if they provided their email address. After looking at your data, you notice that several participants filled out the survey multiple times (probably hoping to get multiple giftcards), which means their survey responses are repeated and therefore not independent.
Possible solution: As long as you can identify the duplicate records, you can randomly select one to keep in your sample and remove the rest.