TEACHER: So let's talk about correlation. Here is a scatter plot graph. This may look like a very organized view of the night sky, but it's actually a treasure trove of information. With a graph like this, you can figure out the correlation between two variables, be it positive, negative, or none. And you can make predictions about data. First, if your graph looks like this, then you have a positive correlation. As x increases, y also increases. The line of regression, which summarizes the relationship between x and y, looks like this. And from the data in this graph, we can predict that the further along a student is in the term, the more stressed they are. If your graph looks like this, then you have a negative correlation. As x increases, y decreases. And the regression line looks like this. From this graph, we might predict that if someone spent several hours on social media, they would get no work done unless they work in public relations, of course. On that note, both of these made-up graphs have their limitations since they only apply to specific groups. This graph doesn't apply to people who work in public relations, and this graph would look quite different if we included all the people who dropped out halfway through the term. So if you use a correlational study to predict something, you should be aware of where the data is sampled from, the subject's gender, class, race, ethnicity, education, and so on before you make any sweeping generalizations. There are a few more graph shapes that you should be aware of. A graph like this shows no correlation. As you can see, there is no clear positive or negative regression line. In the case of this graph, we can see that there is no correlation between dog size and whether they're a good dog. We can't use dog size to predict dog goodness. Note that a graph like this also shows noncorrelation. As you can see, regardless of size, dogs are always very cute. Since the dog size has no correlation with how cute they are, we can't use dog size to predict dog cuteness. But be careful. Sometimes a graph can look like a correlation when it is not. This is called an artificial correlation. For example, this graph seems to show a negative correlation between a dog's size and their fluffiness. We might predict that the larger the dog, the less fluffy they are. But it turns out that there are actually two different subgroups in this graph, puppies and adults. Within these two subgroups, there is no correlation between our x and y variables. So this graph only shows an artificial correlation. We can see that dogs are fluffy at all sizes, but puppies are usually fluffier than adults. So again, to avoid mistakes, make sure that you're aware of what exactly is shown in a graph before you make judgments about the data shown in that graph. Finally, correlation does not entail causation. Let's say we have a graph like this, which shows a positive correlation between the number of ice cream trucks versus the incidence of drowning. Since as the number of ice cream trucks increases, more people drown. We might be tempted to conclude that in this unfortunate town, ice cream trucks are pushing people into the water. Or we might claim that when someone drowns, it summons an ice cream truck. But actually the most reasonable explanation is that a third variable, such as temperature, causes both of these things to increase. As the weather gets warmer, more people swim, and more people make profits off of sweaty children. Alternately, and more seriously, if you found a correlation between low self-esteem and depression, this could mean maybe that low self-esteem causes depression. Or it could mean that depression causes low self-esteem. Or it could mean that some other third factor, such as biological predisposition, causes both low self-esteem and depression and that these two variables, although they are correlated, are independent from each other. Neither variable causes the other. When you just look at a graph, you can't really tell. So again, correlation does not imply causation.