- We follow the empirical cycle to come up with hypotheses and to test and evaluate them against observations. But once the results are in, a confirmation doesn't mean a hypothesis has been proven. And a disconfirmation does not automatically mean we reject it. So how do we decide whether we find a study convincing? 

Well, there are two main criteria for evaluation-- reliability and validity. Reliability is very closely related to replicability. A study is replicable if independent researchers are, in principle, able to repeat it. A research finding is reliable if we actually repeat the study and then find consistent results. 

Validity is more complicated. A study is valid if the conclusion about the hypothesized relation between properties accurately reflects reality. In short, a study is valid if the conclusion based on the results is true. 

Suppose I hypothesize that loneliness causes feelings of depression. I deduce that if I decrease loneliness in elderly people by giving them a cat to take care of, their feelings of depression should also decrease. Now, suppose I perform this study in a retirement home and find that depression actually decreases after residents take care of a cat. 

Is this study valid? Do the results support the conclusion that loneliness causes depression? Well, because this is still a pretty general question, we'll consider three more specific types of validity-- construct, internal, and external validity. 

Construct validity is an important prerequisite for internal and external validity. A study has high construct validity if the properties, or constructs, that appear in the hypothesis are measured and manipulated accurately. In other words, our methods have high construct validity if they actually measure and manipulate the properties that we intended them to. 

Suppose I accidentally measured an entirely different construct with, for example, my depression questionnaire. What if it measures feelings of social exclusion instead of depression? Or suppose that taking care of the cat didn't affect loneliness at all but instead increased feelings of responsibility and self-worth. What if loneliness remained the same? Well, then the results only seem to support the hypothesis that loneliness caused depression when, in reality, we've manipulated a different cause and measured a different effect. 

Developing accurate measurement and manipulation methods is one of the biggest challenges in the social and behavioral sciences. I'll discuss this in more detail when we look at operationalization. But for now, I'll move on to internal validity. 

Internal validity is relevant when our hypothesis describes a causal relationship. A study is internally valid if the observed effect is actually due to the hypothesized cause. Let's assume our measurement and manipulation methods are valid for a second. Can we conclude depression went down because the elderly felt less lonely? 

Well, maybe something else caused the decrease in depression. For example, if the study started in the winter and ended in the spring, then maybe the change in season lowered depression. Or maybe it wasn't the cat's company but the increased physical exercise from cleaning the litter box and feeding bowl. 

Alternative explanations like these threaten internal validity. If there is a plausible alternative explanation, internal validity is low. Now, there are many different types of threats to internal validity that I will discuss in much more detail in later videos. 

OK, let's look at external validity. A study is externally valid if the hypothesized relationship supported by our findings also holds in other settings and other groups. In other words, if the results generalize to different people, groups, environments, and times. Let's return to our example. 

Will taking care of a cat decrease depression in teenagers and middle-aged people, too? Will the effect be the same for men and women? What about people from different cultures? Will a dog be as effective as a cat? 

Of course, this is all hard to say based on the results of only elderly people and cats. If we had included younger people, people from different cultural backgrounds, and used other animals, we might have been more confident about the study's external validity. I'll come back to external validity and how it can be threatened when we come to the subject of sampling. 

So to summarize, construct validity relates to whether our methods actually reflect the properties we intended to manipulate and measure. Internal validity relates to whether our hypothesized cause is the actual cause for the observed effect. Internal validity is threatened by alternative explanations. External validity, or generalizability, relates to whether the hypothesized relation holds in other settings.