All Courses

Confounders

In the last session we briefly defined confounders. Confounders are typically referred to as variables that are associated with both the treatment and the outcome. What would be an example of a confounder?

Revisiting the job training example where the treatment is the participation in a job training program and the outcome is future earnings, any variable that affects both participation and future earnings can be considered a confounder. For instance, if gender is a determinant of future earnings and also a determinant of who enrolls in the program (given that enrollment in the program isn’t randomized), then gender is a confounder. With the same token, living in urban areas, age and race are also potential confounders.

We saw confounders are important because they’re tied to the ignorability assumption, which we discussed in the previous lesson. If we are able to identify and control for confounders, it’s as if the assignment to treatment and control groups was randomized.

Therefore, the ignorability assumption tells us if we control for the confounders, then we can assume that within each subgroup of the population determined by the confounders (40-49 year-old white individuals), then it’s as if the assignment to treatment and control groups was randomized

What does controlling for mean?

We have used the terms controlling for and conditioning on before. But in all honesty, we haven’t fully discussed what we mean by them. When we talk about conditioning on or controlling for a variable we generally mean we are taking said variable into account (or consideration) in our analysis. This can happen in various ways.

If we’re using regression analysis to find the causal effects, we include the confounders we want to control for in the right-hand side of the regression.

If we’re using matching, inverse-probability weighting, or stratification, we include the confounders to match on, to weight by, or to stratify by. We’ll learn about matching and weighting in a future module.

If we do stratification analysis, we only restrict our analysis to a specific subgroup. For instance, we limit the data to only college-educated women and find the causal effects among them. We’ll briefly discuss stratification analysis at the end of this module.

All of these emphasize the importance of the ignorability assumption. Once we identify confounders and the causal assumption is satisfied, we can find causal effects by conditioning on those confounders. The analysis would be easy.

If a coin flip determines who will receive the treatment in a causal study and we have a variable indicating the value of the coin flip for every subject in the study, is this variable a potential confounder?
Huh?
It’s definitely a confounder
It could be a confounder because the coin flip determines the treatment
It’s not a confounder because the coin flip has nothing to do with the outcome if it is fully random

How do we find confounders?

We usually use our intuition (and expertise) to find confounders and in the upcoming modules we’ll see how causal graphs can help us identify confounders.

In some case, we’ll be lucky to have the confounders in our data and in others there may be unmeasured confounders due to various reasons such as problems with measuring a variable (for example, can you measure motivation) or not foreseeing the need for measuring a variable in the data collection phase.

In a causal inference study with a given set of variables, you might be wondering why not just add any variable in the data set as potential confounders in the analysis. You may argue if there are unmeasured confounders, then, well, we can’t do much about them. But any measured variable suspected of being a confounder would be better to include, wouldn’t it?

Nope! 👎

This is almost always a bad idea. As we’ll see later, adding variables that are not confounders may in fact violate the ignorability assumption and create confoundedness.

Moreover, if we’re using regression analysis to answer our causal question, we may also be at risk of multicolinearity. Multicolinearity refers to the idea of some of the variables being accounted for in a regression model are (highly) correlated. Because a key goal of the regression model is to isolate the relationship between those variables, the correlation between those variables makes it harder for us to isolate the relationships.

Next Lesson

Treatment effects

In this lesson, we will learn the distinction between different treatment effects and how to interpret them