Causal inference assumptions

To recap, we can’t estimate the causal effect for each individual because of the fundamental problem of causal inference. But randomized experiments provide a solution for estimating the average treatment effect but comparing the treatment group as a whole to the control group as a whole because the two groups are on average or distributionally similar.

However, as we saw, randomized experiments are not always feasible. Imagine if you’re interested in the effects of visible tattoos on employment outcomes? Would you, as the researcher, get a tattoo gun and randomly stamp tattoos on some of your subjects whether they like it or not? 🖌 👩🏼‍🎨

If you come across a situation where the causal question can’t be answered with a randomized experiment, the good news is that we might still be able to estimate the causal effect with an observational study. In fact, the rest of this entire course teaches you how to estimate causal effects with observational data.

Our first step though is to learn under what assumptions, we can estimate causal effects even if the data are observational.

These assumptions are called causal assumptions and are:

Stable unit treatment value assumption (SUTVA)
Consistency
Ignorability
Positivity

Let’s see what each of these assumptions are

1. SUTVA

Ok, Stable unit treatment value assumption (SUTVA, read as sutvuh). It sounds an awful lot like an IKEA product, but really, it’s a very simple yet important assumption in causal inference.

SUTVA broadly refers to what is called no-interference. For the SUTVA assumption to hold three things must be the case:

the treatment is well-defined
there is only one version of the treatment
subject $i$ ’s treatment has no effect on the outcome of any other subject in the study such as subject $j$ ’s.

Let’s take a look at an example to see where it is reasonable and unreasonable to assume SUTVA.

Imagine a study looking at the effect of using face masks on contracting the SARS-CoV-2 virus. Because we didn’t run any experiment, individual choose to wear masks or not; so some people wear masks and some don’t.

The SUTVA (no-interference assumption) is probably violated here because when the treated group in our study wear face masks, it likely affects the outcome (contracting the virus) for everybody else in the study by reducing the spread of the virus. In other words, subject $i$ ’s treatment (wearing a mask) likely does affect subject $j$ ’s outcomes (their likelihood of contracting the virus).

The example above is a case of spillover effect. In general, SUTVA is also violated in the presence of peer-effects when subjects’ choices are influenced because of interpersonal relationships.

SUTVA is also violated in the presence of general-equilibrium effects. To understand general-equilibrium effects, think of distributing basic income as the treatment to some residents of a city. In general equilibrium and in the long-term, this basic income means the average income in the city is higher which then affects employment decisions and inflation for everybody in that city.

If SUTVA is violated and, therefore, the units interfere with each other, then subjects might have many potential outcomes.

2. Consistency

The consistency assumption implies that a subject’s potential outcome under its observed treatment status is the outcome that will actually be observed for that subject. Therefore, this assumption requires that the potential outcome under treatment, ${Y_i}^1$ , is equal to the observed outcome if the individual receives treatment, $Y_i|D_i=1$ . Or, the potential outcome under no treatment, ${Y_i}^0$ , is equal to the observed outcome if the individual didn’t receive treatment, $Y_i|D_i=0$ .

This may seem trivial. Isn’t this how we defined observed outcomes?

This assumption is very modest. Judea Pearl, a computer scientist who has devoted his life to causal inference (you’ll hear much more about him in later lessons), believes that this assumption can basically be considered an axiom to define counterfactuals and doesn’t necessarily need to be considered an assumption. An axiom is self-evident and requires no proof whereas an assumption is something we assume without any proof.

3. Ignorability

To learn about the ignorability assumption, we need to know what confounders are. You’ve seen what confounders are, and we’ll learn more about them in the next lesson, but for now, here’s a refresher. Confounders are often defined as variables that:

are associated with the treatment variable
are associated with the outcome variable. Note that confounders should directly affect outcomes and not indirectly through the treatment.

Combined, let’s call these criteria for confounders the common-cause criteria. Here’s an example of a confounder:

In the job training example, imagine that people in urban areas are more likely to know about the training program and, therefore, receive the treatment. The variable that indicates whether the person is in urban areas or not, is then a confounder. Why?

As we just said, people in urban areas are more likely to know about the program and participate in it. Therefore, this variable is associated with the treatment variable. Also, people in urban areas usually have higher earnings, therefore, the covariate is also directly associated with the outcome variable.

Let’s say we think hard about this job training program and we decide that a variable indicating whether a person lives in urban areas or not (0 or 1) is the only confounder (you know this is not right since many many variables such as age and gender could be potential confounders, but let’s just assume urban is the only one).

The ignorability assumption implies that, given confounders (here only the urban variable), the treatment assignment is independent of the potential outcomes, and therefore, the treatment assignment is randomized. In other words, if we categorize the entire population of the study to subpopulations based on the confounders, we can assume that the treatment was randomly assigned within each subpopulation.

In our study, if urban is the only confounder, if we only look at people who live in urban areas or only those who live in rural areas, then we can assume that within those groups the treatment assignment is independent of the potential outcomes.

If we group all covariates as $X$ , notationally we write:

Y^0, Y^1 \perp D | X

Read $\perp$ as the notation for being independent and $|$ as the notation for conditional or given. This means potential outcomes will be independent of the treatment assignment given values of the covariates.

This assumption is considered the most important assumption in causal inference. It’s also referred to as the unconfoundedness or the exogeneity assumption.

Is this assumption usually satisfied? No 😐

Can we objectively check this assumption? Also No 😑

Note that in randomized experiments, where the treatment assignment is randomized, the treatment assignment is determined by chance and has nothing to do with the covariates defining the subjects. Therefore, in a randomized experiment the ignorabililty assumption is automatically satisfied.

All we can do is to use our intuition and expert knowledge to decide if the assumption is violate or not. For this we need to know a little about confounders. We’ll spend the next lesson on confounders in detail.

So when is this assumption violated?

We said that if we only look at subpopulations based on the confounders and not the entire population, within each subpopulation, the treatment assignment is as-if random. But let’s assume that we know urban is a potential confounder but in our data we don’t have a variable that represents it. We then call urban an unobserved confounder. Therefore, we can’t really condition on it (control for it in our analysis).

If we don’t condition on the confounder urban, because we know the treatment assignment is associated with it, then we can no longer assume that the treatment is random because it depends on whether a person lives in urban areas or not. In this case, the ignorability assumption will be violated.

As a result, $Y^0$ and $Y^1$ are not conditionally independent from $D$ .

Y^0, Y^1 \not\perp D | X

However, if we control for the variable urban among other covariates, within categories of all the covariates we can assume that the treatment is randomly assigned.

Y^0, Y^1 \perp D | \{X, \text{Urban}\}

If you’re not 100 percent clear on what this assumption means, the good news is that we’ll be coming back to this assumption like 67 more times. So you’ll have lots of time to absorb it.

4. Positivity

The positivity assumption ensures that every subject in the study has some chance of receiving the treatment and that the treatment assignment is not deterministic.

The role of this assumption is that we have counterfactuals. To see what this statement means, let’s consider the training program and imagine some of the potential confounders we need to consider in the study are: urban and gender. The idea behind the positivity assumption is that every type of person (such as female urbanites) in our study has some chance of receiving the treatment.

Because you asked for the mathematical notion, here it is:

\text{Pr}(d=D|x=X)>0 \text{\ \ for all } d\text{'s and } x\text{'s}

Now, consider a violation of the assumption: imagine every female subject in urban areas receives the treatment (participates in the program) and everybody who is not female and/or lives in rural areas doesn’t receive the treatment. Therefore, there is not much variability in the treatment assignment within categories of our confounders, $X$ .

In general, if this assumption is violated, there will be no observed value of $Y$ for one of the treatment groups (treatment or control) for specific subgroups of the population (categorized by $X$ s). For instance, if everybody in that category receives the treatment, there is no observed $Y^0$ for subjects in that category.

This assumption is common sense. We can’t compare red apples 🍎 (those who received the treatment) to green apples 🍏 (those who didn’t) if all we have is 🍎 in a subpopulation of our study.

Selection bias and the ignorability assumption

Selection bias and the ignorability assumption are tightly related.

If the ignorability assumption holds, there shouldn’t be any differences between the treatment and the control groups within each subgroup of the confounders, and therefore, there shouldn’t be any selection bias conditional on the covariates. If there is no selection bias, the naive causal effect and the ATT will be the same.

And this is the magic of the the ignorability assumption. If the ignorability assumption holds ➡️ there is no selection bias ➡️ we can simply compare the observed outcomes of the treated and the control groups and call it the treatment effect (on the treated).

Therefore, under the ignorability assumption (and if that all other causal assumptions hold), the fundamental problem of causal inference no longer hold us from estimating causal effects.

In a 2019 article in the New York Times, the author claimed that people who go to museums live longer. This claim was based on a study comparing the life expectancy of people who go to museums and people who don’t. Is the causal effect in the article the true causal effect of a naive (prima facie) causal effect?

Naive causal effect

True causal effect

Only in a randomized setting, we can find the true causal effect

Recap

The main distinction of statistical inference and causal inference is the causal inference assumptions. Causal inference is basically statistical inference informed by the causal assumptions we discussed above.

In the real world, it’s very unlikely that we will have a causal question in which all the assumptions are met (or at least met without any tricks). Usually the ignorability assumption is the trouble. If all the assumptions are met, our job is super easy!

If all assumptions hold, we can assume that within each strata of the confounders, the treatment assignment is random and we can estimate causal effects using various methods including regression, matching, or stratification.

If the ignorability assumption doesn’t hold, we’ll see soon that we’ll have a bias in our causal effect, called the selection bias.

The downside of some of these assumptions is that they are generally untestable, meaning we can’t objectively test whether they’re violated or not. The only way to test these assumptions is to use our prior knowledge and intuition.

Assume we are interested in the causal effect of a training program on an outcome such as how much the person earns after the treatment. To be more specific, consider enrollment in a digital skills program for high school dropouts. Is this a causal question?

Yes, because it satisfied all the assumptions

No, SUTVA is likely not satisfied

No, treatment isn’t defined well

No. What is the outcome?

Next Lesson

Confounders

This lesson will be your first formal introduction to confounders

Go to the next lesson

All Courses