All Courses

Backdoor criterion

You’ve already learned how to block a path. If a path does not contain a collider, controlling for any node (variable) in the middle of the path will shut it. If the path contains a collider, the path is already shut, and controlling for the collider node only opens it up and introduces a new confounding variable (which is not what we want).

So given a DAG, we first identify all backdoor paths and then identify a set of covariates (nodes) that if we control for them will block all backdoor paths. There’s a systematic way of identifying these nodes. This process is called d-separation.

D-separation

Two nodes DD and YY are d-separated by a set of nodes (let’s call this set MM) if conditioning on the nodes in MM blocks every path from the treatment DD to the outcome YY. In mathematical terms:

YDMY \perp D | M

The symbol \perp here is what denotes the d-separation. If you recall, this notation looks a lot like what we used for the ignorability assumption in the lesson on causal assumptions. Remember, the ignorability assumption dictates that the treatment variable is independent of the outcomes (potential outcomes) conditional on a set of variables X:

YDXY \perp D | X

Therefore, controlling for the set of variables in MM satisfies the ignorability assumption.

Backdoor criterion

So how do we find the set of variables MM that would block every path from the treatment to the outcome and satisfy the ignorability assumption? We use the backdoor path criterion. The set of variables MM must satisfy the following three criteria:

  • MM should not include any variables that are descendents of the treatment DD
  • If the backdoor path between the treatment and the outcome contains a chain or a fork, the chain is blocked by including at least one of the middle nodes of the chain
  • If the backdoor path between the treatment and the outcome contains a collider, the collider node and its descendants are not included to prevent opening the path (collider paths are automatically blocked)

The first criterion ensures that we do not block any descendents of the treatment DD since doing so would erase some of the causal effect of the treatment on the outcome.

It’s important to note that the set of variables that satisfies the backdoor criterion isn’t unique (in other words, there can be multiple sets of MM that will satisfy the condition). Let’s see this through a simple DAG shown below:

To find sets of MM, we start by identifying all backdoor paths. In this case, there is only one backdoor path which is DX2X1YD \longleftarrow X_2 \longrightarrow X_1 \longrightarrow Y. Because there’s no collider in the path, it’s not automatically blocked. To block the path we can either control for X1X_1, X2X_2, or both. Therefore, it is sufficient to control for any one of the sets below:

  • {X1}\{X_1\}
  • {X2}\{X_2\}
  • {X1,X2}\{X_1, X_2\}

Choosing which set to control for is up to you as the researcher, especially if you’re designing a causal study before collecting data, or the set that contains variables for which data is readily available.

Consider the following DAG. What node is a collider on the backdoor path?
X1X_1
X2X_2
X3X_3

In the DAG above, the backdoor path is simply DX1X2X3YD \longleftarrow X_1 \longrightarrow X_2 \longleftarrow X_3 \longrightarrow Y. This path contains a collider, which means the path is already blocked and there is no confounding. So we can choose not to control for any of the variables and still be able to find the causal relationship between DD and YY. If for any reason (say accidentally) we control for the collider X2X_2, and open the path, we will have to make sure to control for additional variables to block the path again. For instance, we can control for both X2X_2 and X3X_3 to make sure the backdoor path remains blocked.

We can choose any of the sets MM below to control for if we want to find the causal effect of DD on YY:

  • {}\{\}, which is an empty set and means we don’t have to control for anything
  • {X1}\{X_1\}
  • {X3}\{X_3\},
  • {X1,X3}\{X_1, X_3\},
  • {X1,X2}\{X_1, X_2\},
  • {X2,X3}\{X_2, X_3\},
  • or {X1,X2,X3}\{X_1, X_2, X_3\}.

And because you asked for it, here is a more complicated DAG:

There are three backdoor paths in this DAG. Without looking at the answer below, can you tell what they are?

The following are the backdoor paths and the sets of nodes that would block them:

  • DX2X1YD \longleftarrow X_2 \longleftarrow X_1 \longrightarrow Y, we need to block {X1}\{X_1\}, or {X2}\{X_2\}
  • DX2X3X4YD \longleftarrow X_2 \longrightarrow X_3 \longrightarrow X_4 \longrightarrow Y, we need to block {X2}\{X_2\}, {X3}\{X_3\}, or {X4}\{X_4\}
  • DX3X2X1YD \longleftarrow X_3 \longleftarrow X_2 \longleftarrow X_1 \longrightarrow Y, we need to block {X1}\{X_1\}, {X2}\{X_2\}, or {X3}\{X_3\}
  • DX3X4YD \longleftarrow X_3 \longrightarrow X_4 \longrightarrow Y, we need to block {X3}\{X_3\}, or {X4}\{X_4\}.

Remember that we need to block all backdoor paths. What set then would block all backdoor paths? Here’s an option: If we control for X2X_2 and X3X_3, all backdoor paths are blocked. This is because controlling for X2X_2 blocks the first and the second backdoor paths and controlling for X3X_3 blocks the third one. So a possible set would be {X2,X3}\{X_2, X_3\}. However, there are other (minimal) sets that if we control for them, all backdoor paths will be blocked.

  • {X2,X3}\{X_2, X_3\}
  • {X1,X3}\{X_1, X_3\}
  • {X2,X4}\{X_2, X_4\}
  • {X1,X4}\{X_1, X_4\}

We call these minimal sets because obviously we can control for {X1,X2,X3}\{X_1, X_2, X_3\} to block all paths or even control for all variables {X1,X2,X3,X4}\{X_1, X_2, X_3, X_4\}.

Consider the DAG below! What single variable, if controlled for, blocks all backdoor paths between the treatment, DD, and the outcome, YY?
X1X_1
X2X_2
X3X_3
None

Next Lesson

Collider bias

You'll learn about the basics of causal inference and why it matters in this course.