COMING SOON
COMING SOON
COMING SOON
You’ve already learned how to block a path. If a path does not contain a collider, controlling for any node (variable) in the middle of the path will shut it. If the path contains a collider, the path is already shut, and controlling for the collider node only opens it up and introduces a new confounding variable (which is not what we want).
So given a DAG, we first identify all backdoor paths and then identify a set of covariates (nodes) that if we control for them will block all backdoor paths. There’s a systematic way of identifying these nodes. This process is called d-separation.
Two nodes and are d-separated
by a set of nodes (let’s call this set ) if conditioning on the nodes in blocks every path from the treatment to the outcome . In mathematical terms:
The symbol here is what denotes the d-separation. If you recall, this notation looks a lot like what we used for the ignorability assumption in the lesson on causal assumptions. Remember, the ignorability assumption dictates that the treatment variable is independent of the outcomes (potential outcomes) conditional on a set of variables X:
Therefore, controlling for the set of variables in satisfies the ignorability assumption.
So how do we find the set of variables that would block every path from the treatment to the outcome and satisfy the ignorability assumption? We use the backdoor path criterion
. The set of variables must satisfy the following three criteria:
The first criterion ensures that we do not block any descendents of the treatment since doing so would erase some of the causal effect of the treatment on the outcome.
It’s important to note that the set of variables that satisfies the backdoor criterion isn’t unique (in other words, there can be multiple sets of that will satisfy the condition). Let’s see this through a simple DAG shown below:
To find sets of , we start by identifying all backdoor paths. In this case, there is only one backdoor path which is . Because there’s no collider in the path, it’s not automatically blocked. To block the path we can either control for , , or both. Therefore, it is sufficient to control for any one of the sets below:
Choosing which set to control for is up to you as the researcher, especially if you’re designing a causal study before collecting data, or the set that contains variables for which data is readily available.
In the DAG above, the backdoor path is simply . This path contains a collider, which means the path is already blocked and there is no confounding. So we can choose not to control for any of the variables and still be able to find the causal relationship between and . If for any reason (say accidentally) we control for the collider , and open the path, we will have to make sure to control for additional variables to block the path again. For instance, we can control for both and to make sure the backdoor path remains blocked.
We can choose any of the sets below to control for if we want to find the causal effect of on :
And because you asked for it, here is a more complicated DAG:
There are three backdoor paths in this DAG. Without looking at the answer below, can you tell what they are?
The following are the backdoor paths and the sets of nodes that would block them:
Remember that we need to block all backdoor paths. What set then would block all backdoor paths? Here’s an option: If we control for and , all backdoor paths are blocked. This is because controlling for blocks the first and the second backdoor paths and controlling for blocks the third one. So a possible set would be . However, there are other (minimal) sets that if we control for them, all backdoor paths will be blocked.
We call these minimal sets because obviously we can control for to block all paths or even control for all variables .