COMING SOON
COMING SOON
COMING SOON
Let’s take a step back. Causal inference is all about the effect of a treatment on an outcome, say the effect of education on earnings. But, the treatment and the outcome can be linked through other variables. For instance, education and earning can be linked through a person’s ability (say intelligence).
In the language of DAGs, there is a direct path between the treatment node and the outcome node. But there may also be other “indirect” paths between them. The goal in causal inference is to tease out the effect of the treatment on the outcome by having all other paths “blocked”. In this lesson, we’ll learn about what blocked paths are.
We know about chains, forks, and colliders but why are they useful in the context of blocked paths? We need to go back to the idea of conditional independence.
Remember two variables and are independent if information about one doesn’t tell you anything about the other. In a DAG, two nodes (or variables) are independent if every path between them is blocked
. But what does being blocked mean?
There are two ways a path can be blocked.
First, a path (that doesn’t include a collider) can be blocked only if we condition on a node in the middle of the path. Look at the DAG below that consists of a simple chain. It shows the relationship between temperature, physical activity, and health. It suggests that when the weather is warmer, people are more physically active and, therefore, healthier. Although no direct causal relationship exists between temperature and health, the two variables are linked together through the mediator node, physical activity.
If we condition on physical activity, however, the path between temperature and health will be blocked. In other words, temperature and health will be independent if we condition on physical activity. Holding physical activity constant, temperature should not affect health.
Consider a second example. Here we have a fork where age affects both baldness and the risk of contracting COVID19.
The two branches of the fork (baldness and COVID19 risk) are not independent; they are associated by age. We can treat these two nodes as independent though, so long as we condition on age. Conditioning on age blocks the link between them.
Colliders are special. A path that includes a collider is automatically blocked. As we discussed in the previous lesson, the two ends of a collider are already independent of each other so there is no need for conditioning on any node.
In the DAG above, the path between and is blocked because of the collider at .
Consider an educational grant that is both merit-based and need-based, i.e., to qualify for the grant, the student has to be both in financial need and in good academic standing. The DAG below shows the causal graph:
This is a clear example of a collider telling you that financial need and academic standing are independent. The DAG assumes no direct relationship between financial need and academic standing. We can make this example very simple by assuming all the variables are binary:
To qualify for the grant (receiving the grant = 1), the value of both academic standing and financial need has to be 1. If either of the two nodes is 0, the student does not qualify for the grant, and, therefore, the value of receiving the grant will be 0.
We know from the previous lesson that the two roots in a collider path are automatically independent of each other; therefore, academic standing and financial need are independent.
If we don’t condition on the collider (receiving the grant), information about academic standing does not tell us anything about financial need or vice versa.
Now, let’s assume we condition on receiving the grant for whatever reason. By doing that, we hold receiving the grant constant and, therefore, we know its value. Let’s say we know the student didn’t receive the grant (receiving the grant = 0). Let’s see how by conditioning on this variable, we make financial need and academic standing dependent.
We know the value of the variable receiving the grant is zero. If we know the value of financial need is one, then we know for sure the value of academic standing has to be zero. This is because if both are 1, the student automatically qualifies for the grant. Therefore, by conditioning on the collider, information about one of the root nodes gives us information about the other.
This is why, if we condition on the collider node, the path between the two root nodes is no longer blocked. We open that path that should have never been opened 👻