All Courses

What do DAGs tell us?

As we saw in the last lesson, DAGs can be used to map variables and make our assumptions about causal relationships more explicit. In the rest of this module, we will learn to see how this mapping process aids us in identifying confounders and knowing which variables need to be controlled for.

Let’s begin by learning how DAGs show dependence and conditional dependence between variables. For this, you’ll need to know what conditional probability and joint distributions are. If you can’t remember what these terms mean, we suggest you open your favorite basic probability book or just do a Wiki search to refresh your memory. Do that now, and we’ll meet you right back here.

Using DAGs to show conditional dependence

Alright, we know from probability, that two variables X1X_1 and X2X_2 are independent if information about one doesn’t tell you anything about the other.

But let’s say a variable X1X_1 affects variable YY through a third variable variable X2X_2. This can be easily shown using DAGs. So X2X_2 lies on the causal path from X1X_1 to YY.

In statistics, conditioning on a variable (or controlling for it) means that this variable is treated as being known to us (e.g. if the variable is binary, we know if it’s 0 or 1. From conditional probability, we know that in this case the conditional distribution of YY given X1X_1 and X2X_2 is equal to the conditional distribution of YY given only X2X_2.

Therefore, in the example above, as long as we condition on X2X_2, we don’t need X1X_1 to get the conditional distribution of YY. In other words:

Pr(YX1,X2)=Pr(YX2)\Pr(Y|X_1, X_2) = \Pr(Y|X_2)

An interpretation of the mathematical statement above is that if we condition on X2X_2, then YY and X1X_1 become independent.

Based on this DAG, we know:

  • YY is directly affected by X2X_2 and only indirectly affected by X1X_1.
  • Pr(YX1,X2)=Pr(YX2)\Pr(Y|X_1, X_2) = \Pr(Y|X_2)
  • Also Pr(YX1)Pr(Y)\Pr(Y|X_1) \neq \Pr(Y) which means YY and X1X_1 are conditionally (marginally) dependent because they’re indirectly dependent on each other. They become independent if we condition on (control for) X2X_2.
  • Similarly, Pr(X1X2,Y)=Pr(X1X2)\Pr(X_1|X_2, Y) = \Pr(X_1|X_2)

Using DAGs to decompose a joint distribution

Now consider the following DAG that is similar to the one we just saw but with an additional variable:

From probability we know that the in general, Pr(X1,X2,Y)\Pr(X_1, X_2, Y) represents the joint distribution of X1X_1, X2X_2, and YY. This DAG is telling us that:

  • Pr(X3X1,X2,Y)=Pr(X3)\Pr(X_3|X_1, X_2, Y) = \Pr(X_3) because X3X_3 is independent of all other variables.
  • Pr(YX1,X2,X3)=Pr(YX2)\Pr(Y|X_1, X_2, X_3) = \Pr(Y|X_2)
  • Similarly, Pr(X1X2,X3,Y)=Pr(X1X2)\Pr(X_1|X_2, X_3, Y) = \Pr(X_1|X_2).

We can use DAGs to help us decompose a joint distribution (writing a joint distribution as a combination of conditional distributions). For this, we start with the roots (remember, a root = a node without any parents). Then we move from roots down to their descendants while conditioning on parents. We’ll use another DAG to see how to do this. This one is a bit more complicated:

First, this DAG tells us:

  • Pr(X1X2,X3,Y)=Pr(X1X2)\Pr(X_1|X_2, X_3, Y) = \Pr(X_1|X_2)
  • Pr(X2X1,X3,Y)=Pr(X2X1,X3)\Pr(X_2|X_1, X_3, Y) = \Pr(X_2|X_1, X_3)
  • Pr(YX1,X2,X3)=Pr(YX3)\Pr(Y|X_1, X_2, X_3) = \Pr(Y|X_3)

We can also find the join distribution:

Pr(X1,X2,X3,Y)=Pr(X2)Pr(X1X2)Pr(X3X2)Pr(YX3)\Pr(X_1, X_2, X_3, Y) = \Pr(X_2) \Pr(X_1|X_2) \Pr(X_3|X_2) \Pr(Y|X_3)

This means we can decompose the joint distribution by starting from the root X2X_2.

Practice makes perfect. Let’s look at another one. This ones a bit more involved…

Mathematically, the DAG tells us:

  • Pr(YX1,X2,X3,X4)=Pr(YX4)\Pr(Y|X_1, X_2, X_3, X_4) = \Pr(Y|X_4)
  • Pr(X1X2,X3,X4,Y)=Pr(X1X2,X3)\Pr(X_1|X_2, X_3, X_4, Y) = \Pr(X_1|X_2, X_3)
  • Pr(X2X1,X3,X4,Y)=Pr(X2X1,X4)\Pr(X_2|X_1, X_3, X_4, Y) = \Pr(X_2|X_1, X_4)
  • Pr(X3X1,X2,X4,Y)=Pr(X3X1,X4)\Pr(X_3|X_1, X_2, X_4, Y) = \Pr(X_3|X_1, X_4)
  • Pr(X4X1,X2,X3,Y)=Pr(X4X2,X3,Y)\Pr(X_4|X_1, X_2, X_3, Y) = \Pr(X_4|X_2, X_3, Y)

The joint distribution of this DAG is:

Pr(X1,X2,X3,X4,Y)=Pr(X1)Pr(X2X1)Pr(X3X1)Pr(X4X2,X3)Pr(YX4)\Pr(X_1, X_2, X_3, X_4, Y) = \Pr(X_1) \Pr(X_2|X_1) \Pr(X_3|X_1) \Pr(X_4|X_2, X_3) \Pr(Y|X_4)

Finally, let’s forget about all those symbolic variables and talk about a more realistic DAG. Consider the DAG below:

From this DAG we know that heart disease is directly caused by obesity and smoking and indirectly caused by (not) having a college degree.

Heart disease and college degree are conditionally dependent. However, if we condition on smoking, heart disease and college degree become independent. What does this statement really mean?

The statement really means that if we control for smoking, we no longer need to control for college education. In other words, for individuals who smoke the same number of cigarettes a day, education no longer determine their chances of having a heart disease.

The DAG above also tells us the obesity is only dependent on heart disease and college degree is only dependent on smoking.

Finally the joint distribution derived from the DAG above is:

Pr(Obesity, College degree, Smoking, Heart disease) =Pr(College degree) Pr(Obesity) Pr(Smoking | College degree, Heart disease) Pr(Heart disease | Smoking, Obesity) \text{Pr(Obesity, College degree, Smoking, Heart disease) } = \text{Pr(College degree) } \text{Pr(Obesity) } \text{Pr(Smoking | College degree, Heart disease) } \text{Pr(Heart disease | Smoking, Obesity) }

To decompose the joint distribution, we start from the two root nodes (college degree and obesity, then we move from those two roots down their descendants while conditioning on parents

To recap, a simple DAG, tells us a lot about the relationship between variables and the direction of the relationships. The relationship between DAGs and joint probability distributions are one to one. We can only derive one and only one joint distribution from a DAGs and a joint distribution can only lead to a unique DAG.

Next Lesson

Chains, forks, and colliders

You'll learn about the basics of causal inference and why it matters in this course.