All Courses

Basics of causal graphs

The most important step in estimating a causal effect in an observational study is to satisfy the ignorability assumption by identifying relevant confounders.

We had a long discussion about confounders and how to define them. But we also said that the common-cause criteria are not the most accurate way to identify confounders. In this module, we will see how causal graphs can help us with that.

What are causal graphs?

So far, we’ve used mathematical notation and words to communicate causal relationships among variables. For example, we have said X1X_1 can affect YY and X2X_2 can affect both X1X_1 and YY.

This simple sentence dictates the causal relationship among X1X_1, X2X_2, and YY. If the number of variables is large and especially when the relationship among variables is complex, describing the relationship between variables with words may not be the best idea.

As an alternative, we use causal graphs, or what we usually call directed acyclic graphs (DAGs). DAGs provide a more intuitive language for communicating the dynamics of causality. The idea is very simple.

When we think a variable X1X_1 is the cause of the variable YY, we simply draw an arrow going from X1X_1 to YY.

These diagrams are called directed graphs because of the arrows. The arrows indicate the presence and the direction of a causal relationship.

They are called acyclic because there are no closed paths. We’ll see what this means in a minute, but first, we need to introduce a few more terms.

In the graph above, X1X_1 and YY are called nodes. You can think of the nodes as representing the random variables we’ve been working with so far. The arrow that connects the nodes is called an edge or sometimes a link. Nodes that are connected by a single edge are called adjacent nodes. In the DAG above X1X_1 and YY are adjacent nodes.

Most DAGs consist of more than two nodes and one edge. In cases such as these, we use the term path to describe a sequence of edges, where a node that ends one edge begins the next edge in the sequence.

We can further differentiate between direct and indirect paths. Consider the causal effect of video games on obesity. Playing video games does not directly lead to obesity (as there’s no medical evidence of a direct link). However, playing a lot of video games can lead to less exercise which may then lead to obesity. In this scenario, the path between video games and obesity is an indirect path the path between these nodes passes through a third node, “exercise.”

The path between the video game node and the exercise node is a direct path. There are no intermediate nodes between them.

Let’s take a look at another DAG.

To go from X1X_1 to YY, there are two paths. The two paths are:

  • X1X2YX_1 \longrightarrow X_2 \longrightarrow Y
  • X1X2X3YX_1 \longrightarrow X_2 \longrightarrow X_3 \longleftarrow Y

Note that the direction of edges can change along a path as is the case in the second path above.

In the example above, how many paths are there between X1X_1 to X3X_3?
0
1
2
3

We can now come back to that word, acyclic (the A in DAG). Look at this example. Can you identify a path that starts from one node and ends at that very same node? The short answer is no! None of the paths in the DAG above start and end at the same node.

Now, consider the diagram below.

Is this a DAG?

It’s not. Notice that as you travel down the path starting from X1X_1, you eventually return right back to X1X_1. X1X2YX1X_1 \longrightarrow X_2 \longrightarrow Y \longrightarrow X_1. Paths that start and end at the same node are called closed paths, they are not acylic, and therefore, any diagram with a closed path is not a DAG.

Now, one last set of terms for the road…

Consider the DAG we saw previously:

In this DAG:

  • X1X_1 is said to be X2X_2’s parent and X2X_2 is said to be the child of X1X_1. By the same token, X3X_3 is a child of X2X_2, and X2X_2 is the parent of X3X_3. A node can have more than one child and multiple parents. For instance, YY has two parents: X2X_2 and X3X_3.
  • YY is a descendant of X1X_1 and X1X_1 is an ancestor of YY.
  • X1X_1 is a root. Roots are nodes with no parents.

DAGs are a bit like family trees 🌳

Next Lesson

What do DAGs tell us?

You'll learn about the basics of causal inference and why it matters in this course.