COMING SOON
COMING SOON
COMING SOON
The most important step in estimating a causal effect in an observational study is to satisfy the ignorability assumption by identifying relevant confounders.
We had a long discussion about confounders and how to define them. But we also said that the common-cause criteria are not the most accurate way to identify confounders. In this module, we will see how causal graphs can help us with that.
So far, we’ve used mathematical notation and words to communicate causal relationships among variables. For example, we have said can affect and can affect both and .
This simple sentence dictates the causal relationship among , , and . If the number of variables is large and especially when the relationship among variables is complex, describing the relationship between variables with words may not be the best idea.
As an alternative, we use causal graphs, or what we usually call directed acyclic graphs
(DAGs
). DAGs provide a more intuitive language for communicating the dynamics of causality. The idea is very simple.
When we think a variable is the cause of the variable , we simply draw an arrow going from to .
These diagrams are called directed graphs because of the arrows. The arrows indicate the presence and the direction of a causal relationship.
They are called acyclic because there are no closed paths. We’ll see what this means in a minute, but first, we need to introduce a few more terms.
In the graph above, and are called nodes
. You can think of the nodes as representing the random variables we’ve been working with so far. The arrow that connects the nodes is called an edge
or sometimes a link
. Nodes that are connected by a single edge are called adjacent nodes
. In the DAG above and are adjacent nodes.
Most DAGs consist of more than two nodes and one edge. In cases such as these, we use the term path
to describe a sequence of edges, where a node that ends one edge begins the next edge in the sequence.
We can further differentiate between direct and indirect paths. Consider the causal effect of video games on obesity. Playing video games does not directly lead to obesity (as there’s no medical evidence of a direct link). However, playing a lot of video games can lead to less exercise which may then lead to obesity. In this scenario, the path between video games and obesity is an indirect path
the path between these nodes passes through a third node, “exercise.”
The path between the video game node and the exercise node is a direct path
. There are no intermediate nodes between them.
Let’s take a look at another DAG.
To go from to , there are two paths. The two paths are:
Note that the direction of edges can change along a path as is the case in the second path above.
We can now come back to that word, acyclic (the A in DAG). Look at this example. Can you identify a path that starts from one node and ends at that very same node? The short answer is no! None of the paths in the DAG above start and end at the same node.
Now, consider the diagram below.
Is this a DAG?
It’s not. Notice that as you travel down the path starting from , you eventually return right back to . . Paths that start and end at the same node are called closed paths, they are not acylic, and therefore, any diagram with a closed path is not a DAG.
Now, one last set of terms for the road…
Consider the DAG we saw previously:
In this DAG:
parent
and is said to be the child
of . By the same token, is a child of , and is the parent of . A node can have more than one child and multiple parents. For instance, has two parents: and .descendant
of and is an ancestor
of .root
. Roots are nodes with no parents.DAGs are a bit like family trees 🌳