A difference-in-difference example

An empirical example in difference-in-differences

Let’s see the Card-Krueger example in action. We’re going to use the data used by Card and Krueger and apply the two different DiD approaches. The data is slightly modified so it may not return the exact numbers as in Card and Krueger’s paper.

We’ve cleaned up the data already. You’ll see that the variable state indicates treatment - it equals 1 for New Jersey and 0 for Pennsylvania. The variable post indicates time. Post equals 1 for post-treatment and 0 otherwise. Finally, the variable employment shows full-time employment at each restaurant. Let’s import the data first:

# The read.csv() function can read csv data from the web
minwage <- read.csv("https://bit.ly/minwage_card_krueger")

# Python codes will be added soon

* Let's load the data from the web link.
import delimited https://bit.ly/minwage_card_krueger

Now, we can find the DiD estimator by taking the differences. This is the simplest method:

# We first create a dataframe that includes all the means
# for NJ and PA pre- and post-treatment
means <- minwage %>% 
    group_by(post,state) %>% 
    summarize(emp = mean(employment))

# Pre- and post-treatment difference for NJ
diff_NJ <- means$emp[4] - means$emp[2]
# Pre- and post-treatment difference for PA
diff_PA <- means$emp[3] - means$emp[1]
# Difference in differences estimate
did <- diff_NJ - diff_PA

# Python codes will be added soon

* Let's first calculate the average pre-treatment employment for PA
qui sum employmen if state == 0 & post == 0
gen mean_PA_pre=r(mean)

* Then we calculate the average post-treatment employment for PA
qui sum employmen if state == 0 & post == 1
gen mean_PA_post=r(mean)

* Now we calculate the average pre-treatment employment for NJ
qui sum employmen if state == 1 & post == 0
gen mean_NJ_pre=r(mean)

* Then we calculate the average post-treatment employment for NJ
qui sum employmen if state == 1 & post == 1
gen mean_NJ_post=r(mean)

* Pre- and post-treatment difference for PA
gen diff_PA = mean_PA_post - mean_PA_pre

* Pre- and post-treatment difference for NJ
gen diff_NJ = mean_NJ_post - mean_NJ_pre

* Difference in differences estimate
dis diff_NJ - diff_PA

The estimated effect is 2.91 which is a little different from the number reported in the paper. Again, this is only because we cleaned the data a bit differently.

Using the second approach, we use a regression equation to find the DiD estimator. The coefficient we’re interested in is the coefficient on the interaction term of the treatment and time variables. Here’s how we do it:

# By stating post * state, R automatically includes variables post,
# state, and the interaction of the two.
ols_model <- lm(employment ~ post * state, data = minwage)
summary(ols_model)

# Python codes will be added soon

* By stating post ## state, Stata automatically includes variables post,
* state, and the interaction of the two.
reg employment post##state

You’ll notice that the coefficient on the interaction term is the same as the one we found earlier. It’s 2.91.

One advantage of using the regression-model approach in DiD, as opposed to simply calculating the mean differences, is that we can include in the model some observed confounders to make sure the estimated treatment effect is the true causal effect and not due to those factors.

As you can see, DiD is simple and straightforward with respect to the calculations involved. The Card and Krueger paper is a seminal paper written by world-renowned economists, but as this example shows, the estimation of the causal effect is rather easy. Identifying the data and defending all of the assumptions is the difficult part.

This example had only one period before and one period after the policy change (a minimum wage increase), but we can also use DiD in cases where there are multiple periods before or after the treatment.

The parallel-trends assumption

As we saw, the most important assumption about DiD is the parallel-trends assumption.

The parallel-trend assumption isn’t something new. This assumption and the assumption used in the synthetic control method are two different takes on the ignorability assumption. In synthetic control, we assumed that the potential outcome under no treatment for both groups (treated and control) is the same conditional on observed covariates and past outcomes. Similarly, the parallel-trends assumption assumes that, in the absence of treatment, the potential outcomes of the treated and control units change to the same degree over time. In the short incoming section, let’s discuss the parallel-trends assumption in more detail.

The most important caveat of the assumption is that it’s generally untestable because we can’t observe the treated unit under no treatment (remember the fundamental problem of causal inference).

However, researchers usually use several pre-treatment points in time (if available in the data) to show that the treated and the control units show parallel-trends pre-treatment lending support for post-treatment parallel-trends. It is argued that if pre-treatment trends are parallel for a reasonable length of time, there’s less reason to believe they’d be non-parallel post-treatment. Note that this can’t be checked in a dataset with only one pre-treatment period.

In the Card-Krueger paper on minimum wage, we can test this because the pre-treatment employment figures are available for many points in time before the treatment year. Card and Krueger wrote another paper testing the parallel-trends assumptions and found that looking at 1991 to 1997 there is little visual support for the parallel-trends assumption. If that’s the case, one can argue that the DiD treatment effect estimated in the original paper could also be due, wholly or in part, to time-varying factors other than the minimum wage hike.

Similar to the synthetic control method, we can also perform a placebo test to test the parallel-trends assumption. Placebo testing in the context of DiD is similar to the in-time placebo test used in synthetic controls. You repeat the DiD estimation using only pre-treatment data with the expectation that you will not find a significant treatment effect.

An alternative approach to placebo testing in DiD is to use data from multiple points in time. In this case, the DiD regression would include interaction terms between each pre-treatment point in time and the treatment variable. The interaction terms between the treatment and the pre-treatment points in time should be insignificant.

Another issue with the assumption that is relatively minor is that it is scale-dependent. For instance, the assumption may hold for the original outcome variable but not for a non-linear transformation of the outcome variable such as its $\log$ .

DiD vs. lagged regression vs. synthetic control

If you think about it, the difference-in-differences method is just a special case of a lagged regression. In a lagged regression we control for pre-treatment outcomes and covariates. In DiD, we restrict the coefficient on the lagged dependent variable to be 1.

O’Neill et al. in their paper use simulations to show that when the parallel-trends assumption holds, DiD is generally less biased compared to lagged regression or synthetic control. However, if the assumption fails, the method is more biased compared to lagged regression and synthetic control.

One advantage of the synthetic control method over DiD is that in DiD there is a bit of ambiguity in the choice of comparison (donor) units, whereas this process is transparent and data-driven in the synthetic control method.

Next Lesson

Final thoughts on causal inference

You'll learn about the basics of causal inference and why it matters in this course.