Causal Mediation Workshop

I have developed a 3-to-5 day workshop on causal mediation analysis with longitudinal data. This workshop was taught at the University of Helsinki, Finland, in 2018 and again in 2019. The master’s students, PhD students, and postdocs taking the course (mainly sociologists and demographers, but also a few epidemiologists and psychologists) evaluated the course very well. Students especially praised the clear course material (e.g. annotated R-code) and  teaching style. Please contact me if you are interested.


Course description

Mediation analysis refers to the techniques used to investigate the causal mechanisms or pathways by which a determinant affects an outcome. For example, to what extent does education affect a woman’s risk of childbirth directly, and to what extent indirectly by first affecting her employment career? To answer such a question, we take concepts and methods from the counterfactual causal inference framework (also known as the potential outcomes framework), which is a common framework in biostatistics and epidemiology and now becoming more popular in the social sciences.

The course starts with an introduction to Directed Acyclic Graphs (DAGs), distinguishing mediation, confounding (including confounders that are also mediators), and colliding. Then covers various concepts of effect decomposition that exist for mediation, such as natural (in)direct effects and controlled (in)direct effects. Afterwards, methods for estimating these concepts will be discussed, including a brief comparison with more traditional approaches (e.g. KHB, Baron & Kenny, Oaxaca-Blinder). We will look at mediation both in cross-sectional and in longitudinal settings, and at settings with multiple mediators. The course will not focus on analytical methods or solutions for mediation. Instead, we will focus on estimation methods that use empirical data in conjunction with simulation; these provide both a more intuitive understanding of the process under investigation, and a more flexible approach to mediation. This approach provides a general solution to many mediation settings, including settings with mixed variable types, i.e. where variables can be continuous, categorical, count, and/or binary. To this end, it is important that participants have basic knowledge of the R-programming language. However, references to various STATA packages, offering similar (but less flexible) solutions, will be made. The course ends with the topic of mediation analysis in dynamic longitudinal settings using the parametric g-formula.

Simple Mediation

A simple mediation DAG showing exposure X, mediator M, and outcome Y.


Dynamic Mediation

A DAG showing a more complex set of cross-lagged relationships between time dependent variables. When relationships are non-linear (e.g. logistic or Poisson), one would like to use the parametric G-formula to model this system.