top of page

Be the first to know

Leave your e-mail to receive our weekly newsletter and access Ask-Me-Anything sessions exclusive to our subscribers.

Core Concept: Forbidden Projection - A Critical Pitfall in Covariate Adjustment

  • Writer: Maria Alice Maia
    Maria Alice Maia
  • Nov 25, 2024
  • 2 min read

To measure the ROI of your new leadership training, you controlled for employee morale. It seems smart. It seems rigorous.


But what if I told you that by adding that one "control," you likely made the program look like a failure?


This is one of the most dangerous and counter-intuitive pitfalls in data analysis. We are so focused on the fear of omitted variable bias that we fall into the opposite trap: controlling for a consequence of our own intervention.


In the language of causal inference, this is an error so critical that it has a name worthy of its danger: adjusting for a Forbidden variable.


The idea is simple but profound. When we want to measure the total causal effect of a Treatment (A) on an Outcome (Y), we must never control for a variable that lies on the causal path between them.


Let's use a real-world HR Department example.

ree

The Wrong Way (Controlling for a Consequence): An HR team wants to know if their new leadership training (Treatment) increases team productivity (Outcome). The analyst wisely recognizes that many factors could confound this, but they also think, "Higher morale also improves productivity, so I should control for that to get a clean estimate." They measure morale after the training and include it in their regression model.


The result? The model shows the training has almost no effect on productivity. The program is deemed a failure.


This is a catastrophic misinterpretation. The training program itself likely caused an increase in employee morale, which in turn helped boost productivity. The causal chain is:

Training → Improved Morale → Increased Productivity


"Morale" is a mediator. It's one of the key channels through which the program works. By "controlling" for morale, the analyst statistically blocked this channel. They essentially asked the model, "What is the effect of the training, holding its positive impact on morale constant?" They have forbidden a key part of the program's success from the analysis.


The Right Way (Respecting the Causal Path): The right way is to understand your causal assumptions before you run the model. By drawing a simple causal map (a DAG), you would immediately see that morale is a descendant of the training program. It is a "forbidden" variable to control for when estimating the total effect.


The correct analysis would measure the effect of Training on Productivity without including post-treatment Morale in the model. This allows you to capture the full effect of the program, including all the downstream pathways it influences.


When I led the People Analytics function at Ambev, this was a constant point of discipline. We had to be rigorous in measuring the total impact of our initiatives, not accidentally erase their benefits by controlling for the very mechanisms that made them successful.


My mission is to bring this level of strategic, causal thinking out of dense academic papers and onto the whiteboards of every business leader and data analyst. This knowledge isn't mine to keep.


If you’re ready to stop making these subtle but costly mistakes and want to join a movement dedicated to a more rigorous application of data science, subscribe to my email list.


Stay Ahead of the Curve

Leave your e-mail to receive our weekly newsletter and access Ask-Me-Anything sessions exclusive to our subscribers.

If you prefer to discuss a specific, real world challenge, schedule a 20-minutes consultation call with Maria Alice or one of her business partners.

Looking for Insights on a Specific Topic?

You can navigate between categories on the top of the page, go to the Insights page to see all articles and navigate across all pages, or use the box below to look for your topic of interest.

bottom of page