The Invisible Trap: Why Your HR Data is Lying to You About "Success" (and How to See the Real Story)
- Maria Alice Maia

- Jul 22, 2024
- 3 min read
I’ve built People Analytics functions from the ground up, designed strategies for massive workforces, and learned one undeniable truth: data, when misunderstood, is a silent killer of good intentions. Especially in HR. We celebrate programs, attribute success, and invest millions based on numbers that, all too often, are completely skewed. This isn’t just bad data; it’s a failure to understand the fundamental problem of Confounding and Selection Bias.
My entire career, from Itaú and Ambev to my own ventures and my current research at Oxford and FGV, has been about exposing these hidden traps. Because this knowledge? It’s not mine to keep. It's too critical for businesses to operate on false premises.
The "Doing Data Wrong" Scenario: The Voluntary Training Illusion
Let's dissect a scenario I’ve seen countless times in HR Departments. A company launches a fantastic new, voluntaryleadership training program. Six months later, the HR team proudly presents data: "Employees who completed the training show a 20% higher performance rating and 15% lower turnover compared to those who didn't!" The executive team is thrilled. More budget for training! This program is a massive success!
The Wrong Way: Naive Attribution (The Selection Bias Blind Spot)
This is the classic case of "doing data wrong" driven by selection bias. The HR department is looking at observed differences in means and naively attributing the outcome (higher performance, lower turnover) solely to the training program. What they're missing is the fundamental principle of causal inference as a missing data problem. We only see one outcome for each individual (trained or not), but we don't see the counterfactual – what would have happened to the trained employees if they hadn't received the training.
The core issue here is selection bias. Who self-selects into a voluntary training program? Typically, it's your most motivated, already high-performing, and engaged employees. These individuals often have better baseline performance and are less likely to leave the company
regardless of the training. The "training" isn't causing their success; their pre-existing motivation and drive are confounding the results – they're influencing both their decision to take the training and their higher performance, creating a spurious correlation.
So, the observed 20% higher performance isn't purely the "effect of the causal variable on outcome" (the training); it's a mix of the true training effect and the pre-existing differences between the groups. This means you’re investing in a program based on misleading data.
The Right Way: Isolating True Impact with Selection on Observables
Moving beyond this trap requires rigor and a deep understanding of selection on observables. This isn't about magical solutions; it's about making defensible assumptions and applying methodologies to isolate the true causal effect.
For HR, this means:
Tech Professionals (People Analytics, Data Scientists): Your role is to identify and control for pre-treatment covariates. These are the characteristics of employees before they opted into the training that might influence both their decision to join and their performance (e.g., prior performance ratings, tenure, department, manager quality, engagement survey scores, education level). By conditionally ignoring the treatment assignment based on these observed factors, you can create statistically comparable groups. Techniques like matching or regression (which we'll explore in future posts) help simulate a randomized experiment from observational data. Your goal is to estimate the Average Treatment Effect on the Treated (ATT) , or even the Average Treatment Effect (ATE), giving a clearer picture of the training's true impact. You're not just running reports; you're designing studies that illuminate causation.
Managers (HR Leaders, Business Partners): You must demand more than just "before-and-after" comparisons. Ask:
"Are we accounting for who chose to participate in this program? What are their pre-existing characteristics?"
"What are the alternative explanations for the observed outcomes?"
"How are we statistically controlling for these differences to isolate the program's true impact?"
"Can we confidently say this program caused the improvement, or is it merely associated with pre-existing high performers?"
Push your analytics teams to apply methods that address selection bias, turning mere correlation into actionable, causally sound insights. This isn't about blaming anyone; it's about ensuring your investments in human capital yield measurable, real returns.
By implementing methods rooted in selection on observables, HR departments can move from celebrating misleading correlations to confidently identifying programs that genuinely increase employee productivity, decrease costly turnover, and foster a truly high-performing culture. This is how you transform HR from a cost center into a strategic driver of value.
The time for guesswork is over. Let's fix our data practices and unlock the immense, often untapped, potential within our organizations. This knowledge is too important to keep quiet.
Want more no-nonsense, research-backed insights on unlocking real data value and fixing broken data practices? Subscribe to our mailing list!
Have a specific "doing data wrong" challenge holding your team back? Let's talk. Schedule a no-nonsense, 20-minute consultation call!


