How do we determine that certain products or events cause cancer (or other conditions/ diseases) when the effects are latent and dispersed across a population?


For context, I was reading about the J&J powder settlement recently. The suits allege J&J’s talc products were contaminated with the carcinogen asbestos, which caused ovarian cancer in thousands of individuals. When there are so many things that can potentially cause cancer, how do scientists gather data and establish trends? More importantly, how do they know x number of people used y product? It seems there are an incredible number of variables to consider.

In: 2

The Rubin Causal Model provides a formal framework for understanding causality in observational studies, where randomized controlled trials (RCTs) may not be feasible or ethical. It addresses the fundamental challenge of establishing causation in observational data by introducing the concept of potential outcomes.

In this framework, each unit or individual in the study population has two potential outcomes: one potential outcome under the treatment or exposure condition (denoted as Y(1)) and one potential outcome under the non-treatment or control condition (denoted as Y(0)). However, due to the fundamental principle of causality, each unit can only experience one of these potential outcomes, depending on whether they receive the treatment or not.

The key assumption in the Rubin Causal Model is the “ignorability assumption” or “unconfoundedness,” which states that, conditional on observed variables, the treatment assignment is independent of potential outcomes. This assumption ensures that the treatment and control groups are comparable in terms of observed variables, allowing for causal inference.

The Rubin Causal Model provides a framework for estimating the average treatment effect (ATE) or causal effect by comparing the average difference between the potential outcomes under treatment (Y(1)) and non-treatment (Y(0)). Various statistical methods, such as matching, regression, and propensity score analysis, can be employed within this framework to estimate causal effects and assess their uncertainty.

Overall, the Rubin Causal Model is a foundational framework in causal inference that allows researchers to make causal claims based on observational data by carefully considering potential outcomes and accounting for confounding variables.

But yes, as you mention the number of confounding variables can be quite large and the unconfoundedness assumption is quite a strong one indeed. One method to get around the issue of unmeasured confounders is by using so-called instrumental variables, but that’s a whole topic and can of worms in itself.

When it comes to data collection, the fact is that many countries have multiple high quality registers with demographic, socioeconomic, environmental and medical information. These registers and databases can be accessed and cross referenced by researchers in order to provide insights about a number of phenomena.