Causal Discovery with DAGitty and Simulated Data
causal-discovery.Rmd🧭 Causal Discovery: Learning Structure from Data
Causal discovery aims to infer the causal structure (i.e. DAG) from observational data. This is useful when domain knowledge is limited or exploratory analysis is needed.
1. ✨ Simulate Data with Known DAG
set.seed(123)
n <- 1000
X <- rnorm(n)
Z <- 0.6 * X + rnorm(n)
Y <- 0.5 * Z + 0.3 * X + rnorm(n)
df <- data.frame(X, Z, Y)
head(df)## X Z Y
## 1 -0.56047565 -1.3320841 -1.3457885
## 2 -0.23017749 -1.1780615 -0.4211461
## 3 1.55870831 0.9172447 0.3846457
## 4 0.07050839 -0.0898701 1.1954451
## 5 0.12928774 -2.4717701 -1.0229629
## 6 1.71506499 2.0696124 0.9340574
2. 🧠 Define the True DAG
true_dag <- dagitty("dag {
X -> Z -> Y
X -> Y
}")
ggdag(true_dag, layout = "circle")
ggtitle("True DAG Used to Generate Data")## $title
## [1] "True DAG Used to Generate Data"
##
## attr(,"class")
## [1] "labels"
3. 🔍 Use localTests() to Test Dependencies
We use dagitty::localTests() to test conditional
independence statements implied by a candidate DAG:
test_result <- localTests(true_dag, data = df, type = "cis")
test_result## data frame with 0 columns and 0 rows
Each row checks whether a conditional independence assumption holds (p > 0.05 supports the claim).
4. 🧪 Try an Incorrect DAG
wrong_dag <- dagitty("dag {
Z -> X
Z -> Y
X -> Y
}")
ggdag(wrong_dag, layout = "circle")
ggtitle("Incorrect DAG Assumption")## $title
## [1] "Incorrect DAG Assumption"
##
## attr(,"class")
## [1] "labels"
Now test:
localTests(wrong_dag, data = df, type = "cis")## data frame with 0 columns and 0 rows
Look for failed assumptions (very low p-values).
🧠 Assumptions for Causal Discovery
- Causal Sufficiency: All common causes are measured
- Faithfulness: Statistical independence reflects causal structure
- No Measurement Error