Causal Discovery with DAGitty and Simulated Data

🧭 Causal Discovery: Learning Structure from Data

Causal discovery aims to infer the causal structure (i.e. DAG) from observational data. This is useful when domain knowledge is limited or exploratory analysis is needed.

1. ✨ Simulate Data with Known DAG

set.seed(123)
n <- 1000

X <- rnorm(n)
Z <- 0.6 * X + rnorm(n)
Y <- 0.5 * Z + 0.3 * X + rnorm(n)

df <- data.frame(X, Z, Y)
head(df)

##             X          Z          Y
## 1 -0.56047565 -1.3320841 -1.3457885
## 2 -0.23017749 -1.1780615 -0.4211461
## 3  1.55870831  0.9172447  0.3846457
## 4  0.07050839 -0.0898701  1.1954451
## 5  0.12928774 -2.4717701 -1.0229629
## 6  1.71506499  2.0696124  0.9340574

2. 🧠 Define the True DAG

true_dag <- dagitty("dag {
  X -> Z -> Y
  X -> Y
}")

ggdag(true_dag, layout = "circle")

  ggtitle("True DAG Used to Generate Data")

## $title
## [1] "True DAG Used to Generate Data"
## 
## attr(,"class")
## [1] "labels"

3. 🔍 Use `localTests()` to Test Dependencies

We use dagitty::localTests() to test conditional independence statements implied by a candidate DAG:

test_result <- localTests(true_dag, data = df, type = "cis")
test_result

## data frame with 0 columns and 0 rows

Each row checks whether a conditional independence assumption holds (p > 0.05 supports the claim).

4. 🧪 Try an Incorrect DAG

wrong_dag <- dagitty("dag {
  Z -> X
  Z -> Y
  X -> Y
}")

ggdag(wrong_dag, layout = "circle")

  ggtitle("Incorrect DAG Assumption")

## $title
## [1] "Incorrect DAG Assumption"
## 
## attr(,"class")
## [1] "labels"

Now test:

localTests(wrong_dag, data = df, type = "cis")

## data frame with 0 columns and 0 rows

Look for failed assumptions (very low p-values).

🧠 Assumptions for Causal Discovery

Causal Sufficiency: All common causes are measured
Faithfulness: Statistical independence reflects causal structure
No Measurement Error

✅ Summary

Causal discovery lets us: - Explore structures with little prior knowledge - Validate assumptions - Compare competing causal hypotheses

dagitty provides a simple entry point. For algorithmic discovery, explore pcalg, bnlearn, or cdcs.

📖 References

Textor et al. (2016). dagitty: Graphical Analysis of Structural Causal Models
Spirtes, Glymour, Scheines (2000). Causation, Prediction, and Search