Double Machine Learning (DML) for Causal Inference
double-machine-learning.Rmd๐ค Double Machine Learning (DML)
Double Machine Learning (DML) uses flexible machine learning models to estimate nuisance parameters (e.g., propensity scores, outcome models) and removes their bias via orthogonalization, enabling valid causal inference.
1. ๐งช Simulate Data with Confounding
set.seed(123)
n <- 1000
X1 <- rnorm(n)
X2 <- rnorm(n)
W <- rbinom(n, 1, prob = plogis(0.5 * X1 + 0.5 * X2))
Y <- 2 * W + 1.5 * X1 + 1.0 * X2 + rnorm(n)
df <- data.frame(X1, X2, W, Y)2. ๐ DML Steps
(a) Split into folds
folds <- createFolds(df$Y, k = 2)
train1 <- df[folds[[1]], ]
train2 <- df[folds[[2]], ](b) Estimate nuisance functions
# Outcome model: Y ~ X1 + X2 (train2 on train1)
y_model <- cv.glmnet(as.matrix(train1[, c("X1", "X2")]), train1$Y)
m_Y2 <- predict(y_model, as.matrix(train2[, c("X1", "X2")]), s = "lambda.min")
# Treatment model: W ~ X1 + X2
w_model <- cv.glmnet(as.matrix(train1[, c("X1", "X2")]), train1$W, family = "binomial")
m_W2 <- predict(w_model, as.matrix(train2[, c("X1", "X2")]), s = "lambda.min", type = "response")