Introduction to CausalInvestData
intro.RmdOverview
CausalInvestData provides simulated datasets for causal
inference in institutional investment management. The package includes
four core datasets designed to reflect real-world structures, enabling
users to prototype, teach, and evaluate methods such as propensity score
matching, causal forests, and impact analysis.
Dataset: fund_performance
## fund_id market_return alpha beta treatment return
## 1 1 0.003952435 -0.009915974 0.9217857 0 -0.007775751
## 2 2 0.036982251 -0.010799101 1.1331275 1 0.032828934
## 3 3 0.215870831 0.009640395 1.0374590 1 0.224115880
## 4 4 0.067050839 0.007356497 1.1228787 1 0.080673608
## 5 5 0.072928774 -0.040986855 0.9176203 0 0.051918971
## 6 6 0.231506499 0.030811469 0.8564341 0 0.228707376
Propensity Score Matching Example
## Warning: package 'MatchIt' was built under R version 4.3.3
##
## Call:
## matchit(formula = treatment ~ market_return + alpha + beta, data = fund_performance)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.4942 0.4919 0.0961 0.9914 0.0250
## market_return 0.0623 0.0610 0.0134 0.9360 0.0079
## alpha 0.0102 0.0115 -0.0687 0.9442 0.0194
## beta 0.9919 0.9993 -0.0654 0.9965 0.0186
## eCDF Max
## distance 0.0725
## market_return 0.0232
## alpha 0.0512
## beta 0.0413
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.4942 0.4920 0.0887 1.0413 0.0230
## market_return 0.0623 0.0602 0.0208 0.9314 0.0088
## alpha 0.0102 0.0113 -0.0568 0.9830 0.0167
## beta 0.9919 0.9992 -0.0650 1.0060 0.0185
## eCDF Max Std. Pair Dist.
## distance 0.0730 0.0927
## market_return 0.0243 1.1393
## alpha 0.0467 0.8110
## beta 0.0426 0.8197
##
## Sample Sizes:
## Control Treated
## All 507 493
## Matched 493 493
## Unmatched 14 0
## Discarded 0 0
Dataset: portfolio_allocations
## portfolio_id risk_level equity_allocation treatment return
## 1 1 Low 0.4080645 1 0.06659800
## 2 2 High 0.7138592 1 0.09565076
## 3 3 Medium 0.7284506 1 0.07306955
## 4 4 High 0.2891309 1 0.09795321
## 5 5 Low 0.8376339 0 0.10054662
## 6 6 Medium 0.4115289 1 0.08804457
## bond_allocation
## 1 0.5919355
## 2 0.2861408
## 3 0.2715494
## 4 0.7108691
## 5 0.1623661
## 6 0.5884711
Dataset: client_behavior
## client_id age income satisfaction_score treatment churned
## 1 1 54 65785.65 4.898391 0 0
## 2 2 66 56907.43 2.481101 1 1
## 3 3 32 57223.12 2.072351 0 0
## 4 4 48 49584.93 6.903271 0 0
## 5 5 75 46669.48 3.650145 0 0
## 6 6 33 52759.41 1.038131 1 0
Dataset: macro_shocks
## date interest_rate gdp_growth market_index
## 1 2020-01-01 0.04858548 0.02098241 0.03418012
## 2 2020-02-01 0.03740164 0.02505696 0.05356738
## 3 2020-03-01 0.04924685 0.02460699 0.04155380
## 4 2020-04-01 0.05109889 0.01260686 0.01707771
## 5 2020-05-01 0.02392792 0.02387595 0.02444632
## 6 2020-06-01 0.06718417 0.01306215 -0.04220713
Summary
This package is ideal for:
- Financial data scientists building causal ML pipelines
- Academics teaching causal inference methods
- Practitioners evaluating financial interventions
To cite the package, run:
citation("CausalInvestData")## To cite the CausalInvestData package in publications, use:
##
## Conilias Zvobwo E (2025). _CausalInvestData: Simulated Datasets for
## Causal Inference in Investment Management_. R package version 0.1.0,
## <https://github.com/edzai/CausalInvestData>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {CausalInvestData: Simulated Datasets for Causal Inference in Investment Management},
## author = {Edzai {Conilias Zvobwo}},
## year = {2025},
## note = {R package version 0.1.0},
## url = {https://github.com/edzai/CausalInvestData},
## }