
You can access our final project report, slides, and github repo here. :)
In collaboration with Rajvi Jasani and Yiqiao Zhu, we conducted a comprehensive causal analysis of how debt-to-income ratio (DTI) affects loan default probability using a 2019 bank loan dataset (N=148,670). Our analysis tackled two major challenges: severe Missing Not At Random (MNAR) patterns in DTI data (44.5% missing among defaults vs. 7.0% among non-defaults) and endogeneity due to unobserved borrower characteristics.
We employed Multiple Imputation by Chained Equations (MICE) combined with Two-Stage Least Squares (2SLS) instrumental variables estimation, using pre-approval status as our instrument (first-stage F-statistic = 18.95). Our findings revealed a striking sign reversal: while naive OLS showed a small positive association (\(\beta\)= 0.002), the causal IV estimate demonstrated that a one percentage point increase in DTI actually reduces default probability by 14.3 percentage points (\(\beta\) = -0.143, p < 0.001) for the complier population. This counterintuitive result reflects that pre-approved borrowers undergo stricter institutional screening, indicating superior creditworthiness despite mechanically higher approved loan amounts. Our analysis challenges conventional underwriting wisdom and underscores the importance of comprehensive screening over mechanical DTI cutoffs alone.