Explainable Machine Learning Framework

for Early Detection of Loan Delinquency in Banking
M.Tech Project Presentation
SD
Sanidhya Dash
M25AI1036 | M.Tech (AI)
Project Focus
Early Risk Detection & XAI
Faculty Guide
Dr. Binod Kumar
Key Technologies
Indian Institute of Technology, Jodhpur
Problem Statement
Loan delinquency challenges in the banking sector
M.Tech Project
AI & ML Framework
Operational Pain Points
  • NPAs & Financial Losses: Delinquency leads to increased Non-Performing Assets and significant revenue erosion.
  • Rigid Rule-Based Systems: Static systems fail to adapt to evolving customer behavior and complex financial patterns.
  • Black-Box ML Models: Significant lack of explainability hinders trust and prevents effective manual overrides.
  • Regulatory Compliance: Compliance standards mandate transparent, fair, and fully interpretable AI decision systems.
  • Early Warning System: Absence of proactive identifiers to detect high-risk customers before actual default occurs.
15%
Avg. NPAs
$2.5T
Global Losses
30%
Default Risk
Technical Challenges
Imbalanced Data: Extreme scarcity of default cases compared to non-default transactions.
Data Leakage: Managing overlapping timeline features that bias model training results.
Trade-offs: Balancing high predictive accuracy with granular feature interpretability.
Temporal Analysis: Capturing shifting financial behaviors over extended sequences.
Proposed Solution Architecture
Phase 1
XGBoost + SHAP Explainer
Phase 2
LSTM Temporal Modeling
Ensemble
Hybrid Decision Framework
Objectives of the Project
Key goals and deliverables for the loan delinquency detection system
🤖
ML Models
Build predictive ML models for delinquency detection using advanced algorithms
🔍
Explainability
Integrate SHAP and LIME techniques for model interpretability
⚖️
Fairness
Evaluate fairness and bias in predictions across different groups
🔄
Hybrid Framework
Develop hybrid XGBoost + LSTM framework for enhanced accuracy
📊
Interactive Dashboard
Build interactive banking dashboard for risk monitoring
Key Features
XGBoost & LSTM Integration
Combined for superior performance
SHAP Explainability
Transparent AI predictions
Real-time Monitoring
Live risk assessment

Dataset and Feature Engineering

Synthetic banking loan records with 55 original features and 11 engineered features
5,000 records
Dataset Details
5,000
Total Records
55 → 66
Features
725
Delinquent
4,275
Non-Delinquent
Delinquent: 725 (14.5%)
Non-Delinquent: 4,275 (85.5%)
Important Features Used
Annual Income
Primary income
Debt-to-Income
Ratio calculation
Loan Amount
Principal borrowed
Interest Rate
Annual percentage
Credit Utilization
Usage ratio
Delinquency Hist.
Past payments
Legal Issues
Bankruptcy/Tax
Installment Load
Payment ratio
Recent Inquiries
Credit checks
Feature Engineering
EMI to Income Ratio
Added
Credit Utilization Ratio
Added
Open Credit Ratio
Added
Delinquency Score
Added
Bankruptcy/Tax Issue
Added
Installment Burden
Added
Target Distribution
Delinquent14.5%
Non-Delinquent85.5%

Phase-1 Architecture and Model

XGBoost-based risk prediction model with SHAP explainability
Pipeline Status:
Active
Phase-1 Pipeline Flow
Data Ingestion
Cleaning
Leakage Removal
Feature Eng.
XGBoost
SHAP
Dashboard
Leakage Columns Removed
Balance
Paid Total
Principal
Interest
Late Fees
Why Leakage Removal Matters
Prevents unrealistic model performance (data leakage) and ensures fair, real-world prediction capability by removing post-repayment information that wouldn't be available at the time of inference.
Model Performance
Accuracy
93.9%
Recall
90.3%
Precision
45.2%
AUC
91.6%
Model Trained Successfully
Key Features
SHAP Explainability
Feature Importance Rankings
Real-time Monitoring Integration

Phase-1 Results - XGBoost Model Performance

High-performing risk prediction model with 93.9% accuracy and 91.6% AUC
Model Status:
Trained Successfully
Performance Metrics
93.9%
Accuracy
+2.1%
90.3%
Recall
+1.5%
45.2%
Precision
+3.2%
60.2%
F1-Score
+2.8%
91.6%
AUC
+1.9%
Model Interpretation
The XGBoost model demonstrates strong delinquency detection capability with high recall (90.3%) ensuring most risky customers are identified. The high AUC (91.6%) indicates excellent separation between risky and safe customers.
Strong delinquency detection: High recall means most risky customers are detected
High AUC score: Strong separation between risky and safe customers
Balanced performance: Good balance between precision and recall
Model Details
XGBoost Classifier
Gradient boosting with 100 estimators
Training Time2.3 minutes
Features Used66
Validation Split80/20
Class Distribution
Delinquent14.5%
Non-Delinquent85.5%
Explainability and Trusted AI
SHAP-based interpretability for transparent risk predictions
Phase 1
XGBoost + SHAP
Explainability Techniques Used
SHAP Global
Model-wide feature importance analysis
SHAP Local
Individual prediction explanations
Feature Importance
Visual ranking of key features
Customer Risk
Personalized risk explanations
Value Addition
Transparency
95%
Trust
92%
Compliance
98%
Accuracy
94%
Top Features Driving Delinquency
  • Debt-to-Income Ratio - Primary risk indicator
  • Credit Utilization Ratio - Current usage
  • Delinquency Score - Historical behavior
  • Installment Burden - Monthly capacity
  • Historical Failed Payments - Failure count
Key Benefits
Transparent
AI predictions explained
Trust
Banking professionals
Compliance
Regulatory ready

Phase-2 Sequential Model

LSTM-based temporal modeling for capturing evolving customer behavior
6-month sequence
Why Phase-2 Was Needed
Behavior Changes
Customer risk patterns evolve dynamically
Evolving Risk
Static models miss temporal trends
Sequential Analysis
Repayment behavior tracking needed
Model Type
LSTM
Sequence
6 months
Features
12
Hidden Units
128
LSTM Architecture
Input
LSTM Layer 1
LSTM Layer 2
Output
Captures temporal dependencies in customer behavior over 6-month periods, analyzing sequential patterns in repayment history and credit utilization.
Sequence Features
Delinquency Score
Risk indicator
Credit Utilization
Usage ratio
Interest Rate
Annual rate
Recent Inquiry
Credit checks
Failed Payments
Historical data
Payment Amount
Monthly payment
Key Metrics
Sequence Length6 months
Input Features12

Phase-2 Results and Hybrid Framework

LSTM model performance and hybrid risk scoring system
Model Status:
Trained Successfully
LSTM Model Performance
47.3%
Accuracy
+5.2%
62.1%
Recall
+8.3%
16.0%
Precision
+2.1%
25.5%
F1-Score
+4.5%
53.5%
AUC
+6.2%
Hybrid Framework Formula
Final Hybrid Risk Score = 0.7 × XGBoost Risk + 0.3 × LSTM Risk
High Accuracy:
XGBoost provides 93.9%
Temporal Insights:
LSTM captures behavior
Balanced Scoring:
70/30 optimal weight
Hybrid Model Benefits
XGBoost Weight70%
LSTM Weight30%
Combined AUC91.6%
Combined Recall90.3%
Risk Distribution
High Risk15%
Medium Risk25%
Low Risk60%

Dashboard and Impact

Interactive banking dashboard for risk prediction and monitoring
System Status:
Active
Dashboard Features
Upload banking dataset
Predict delinquency risk
XGBoost, LSTM, Hybrid scores
SHAP explainability plots
Early warning alerts
Risk segmentation
Impact of the Project
Reduces NPAs
Improves loan recovery rates
Early Intervention
Prevents defaults proactively
Transparency
Enhances trust in AI decisions