Data Source
MongoDB Collection
Data Extraction
Export as DataFrame
Schema Validation
Against schema.yaml
Train/Test Split
80/20 Ratio
Enforces strict schema compliance using schema.yaml, ensuring data quality and consistency
Automatically syncs processed data to AWS S3 for persistence and downstream access
Uses Kolmogorov-Smirnov test to detect and report data drift between train and test sets
Custom NetworkSecurityException class for robust error tracking and handling