Ensures all required features are present and conform to expected data types
YAML-based schema with 30 features (URL & website attributes) + target
Uses Kolmogorov-Smirnov test to detect distribution shifts between train & test data
Sample report showing p-values and drift status for features
Prevents garbage-in, garbage-out by ensuring all data meets expected structure before training
Identifies distribution shifts that could impact model performance before deployment
Creates audit trail of validation reports for tracing and reproducing model performance