PROJECT ARCHITECTURE & OVERVIEW

URL & website features captured from network traffic

MongoDB CSV S3

Ingestion, validation & transformation

KNNImputer Schema Validation Drift Detection

Version control & experiment tracking

MLflow DagsHub F1/Precision/Recall

Multiple classifiers with hyperparameter tuning

scikit-learn GridSearchCV RandomForest

Automated testing & deployment

GitHub Actions Docker AWS ECR

Real-time & batch prediction

FastAPI model.pkl preprocessor.pkl

Key Architecture Features

Independent pipeline stages for ingestion, validation, transformation, training, and deployment

End-to-end automation from data ingestion to model deployment with CI/CD pipeline

Custom exception hierarchy and logging framework for traceability and stability

Slide 5/12