PROJECT ARCHITECTURE & OVERVIEW

Data Sources

URL & website features captured from network traffic

MongoDB CSV S3

Data Pipeline

Ingestion, validation & transformation

KNNImputer Schema Validation Drift Detection

Model Tracking

Version control & experiment tracking

MLflow DagsHub F1/Precision/Recall

Model Training

Multiple classifiers with hyperparameter tuning

scikit-learn GridSearchCV RandomForest

CI/CD Pipeline

Automated testing & deployment

GitHub Actions Docker AWS ECR

Inference Service

Real-time & batch prediction

FastAPI model.pkl preprocessor.pkl

Key Architecture Features

Modular Components

Independent pipeline stages for ingestion, validation, transformation, training, and deployment

Automated Workflow

End-to-end automation from data ingestion to model deployment with CI/CD pipeline

Robust Error Handling

Custom exception hierarchy and logging framework for traceability and stability

Prev
Slide 5/12
Next