Overview
This project implements an end-to-end MLOps pipeline in
R using the UCI Iris Dataset. The pipeline
covers data preprocessing, model training, evaluation, deployment, and
validation using reproducible and industry-standard practices.
What Has Been Implemented
Dataset
- UCI Iris Dataset
- 150 samples
- 4 numerical features
- 3 target classes
Feature Engineering
The following engineered features were created: -
petal_to_sepal_length - petal_to_sepal_width -
sepal_ratio - Original petal features
The same feature pipeline is used during training and
inference.
Model Training & Selection
- Models trained:
- Logistic Regression
- Random Forest
- Decision Tree
- Naive Bayes
- Models compared using Recall
- Best model selected automatically
- Artifacts saved:
model.rds
scaler.rds
feature_names.rds
Reproducibility
- Fixed random seed
- Same input → same output
- Deterministic pipeline
Deployment
- Model deployed using Plumber REST API
- Endpoints:
/health
/predict
/predict-csv
- Swagger UI available for testing
- Input validation and safe error handling implemented
Containerization
- Dockerfile created
- API can be run inside a Docker container
- Environment fully reproducible
How to Run the Project
1. Train the Model
From RStudio (project root):
```r source(“src/train.R”)
Overview
This project implements an end-to-end MLOps pipeline in
R using the UCI Iris Dataset. The pipeline
covers data preprocessing, model training, evaluation, deployment, and
validation using reproducible and industry-standard practices.
What Has Been Implemented
Dataset
- UCI Iris Dataset
- 150 samples
- 4 numerical features
- 3 target classes
Feature Engineering
The following engineered features were created: -
petal_to_sepal_length - petal_to_sepal_width -
sepal_ratio - Original petal features
The same feature pipeline is used during training and
inference.
Model Training & Selection
- Models trained:
- Logistic Regression
- Random Forest
- Decision Tree
- Naive Bayes
- Models compared using Recall
- Best model selected automatically
- Artifacts saved:
model.rds
scaler.rds
feature_names.rds
Reproducibility
- Fixed random seed
- Same input → same output
- Deterministic pipeline
Deployment
- Model deployed using Plumber REST API
- Endpoints:
/health
/predict
/predict-csv
- Swagger UI available for testing
- Input validation and safe error handling implemented
Containerization
- Dockerfile created
- API can be run inside a Docker container
- Environment fully reproducible
How to Run the Project
1. Train the Model
From RStudio (project root):
```r source(“src/train.R”)
This generates trained artifacts in the artifacts/ directory.
- Run the API library(plumber) pr <- plumb(“api/plumber.R”)
pr$run(port = 7860)
Open in browser:
http://localhost:7860/__docs__/
- Run with Docker (Optional) docker build -t iris-r-mlops . docker run
-p 7860:7860 iris-r-mlops