End-to-End MLOps Pipeline in R (UCI Iris Dataset)

Overview

This project implements an end-to-end MLOps pipeline in R using the UCI Iris Dataset. The pipeline covers data preprocessing, model training, evaluation, deployment, and validation using reproducible and industry-standard practices.

What Has Been Implemented

Dataset

UCI Iris Dataset
150 samples
4 numerical features
3 target classes

Feature Engineering

The following engineered features were created: - petal_to_sepal_length - petal_to_sepal_width - sepal_ratio - Original petal features

The same feature pipeline is used during training and inference.

Model Training & Selection

Models trained:
- Logistic Regression
- Random Forest
- Decision Tree
- Naive Bayes
Models compared using Recall
Best model selected automatically
Artifacts saved:
- model.rds
- scaler.rds
- feature_names.rds

Reproducibility

Fixed random seed
Same input → same output
Deterministic pipeline

Deployment

Model deployed using Plumber REST API
Endpoints:
- /health
- /predict
- /predict-csv
Swagger UI available for testing
Input validation and safe error handling implemented

Containerization

Dockerfile created
API can be run inside a Docker container
Environment fully reproducible

How to Run the Project

1. Train the Model

From RStudio (project root):

```r source(“src/train.R”)

Overview

What Has Been Implemented

Dataset

UCI Iris Dataset
150 samples
4 numerical features
3 target classes

Feature Engineering

The following engineered features were created: - petal_to_sepal_length - petal_to_sepal_width - sepal_ratio - Original petal features

The same feature pipeline is used during training and inference.

Model Training & Selection

Models trained:
- Logistic Regression
- Random Forest
- Decision Tree
- Naive Bayes
Models compared using Recall
Best model selected automatically
Artifacts saved:
- model.rds
- scaler.rds
- feature_names.rds

Reproducibility

Fixed random seed
Same input → same output
Deterministic pipeline

Deployment

Model deployed using Plumber REST API
Endpoints:
- /health
- /predict
- /predict-csv
Swagger UI available for testing
Input validation and safe error handling implemented

Containerization

Dockerfile created
API can be run inside a Docker container
Environment fully reproducible

How to Run the Project

1. Train the Model

From RStudio (project root):

```r source(“src/train.R”)

This generates trained artifacts in the artifacts/ directory.

Run the API library(plumber) pr <- plumb(“api/plumber.R”) pr$run(port = 7860)

Open in browser:

http://localhost:7860/__docs__/

Run with Docker (Optional) docker build -t iris-r-mlops . docker run -p 7860:7860 iris-r-mlops

End-to-End MLOps Pipeline in R (UCI Iris Dataset)

Shruti Thakkar

Overview

What Has Been Implemented

Dataset

Feature Engineering

Model Training & Selection

Reproducibility

Deployment

Containerization

How to Run the Project

1. Train the Model

Overview

What Has Been Implemented

Dataset

Feature Engineering

Model Training & Selection

Reproducibility

Deployment

Containerization

How to Run the Project

1. Train the Model