Overview

This project implements an end-to-end MLOps pipeline in R using the UCI Iris Dataset. The pipeline covers data preprocessing, model training, evaluation, deployment, and validation using reproducible and industry-standard practices.


What Has Been Implemented

Dataset

  • UCI Iris Dataset
  • 150 samples
  • 4 numerical features
  • 3 target classes

Feature Engineering

The following engineered features were created: - petal_to_sepal_length - petal_to_sepal_width - sepal_ratio - Original petal features

The same feature pipeline is used during training and inference.


Model Training & Selection

  • Models trained:
    • Logistic Regression
    • Random Forest
    • Decision Tree
    • Naive Bayes
  • Models compared using Recall
  • Best model selected automatically
  • Artifacts saved:
    • model.rds
    • scaler.rds
    • feature_names.rds

Reproducibility

  • Fixed random seed
  • Same input → same output
  • Deterministic pipeline

Deployment

  • Model deployed using Plumber REST API
  • Endpoints:
    • /health
    • /predict
    • /predict-csv
  • Swagger UI available for testing
  • Input validation and safe error handling implemented

Containerization

  • Dockerfile created
  • API can be run inside a Docker container
  • Environment fully reproducible

How to Run the Project

1. Train the Model

From RStudio (project root):

```r source(“src/train.R”)

Overview

This project implements an end-to-end MLOps pipeline in R using the UCI Iris Dataset. The pipeline covers data preprocessing, model training, evaluation, deployment, and validation using reproducible and industry-standard practices.


What Has Been Implemented

Dataset

  • UCI Iris Dataset
  • 150 samples
  • 4 numerical features
  • 3 target classes

Feature Engineering

The following engineered features were created: - petal_to_sepal_length - petal_to_sepal_width - sepal_ratio - Original petal features

The same feature pipeline is used during training and inference.


Model Training & Selection

  • Models trained:
    • Logistic Regression
    • Random Forest
    • Decision Tree
    • Naive Bayes
  • Models compared using Recall
  • Best model selected automatically
  • Artifacts saved:
    • model.rds
    • scaler.rds
    • feature_names.rds

Reproducibility

  • Fixed random seed
  • Same input → same output
  • Deterministic pipeline

Deployment

  • Model deployed using Plumber REST API
  • Endpoints:
    • /health
    • /predict
    • /predict-csv
  • Swagger UI available for testing
  • Input validation and safe error handling implemented

Containerization

  • Dockerfile created
  • API can be run inside a Docker container
  • Environment fully reproducible

How to Run the Project

1. Train the Model

From RStudio (project root):

```r source(“src/train.R”)

This generates trained artifacts in the artifacts/ directory.

  1. Run the API library(plumber) pr <- plumb(“api/plumber.R”) pr$run(port = 7860)

Open in browser:

http://localhost:7860/__docs__/

  1. Run with Docker (Optional) docker build -t iris-r-mlops . docker run -p 7860:7860 iris-r-mlops