How does train/test ratio influnces accuracy?

Course project

Anton Votinov
Coursera student

Summary of the App

You can play with the App here.

  • The aim of the App is to show the relationship between train-to-test proportion and the accuracy of predictions.
  • The App allows you to choose train-to-data proportion and randomization seed.
  • The output of the App lets you examine residuals of predictions based on train data set.

Output example 1

drawing drawing

Output example 2

As you can see on the previous slide, the inputs are the following:

  1. Randomization seed is set to be 10;
  2. Train data is 0.75 of the total.

The outputs are:

  1. Actual data point and the predicted regression line;
  2. Residuals of predictions;
  3. Sigma_sq of the residuals (equal 15.49).

If you increase proportion to 0.875, sigma_sq increases too (22.92).

Implementation

Proportion is set here to 0.5. server.R code differs a bit (input$proportion in place of proportion).

proportion <- 1/2
edge <- round(proportion*32,0)
edge <- sample(1:32,edge)
trainData <- mtcars[edge,]
testData <- mtcars[-edge,]

Creating train and test data.

fit <- lm(data = trainData, mpg ~ disp)
predictedValues <- predict(fit, testData)

Fitting the model and predictiong on test data to plot it later.