Developing Data Products Course Project

Raffaele Martino
11/21/2020

Introduction to the Aapplication

The analysis is aimed at determining whether it is better to use only air temperature or also the wind speed

  • Preliminary exploratory analysis suggest significant correlation between ozone and these two variables
  • Two models have been compared:
    • A simple linear regression model of ozone against temperature
    • A multiple linear regression model of ozone
  • A web application is requested to easily provide prediction of ozone levels
  • However, the customer expressed a preference for temperature-only predictions since it is sometimes difficult for them to collect wind speed measurements

Linear Regression against Temperature

modelTemp <- lm(Ozone ~ Temp, data = airquality)
trainPredictionTemp <- predict(modelTemp, newdata = airquality)
sqrt(sum((trainPredictionTemp-airquality$Ozone)^2, na.rm = TRUE))
[1] 253.1993

Multiple Linear Regression against Temperature and Wind

modelMLR <- lm(Ozone ~ Temp + Wind, data = airquality)
trainPredictionMLR <- predict(modelMLR, newdata = airquality)
sqrt(sum((trainPredictionMLR-airquality$Ozone)^2, na.rm = TRUE))
[1] 232.3209

Conclusions and Final Remarks

Using also wind speed increases the accuracy of the prediction

  • Nevertheless, due to the customer's needs, both models have been included in the web application
    • The user can simply set values of temperature and wind speed to instantly get ozone levels prediction
      • If the user is not interested in the MLR model, the wind setting can simply be ignored
    • A graphical visualization of the emodel is also provided
  • An effort to collect more wind speed measurements in order to get more accurate prediction should be considered