Developing Data Products Course Project

Raffaele Martino

Introduction to the Aapplication

The analysis is aimed at determining whether it is better to use only air temperature or also the wind speed

  • Preliminary exploratory analysis suggest significant correlation between ozone and these two variables
  • Two models have been compared:
    • A simple linear regression model of ozone against temperature
    • A multiple linear regression model of ozone
  • A web application is requested to easily provide prediction of ozone levels
  • However, the customer expressed a preference for temperature-only predictions since it is sometimes difficult for them to collect wind speed measurements

Linear Regression against Temperature

modelTemp <- lm(Ozone ~ Temp, data = airquality)
trainPredictionTemp <- predict(modelTemp, newdata = airquality)
sqrt(sum((trainPredictionTemp-airquality$Ozone)^2, na.rm = TRUE))
[1] 253.1993

Multiple Linear Regression against Temperature and Wind

modelMLR <- lm(Ozone ~ Temp + Wind, data = airquality)
trainPredictionMLR <- predict(modelMLR, newdata = airquality)
sqrt(sum((trainPredictionMLR-airquality$Ozone)^2, na.rm = TRUE))
[1] 232.3209

Conclusions and Final Remarks

Using also wind speed increases the accuracy of the prediction

  • Nevertheless, due to the customer's needs, both models have been included in the web application
    • The user can simply set values of temperature and wind speed to instantly get ozone levels prediction
      • If the user is not interested in the MLR model, the wind setting can simply be ignored
    • A graphical visualization of the emodel is also provided
  • An effort to collect more wind speed measurements in order to get more accurate prediction should be considered