# Developing Data Products Course Project

Raffaele Martino
11/21/2020

### Introduction to the Aapplication

The analysis is aimed at determining whether it is better to use only air temperature or also the wind speed

• Preliminary exploratory analysis suggest significant correlation between ozone and these two variables
• Two models have been compared:
• A simple linear regression model of ozone against temperature
• A multiple linear regression model of ozone
• A web application is requested to easily provide prediction of ozone levels
• However, the customer expressed a preference for temperature-only predictions since it is sometimes difficult for them to collect wind speed measurements

### Linear Regression against Temperature

modelTemp <- lm(Ozone ~ Temp, data = airquality)
trainPredictionTemp <- predict(modelTemp, newdata = airquality)
sqrt(sum((trainPredictionTemp-airquality$Ozone)^2, na.rm = TRUE))  [1] 253.1993  ### Multiple Linear Regression against Temperature and Wind modelMLR <- lm(Ozone ~ Temp + Wind, data = airquality) trainPredictionMLR <- predict(modelMLR, newdata = airquality) sqrt(sum((trainPredictionMLR-airquality$Ozone)^2, na.rm = TRUE))

[1] 232.3209


### Conclusions and Final Remarks

Using also wind speed increases the accuracy of the prediction

• Nevertheless, due to the customer's needs, both models have been included in the web application
• The user can simply set values of temperature and wind speed to instantly get ozone levels prediction
• If the user is not interested in the MLR model, the wind setting can simply be ignored
• A graphical visualization of the emodel is also provided
• An effort to collect more wind speed measurements in order to get more accurate prediction should be considered