The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level,within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer. Evidences of cross-sensitivities as well as both concept and sensor drifts are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (citation required) eventually affecting sensors concentration estimation capabilities. Missing values are tagged with -200 value. This dataset can be used exclusively for research purposes. Commercial purposes are fully excluded.

Attribute Information: 0 Date (DD/MM/YYYY) 1 Time (HH.MM.SS) 2 True hourly averaged concentration CO in mg/m^3 (reference analyzer) 3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted) 4 True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer) 5 True hourly averaged Benzene concentration in microg/m^3 (reference analyzer) 6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted) 7 True hourly averaged NOx concentration in ppb (reference analyzer) 8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted) 9 True hourly averaged NO2 concentration in microg/m^3 (reference analyzer) 10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted) 11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted) 12 Temperature in °C 13 Relative Humidity (%) 14 AH Absolute Humidity

Lets take a look at our data.Our main goal will be to check the effect of all the above listed attributes on temperature.

dataS <- read.csv(file= "/Users/GD/Desktop/AirQualityUCI.csv", header=TRUE, sep=",")
head(dataS)
##      Date     Time CO.GT. PT08.S1.CO. NMHC.GT. C6H6.GT. PT08.S2.NMHC.
## 1 3/10/04 18:00:00    2.6        1360      150     11.9          1046
## 2 3/10/04 19:00:00    2.0        1292      112      9.4           955
## 3 3/10/04 20:00:00    2.2        1402       88      9.0           939
## 4 3/10/04 21:00:00    2.2        1376       80      9.2           948
## 5 3/10/04 22:00:00    1.6        1272       51      6.5           836
## 6 3/10/04 23:00:00    1.2        1197       38      4.7           750
##   NOx.GT. PT08.S3.NOx. NO2.GT. PT08.S4.NO2. PT08.S5.O3.    T   RH     AH
## 1     166         1056     113         1692        1268 13.6 48.9 0.7578
## 2     103         1174      92         1559         972 13.3 47.7 0.7255
## 3     131         1140     114         1555        1074 11.9 54.0 0.7502
## 4     172         1092     122         1584        1203 11.0 60.0 0.7867
## 5     131         1205     116         1490        1110 11.2 59.6 0.7888
## 6      89         1337      96         1393         949 11.2 59.2 0.7848
##    X X.1
## 1 NA  NA
## 2 NA  NA
## 3 NA  NA
## 4 NA  NA
## 5 NA  NA
## 6 NA  NA
summary(dataS)
##       Date            Time          CO.GT.         PT08.S1.CO.  
##         : 114   0:00:00 : 390   Min.   :-200.00   Min.   :-200  
##  1/1/05 :  24   1:00:00 : 390   1st Qu.:   0.60   1st Qu.: 921  
##  1/10/05:  24   10:00:00: 390   Median :   1.50   Median :1053  
##  1/11/05:  24   11:00:00: 390   Mean   : -34.21   Mean   :1049  
##  1/12/05:  24   12:00:00: 390   3rd Qu.:   2.60   3rd Qu.:1221  
##  1/13/05:  24   13:00:00: 390   Max.   :  11.90   Max.   :2040  
##  (Other):9237   (Other) :7131   NA's   :114       NA's   :114   
##     NMHC.GT.         C6H6.GT.        PT08.S2.NMHC.       NOx.GT.      
##  Min.   :-200.0   Min.   :-200.000   Min.   :-200.0   Min.   :-200.0  
##  1st Qu.:-200.0   1st Qu.:   4.000   1st Qu.: 711.0   1st Qu.:  50.0  
##  Median :-200.0   Median :   7.900   Median : 895.0   Median : 141.0  
##  Mean   :-159.1   Mean   :   1.866   Mean   : 894.6   Mean   : 168.6  
##  3rd Qu.:-200.0   3rd Qu.:  13.600   3rd Qu.:1105.0   3rd Qu.: 284.0  
##  Max.   :1189.0   Max.   :  63.700   Max.   :2214.0   Max.   :1479.0  
##  NA's   :114      NA's   :114        NA's   :114      NA's   :114     
##   PT08.S3.NOx.     NO2.GT.         PT08.S4.NO2.   PT08.S5.O3.    
##  Min.   :-200   Min.   :-200.00   Min.   :-200   Min.   :-200.0  
##  1st Qu.: 637   1st Qu.:  53.00   1st Qu.:1185   1st Qu.: 700.0  
##  Median : 794   Median :  96.00   Median :1446   Median : 942.0  
##  Mean   : 795   Mean   :  58.15   Mean   :1391   Mean   : 975.1  
##  3rd Qu.: 960   3rd Qu.: 133.00   3rd Qu.:1662   3rd Qu.:1255.0  
##  Max.   :2683   Max.   : 340.00   Max.   :2775   Max.   :2523.0  
##  NA's   :114    NA's   :114       NA's   :114    NA's   :114     
##        T                  RH                AH               X          
##  Min.   :-200.000   Min.   :-200.00   Min.   :-200.0000   Mode:logical  
##  1st Qu.:  10.900   1st Qu.:  34.10   1st Qu.:   0.6923   NA's:9471     
##  Median :  17.200   Median :  48.60   Median :   0.9768                 
##  Mean   :   9.778   Mean   :  39.49   Mean   :  -6.8376                 
##  3rd Qu.:  24.100   3rd Qu.:  61.90   3rd Qu.:   1.2962                 
##  Max.   :  44.600   Max.   :  88.70   Max.   :   2.2310                 
##  NA's   :114        NA's   :114       NA's   :114                       
##    X.1         
##  Mode:logical  
##  NA's:9471     
##                
##                
##                
##                
## 

Multilinear Model I

model1 <- lm(T ~ PT08.S5.O3.+ PT08.S4.NO2.,data = dataS)
summary(model1)
## 
## Call:
## lm(formula = T ~ PT08.S5.O3. + PT08.S4.NO2., data = dataS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -99.072  -9.405   4.226  17.530  54.030 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -8.746e+01  9.161e-01 -95.472   <2e-16 ***
## PT08.S5.O3.  -8.483e-03  9.246e-04  -9.174   <2e-16 ***
## PT08.S4.NO2.  7.583e-02  9.043e-04  83.850   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.2 on 9354 degrees of freedom
##   (114 observations deleted due to missingness)
## Multiple R-squared:  0.5739, Adjusted R-squared:  0.5739 
## F-statistic:  6301 on 2 and 9354 DF,  p-value: < 2.2e-16
plot(model1)

The summary shows R square value = 0.5739.The results from the model are not enough to conclude that model will be normal and good fit.

Lets try Model II: One dichotomous and one squared term

model2 <- lm(T ~ PT08.S5.O3.+ (PT08.S4.NO2.^2) + PT08.S4.NO2.,data = dataS)
summary(model2)
## 
## Call:
## lm(formula = T ~ PT08.S5.O3. + (PT08.S4.NO2.^2) + PT08.S4.NO2., 
##     data = dataS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -99.072  -9.405   4.226  17.530  54.030 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -8.746e+01  9.161e-01 -95.472   <2e-16 ***
## PT08.S5.O3.  -8.483e-03  9.246e-04  -9.174   <2e-16 ***
## PT08.S4.NO2.  7.583e-02  9.043e-04  83.850   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.2 on 9354 degrees of freedom
##   (114 observations deleted due to missingness)
## Multiple R-squared:  0.5739, Adjusted R-squared:  0.5739 
## F-statistic:  6301 on 2 and 9354 DF,  p-value: < 2.2e-16

Nope there is no difference or improvement observed in comparison with model I

Model III:

model3 <- lm(T ~ CO.GT.+ PT08.S1.CO.+ NMHC.GT.+ C6H6.GT.+PT08.S2.NMHC.+NOx.GT.+PT08.S3.NOx., data = dataS )
summary(model3)
## 
## Call:
## lm(formula = T ~ CO.GT. + PT08.S1.CO. + NMHC.GT. + C6H6.GT. + 
##     PT08.S2.NMHC. + NOx.GT. + PT08.S3.NOx., data = dataS)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.837  -4.984  -0.134   4.523  25.838 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   42.0847883  1.3559790  31.036  < 2e-16 ***
## CO.GT.         0.0195921  0.0012462  15.722  < 2e-16 ***
## PT08.S1.CO.   -0.0259428  0.0009332 -27.801  < 2e-16 ***
## NMHC.GT.      -0.0039548  0.0006483  -6.101  1.1e-09 ***
## C6H6.GT.       1.2125154  0.0084096 144.182  < 2e-16 ***
## PT08.S2.NMHC. -0.0001632  0.0007860  -0.208    0.835    
## NOx.GT.       -0.0194799  0.0004494 -43.344  < 2e-16 ***
## PT08.S3.NOx.  -0.0048846  0.0005878  -8.310  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.49 on 9349 degrees of freedom
##   (114 observations deleted due to missingness)
## Multiple R-squared:   0.97,  Adjusted R-squared:  0.9699 
## F-statistic: 4.313e+04 on 7 and 9349 DF,  p-value: < 2.2e-16
plot(model3)

Model III :Model III shows great improvemnet with R square value(0.9699) close to 1 .Thus we can conclude this model is good fit and Temperature is dependent on all factors listed in model.

Residual Analysis:

For Model1:

qqnorm(model1$residuals)
qqline(model1$residuals)

For Model2

qqnorm(model2$residuals)
qqline(model2$residuals)

For Model3:

qqnorm(model3$residuals)
qqline(model3$residuals)