The dataset contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The device was located on the field in a significantly polluted area, at road level,within an Italian city. Data were recorded from March 2004 to February 2005 (one year)representing the longest freely available recordings of on field deployed air quality chemical sensor devices responses. Ground Truth hourly averaged concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx) and Nitrogen Dioxide (NO2) and were provided by a co-located reference certified analyzer. Evidences of cross-sensitivities as well as both concept and sensor drifts are present as described in De Vito et al., Sens. And Act. B, Vol. 129,2,2008 (citation required) eventually affecting sensors concentration estimation capabilities. Missing values are tagged with -200 value. This dataset can be used exclusively for research purposes. Commercial purposes are fully excluded.
Attribute Information: 0 Date (DD/MM/YYYY) 1 Time (HH.MM.SS) 2 True hourly averaged concentration CO in mg/m^3 (reference analyzer) 3 PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted) 4 True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer) 5 True hourly averaged Benzene concentration in microg/m^3 (reference analyzer) 6 PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted) 7 True hourly averaged NOx concentration in ppb (reference analyzer) 8 PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted) 9 True hourly averaged NO2 concentration in microg/m^3 (reference analyzer) 10 PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted) 11 PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted) 12 Temperature in °C 13 Relative Humidity (%) 14 AH Absolute Humidity
dataS <- read.csv(file= "/Users/GD/Desktop/AirQualityUCI.csv", header=TRUE, sep=",")
head(dataS)
## Date Time CO.GT. PT08.S1.CO. NMHC.GT. C6H6.GT. PT08.S2.NMHC.
## 1 3/10/04 18:00:00 2.6 1360 150 11.9 1046
## 2 3/10/04 19:00:00 2.0 1292 112 9.4 955
## 3 3/10/04 20:00:00 2.2 1402 88 9.0 939
## 4 3/10/04 21:00:00 2.2 1376 80 9.2 948
## 5 3/10/04 22:00:00 1.6 1272 51 6.5 836
## 6 3/10/04 23:00:00 1.2 1197 38 4.7 750
## NOx.GT. PT08.S3.NOx. NO2.GT. PT08.S4.NO2. PT08.S5.O3. T RH AH
## 1 166 1056 113 1692 1268 13.6 48.9 0.7578
## 2 103 1174 92 1559 972 13.3 47.7 0.7255
## 3 131 1140 114 1555 1074 11.9 54.0 0.7502
## 4 172 1092 122 1584 1203 11.0 60.0 0.7867
## 5 131 1205 116 1490 1110 11.2 59.6 0.7888
## 6 89 1337 96 1393 949 11.2 59.2 0.7848
## X X.1
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
summary(dataS)
## Date Time CO.GT. PT08.S1.CO.
## : 114 0:00:00 : 390 Min. :-200.00 Min. :-200
## 1/1/05 : 24 1:00:00 : 390 1st Qu.: 0.60 1st Qu.: 921
## 1/10/05: 24 10:00:00: 390 Median : 1.50 Median :1053
## 1/11/05: 24 11:00:00: 390 Mean : -34.21 Mean :1049
## 1/12/05: 24 12:00:00: 390 3rd Qu.: 2.60 3rd Qu.:1221
## 1/13/05: 24 13:00:00: 390 Max. : 11.90 Max. :2040
## (Other):9237 (Other) :7131 NA's :114 NA's :114
## NMHC.GT. C6H6.GT. PT08.S2.NMHC. NOx.GT.
## Min. :-200.0 Min. :-200.000 Min. :-200.0 Min. :-200.0
## 1st Qu.:-200.0 1st Qu.: 4.000 1st Qu.: 711.0 1st Qu.: 50.0
## Median :-200.0 Median : 7.900 Median : 895.0 Median : 141.0
## Mean :-159.1 Mean : 1.866 Mean : 894.6 Mean : 168.6
## 3rd Qu.:-200.0 3rd Qu.: 13.600 3rd Qu.:1105.0 3rd Qu.: 284.0
## Max. :1189.0 Max. : 63.700 Max. :2214.0 Max. :1479.0
## NA's :114 NA's :114 NA's :114 NA's :114
## PT08.S3.NOx. NO2.GT. PT08.S4.NO2. PT08.S5.O3.
## Min. :-200 Min. :-200.00 Min. :-200 Min. :-200.0
## 1st Qu.: 637 1st Qu.: 53.00 1st Qu.:1185 1st Qu.: 700.0
## Median : 794 Median : 96.00 Median :1446 Median : 942.0
## Mean : 795 Mean : 58.15 Mean :1391 Mean : 975.1
## 3rd Qu.: 960 3rd Qu.: 133.00 3rd Qu.:1662 3rd Qu.:1255.0
## Max. :2683 Max. : 340.00 Max. :2775 Max. :2523.0
## NA's :114 NA's :114 NA's :114 NA's :114
## T RH AH X
## Min. :-200.000 Min. :-200.00 Min. :-200.0000 Mode:logical
## 1st Qu.: 10.900 1st Qu.: 34.10 1st Qu.: 0.6923 NA's:9471
## Median : 17.200 Median : 48.60 Median : 0.9768
## Mean : 9.778 Mean : 39.49 Mean : -6.8376
## 3rd Qu.: 24.100 3rd Qu.: 61.90 3rd Qu.: 1.2962
## Max. : 44.600 Max. : 88.70 Max. : 2.2310
## NA's :114 NA's :114 NA's :114
## X.1
## Mode:logical
## NA's:9471
##
##
##
##
##
model1 <- lm(T ~ PT08.S5.O3.+ PT08.S4.NO2.,data = dataS)
summary(model1)
##
## Call:
## lm(formula = T ~ PT08.S5.O3. + PT08.S4.NO2., data = dataS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -99.072 -9.405 4.226 17.530 54.030
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.746e+01 9.161e-01 -95.472 <2e-16 ***
## PT08.S5.O3. -8.483e-03 9.246e-04 -9.174 <2e-16 ***
## PT08.S4.NO2. 7.583e-02 9.043e-04 83.850 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.2 on 9354 degrees of freedom
## (114 observations deleted due to missingness)
## Multiple R-squared: 0.5739, Adjusted R-squared: 0.5739
## F-statistic: 6301 on 2 and 9354 DF, p-value: < 2.2e-16
plot(model1)
The summary shows R square value = 0.5739.The results from the model are not enough to conclude that model will be normal and good fit.
model2 <- lm(T ~ PT08.S5.O3.+ (PT08.S4.NO2.^2) + PT08.S4.NO2.,data = dataS)
summary(model2)
##
## Call:
## lm(formula = T ~ PT08.S5.O3. + (PT08.S4.NO2.^2) + PT08.S4.NO2.,
## data = dataS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -99.072 -9.405 4.226 17.530 54.030
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.746e+01 9.161e-01 -95.472 <2e-16 ***
## PT08.S5.O3. -8.483e-03 9.246e-04 -9.174 <2e-16 ***
## PT08.S4.NO2. 7.583e-02 9.043e-04 83.850 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.2 on 9354 degrees of freedom
## (114 observations deleted due to missingness)
## Multiple R-squared: 0.5739, Adjusted R-squared: 0.5739
## F-statistic: 6301 on 2 and 9354 DF, p-value: < 2.2e-16
Nope there is no difference or improvement observed in comparison with model I
model3 <- lm(T ~ CO.GT.+ PT08.S1.CO.+ NMHC.GT.+ C6H6.GT.+PT08.S2.NMHC.+NOx.GT.+PT08.S3.NOx., data = dataS )
summary(model3)
##
## Call:
## lm(formula = T ~ CO.GT. + PT08.S1.CO. + NMHC.GT. + C6H6.GT. +
## PT08.S2.NMHC. + NOx.GT. + PT08.S3.NOx., data = dataS)
##
## Residuals:
## Min 1Q Median 3Q Max
## -59.837 -4.984 -0.134 4.523 25.838
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.0847883 1.3559790 31.036 < 2e-16 ***
## CO.GT. 0.0195921 0.0012462 15.722 < 2e-16 ***
## PT08.S1.CO. -0.0259428 0.0009332 -27.801 < 2e-16 ***
## NMHC.GT. -0.0039548 0.0006483 -6.101 1.1e-09 ***
## C6H6.GT. 1.2125154 0.0084096 144.182 < 2e-16 ***
## PT08.S2.NMHC. -0.0001632 0.0007860 -0.208 0.835
## NOx.GT. -0.0194799 0.0004494 -43.344 < 2e-16 ***
## PT08.S3.NOx. -0.0048846 0.0005878 -8.310 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.49 on 9349 degrees of freedom
## (114 observations deleted due to missingness)
## Multiple R-squared: 0.97, Adjusted R-squared: 0.9699
## F-statistic: 4.313e+04 on 7 and 9349 DF, p-value: < 2.2e-16
plot(model3)
Model III :Model III shows great improvemnet with R square value(0.9699) close to 1 .Thus we can conclude this model is good fit and Temperature is dependent on all factors listed in model.
qqnorm(model1$residuals)
qqline(model1$residuals)
qqnorm(model2$residuals)
qqline(model2$residuals)
qqnorm(model3$residuals)
qqline(model3$residuals)