Carnegie Mellon University Database Archive: http://lib.stat.cmu.edu/datasets/ Submitted by Magne Aldrin (magne.aldrin(at)nr.no). [28/Jul/04]
This data is a subsample of 500 obervations from an original data set for a study of air pollution at a road who’s traffic volume and meteorology around its area are distinctly observed. This data was collected by the Norwegian Public Roads Administration.
Column 1, NO2 : hourly values of the logarithm of the concentration of NO2 (particles) at Alnabru in Oslo, Norway between October 2001 and August 2003.
Column 2, CAR: number of cars per hour (log) Column 3, TEMP: temperature 2 meters above fround (degree C) Column 4, WIND_SPEED: wind speed (meters/second) Column 5, TEMP_DIFF: wind direction (degrees between 0 and 360) Column 6, WIND_DIR: the temperature difference between 2 and 25 meters above ground (degree C) Column 7, HOUR: hour of day Column 8, DAY: day number from October 1. 2001
REG = lm(NO2 ~ CAR + TEMP + WIND_SPEED + TEMP_DIFF + WIND_DIR + HOUR + DAY, data=NO2)
summary(REG)
##
## Call:
## lm(formula = NO2 ~ CAR + TEMP + WIND_SPEED + TEMP_DIFF + WIND_DIR +
## HOUR + DAY, data = NO2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.26257 -0.30217 0.02559 0.34609 1.80855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.5703055 0.1883614 3.028 0.00259 **
## CAR 0.5052947 0.0283248 17.839 < 2e-16 ***
## TEMP -0.0238875 0.0042874 -5.572 4.16e-08 ***
## WIND_SPEED -0.1253512 0.0139503 -8.986 < 2e-16 ***
## TEMP_DIFF 0.1670117 0.0262156 6.371 4.33e-10 ***
## WIND_DIR 0.0007838 0.0002984 2.627 0.00889 **
## HOUR -0.0195713 0.0043701 -4.478 9.35e-06 ***
## DAY 0.0003633 0.0001233 2.946 0.00337 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5258 on 492 degrees of freedom
## Multiple R-squared: 0.5161, Adjusted R-squared: 0.5092
## F-statistic: 74.97 on 7 and 492 DF, p-value: < 2.2e-16
Regression analysis shows us that traffic levels significantly increase NO2 levels, while wind speed significantly decreases NO2 levels. One can hypothesis that heavy traffic causes NO2 levels to increase in the area and that wind speed may be able to mitigate the NO2 levels in the area. However, it is important to note we cannot infer causation from correlation.
pairs(NO2 ~ CAR + TEMP + WIND_SPEED + TEMP_DIFF + WIND_DIR + HOUR + DAY, data=NO2)
plot(NO2$CAR, NO2$NO2)
Pair-wise scatterplots show vague relationships between NO2 and traffic, wind speed, and temperature difference.
library(glmnet)
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-16
x = model.matrix(NO2 ~ CAR + TEMP + WIND_SPEED + TEMP_DIFF + WIND_DIR + HOUR + DAY, data=NO2) # reg model
y = as.numeric(as.matrix(NO2$NO2)) # NO2
LASSO = glmnet(x, y, alpha = 1)
coef(LASSO, s=LASSO$lambda.min)
## 9 x 65 sparse Matrix of class "dgCMatrix"
## [[ suppressing 65 column names 's0', 's1', 's2' ... ]]
##
## (Intercept) 3.698368 3.47936017 3.27980852 3.09798449 2.9323132
## (Intercept) . . . . .
## CAR . 0.03140642 0.06002277 0.08609693 0.1098547
## TEMP . . . . .
## WIND_SPEED . . . . .
## TEMP_DIFF . . . . .
## WIND_DIR . . . . .
## HOUR . . . . .
## DAY . . . . .
##
## (Intercept) 2.80264076 2.69093163 2.58914642 2.49640353 2.41189967
## (Intercept) . . . . .
## CAR 0.13325789 0.15511364 0.17502779 0.19317282 0.20970589
## TEMP . . . . .
## WIND_SPEED -0.01097047 -0.02428804 -0.03642251 -0.04747899 -0.05755324
## TEMP_DIFF . . . . .
## WIND_DIR . . . . .
## HOUR . . . . .
## DAY . . . . .
##
## (Intercept) 2.310225e+00 2.20293848 2.105142476 2.016061280
## (Intercept) . . . .
## CAR 2.273958e-01 0.24541050 0.261830588 0.276788093
## TEMP -6.268144e-05 -0.00177569 -0.003337483 -0.004759878
## WIND_SPEED -6.519272e-02 -0.07123804 -0.076745699 -0.081764259
## TEMP_DIFF 1.149167e-02 0.02212953 0.031821739 0.040653433
## WIND_DIR . . . .
## HOUR . . . .
## DAY . . . .
##
## (Intercept) 1.934893808 1.860937031 1.793550370 1.732150155
## (Intercept) . . . .
## CAR 0.290416815 0.302834799 0.314149603 0.324459231
## TEMP -0.006055911 -0.007236808 -0.008312798 -0.009293199
## WIND_SPEED -0.086336984 -0.090503480 -0.094299837 -0.097758936
## TEMP_DIFF 0.048700543 0.056032770 0.062713623 0.068800966
## WIND_DIR . . . .
## HOUR . . . .
## DAY . . . .
##
## (Intercept) 1.67620457 1.62522903 1.57878202 1.53646123 1.49790010
## (Intercept) . . . . .
## CAR 0.33385298 0.34241222 0.35021107 0.35731710 0.36379184
## TEMP -0.01018650 -0.01100045 -0.01174209 -0.01241784 -0.01303356
## WIND_SPEED -0.10091074 -0.10378254 -0.10639922 -0.10878345 -0.11095586
## TEMP_DIFF 0.07434753 0.07940135 0.08400620 0.08820197 0.09202500
## WIND_DIR . . . . .
## HOUR . . . . .
## DAY . . . . .
##
## (Intercept) 1.455484e+00 1.3942402181 1.323066e+00 1.256164e+00
## (Intercept) . . . .
## CAR 3.693957e-01 0.3803318706 3.913450e-01 4.014716e-01
## TEMP -1.388003e-02 -0.0146155162 -1.542030e-02 -1.617280e-02
## WIND_SPEED -1.124323e-01 -0.1138613497 -1.149214e-01 -1.158463e-01
## TEMP_DIFF 9.510908e-02 0.0998393978 1.056138e-01 1.110703e-01
## WIND_DIR 5.655028e-05 0.0001117524 1.702709e-04 2.247886e-04
## HOUR . -0.0015061465 -3.098985e-03 -4.562683e-03
## DAY . . 2.823841e-05 5.800762e-05
##
## (Intercept) 1.195234e+00 1.1397168665 1.0891319271 1.0430408144
## (Intercept) . . . .
## CAR 4.106950e-01 0.4190989622 0.4267563578 0.4337334914
## TEMP -1.685814e-02 -0.0174826083 -0.0180515971 -0.0185700385
## WIND_SPEED -1.166907e-01 -0.1174600834 -0.1181611049 -0.1187998495
## TEMP_DIFF 1.160399e-01 0.1205681421 0.1246940631 0.1284534486
## WIND_DIR 2.744495e-04 0.0003196986 0.0003609279 0.0003984946
## HOUR -5.896005e-03 -0.0071108770 -0.0082178232 -0.0092264313
## DAY 8.512775e-05 0.0001098386 0.0001323542 0.0001528696
##
## (Intercept) 1.0010443090 0.9629717099 0.9280890731 0.8963045664
## (Intercept) . . . .
## CAR 0.4400907957 0.4458495661 0.4511303747 0.4559421873
## TEMP -0.0190424230 -0.0194757955 -0.0198677305 -0.0202248317
## WIND_SPEED -0.1193818499 -0.1199134294 -0.1203965031 -0.1208366611
## TEMP_DIFF 0.1318788607 0.1349818000 0.1378271752 0.1404198510
## WIND_DIR 0.0004327239 0.0004639813 0.0004923934 0.0005182810
## HOUR -0.0101454374 -0.0109793940 -0.0117426578 -0.0124381293
## DAY 0.0001715625 0.0001885930 0.0002041123 0.0002182529
##
## (Intercept) 0.8673437049 0.8409556465 0.8169118305 0.7950040008
## (Intercept) . . . .
## CAR 0.4603265323 0.4643213841 0.4679613444 0.4712779405
## TEMP -0.0205502090 -0.0208466806 -0.0211168145 -0.0213629505
## WIND_SPEED -0.1212377168 -0.1216031437 -0.1219361071 -0.1222394910
## TEMP_DIFF 0.1427822008 0.1449346861 0.1468959504 0.1486829815
## WIND_DIR 0.0005418688 0.0005633612 0.0005829442 0.0006007875
## HOUR -0.0130718170 -0.0136492097 -0.0141753084 -0.0146546700
## DAY 0.0002311373 0.0002428771 0.0002535739 0.0002633205
##
## (Intercept) 0.7750424023 0.7568541372 0.7404916913 0.7253749080
## (Intercept) . . . .
## CAR 0.4742998995 0.4770533959 0.4795263282 0.4818151436
## TEMP -0.0215872204 -0.0217915668 -0.0219804104 -0.0221498682
## WIND_SPEED -0.1225159231 -0.1227677977 -0.1229992363 -0.1232081787
## TEMP_DIFF 0.1503112576 0.1517948822 0.1531277601 0.1543609645
## WIND_DIR 0.0006170457 0.0006318595 0.0006454164 0.0006577109
## HOUR -0.0150914464 -0.0154894208 -0.0158484349 -0.0161791218
## DAY 0.0002722012 0.0002802930 0.0002876613 0.0002943796
##
## (Intercept) 0.7115989862 0.6990468595 0.6876098290 0.6771888327
## (Intercept) . . . .
## CAR 0.4839010028 0.4858015636 0.4875332839 0.4891111629
## TEMP -0.0223042309 -0.0224448801 -0.0225730344 -0.0226898038
## WIND_SPEED -0.1233985553 -0.1235720193 -0.1237300732 -0.1238740861
## TEMP_DIFF 0.1554848222 0.1565088415 0.1574418898 0.1582920487
## WIND_DIR 0.0006689121 0.0006791183 0.0006884178 0.0006968911
## HOUR -0.0164804698 -0.0167550471 -0.0170052318 -0.0172331908
## DAY 0.0003005011 0.0003060788 0.0003111610 0.0003157917
##
## (Intercept) 0.6676936090 0.6590419149 0.6511588134 0.6442043565
## (Intercept) . . . .
## CAR 0.4905488675 0.4918588504 0.4930524580 0.4941020920
## TEMP -0.0227961997 -0.0228931437 -0.0229814755 -0.0230639815
## WIND_SPEED -0.1240053052 -0.1241248673 -0.1242338077 -0.1243359291
## TEMP_DIFF 0.1590666818 0.1597724986 0.1604156126 0.1609822955
## WIND_DIR 0.0007046117 0.0007116465 0.0007180562 0.0007239366
## HOUR -0.0174408985 -0.0176301540 -0.0178025966 -0.0179559520
## DAY 0.0003200110 0.0003238555 0.0003273584 0.0003305410
##
## (Intercept) 0.6376452513 0.6316631133 0.6262122724 0.6212456660
## (Intercept) . . . .
## CAR 0.4950953680 0.4960014262 0.4968270181 0.4975792674
## TEMP -0.0231372410 -0.0232038908 -0.0232646168 -0.0233199479
## WIND_SPEED -0.1244261425 -0.1245083186 -0.1245831943 -0.1246514182
## TEMP_DIFF 0.1615173616 0.1620054522 0.1624501964 0.1628554311
## WIND_DIR 0.0007292571 0.0007341025 0.0007385173 0.0007425399
## HOUR -0.0180993452 -0.0182301035 -0.0183492482 -0.0184578084
## DAY 0.0003334500 0.0003361006 0.0003385157 0.0003407163
##
## (Intercept) 0.6167202790 0.6125969150 0.6091392056 0.6057093002
## (Intercept) . . . .
## CAR 0.4982646889 0.4988892195 0.4994104040 0.4999296884
## TEMP -0.0233703636 -0.0234163004 -0.0234592669 -0.0234975846
## WIND_SPEED -0.1247135813 -0.1247702221 -0.1248266433 -0.1248733878
## TEMP_DIFF 0.1632246658 0.1635610988 0.1638448014 0.1641243078
## WIND_DIR 0.0007462052 0.0007495449 0.0007525984 0.0007553765
## HOUR -0.0185567245 -0.0186468531 -0.0187242920 -0.0187991861
## DAY 0.0003427214 0.0003445483 0.0003461936 0.0003477118
##
## (Intercept) 0.6025653772 0.5996995724 0.5976047921 0.5947446856
## (Intercept) . . . .
## CAR 0.5004060793 0.5008403607 0.5011559091 0.5015905198
## TEMP -0.0235322430 -0.0235638006 -0.0235923847 -0.0236191932
## WIND_SPEED -0.1249158359 -0.1249545090 -0.1249993904 -0.1250221686
## TEMP_DIFF 0.1643806946 0.1646144213 0.1647902303 0.1650182462
## WIND_DIR 0.0007579020 0.0007602026 0.0007622644 0.0007642186
## HOUR -0.0188677522 -0.0189302485 -0.0189794272 -0.0190384685
## DAY 0.0003490954 0.0003503561 0.0003514603 0.0003525507
##
## (Intercept) 0.5929755492 0.5910308996
## (Intercept) . .
## CAR 0.5018576046 0.5021514198
## TEMP -0.0236425758 -0.0236649072
## WIND_SPEED -0.1250591475 -0.1250858890
## TEMP_DIFF 0.1651668568 0.1653249988
## WIND_DIR 0.0007659227 0.0007675222
## HOUR -0.0190798084 -0.0191223255
## DAY 0.0003534681 0.0003543375
LASSO_CV = cv.glmnet(x, y, alpha = 1)
coef(LASSO_CV, s = "lambda.min")
## 9 x 1 sparse Matrix of class "dgCMatrix"
## 1
## (Intercept) 0.5947446856
## (Intercept) .
## CAR 0.5015905198
## TEMP -0.0236191932
## WIND_SPEED -0.1250221686
## TEMP_DIFF 0.1650182462
## WIND_DIR 0.0007642186
## HOUR -0.0190384685
## DAY 0.0003525507
LASSO tells us that Cars and Wind Speeds are the most predictively significant variables in the model.