[Reference: http://catalog.data.gov/dataset/energy-usage-2010-24a67]
Below, the “Energy Usage 2010” Dataset is loaded into R, and its summary statistics and its structure are display (along with the “head” and the “tail” of the dataset).
#Install and load the "Energy Usage 2010" dataset into R, assigning a variable, "energy_raw", to the complete dataframe..
rm(list=ls())
energy_raw <- read.csv("~/Academics (RPI)/10. Spring 2015/Applied Regression Analysis/Assignments/Assignment #4/Energy_Usage_2010.csv", header=TRUE, stringsAsFactors = FALSE)
#Then, display the "head" and "tail" of the dataset, "eNergy_raw".
head(energy_raw)
## COMMUNITY.AREA.NAME CENSUS.BLOCK BUILDING_TYPE BUILDING_SUBTYPE
## 1 Albany Park 1.7e+14 Residential Multi 7+
## 2 Albany Park 1.7e+14 Residential Multi < 7
## 3 Albany Park 1.7e+14 Residential Single Family
## 4 Albany Park 1.7e+14 Residential Multi 7+
## 5 Albany Park 1.7e+14 Residential Multi < 7
## 6 Albany Park 1.7e+14 Commercial Multi < 7
## KWH.JANUARY.2010 KWH.FEBRUARY.2010 KWH.MARCH.2010 KWH.APRIL.2010
## 1 11921 12145 9759 11542
## 2 1233 1645 994 1055
## 3 4141 3798 2939 4727
## 4 1230 1333 1260 1405
## 5 12977 14639 12718 14973
## 6 2878 3755 4571 2984
## KWH.MAY.2010 KWH.JUNE.2010 KWH.JULY.2010 KWH.AUGUST.2010
## 1 14348 26617 24210 20383
## 2 1284 3527 3099 2527
## 3 5324 9676 7591 6287
## 4 1699 2094 732 1312
## 5 16384 32940 24454 23926
## 6 3111 4808 4132 3564
## KWH.SEPTEMBER.2010 KWH.OCTOBER.2010 KWH.NOVEMBER.2010 KWH.DECEMBER.2010
## 1 11983 10335 25327 22462
## 2 904 626 2092 1622
## 3 2920 2565 5979 5073
## 4 1462 1358 1372 1495
## 5 15012 13679 31979 30660
## 6 2174 1985 5968 5400
## TOTAL_KWH ELECTRICITY.ACCOUNTS ZERO.KWH.ACCOUNTS THERM.JANUARY.2010
## 1 201032 48 22 7247
## 2 20608 Less than 4 1 321
## 3 61020 6 2 1222
## 4 16752 Less than 4 2 2961
## 5 244341 49 32 11508
## 6 45330 7 0 1793
## THERM.FEBRUARY.2010 THERM.MARCH.2010 TERM.APRIL.2010 THERM.MAY.2010
## 1 5904 5180 3113 1822
## 2 130 86 49 19
## 3 1016 860 543 346
## 4 2664 1616 798 344
## 5 9057 8000 4529 2809
## 6 1573 1352 890 853
## THERM.JUNE.2010 THERM.JULY.2010 THERM.AUGUST.2010 THERM.SEPTEMBER.2010
## 1 1272 1234 952 1780
## 2 13 7 10 12
## 3 247 203 179 170
## 4 404 320 272 368
## 5 1507 1179 991 994
## 6 541 448 438 439
## THERM.OCTOBER.2010 THERM.NOVEMBER.2010 THERM.DECEMBER.2010 TOTAL_THERMS
## 1 1472 1961 4885 36822
## 2 9 21 78 755
## 3 190 298 791 6065
## 4 745 1260 2901 14653
## 5 1254 2595 7167 51590
## 6 565 787 1538 11217
## GAS.ACCOUNTS KWH.TOTAL.SQFT THERMS.TOTAL.SQFT KWH.MEAN.2010
## 1 21 48825 48825 20103.20
## 2 Less than 4 3306 3306 20608.00
## 3 6 9472 9472 10170.00
## 4 6 14407 14407 16752.00
## 5 54 58835 58835 15271.31
## 6 6 8240 8240 22665.00
## KWH.STANDARD.DEVIATION.2010 KWH.MINIMUM.2010 KWH.1ST.QUARTILE.2010
## 1 8609.69 9414 12563.0
## 2 NA 20608 20608.0
## 3 4410.10 5619 6746.0
## 4 NA 16752 16752.0
## 5 8089.70 5462 10343.5
## 6 9526.14 15929 15929.0
## KWH.2ND.QUARTILE.2010 KWH.3RD.QUARTILE.2010 KWH.MAXIMUM.2010
## 1 19072.5 22177.0 36781
## 2 20608.0 20608.0 20608
## 3 9055.5 13014.0 17530
## 4 16752.0 16752.0 16752
## 5 12427.0 17495.5 34236
## 6 22665.0 29401.0 29401
## KWH.SQFT.MEAN.2010 KWH.SQFT.STANDARD.DEVIATION.2010
## 1 24412.50 5698.57
## 2 3306.00 NA
## 3 1578.67 863.85
## 4 14407.00 NA
## 5 3677.19 1061.65
## 6 8240.00 NA
## KWH.SQFT.MINIMUM.2010 KWH.SQFT.1ST.QUARTILE.2010
## 1 20383 20383
## 2 3306 3306
## 3 1226 1226
## 4 14407 14407
## 5 2414 2546
## 6 8240 8240
## KWH.SQFT.2ND.QUARTILE.2010 KWH.SQFT.3RD.QUARTILE.2010
## 1 24412.5 28442
## 2 3306.0 3306
## 3 1226.0 1226
## 4 14407.0 14407
## 5 3553.5 4692
## 6 8240.0 8240
## KWH.SQFT.MAXIMUM.2010 THERM.MEAN.2010 THERM.STANDARD.DEVIATION.2010
## 1 28442 5260.29 8435.63
## 2 3306 755.00 NA
## 3 3342 1010.83 620.53
## 4 14407 14653.00 NA
## 5 5530 3224.38 1079.13
## 6 8240 5608.50 5620.79
## THERM.MINIMUM.2010 THERM.1ST.QUARTILE.2010 THERM.2ND.QUARTILE.2010
## 1 882 957 1102.0
## 2 755 755 755.0
## 3 496 514 835.5
## 4 14653 14653 14653.0
## 5 2071 2499 2933.5
## 6 1634 1634 5608.5
## THERM.3RD.QUARTILE.2010 THERM.MAXIMUM.2010 THERMS.SQFT.MEAN.2010
## 1 8024.0 23460 24412.50
## 2 755.0 755 3306.00
## 3 1240.0 2144 1578.67
## 4 14653.0 14653 14407.00
## 5 3593.5 5754 3677.19
## 6 9583.0 9583 8240.00
## THERMS.SQFT.STANDARD.DEVIATION.2010 THERMS.SQFT.MINIMUM.2010
## 1 5698.57 20383
## 2 NA 3306
## 3 863.85 1226
## 4 NA 14407
## 5 1061.65 2414
## 6 NA 8240
## THERMS.SQFT.1ST.QUARTILE.2010 THERMS.SQFT.2ND.QUARTILE.2010
## 1 20383 24412.5
## 2 3306 3306.0
## 3 1226 1226.0
## 4 14407 14407.0
## 5 2546 3553.5
## 6 8240 8240.0
## THERMS.SQFT.3RD.QUARTILE.2010 THERMS.SQFT.MAXIMUM.2010 TOTAL_POPULATION
## 1 28442 28442 132
## 2 3306 3306 132
## 3 1226 3342 132
## 4 14407 14407 228
## 5 4692 5530 228
## 6 8240 8240 231
## TOTAL.UNITS AVERAGE.STORIES AVERAGE.BUILDING.AGE AVERAGE.HOUSESIZE
## 1 64 3.00 65.50 2.20
## 2 64 2.00 86.00 2.20
## 3 64 1.17 14.33 2.20
## 4 79 3.00 86.00 3.51
## 5 79 2.50 87.69 3.51
## 6 70 1.00 0.00 3.73
## OCCUPIED.UNITS OCCUPIED.UNITS.PERCENTAGE RENTER.OCCUPIED.HOUSING.UNITS
## 1 60 0.9375 33
## 2 60 0.9375 33
## 3 60 0.9375 33
## 4 65 0.8228 49
## 5 65 0.8228 49
## 6 62 0.8856 49
## RENTER.OCCUPIED.HOUSING.PERCENTAGE OCCUPIED.HOUSING.UNITS
## 1 0.550 60
## 2 0.550 60
## 3 0.550 60
## 4 0.754 65
## 5 0.754 65
## 6 0.790 62
tail(energy_raw)
## COMMUNITY.AREA.NAME CENSUS.BLOCK BUILDING_TYPE BUILDING_SUBTYPE
## 66969 Woodlawn 1.7e+14 Residential Multi < 7
## 66970 Woodlawn 1.7e+14 Residential Single Family
## 66971 Woodlawn 1.7e+14 Commercial Multi < 7
## 66972 Woodlawn 1.7e+14 Residential Multi < 7
## 66973 Woodlawn 1.7e+14 Residential Single Family
## 66974 Woodlawn 1.7e+14 Residential Multi < 7
## KWH.JANUARY.2010 KWH.FEBRUARY.2010 KWH.MARCH.2010 KWH.APRIL.2010
## 66969 9572 9104 8525 7756
## 66970 2705 1318 1582 1465
## 66971 1005 1760 1521 1832
## 66972 3567 3031 2582 2295
## 66973 1208 1055 1008 1109
## 66974 2717 3057 2695 3793
## KWH.MAY.2010 KWH.JUNE.2010 KWH.JULY.2010 KWH.AUGUST.2010
## 66969 11256 11669 12099 13200
## 66970 1494 2990 2449 2351
## 66971 2272 2361 3018 3030
## 66972 7902 4987 5773 3996
## 66973 1591 1367 1569 1551
## 66974 4237 5383 5544 6929
## KWH.SEPTEMBER.2010 KWH.OCTOBER.2010 KWH.NOVEMBER.2010
## 66969 9694 8419 19077
## 66970 1213 2174 2888
## 66971 2886 3833 6290
## 66972 3050 3103 3880
## 66973 1376 1236 2108
## 66974 5280 5971 6986
## KWH.DECEMBER.2010 TOTAL_KWH ELECTRICITY.ACCOUNTS ZERO.KWH.ACCOUNTS
## 66969 18869 139240 21 18
## 66970 5025 27654 6 7
## 66971 12169 41977 9 5
## 66972 4684 48850 7 2
## 66973 2529 17707 7 9
## 66974 5144 57736 12 17
## THERM.JANUARY.2010 THERM.FEBRUARY.2010 THERM.MARCH.2010
## 66969 6914 5433 5054
## 66970 2166 1681 1858
## 66971 985 1152 1238
## 66972 2202 1874 1647
## 66973 95 11 47
## 66974 2372 1787 1449
## TERM.APRIL.2010 THERM.MAY.2010 THERM.JUNE.2010 THERM.JULY.2010
## 66969 2967 2241 1107 770
## 66970 1172 708 360 72
## 66971 630 475 192 141
## 66972 906 645 346 84
## 66973 9 45 18 22
## 66974 718 572 286 155
## THERM.AUGUST.2010 THERM.SEPTEMBER.2010 THERM.OCTOBER.2010
## 66969 674 788 954
## 66970 67 77 185
## 66971 162 144 210
## 66972 150 150 260
## 66973 9 17 11
## 66974 134 161 303
## THERM.NOVEMBER.2010 THERM.DECEMBER.2010 TOTAL_THERMS GAS.ACCOUNTS
## 66969 2423 4619 33944 25
## 66970 623 1800 10769 9
## 66971 653 1744 7726 8
## 66972 694 1335 10293 5
## 66973 18 13 315 5
## 66974 588 1469 9994 13
## KWH.TOTAL.SQFT THERMS.TOTAL.SQFT KWH.MEAN.2010
## 66969 48349 48349 12658.18
## 66970 7801 7801 6913.50
## 66971 11838 11838 13992.33
## 66972 11028 11028 16283.33
## 66973 4653 4653 4426.75
## 66974 17812 13776 9622.67
## KWH.STANDARD.DEVIATION.2010 KWH.MINIMUM.2010 KWH.1ST.QUARTILE.2010
## 66969 7948.06 2691 7635.0
## 66970 5695.82 2444 2872.5
## 66971 2989.54 10754 10754.0
## 66972 15000.83 7010 7010.0
## 66973 2297.29 1878 2635.0
## 66974 5625.23 1312 6288.0
## KWH.2ND.QUARTILE.2010 KWH.3RD.QUARTILE.2010 KWH.MAXIMUM.2010
## 66969 11370.0 19168.0 30287
## 66970 5139.0 10954.5 14932
## 66971 14576.0 16647.0 16647
## 66972 8250.0 33590.0 33590
## 66973 4325.0 6218.5 7179
## 66974 9586.5 15290.0 15673
## KWH.SQFT.MEAN.2010 KWH.SQFT.STANDARD.DEVIATION.2010
## 66969 4834.9 2180.96
## 66970 3900.5 1429.06
## 66971 5919.0 725.49
## 66972 3676.0 1022.80
## 66973 4653.0 NA
## 66974 3562.4 2911.56
## KWH.SQFT.MINIMUM.2010 KWH.SQFT.1ST.QUARTILE.2010
## 66969 2810 3166
## 66970 2890 2890
## 66971 5406 5406
## 66972 2800 2800
## 66973 4653 4653
## 66974 1866 2170
## KWH.SQFT.2ND.QUARTILE.2010 KWH.SQFT.3RD.QUARTILE.2010
## 66969 3771.0 7232
## 66970 3900.5 4911
## 66971 5919.0 6432
## 66972 3428.0 4800
## 66973 4653.0 4653
## 66974 2472.0 2556
## KWH.SQFT.MAXIMUM.2010 THERM.MEAN.2010 THERM.STANDARD.DEVIATION.2010
## 66969 8016 3085.82 1542.64
## 66970 4911 2692.25 3661.92
## 66971 6432 2575.33 3492.97
## 66972 4800 3431.00 1155.32
## 66973 4653 105.00 80.30
## 66974 8748 2498.50 2372.88
## THERM.MINIMUM.2010 THERM.1ST.QUARTILE.2010 THERM.2ND.QUARTILE.2010
## 66969 621 2300 2669.0
## 66970 272 464 1195.5
## 66971 42 42 1124.0
## 66972 2449 2449 3140.0
## 66973 49 49 69.0
## 66974 487 578 2029.0
## THERM.3RD.QUARTILE.2010 THERM.MAXIMUM.2010 THERMS.SQFT.MEAN.2010
## 66969 4408.0 6246 4834.9
## 66970 4920.5 8106 3900.5
## 66971 6560.0 6560 5919.0
## 66972 4704.0 4704 3676.0
## 66973 197.0 197 4653.0
## 66974 4419.0 5449 4592.0
## THERMS.SQFT.STANDARD.DEVIATION.2010 THERMS.SQFT.MINIMUM.2010
## 66969 2180.96 2810
## 66970 1429.06 2890
## 66971 725.49 5406
## 66972 1022.80 2800
## 66973 NA 4653
## 66974 3599.45 2472
## THERMS.SQFT.1ST.QUARTILE.2010 THERMS.SQFT.2ND.QUARTILE.2010
## 66969 3166 3771.0
## 66970 2890 3900.5
## 66971 5406 5919.0
## 66972 2800 3428.0
## 66973 4653 4653.0
## 66974 2472 2556.0
## THERMS.SQFT.3RD.QUARTILE.2010 THERMS.SQFT.MAXIMUM.2010
## 66969 7232 8016
## 66970 4911 4911
## 66971 6432 6432
## 66972 4800 4800
## 66973 4653 4653
## 66974 8748 8748
## TOTAL_POPULATION TOTAL.UNITS AVERAGE.STORIES AVERAGE.BUILDING.AGE
## 66969 116 55 2.00 51.90
## 66970 116 55 1.00 0.00
## 66971 31 24 3.00 104.50
## 66972 31 24 2.33 100.67
## 66973 0 0 1.00 0.00
## 66974 77 49 2.00 79.40
## AVERAGE.HOUSESIZE OCCUPIED.UNITS OCCUPIED.UNITS.PERCENTAGE
## 66969 3.14 37 0.6727
## 66970 3.14 37 0.6727
## 66971 2.07 15 0.6250
## 66972 2.07 15 0.6250
## 66973 0.00 0 NA
## 66974 2.57 30 0.6122
## RENTER.OCCUPIED.HOUSING.UNITS RENTER.OCCUPIED.HOUSING.PERCENTAGE
## 66969 26 0.7030
## 66970 26 0.7030
## 66971 13 0.8670
## 66972 13 0.8670
## 66973 0 NA
## 66974 28 0.9329
## OCCUPIED.HOUSING.UNITS
## 66969 37
## 66970 37
## 66971 15
## 66972 15
## 66973 0
## 66974 30
#Display the summary statistics and the structure of the data
summary(energy_raw)
## COMMUNITY.AREA.NAME CENSUS.BLOCK BUILDING_TYPE
## Length:66974 Min. :1.7e+14 Length:66974
## Class :character 1st Qu.:1.7e+14 Class :character
## Mode :character Median :1.7e+14 Mode :character
## Mean :1.7e+14
## 3rd Qu.:1.7e+14
## Max. :1.7e+14
##
## BUILDING_SUBTYPE KWH.JANUARY.2010 KWH.FEBRUARY.2010
## Length:66974 Min. : 0 Min. : 0
## Class :character 1st Qu.: 1369 1st Qu.: 1612
## Mode :character Median : 3476 Median : 3806
## Mean : 12810 Mean : 12582
## 3rd Qu.: 7138 3rd Qu.: 7396
## Max. :21214017 Max. :21065500
## NA's :871 NA's :871
## KWH.MARCH.2010 KWH.APRIL.2010 KWH.MAY.2010
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 1585 1st Qu.: 1578 1st Qu.: 1955
## Median : 3676 Median : 3636 Median : 4522
## Mean : 11707 Mean : 11463 Mean : 13853
## 3rd Qu.: 7042 3rd Qu.: 6989 3rd Qu.: 8922
## Max. :18503691 Max. :17310058 Max. :21344049
## NA's :871 NA's :871 NA's :871
## KWH.JUNE.2010 KWH.JULY.2010 KWH.AUGUST.2010
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 2695 1st Qu.: 3199 1st Qu.: 2834
## Median : 6283 Median : 7375 Median : 6404
## Mean : 17213 Mean : 18845 Mean : 16989
## 3rd Qu.: 12793 3rd Qu.: 14624 3rd Qu.: 12274
## Max. :20209197 Max. :21478035 Max. :18586958
## NA's :871 NA's :871 NA's :871
## KWH.SEPTEMBER.2010 KWH.OCTOBER.2010 KWH.NOVEMBER.2010
## Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 2024 1st Qu.: 1951 1st Qu.: 2639
## Median : 4566 Median : 4354 Median : 5851
## Mean : 13595 Mean : 12595 Mean : 15705
## 3rd Qu.: 8612 3rd Qu.: 8154 3rd Qu.: 11044
## Max. :19280342 Max. :18423025 Max. :20670698
## NA's :871 NA's :871 NA's :871
## KWH.DECEMBER.2010 TOTAL_KWH ELECTRICITY.ACCOUNTS
## Min. : 0 Min. : 102 Length:66974
## 1st Qu.: 3076 1st Qu.: 28188 Class :character
## Median : 6813 Median : 62272 Mode :character
## Mean : 18315 Mean : 175672
## 3rd Qu.: 12602 3rd Qu.: 118172
## Max. :25060008 Max. :231280522
## NA's :871 NA's :871
## ZERO.KWH.ACCOUNTS THERM.JANUARY.2010 THERM.FEBRUARY.2010 THERM.MARCH.2010
## Min. : 0.000 Min. : 1 Min. : 1 Min. : 1
## 1st Qu.: 1.000 1st Qu.: 1022 1st Qu.: 897 1st Qu.: 736
## Median : 2.000 Median : 2141 Median : 1901 Median : 1558
## Mean : 4.771 Mean : 3306 Mean : 2893 Mean : 2406
## 3rd Qu.: 5.000 3rd Qu.: 3866 3rd Qu.: 3418 3rd Qu.: 2808
## Max. :601.000 Max. :566238 Max. :511323 Max. :557509
## NA's :2230 NA's :4232 NA's :1482
## TERM.APRIL.2010 THERM.MAY.2010 THERM.JUNE.2010 THERM.JULY.2010
## Min. : 1 Min. : 1.0 Min. : 1.0 Min. : 1.0
## 1st Qu.: 354 1st Qu.: 209.0 1st Qu.: 113.0 1st Qu.: 87.0
## Median : 779 Median : 469.0 Median : 256.0 Median : 197.0
## Mean : 1261 Mean : 807.2 Mean : 498.3 Mean : 418.4
## 3rd Qu.: 1440 3rd Qu.: 875.0 3rd Qu.: 486.0 3rd Qu.: 369.0
## Max. :624882 Max. :651226.0 Max. :631383.0 Max. :680201.0
## NA's :1575 NA's :1857 NA's :1767 NA's :1820
## THERM.AUGUST.2010 THERM.SEPTEMBER.2010 THERM.OCTOBER.2010
## Min. : 1.0 Min. : 1.0 Min. : 1.0
## 1st Qu.: 79.0 1st Qu.: 82.0 1st Qu.: 122.0
## Median : 180.0 Median : 187.0 Median : 276.0
## Mean : 399.7 Mean : 401.2 Mean : 568.2
## 3rd Qu.: 340.0 3rd Qu.: 347.0 3rd Qu.: 509.2
## Max. :693230.0 Max. :634051.0 Max. :593026.0
## NA's :1908 NA's :2282 NA's :1722
## THERM.NOVEMBER.2010 THERM.DECEMBER.2010 TOTAL_THERMS
## Min. : 1 Min. : 1 Min. : 25
## 1st Qu.: 282 1st Qu.: 774 1st Qu.: 4879
## Median : 629 Median : 1631 Median : 10340
## Mean : 1150 Mean : 2645 Mean : 16524
## 3rd Qu.: 1167 3rd Qu.: 2965 3rd Qu.: 18570
## Max. :539356 Max. :566326 Max. :7035940
## NA's :1559 NA's :1544 NA's :1296
## GAS.ACCOUNTS KWH.TOTAL.SQFT THERMS.TOTAL.SQFT
## Length:66974 Min. : 300 Min. : 300
## Class :character 1st Qu.: 5385 1st Qu.: 5368
## Mode :character Median : 10858 Median : 10844
## Mean : 21093 Mean : 20347
## 3rd Qu.: 18721 3rd Qu.: 18844
## Max. :6548217 Max. :6548217
## NA's :1150 NA's :1673
## KWH.MEAN.2010 KWH.STANDARD.DEVIATION.2010 KWH.MINIMUM.2010
## Min. : 102 Min. : 0 Min. : 100
## 1st Qu.: 8229 1st Qu.: 3630 1st Qu.: 2164
## Median : 10515 Median : 5148 Median : 4377
## Mean : 62493 Mean : 40323 Mean : 36852
## 3rd Qu.: 15645 3rd Qu.: 8065 3rd Qu.: 8774
## Max. :227750000 Max. :162851049 Max. :227752064
## NA's :871 NA's :9956 NA's :871
## KWH.1ST.QUARTILE.2010 KWH.2ND.QUARTILE.2010 KWH.3RD.QUARTILE.2010
## Min. : 100 Min. : 102 Min. : 102
## 1st Qu.: 4766 1st Qu.: 7636 1st Qu.: 10477
## Median : 6746 Median : 9944 Median : 13623
## Mean : 39158 Mean : 55773 Mean : 85608
## 3rd Qu.: 10374 3rd Qu.: 14603 3rd Qu.: 20018
## Max. :227752064 Max. :227752064 Max. :230793342
## NA's :871 NA's :871 NA's :871
## KWH.MAXIMUM.2010 KWH.SQFT.MEAN.2010 KWH.SQFT.STANDARD.DEVIATION.2010
## Min. : 102 Min. : 300 Min. : 0
## 1st Qu.: 13281 1st Qu.: 1326 1st Qu.: 240
## Median : 18033 Median : 2214 Median : 471
## Mean : 103512 Mean : 7665 Mean : 3446
## 3rd Qu.: 26276 3rd Qu.: 3790 3rd Qu.: 1048
## Max. :230793342 Max. :6548217 Max. :3840818
## NA's :871 NA's :1150 NA's :15385
## KWH.SQFT.MINIMUM.2010 KWH.SQFT.1ST.QUARTILE.2010
## Min. : 100 Min. : 100
## 1st Qu.: 954 1st Qu.: 1078
## Median : 1534 Median : 1760
## Mean : 5604 Mean : 5792
## 3rd Qu.: 2684 3rd Qu.: 2854
## Max. :6548217 Max. :6548217
## NA's :1150 NA's :1150
## KWH.SQFT.2ND.QUARTILE.2010 KWH.SQFT.3RD.QUARTILE.2010
## Min. : 300 Min. : 300
## 1st Qu.: 1250 1st Qu.: 1490
## Median : 2132 Median : 2470
## Mean : 7268 Mean : 9534
## 3rd Qu.: 3612 3rd Qu.: 4491
## Max. :6548217 Max. :6548217
## NA's :1150 NA's :1150
## KWH.SQFT.MAXIMUM.2010 THERM.MEAN.2010 THERM.STANDARD.DEVIATION.2010
## Min. : 300 Min. : 25 Min. : 0
## 1st Qu.: 1890 1st Qu.: 1365 1st Qu.: 351
## Median : 2810 Median : 1842 Median : 577
## Mean : 10581 Mean : 4062 Mean : 2649
## 3rd Qu.: 5254 3rd Qu.: 2707 3rd Qu.: 1183
## Max. :6548217 Max. :6600274 Max. :4941759
## NA's :1150 NA's :1296 NA's :10230
## THERM.MINIMUM.2010 THERM.1ST.QUARTILE.2010 THERM.2ND.QUARTILE.2010
## Min. : 25 Min. : 25 Min. : 25
## 1st Qu.: 592 1st Qu.: 957 1st Qu.: 1286
## Median : 990 Median : 1290 Median : 1724
## Mean : 2267 Mean : 2545 Mean : 3634
## 3rd Qu.: 1643 3rd Qu.: 1878 3rd Qu.: 2474
## Max. :6600274 Max. :6600274 Max. :6600274
## NA's :1296 NA's :1296 NA's :1296
## THERM.3RD.QUARTILE.2010 THERM.MAXIMUM.2010 THERMS.SQFT.MEAN.2010
## Min. : 25 Min. : 25 Min. : 300
## 1st Qu.: 1595 1st Qu.: 1934 1st Qu.: 1318
## Median : 2182 Median : 2603 Median : 2200
## Mean : 5490 Mean : 6955 Mean : 7175
## 3rd Qu.: 3241 3rd Qu.: 4069 3rd Qu.: 3736
## Max. :7012321 Max. :7012321 Max. :6548217
## NA's :1296 NA's :1296 NA's :1673
## THERMS.SQFT.STANDARD.DEVIATION.2010 THERMS.SQFT.MINIMUM.2010
## Min. : 0 Min. : 100
## 1st Qu.: 239 1st Qu.: 950
## Median : 467 Median : 1520
## Mean : 3140 Mean : 5282
## 3rd Qu.: 1034 3rd Qu.: 2651
## Max. :3840818 Max. :6548217
## NA's :15684 NA's :1673
## THERMS.SQFT.1ST.QUARTILE.2010 THERMS.SQFT.2ND.QUARTILE.2010
## Min. : 132 Min. : 300
## 1st Qu.: 1075 1st Qu.: 1244
## Median : 1756 Median : 2116
## Mean : 5462 Mean : 6799
## 3rd Qu.: 2820 3rd Qu.: 3564
## Max. :6548217 Max. :6548217
## NA's :1673 NA's :1673
## THERMS.SQFT.3RD.QUARTILE.2010 THERMS.SQFT.MAXIMUM.2010 TOTAL_POPULATION
## Min. : 300 Min. : 300 Min. : 0.00
## 1st Qu.: 1479 1st Qu.: 1888 1st Qu.: 37.00
## Median : 2450 Median : 2796 Median : 64.00
## Mean : 8897 Mean : 9851 Mean : 83.85
## 3rd Qu.: 4410 3rd Qu.: 5191 3rd Qu.: 104.00
## Max. :6548217 Max. :6548217 Max. :1590.00
## NA's :1673 NA's :1673 NA's :14
## TOTAL.UNITS AVERAGE.STORIES AVERAGE.BUILDING.AGE
## Min. : 0.00 Min. : 1.000 Min. : 0.00
## 1st Qu.: 15.00 1st Qu.: 1.140 1st Qu.: 53.00
## Median : 25.00 Median : 1.750 Median : 80.00
## Mean : 38.11 Mean : 1.887 Mean : 71.61
## 3rd Qu.: 42.00 3rd Qu.: 2.000 3rd Qu.: 96.50
## Max. :1365.00 Max. :110.000 Max. :158.00
## NA's :14
## AVERAGE.HOUSESIZE OCCUPIED.UNITS OCCUPIED.UNITS.PERCENTAGE
## Min. : 0.000 Min. : 0.0 Min. :0.0000
## 1st Qu.: 2.140 1st Qu.: 13.0 1st Qu.:0.8332
## Median : 2.700 Median : 22.0 Median :0.9148
## Mean : 2.722 Mean : 33.5 Mean :0.8804
## 3rd Qu.: 3.310 3rd Qu.: 37.0 3rd Qu.:0.9677
## Max. :12.000 Max. :1034.0 Max. :1.0000
## NA's :14 NA's :14 NA's :2445
## RENTER.OCCUPIED.HOUSING.UNITS RENTER.OCCUPIED.HOUSING.PERCENTAGE
## Min. : 0.00 Min. :0.0000
## 1st Qu.: 3.00 1st Qu.:0.2860
## Median : 11.00 Median :0.5379
## Mean : 19.78 Mean :0.5116
## 3rd Qu.: 23.00 3rd Qu.:0.7330
## Max. :1009.00 Max. :1.0000
## NA's :14 NA's :2618
## OCCUPIED.HOUSING.UNITS
## Min. : 0.0
## 1st Qu.: 13.0
## Median : 22.0
## Mean : 33.5
## 3rd Qu.: 37.0
## Max. :1034.0
## NA's :14
str(energy_raw)
## 'data.frame': 66974 obs. of 73 variables:
## $ COMMUNITY.AREA.NAME : chr "Albany Park" "Albany Park" "Albany Park" "Albany Park" ...
## $ CENSUS.BLOCK : num 1.7e+14 1.7e+14 1.7e+14 1.7e+14 1.7e+14 ...
## $ BUILDING_TYPE : chr "Residential" "Residential" "Residential" "Residential" ...
## $ BUILDING_SUBTYPE : chr "Multi 7+" "Multi < 7" "Single Family" "Multi 7+" ...
## $ KWH.JANUARY.2010 : int 11921 1233 4141 1230 12977 2878 1478 4985 4926 16639 ...
## $ KWH.FEBRUARY.2010 : int 12145 1645 3798 1333 14639 3755 1890 2636 6413 23502 ...
## $ KWH.MARCH.2010 : int 9759 994 2939 1260 12718 4571 1364 2353 5586 19587 ...
## $ KWH.APRIL.2010 : int 11542 1055 4727 1405 14973 2984 1271 4761 5606 23327 ...
## $ KWH.MAY.2010 : int 14348 1284 5324 1699 16384 3111 1464 4391 6271 26537 ...
## $ KWH.JUNE.2010 : int 26617 3527 9676 2094 32940 4808 2118 7362 11549 40725 ...
## $ KWH.JULY.2010 : int 24210 3099 7591 732 24454 4132 2384 6462 8549 41430 ...
## $ KWH.AUGUST.2010 : int 20383 2527 6287 1312 23926 3564 3767 8015 6709 41268 ...
## $ KWH.SEPTEMBER.2010 : int 11983 904 2920 1462 15012 2174 2059 7314 3963 26208 ...
## $ KWH.OCTOBER.2010 : int 10335 626 2565 1358 13679 1985 1387 3816 3480 23230 ...
## $ KWH.NOVEMBER.2010 : int 25327 2092 5979 1372 31979 5968 2874 7496 7998 43196 ...
## $ KWH.DECEMBER.2010 : int 22462 1622 5073 1495 30660 5400 3244 6391 8613 43582 ...
## $ TOTAL_KWH : int 201032 20608 61020 16752 244341 45330 25300 65982 79663 369231 ...
## $ ELECTRICITY.ACCOUNTS : chr "48" "Less than 4" "6" "Less than 4" ...
## $ ZERO.KWH.ACCOUNTS : int 22 1 2 2 32 0 2 3 2 106 ...
## $ THERM.JANUARY.2010 : int 7247 321 1222 2961 11508 1793 1554 3107 3371 22813 ...
## $ THERM.FEBRUARY.2010 : int 5904 130 1016 2664 9057 1573 1195 2749 2647 18905 ...
## $ THERM.MARCH.2010 : int 5180 86 860 1616 8000 1352 1280 2228 2396 16890 ...
## $ TERM.APRIL.2010 : int 3113 49 543 798 4529 890 821 1331 1407 10504 ...
## $ THERM.MAY.2010 : int 1822 19 346 344 2809 853 663 738 833 6981 ...
## $ THERM.JUNE.2010 : int 1272 13 247 404 1507 541 607 443 460 4455 ...
## $ THERM.JULY.2010 : int 1234 7 203 320 1179 448 487 329 286 3456 ...
## $ THERM.AUGUST.2010 : int 952 10 179 272 991 438 476 284 260 3232 ...
## $ THERM.SEPTEMBER.2010 : int 1780 12 170 368 994 439 382 288 246 3306 ...
## $ THERM.OCTOBER.2010 : int 1472 9 190 745 1254 565 459 301 323 3477 ...
## $ THERM.NOVEMBER.2010 : int 1961 21 298 1260 2595 787 590 520 632 5898 ...
## $ THERM.DECEMBER.2010 : int 4885 78 791 2901 7167 1538 971 1821 1919 14630 ...
## $ TOTAL_THERMS : int 36822 755 6065 14653 51590 11217 9485 14139 14780 114547 ...
## $ GAS.ACCOUNTS : chr "21" "Less than 4" "6" "6" ...
## $ KWH.TOTAL.SQFT : int 48825 3306 9472 14407 58835 8240 13305 16654 9690 127916 ...
## $ THERMS.TOTAL.SQFT : int 48825 3306 9472 14407 58835 8240 13305 16654 10840 127916 ...
## $ KWH.MEAN.2010 : num 20103 20608 10170 16752 15271 ...
## $ KWH.STANDARD.DEVIATION.2010 : num 8610 NA 4410 NA 8090 ...
## $ KWH.MINIMUM.2010 : int 9414 20608 5619 16752 5462 15929 7285 8496 5388 4397 ...
## $ KWH.1ST.QUARTILE.2010 : num 12563 20608 6746 16752 10344 ...
## $ KWH.2ND.QUARTILE.2010 : num 19073 20608 9056 16752 12427 ...
## $ KWH.3RD.QUARTILE.2010 : num 22177 20608 13014 16752 17496 ...
## $ KWH.MAXIMUM.2010 : int 36781 20608 17530 16752 34236 29401 18015 16794 19735 39809 ...
## $ KWH.SQFT.MEAN.2010 : num 24413 3306 1579 14407 3677 ...
## $ KWH.SQFT.STANDARD.DEVIATION.2010 : num 5699 NA 864 NA 1062 ...
## $ KWH.SQFT.MINIMUM.2010 : int 20383 3306 1226 14407 2414 8240 13305 2448 1116 24751 ...
## $ KWH.SQFT.1ST.QUARTILE.2010 : num 20383 3306 1226 14407 2546 ...
## $ KWH.SQFT.2ND.QUARTILE.2010 : num 24413 3306 1226 14407 3554 ...
## $ KWH.SQFT.3RD.QUARTILE.2010 : num 28442 3306 1226 14407 4692 ...
## $ KWH.SQFT.MAXIMUM.2010 : int 28442 3306 3342 14407 5530 8240 13305 4554 1334 27975 ...
## $ THERM.MEAN.2010 : num 5260 755 1011 14653 3224 ...
## $ THERM.STANDARD.DEVIATION.2010 : num 8436 NA 621 NA 1079 ...
## $ THERM.MINIMUM.2010 : int 882 755 496 14653 2071 1634 1866 2689 835 114 ...
## $ THERM.1ST.QUARTILE.2010 : num 957 755 514 14653 2499 ...
## $ THERM.2ND.QUARTILE.2010 : num 1102 755 836 14653 2934 ...
## $ THERM.3RD.QUARTILE.2010 : num 8024 755 1240 14653 3594 ...
## $ THERM.MAXIMUM.2010 : int 23460 755 2144 14653 5754 9583 7619 2956 2372 28459 ...
## $ THERMS.SQFT.MEAN.2010 : num 24413 3306 1579 14407 3677 ...
## $ THERMS.SQFT.STANDARD.DEVIATION.2010: num 5699 NA 864 NA 1062 ...
## $ THERMS.SQFT.MINIMUM.2010 : int 20383 3306 1226 14407 2414 8240 13305 2448 1116 24751 ...
## $ THERMS.SQFT.1ST.QUARTILE.2010 : num 20383 3306 1226 14407 2546 ...
## $ THERMS.SQFT.2ND.QUARTILE.2010 : num 24413 3306 1226 14407 3554 ...
## $ THERMS.SQFT.3RD.QUARTILE.2010 : num 28442 3306 1226 14407 4692 ...
## $ THERMS.SQFT.MAXIMUM.2010 : int 28442 3306 3342 14407 5530 8240 13305 4554 1334 27975 ...
## $ TOTAL_POPULATION : int 132 132 132 228 228 231 231 231 231 456 ...
## $ TOTAL.UNITS : int 64 64 64 79 79 70 70 70 70 180 ...
## $ AVERAGE.STORIES : num 3 2 1.17 3 2.5 1 3 2.2 1 3 ...
## $ AVERAGE.BUILDING.AGE : num 65.5 86 14.3 86 87.7 ...
## $ AVERAGE.HOUSESIZE : num 2.2 2.2 2.2 3.51 3.51 3.73 3.73 3.73 3.73 2.73 ...
## $ OCCUPIED.UNITS : int 60 60 60 65 65 62 62 62 62 167 ...
## $ OCCUPIED.UNITS.PERCENTAGE : num 0.938 0.938 0.938 0.823 0.823 ...
## $ RENTER.OCCUPIED.HOUSING.UNITS : int 33 33 33 49 49 49 49 49 49 167 ...
## $ RENTER.OCCUPIED.HOUSING.PERCENTAGE : num 0.55 0.55 0.55 0.754 0.754 0.79 0.79 0.79 0.79 1 ...
## $ OCCUPIED.HOUSING.UNITS : int 60 60 60 65 65 62 62 62 62 167 ...
#Create a subset of "energy_raw" that contains only numeric data
energy_data0 <- subset(energy_raw, select = c(BUILDING_TYPE, TOTAL_KWH, TOTAL_POPULATION))
energy_data1 <- na.omit(energy_data0)
#Display the "head" and "tail" of the dataset, "energy_data1"
head(energy_data1)
## BUILDING_TYPE TOTAL_KWH TOTAL_POPULATION
## 1 Residential 201032 132
## 2 Residential 20608 132
## 3 Residential 61020 132
## 4 Residential 16752 228
## 5 Residential 244341 228
## 6 Commercial 45330 231
tail(energy_data1)
## BUILDING_TYPE TOTAL_KWH TOTAL_POPULATION
## 66969 Residential 139240 116
## 66970 Residential 27654 116
## 66971 Commercial 41977 31
## 66972 Residential 48850 31
## 66973 Residential 17707 0
## 66974 Residential 57736 77
#Display the summary statistics and the structure of the data
summary(energy_data1)
## BUILDING_TYPE TOTAL_KWH TOTAL_POPULATION
## Length:66089 Min. : 102 Min. : 0.00
## Class :character 1st Qu.: 28189 1st Qu.: 37.00
## Mode :character Median : 62271 Median : 64.00
## Mean : 175675 Mean : 83.81
## 3rd Qu.: 118156 3rd Qu.: 104.00
## Max. :231280522 Max. :1590.00
str(energy_data1)
## 'data.frame': 66089 obs. of 3 variables:
## $ BUILDING_TYPE : chr "Residential" "Residential" "Residential" "Residential" ...
## $ TOTAL_KWH : int 201032 20608 61020 16752 244341 45330 25300 65982 79663 369231 ...
## $ TOTAL_POPULATION: int 132 132 132 228 228 231 231 231 231 456 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:885] 67 85 104 128 328 415 494 522 804 853 ...
## .. ..- attr(*, "names")= chr [1:885] "67" "85" "104" "128" ...
#Transform 'BUILDING_TYPE' into categorical variables (0 represents residential buildings and 1 represents non-residential buildings, which could correspond to commercial or industrial buildings)
energy_data1$BUILDING_TYPE = as.character(energy_data1$BUILDING_TYPE)
energy_data1$BUILDING_TYPE[energy_data1$BUILDING_TYPE != "Residential"] = 0
energy_data1$BUILDING_TYPE[energy_data1$BUILDING_TYPE == "Residential"] = 1
#Categorize 'BUILDING.TYPE' as a factor and display its resulting levels
energy_data1$BUILDING_TYPE = as.factor(energy_data1$BUILDING_TYPE)
levels(energy_data1$BUILDING_TYPE)
## [1] "0" "1"
Upon performing this initial summary statistics analysis, a hierarchical approach is carried out in beginning to develop a multiple linear regression model. Using information obtained from a U.S. Department of Energy document entitled “Energy Efficiency Trends in Residential and Commercial Buildings” [reference: http://apps1.eere.energy.gov/buildings/publications/pdfs/corporate/bt_stateindustry.pdf] and learning that a relationship exists between energy consumption, building type (residential, commercial, etc.), and building population, we aim to determine (using the “Energy Usage 2010” dataset) if building type can be determined using information pertaining to energy consumption (in kilowatt-hours) and/or building population. In answering our question, building type is treated as a dichotomous dependent variable and both building population and energy consumption (in kilowatt-hours) are treated as continuous independent variables.
Therefore, upon carrying out this hierarchical approach for this experiment, we are now trying to determine whether or not the variation that is observed in the dependent variable (which corresponds to ‘BUILDING_TYPE’ in this analysis) can be explained by the variation existent in either of the independent variables in this experiment (which correspond to ‘TOTAL_KWH’ and ‘TOTAL_POPULATION’). Therefore, the null hypothesis that is being tested states that total energy consumption (in kilowatt-hours) and building population do not have a significant effect on the determination of building type (i.e., either residential or non-residential). Opposingly, the alternate hypothesis that is being tested states that total energy consumption (in kilowatt-hours) and building population do, in fact, have a significant effect on the determination of building type (i.e., either residential or non-residential). In our analysis, we aim to create an explanatory model that uses these independent variables in the determination of our dichotomous dependent variable.
In this experiment, a hierarchical multiple linear regression model is generated, which will offer some insight into determining whether building type can be explained by each of the independent variables being considered in this analysis, and whether any existence of suppression is likely to exist within a linear regression model comprised of this data. The independent variables include total energy consumption (in kilowatt-hours) and building population, and the dependent variable refers to building type characterized as being either residential or non-residential.
Originally, the “Energy Usage 2010” dataset contains 66,974 observations. However, this number of observations may serve to be too large for a statistically significant analysis, so a power analysis is performed in this experiment to determine the most appropriate sample size for our final multiple linear regression model (where our desired alpha-level equals 0.05, our desired power-level equals 0.95, our effect size equals 0.02, and the considered number of predictors equals 2).
#Generate an initial Hierarchical Multiple Linear Regression Model that uses all 66,974 observations
energy_model <- glm(energy_data1$BUILDING_TYPE~energy_data1$TOTAL_KWH+energy_data1$TOTAL_POPULATION, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#Display summary of the initial Hierarchical Multiple Linear Regression Model
summary(energy_model)
##
## Call:
## glm(formula = energy_data1$BUILDING_TYPE ~ energy_data1$TOTAL_KWH +
## energy_data1$TOTAL_POPULATION, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8184 -0.0002 0.7041 0.7457 5.0715
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.441e+00 1.389e-02 103.76 <2e-16 ***
## energy_data1$TOTAL_KWH -2.062e-06 6.351e-08 -32.47 <2e-16 ***
## energy_data1$TOTAL_POPULATION -1.265e-03 1.073e-04 -11.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 74606 on 66088 degrees of freedom
## Residual deviance: 71640 on 66086 degrees of freedom
## AIC: 71646
##
## Number of Fisher Scoring iterations: 8
Upon determining the effect size, the software G[STAR]Power is used to determine the most appropriate sample size for this hierarchical multiple linear regression analysis. In its results, G[STAR]Power generated a sample size of 543. So, with this sample size, the dataset “energy_data1” will be sampled, creating a new dataset to be used for this hierarchical multiple linear regression model, which will then be used to determine if corresponding building types can be explained by the variation existent in both energy consumption and building population.
#Randomly take a sample of 543 observations from "energy_data1", creating "energy_final".
S <- 543
set.seed(23)
energy.index <- sample(1:nrow(energy_data1),S,replace=FALSE)
energy_final <- energy_data1[energy.index,]
#Generate a new Hierarchical Multiple Linear Regression Model that uses 543 observations
energy_model_final <- glm(energy_final$BUILDING_TYPE~energy_final$TOTAL_KWH+energy_final$TOTAL_POPULATION, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#Display summary of the final Hierarchical Multiple Linear Regression Model
summary(energy_model_final)
##
## Call:
## glm(formula = energy_final$BUILDING_TYPE ~ energy_final$TOTAL_KWH +
## energy_final$TOTAL_POPULATION, family = "binomial")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0014 0.5584 0.6022 0.6548 1.1704
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.876e+00 1.799e-01 10.429 <2e-16 ***
## energy_final$TOTAL_KWH -2.727e-06 9.221e-07 -2.957 0.0031 **
## energy_final$TOTAL_POPULATION -2.134e-03 1.348e-03 -1.583 0.1133
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 544.54 on 542 degrees of freedom
## Residual deviance: 510.97 on 540 degrees of freedom
## AIC: 516.97
##
## Number of Fisher Scoring iterations: 7
#Collinearity Check
col.test <- lm(energy_final$TOTAL_KWH~energy_final$TOTAL_POPULATION)
summary(col.test)
##
## Call:
## lm(formula = energy_final$TOTAL_KWH ~ energy_final$TOTAL_POPULATION)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2846881 -134912 3254 109245 9959580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -179506.9 35729.6 -5.024 6.88e-07 ***
## energy_final$TOTAL_POPULATION 4008.3 291.1 13.769 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 616800 on 541 degrees of freedom
## Multiple R-squared: 0.2595, Adjusted R-squared: 0.2581
## F-statistic: 189.6 on 1 and 541 DF, p-value: < 2.2e-16
Before beginning to check the model against the four “LINE” assumptions associated with linear regression modeling, histograms, boxplots, scatterplots, and a “Quality of Fit” plot (via a fitted vs. residual values determination) are generated, which will be used for their graphical nature in our interpretations.
#Generate histograms for all of the different independent variables being considered in our sampled data ('TOTAL_KWH' and 'TOTAL_POPULATION')
hist(energy_final$TOTAL_KWH, xlab = "Total Energy Consumption [in kilowatt-hours]", main = "Histogram of Total Energy Consumption")
hist(energy_final$TOTAL_POPULATION, xlab = "Total Building Population", main = "Histogram of Total Building Population")
#Generate a boxplot of the data (Independent Variable = Energy Consumption)
boxplot(x = energy_final$TOTAL_KWH, pch=21, bg="darkviolet", main="Total Energy Consumption", xlab = "Total Energy Consumption [in kilowatt-hours]")
#Generate a boxplot of the data (Independent Variable = Population)
boxplot(x = energy_final$TOTAL_POPULATION, pch=21, bg="darkviolet", main="Total Building Population", xlab = "Building Population")
#Generate a scatterplot of the data: "Building Type" vs. "Energy Consumption"
plot(y = energy_final$BUILDING_TYPE,x = energy_final$TOTAL_KWH, pch=21, bg="darkviolet", main="Total Energy Consumption vs. Building Type", ylab = "Building Type", xlab = "Energy Consumption (in kilowatt-hours)")
#Generate a scatterplot of the data: "Building Type" vs. "Building Population"
plot(y = energy_final$BUILDING_TYPE,x = energy_final$TOTAL_POPULATION, pch=21, bg="darkviolet", main="Total Building Population vs. Building Type", ylab = "Building Type", xlab = "Building Population")
#Create a "Quality of Fit Model" that plots the residuals of "energy_model_final" against its fitted model.
par(mfrow=c(1,1))
plot(fitted(energy_model_final),residuals(energy_model_final), main = "Residuals of 'energy_model_final' Against Fitted Model 'energy_model_final' [Not Standardized]")
abline(0,0, col='darkviolet', lwd=2.5)
#Create a "Quality of Fit Model" that plots the standardized residuals of "energy_model_final" against its fitted model.
par(mfrow=c(1,1))
standardized_energy_model <- rstandard(energy_model_final)
plot(fitted(energy_model_final),standardized_energy_model, main = "Standardized Residuals of 'energy_model_final' Against Fitted Model 'energy_model_final'")
abline(0,0, col='darkviolet', lwd=2.5)
In interpreting our hierarchical multiple linear logistic regression model and the statistical significance of the results that were generated therein, it is important to test the model against the four “LINE” assumptions corresponding to linear regression.
In interpreting our hierarchical multiple linear logistic regression model and the statistical significance of the results that were generated therein, it is important to check the model against the four main issues surrounding linear regression.