Assignment #04: Logistic Regression [Outline]

Logistic Regression Project - Analysis of Building Type by Energy Usage

Brendan Howell

Renselaer Polytechnic Institute

05/07/15 - Version 1.0

1. Data

Dataset which displays several units of energy consumption for households, businesses, and industries in the City of Chicago during 2010.

Description: [To be included in Final Version.]

[Reference: http://catalog.data.gov/dataset/energy-usage-2010-24a67]

Data Organization

Below, the “Energy Usage 2010” Dataset is loaded into R, and its summary statistics and its structure are display (along with the “head” and the “tail” of the dataset).

#Install and load the "Energy Usage 2010" dataset into R, assigning a variable, "energy_raw", to the complete dataframe..
rm(list=ls())
energy_raw <- read.csv("~/Academics (RPI)/10. Spring 2015/Applied Regression Analysis/Assignments/Assignment #4/Energy_Usage_2010.csv", header=TRUE, stringsAsFactors = FALSE)
#Then, display the "head" and "tail" of the dataset, "eNergy_raw".
head(energy_raw)
##   COMMUNITY.AREA.NAME CENSUS.BLOCK BUILDING_TYPE BUILDING_SUBTYPE
## 1         Albany Park      1.7e+14   Residential         Multi 7+
## 2         Albany Park      1.7e+14   Residential        Multi < 7
## 3         Albany Park      1.7e+14   Residential    Single Family
## 4         Albany Park      1.7e+14   Residential         Multi 7+
## 5         Albany Park      1.7e+14   Residential        Multi < 7
## 6         Albany Park      1.7e+14    Commercial        Multi < 7
##   KWH.JANUARY.2010 KWH.FEBRUARY.2010 KWH.MARCH.2010 KWH.APRIL.2010
## 1            11921             12145           9759          11542
## 2             1233              1645            994           1055
## 3             4141              3798           2939           4727
## 4             1230              1333           1260           1405
## 5            12977             14639          12718          14973
## 6             2878              3755           4571           2984
##   KWH.MAY.2010 KWH.JUNE.2010 KWH.JULY.2010 KWH.AUGUST.2010
## 1        14348         26617         24210           20383
## 2         1284          3527          3099            2527
## 3         5324          9676          7591            6287
## 4         1699          2094           732            1312
## 5        16384         32940         24454           23926
## 6         3111          4808          4132            3564
##   KWH.SEPTEMBER.2010 KWH.OCTOBER.2010 KWH.NOVEMBER.2010 KWH.DECEMBER.2010
## 1              11983            10335             25327             22462
## 2                904              626              2092              1622
## 3               2920             2565              5979              5073
## 4               1462             1358              1372              1495
## 5              15012            13679             31979             30660
## 6               2174             1985              5968              5400
##   TOTAL_KWH ELECTRICITY.ACCOUNTS ZERO.KWH.ACCOUNTS THERM.JANUARY.2010
## 1    201032                   48                22               7247
## 2     20608          Less than 4                 1                321
## 3     61020                    6                 2               1222
## 4     16752          Less than 4                 2               2961
## 5    244341                   49                32              11508
## 6     45330                    7                 0               1793
##   THERM.FEBRUARY.2010 THERM.MARCH.2010 TERM.APRIL.2010 THERM.MAY.2010
## 1                5904             5180            3113           1822
## 2                 130               86              49             19
## 3                1016              860             543            346
## 4                2664             1616             798            344
## 5                9057             8000            4529           2809
## 6                1573             1352             890            853
##   THERM.JUNE.2010 THERM.JULY.2010 THERM.AUGUST.2010 THERM.SEPTEMBER.2010
## 1            1272            1234               952                 1780
## 2              13               7                10                   12
## 3             247             203               179                  170
## 4             404             320               272                  368
## 5            1507            1179               991                  994
## 6             541             448               438                  439
##   THERM.OCTOBER.2010 THERM.NOVEMBER.2010 THERM.DECEMBER.2010 TOTAL_THERMS
## 1               1472                1961                4885        36822
## 2                  9                  21                  78          755
## 3                190                 298                 791         6065
## 4                745                1260                2901        14653
## 5               1254                2595                7167        51590
## 6                565                 787                1538        11217
##   GAS.ACCOUNTS KWH.TOTAL.SQFT THERMS.TOTAL.SQFT KWH.MEAN.2010
## 1           21          48825             48825      20103.20
## 2  Less than 4           3306              3306      20608.00
## 3            6           9472              9472      10170.00
## 4            6          14407             14407      16752.00
## 5           54          58835             58835      15271.31
## 6            6           8240              8240      22665.00
##   KWH.STANDARD.DEVIATION.2010 KWH.MINIMUM.2010 KWH.1ST.QUARTILE.2010
## 1                     8609.69             9414               12563.0
## 2                          NA            20608               20608.0
## 3                     4410.10             5619                6746.0
## 4                          NA            16752               16752.0
## 5                     8089.70             5462               10343.5
## 6                     9526.14            15929               15929.0
##   KWH.2ND.QUARTILE.2010 KWH.3RD.QUARTILE.2010 KWH.MAXIMUM.2010
## 1               19072.5               22177.0            36781
## 2               20608.0               20608.0            20608
## 3                9055.5               13014.0            17530
## 4               16752.0               16752.0            16752
## 5               12427.0               17495.5            34236
## 6               22665.0               29401.0            29401
##   KWH.SQFT.MEAN.2010 KWH.SQFT.STANDARD.DEVIATION.2010
## 1           24412.50                          5698.57
## 2            3306.00                               NA
## 3            1578.67                           863.85
## 4           14407.00                               NA
## 5            3677.19                          1061.65
## 6            8240.00                               NA
##   KWH.SQFT.MINIMUM.2010 KWH.SQFT.1ST.QUARTILE.2010
## 1                 20383                      20383
## 2                  3306                       3306
## 3                  1226                       1226
## 4                 14407                      14407
## 5                  2414                       2546
## 6                  8240                       8240
##   KWH.SQFT.2ND.QUARTILE.2010 KWH.SQFT.3RD.QUARTILE.2010
## 1                    24412.5                      28442
## 2                     3306.0                       3306
## 3                     1226.0                       1226
## 4                    14407.0                      14407
## 5                     3553.5                       4692
## 6                     8240.0                       8240
##   KWH.SQFT.MAXIMUM.2010 THERM.MEAN.2010 THERM.STANDARD.DEVIATION.2010
## 1                 28442         5260.29                       8435.63
## 2                  3306          755.00                            NA
## 3                  3342         1010.83                        620.53
## 4                 14407        14653.00                            NA
## 5                  5530         3224.38                       1079.13
## 6                  8240         5608.50                       5620.79
##   THERM.MINIMUM.2010 THERM.1ST.QUARTILE.2010 THERM.2ND.QUARTILE.2010
## 1                882                     957                  1102.0
## 2                755                     755                   755.0
## 3                496                     514                   835.5
## 4              14653                   14653                 14653.0
## 5               2071                    2499                  2933.5
## 6               1634                    1634                  5608.5
##   THERM.3RD.QUARTILE.2010 THERM.MAXIMUM.2010 THERMS.SQFT.MEAN.2010
## 1                  8024.0              23460              24412.50
## 2                   755.0                755               3306.00
## 3                  1240.0               2144               1578.67
## 4                 14653.0              14653              14407.00
## 5                  3593.5               5754               3677.19
## 6                  9583.0               9583               8240.00
##   THERMS.SQFT.STANDARD.DEVIATION.2010 THERMS.SQFT.MINIMUM.2010
## 1                             5698.57                    20383
## 2                                  NA                     3306
## 3                              863.85                     1226
## 4                                  NA                    14407
## 5                             1061.65                     2414
## 6                                  NA                     8240
##   THERMS.SQFT.1ST.QUARTILE.2010 THERMS.SQFT.2ND.QUARTILE.2010
## 1                         20383                       24412.5
## 2                          3306                        3306.0
## 3                          1226                        1226.0
## 4                         14407                       14407.0
## 5                          2546                        3553.5
## 6                          8240                        8240.0
##   THERMS.SQFT.3RD.QUARTILE.2010 THERMS.SQFT.MAXIMUM.2010 TOTAL_POPULATION
## 1                         28442                    28442              132
## 2                          3306                     3306              132
## 3                          1226                     3342              132
## 4                         14407                    14407              228
## 5                          4692                     5530              228
## 6                          8240                     8240              231
##   TOTAL.UNITS AVERAGE.STORIES AVERAGE.BUILDING.AGE AVERAGE.HOUSESIZE
## 1          64            3.00                65.50              2.20
## 2          64            2.00                86.00              2.20
## 3          64            1.17                14.33              2.20
## 4          79            3.00                86.00              3.51
## 5          79            2.50                87.69              3.51
## 6          70            1.00                 0.00              3.73
##   OCCUPIED.UNITS OCCUPIED.UNITS.PERCENTAGE RENTER.OCCUPIED.HOUSING.UNITS
## 1             60                    0.9375                            33
## 2             60                    0.9375                            33
## 3             60                    0.9375                            33
## 4             65                    0.8228                            49
## 5             65                    0.8228                            49
## 6             62                    0.8856                            49
##   RENTER.OCCUPIED.HOUSING.PERCENTAGE OCCUPIED.HOUSING.UNITS
## 1                              0.550                     60
## 2                              0.550                     60
## 3                              0.550                     60
## 4                              0.754                     65
## 5                              0.754                     65
## 6                              0.790                     62
tail(energy_raw)
##       COMMUNITY.AREA.NAME CENSUS.BLOCK BUILDING_TYPE BUILDING_SUBTYPE
## 66969            Woodlawn      1.7e+14   Residential        Multi < 7
## 66970            Woodlawn      1.7e+14   Residential    Single Family
## 66971            Woodlawn      1.7e+14    Commercial        Multi < 7
## 66972            Woodlawn      1.7e+14   Residential        Multi < 7
## 66973            Woodlawn      1.7e+14   Residential    Single Family
## 66974            Woodlawn      1.7e+14   Residential        Multi < 7
##       KWH.JANUARY.2010 KWH.FEBRUARY.2010 KWH.MARCH.2010 KWH.APRIL.2010
## 66969             9572              9104           8525           7756
## 66970             2705              1318           1582           1465
## 66971             1005              1760           1521           1832
## 66972             3567              3031           2582           2295
## 66973             1208              1055           1008           1109
## 66974             2717              3057           2695           3793
##       KWH.MAY.2010 KWH.JUNE.2010 KWH.JULY.2010 KWH.AUGUST.2010
## 66969        11256         11669         12099           13200
## 66970         1494          2990          2449            2351
## 66971         2272          2361          3018            3030
## 66972         7902          4987          5773            3996
## 66973         1591          1367          1569            1551
## 66974         4237          5383          5544            6929
##       KWH.SEPTEMBER.2010 KWH.OCTOBER.2010 KWH.NOVEMBER.2010
## 66969               9694             8419             19077
## 66970               1213             2174              2888
## 66971               2886             3833              6290
## 66972               3050             3103              3880
## 66973               1376             1236              2108
## 66974               5280             5971              6986
##       KWH.DECEMBER.2010 TOTAL_KWH ELECTRICITY.ACCOUNTS ZERO.KWH.ACCOUNTS
## 66969             18869    139240                   21                18
## 66970              5025     27654                    6                 7
## 66971             12169     41977                    9                 5
## 66972              4684     48850                    7                 2
## 66973              2529     17707                    7                 9
## 66974              5144     57736                   12                17
##       THERM.JANUARY.2010 THERM.FEBRUARY.2010 THERM.MARCH.2010
## 66969               6914                5433             5054
## 66970               2166                1681             1858
## 66971                985                1152             1238
## 66972               2202                1874             1647
## 66973                 95                  11               47
## 66974               2372                1787             1449
##       TERM.APRIL.2010 THERM.MAY.2010 THERM.JUNE.2010 THERM.JULY.2010
## 66969            2967           2241            1107             770
## 66970            1172            708             360              72
## 66971             630            475             192             141
## 66972             906            645             346              84
## 66973               9             45              18              22
## 66974             718            572             286             155
##       THERM.AUGUST.2010 THERM.SEPTEMBER.2010 THERM.OCTOBER.2010
## 66969               674                  788                954
## 66970                67                   77                185
## 66971               162                  144                210
## 66972               150                  150                260
## 66973                 9                   17                 11
## 66974               134                  161                303
##       THERM.NOVEMBER.2010 THERM.DECEMBER.2010 TOTAL_THERMS GAS.ACCOUNTS
## 66969                2423                4619        33944           25
## 66970                 623                1800        10769            9
## 66971                 653                1744         7726            8
## 66972                 694                1335        10293            5
## 66973                  18                  13          315            5
## 66974                 588                1469         9994           13
##       KWH.TOTAL.SQFT THERMS.TOTAL.SQFT KWH.MEAN.2010
## 66969          48349             48349      12658.18
## 66970           7801              7801       6913.50
## 66971          11838             11838      13992.33
## 66972          11028             11028      16283.33
## 66973           4653              4653       4426.75
## 66974          17812             13776       9622.67
##       KWH.STANDARD.DEVIATION.2010 KWH.MINIMUM.2010 KWH.1ST.QUARTILE.2010
## 66969                     7948.06             2691                7635.0
## 66970                     5695.82             2444                2872.5
## 66971                     2989.54            10754               10754.0
## 66972                    15000.83             7010                7010.0
## 66973                     2297.29             1878                2635.0
## 66974                     5625.23             1312                6288.0
##       KWH.2ND.QUARTILE.2010 KWH.3RD.QUARTILE.2010 KWH.MAXIMUM.2010
## 66969               11370.0               19168.0            30287
## 66970                5139.0               10954.5            14932
## 66971               14576.0               16647.0            16647
## 66972                8250.0               33590.0            33590
## 66973                4325.0                6218.5             7179
## 66974                9586.5               15290.0            15673
##       KWH.SQFT.MEAN.2010 KWH.SQFT.STANDARD.DEVIATION.2010
## 66969             4834.9                          2180.96
## 66970             3900.5                          1429.06
## 66971             5919.0                           725.49
## 66972             3676.0                          1022.80
## 66973             4653.0                               NA
## 66974             3562.4                          2911.56
##       KWH.SQFT.MINIMUM.2010 KWH.SQFT.1ST.QUARTILE.2010
## 66969                  2810                       3166
## 66970                  2890                       2890
## 66971                  5406                       5406
## 66972                  2800                       2800
## 66973                  4653                       4653
## 66974                  1866                       2170
##       KWH.SQFT.2ND.QUARTILE.2010 KWH.SQFT.3RD.QUARTILE.2010
## 66969                     3771.0                       7232
## 66970                     3900.5                       4911
## 66971                     5919.0                       6432
## 66972                     3428.0                       4800
## 66973                     4653.0                       4653
## 66974                     2472.0                       2556
##       KWH.SQFT.MAXIMUM.2010 THERM.MEAN.2010 THERM.STANDARD.DEVIATION.2010
## 66969                  8016         3085.82                       1542.64
## 66970                  4911         2692.25                       3661.92
## 66971                  6432         2575.33                       3492.97
## 66972                  4800         3431.00                       1155.32
## 66973                  4653          105.00                         80.30
## 66974                  8748         2498.50                       2372.88
##       THERM.MINIMUM.2010 THERM.1ST.QUARTILE.2010 THERM.2ND.QUARTILE.2010
## 66969                621                    2300                  2669.0
## 66970                272                     464                  1195.5
## 66971                 42                      42                  1124.0
## 66972               2449                    2449                  3140.0
## 66973                 49                      49                    69.0
## 66974                487                     578                  2029.0
##       THERM.3RD.QUARTILE.2010 THERM.MAXIMUM.2010 THERMS.SQFT.MEAN.2010
## 66969                  4408.0               6246                4834.9
## 66970                  4920.5               8106                3900.5
## 66971                  6560.0               6560                5919.0
## 66972                  4704.0               4704                3676.0
## 66973                   197.0                197                4653.0
## 66974                  4419.0               5449                4592.0
##       THERMS.SQFT.STANDARD.DEVIATION.2010 THERMS.SQFT.MINIMUM.2010
## 66969                             2180.96                     2810
## 66970                             1429.06                     2890
## 66971                              725.49                     5406
## 66972                             1022.80                     2800
## 66973                                  NA                     4653
## 66974                             3599.45                     2472
##       THERMS.SQFT.1ST.QUARTILE.2010 THERMS.SQFT.2ND.QUARTILE.2010
## 66969                          3166                        3771.0
## 66970                          2890                        3900.5
## 66971                          5406                        5919.0
## 66972                          2800                        3428.0
## 66973                          4653                        4653.0
## 66974                          2472                        2556.0
##       THERMS.SQFT.3RD.QUARTILE.2010 THERMS.SQFT.MAXIMUM.2010
## 66969                          7232                     8016
## 66970                          4911                     4911
## 66971                          6432                     6432
## 66972                          4800                     4800
## 66973                          4653                     4653
## 66974                          8748                     8748
##       TOTAL_POPULATION TOTAL.UNITS AVERAGE.STORIES AVERAGE.BUILDING.AGE
## 66969              116          55            2.00                51.90
## 66970              116          55            1.00                 0.00
## 66971               31          24            3.00               104.50
## 66972               31          24            2.33               100.67
## 66973                0           0            1.00                 0.00
## 66974               77          49            2.00                79.40
##       AVERAGE.HOUSESIZE OCCUPIED.UNITS OCCUPIED.UNITS.PERCENTAGE
## 66969              3.14             37                    0.6727
## 66970              3.14             37                    0.6727
## 66971              2.07             15                    0.6250
## 66972              2.07             15                    0.6250
## 66973              0.00              0                        NA
## 66974              2.57             30                    0.6122
##       RENTER.OCCUPIED.HOUSING.UNITS RENTER.OCCUPIED.HOUSING.PERCENTAGE
## 66969                            26                             0.7030
## 66970                            26                             0.7030
## 66971                            13                             0.8670
## 66972                            13                             0.8670
## 66973                             0                                 NA
## 66974                            28                             0.9329
##       OCCUPIED.HOUSING.UNITS
## 66969                     37
## 66970                     37
## 66971                     15
## 66972                     15
## 66973                      0
## 66974                     30
#Display the summary statistics and the structure of the data
summary(energy_raw)
##  COMMUNITY.AREA.NAME  CENSUS.BLOCK     BUILDING_TYPE     
##  Length:66974        Min.   :1.7e+14   Length:66974      
##  Class :character    1st Qu.:1.7e+14   Class :character  
##  Mode  :character    Median :1.7e+14   Mode  :character  
##                      Mean   :1.7e+14                     
##                      3rd Qu.:1.7e+14                     
##                      Max.   :1.7e+14                     
##                                                          
##  BUILDING_SUBTYPE   KWH.JANUARY.2010   KWH.FEBRUARY.2010 
##  Length:66974       Min.   :       0   Min.   :       0  
##  Class :character   1st Qu.:    1369   1st Qu.:    1612  
##  Mode  :character   Median :    3476   Median :    3806  
##                     Mean   :   12810   Mean   :   12582  
##                     3rd Qu.:    7138   3rd Qu.:    7396  
##                     Max.   :21214017   Max.   :21065500  
##                     NA's   :871        NA's   :871       
##  KWH.MARCH.2010     KWH.APRIL.2010      KWH.MAY.2010     
##  Min.   :       0   Min.   :       0   Min.   :       0  
##  1st Qu.:    1585   1st Qu.:    1578   1st Qu.:    1955  
##  Median :    3676   Median :    3636   Median :    4522  
##  Mean   :   11707   Mean   :   11463   Mean   :   13853  
##  3rd Qu.:    7042   3rd Qu.:    6989   3rd Qu.:    8922  
##  Max.   :18503691   Max.   :17310058   Max.   :21344049  
##  NA's   :871        NA's   :871        NA's   :871       
##  KWH.JUNE.2010      KWH.JULY.2010      KWH.AUGUST.2010   
##  Min.   :       0   Min.   :       0   Min.   :       0  
##  1st Qu.:    2695   1st Qu.:    3199   1st Qu.:    2834  
##  Median :    6283   Median :    7375   Median :    6404  
##  Mean   :   17213   Mean   :   18845   Mean   :   16989  
##  3rd Qu.:   12793   3rd Qu.:   14624   3rd Qu.:   12274  
##  Max.   :20209197   Max.   :21478035   Max.   :18586958  
##  NA's   :871        NA's   :871        NA's   :871       
##  KWH.SEPTEMBER.2010 KWH.OCTOBER.2010   KWH.NOVEMBER.2010 
##  Min.   :       0   Min.   :       0   Min.   :       0  
##  1st Qu.:    2024   1st Qu.:    1951   1st Qu.:    2639  
##  Median :    4566   Median :    4354   Median :    5851  
##  Mean   :   13595   Mean   :   12595   Mean   :   15705  
##  3rd Qu.:    8612   3rd Qu.:    8154   3rd Qu.:   11044  
##  Max.   :19280342   Max.   :18423025   Max.   :20670698  
##  NA's   :871        NA's   :871        NA's   :871       
##  KWH.DECEMBER.2010    TOTAL_KWH         ELECTRICITY.ACCOUNTS
##  Min.   :       0   Min.   :      102   Length:66974        
##  1st Qu.:    3076   1st Qu.:    28188   Class :character    
##  Median :    6813   Median :    62272   Mode  :character    
##  Mean   :   18315   Mean   :   175672                       
##  3rd Qu.:   12602   3rd Qu.:   118172                       
##  Max.   :25060008   Max.   :231280522                       
##  NA's   :871        NA's   :871                             
##  ZERO.KWH.ACCOUNTS THERM.JANUARY.2010 THERM.FEBRUARY.2010 THERM.MARCH.2010
##  Min.   :  0.000   Min.   :     1     Min.   :     1      Min.   :     1  
##  1st Qu.:  1.000   1st Qu.:  1022     1st Qu.:   897      1st Qu.:   736  
##  Median :  2.000   Median :  2141     Median :  1901      Median :  1558  
##  Mean   :  4.771   Mean   :  3306     Mean   :  2893      Mean   :  2406  
##  3rd Qu.:  5.000   3rd Qu.:  3866     3rd Qu.:  3418      3rd Qu.:  2808  
##  Max.   :601.000   Max.   :566238     Max.   :511323      Max.   :557509  
##                    NA's   :2230       NA's   :4232        NA's   :1482    
##  TERM.APRIL.2010  THERM.MAY.2010     THERM.JUNE.2010    THERM.JULY.2010   
##  Min.   :     1   Min.   :     1.0   Min.   :     1.0   Min.   :     1.0  
##  1st Qu.:   354   1st Qu.:   209.0   1st Qu.:   113.0   1st Qu.:    87.0  
##  Median :   779   Median :   469.0   Median :   256.0   Median :   197.0  
##  Mean   :  1261   Mean   :   807.2   Mean   :   498.3   Mean   :   418.4  
##  3rd Qu.:  1440   3rd Qu.:   875.0   3rd Qu.:   486.0   3rd Qu.:   369.0  
##  Max.   :624882   Max.   :651226.0   Max.   :631383.0   Max.   :680201.0  
##  NA's   :1575     NA's   :1857       NA's   :1767       NA's   :1820      
##  THERM.AUGUST.2010  THERM.SEPTEMBER.2010 THERM.OCTOBER.2010
##  Min.   :     1.0   Min.   :     1.0     Min.   :     1.0  
##  1st Qu.:    79.0   1st Qu.:    82.0     1st Qu.:   122.0  
##  Median :   180.0   Median :   187.0     Median :   276.0  
##  Mean   :   399.7   Mean   :   401.2     Mean   :   568.2  
##  3rd Qu.:   340.0   3rd Qu.:   347.0     3rd Qu.:   509.2  
##  Max.   :693230.0   Max.   :634051.0     Max.   :593026.0  
##  NA's   :1908       NA's   :2282         NA's   :1722      
##  THERM.NOVEMBER.2010 THERM.DECEMBER.2010  TOTAL_THERMS    
##  Min.   :     1      Min.   :     1      Min.   :     25  
##  1st Qu.:   282      1st Qu.:   774      1st Qu.:   4879  
##  Median :   629      Median :  1631      Median :  10340  
##  Mean   :  1150      Mean   :  2645      Mean   :  16524  
##  3rd Qu.:  1167      3rd Qu.:  2965      3rd Qu.:  18570  
##  Max.   :539356      Max.   :566326      Max.   :7035940  
##  NA's   :1559        NA's   :1544        NA's   :1296     
##  GAS.ACCOUNTS       KWH.TOTAL.SQFT    THERMS.TOTAL.SQFT
##  Length:66974       Min.   :    300   Min.   :    300  
##  Class :character   1st Qu.:   5385   1st Qu.:   5368  
##  Mode  :character   Median :  10858   Median :  10844  
##                     Mean   :  21093   Mean   :  20347  
##                     3rd Qu.:  18721   3rd Qu.:  18844  
##                     Max.   :6548217   Max.   :6548217  
##                     NA's   :1150      NA's   :1673     
##  KWH.MEAN.2010       KWH.STANDARD.DEVIATION.2010 KWH.MINIMUM.2010   
##  Min.   :      102   Min.   :        0           Min.   :      100  
##  1st Qu.:     8229   1st Qu.:     3630           1st Qu.:     2164  
##  Median :    10515   Median :     5148           Median :     4377  
##  Mean   :    62493   Mean   :    40323           Mean   :    36852  
##  3rd Qu.:    15645   3rd Qu.:     8065           3rd Qu.:     8774  
##  Max.   :227750000   Max.   :162851049           Max.   :227752064  
##  NA's   :871         NA's   :9956                NA's   :871        
##  KWH.1ST.QUARTILE.2010 KWH.2ND.QUARTILE.2010 KWH.3RD.QUARTILE.2010
##  Min.   :      100     Min.   :      102     Min.   :      102    
##  1st Qu.:     4766     1st Qu.:     7636     1st Qu.:    10477    
##  Median :     6746     Median :     9944     Median :    13623    
##  Mean   :    39158     Mean   :    55773     Mean   :    85608    
##  3rd Qu.:    10374     3rd Qu.:    14603     3rd Qu.:    20018    
##  Max.   :227752064     Max.   :227752064     Max.   :230793342    
##  NA's   :871           NA's   :871           NA's   :871          
##  KWH.MAXIMUM.2010    KWH.SQFT.MEAN.2010 KWH.SQFT.STANDARD.DEVIATION.2010
##  Min.   :      102   Min.   :    300    Min.   :      0                 
##  1st Qu.:    13281   1st Qu.:   1326    1st Qu.:    240                 
##  Median :    18033   Median :   2214    Median :    471                 
##  Mean   :   103512   Mean   :   7665    Mean   :   3446                 
##  3rd Qu.:    26276   3rd Qu.:   3790    3rd Qu.:   1048                 
##  Max.   :230793342   Max.   :6548217    Max.   :3840818                 
##  NA's   :871         NA's   :1150       NA's   :15385                   
##  KWH.SQFT.MINIMUM.2010 KWH.SQFT.1ST.QUARTILE.2010
##  Min.   :    100       Min.   :    100           
##  1st Qu.:    954       1st Qu.:   1078           
##  Median :   1534       Median :   1760           
##  Mean   :   5604       Mean   :   5792           
##  3rd Qu.:   2684       3rd Qu.:   2854           
##  Max.   :6548217       Max.   :6548217           
##  NA's   :1150          NA's   :1150              
##  KWH.SQFT.2ND.QUARTILE.2010 KWH.SQFT.3RD.QUARTILE.2010
##  Min.   :    300            Min.   :    300           
##  1st Qu.:   1250            1st Qu.:   1490           
##  Median :   2132            Median :   2470           
##  Mean   :   7268            Mean   :   9534           
##  3rd Qu.:   3612            3rd Qu.:   4491           
##  Max.   :6548217            Max.   :6548217           
##  NA's   :1150               NA's   :1150              
##  KWH.SQFT.MAXIMUM.2010 THERM.MEAN.2010   THERM.STANDARD.DEVIATION.2010
##  Min.   :    300       Min.   :     25   Min.   :      0              
##  1st Qu.:   1890       1st Qu.:   1365   1st Qu.:    351              
##  Median :   2810       Median :   1842   Median :    577              
##  Mean   :  10581       Mean   :   4062   Mean   :   2649              
##  3rd Qu.:   5254       3rd Qu.:   2707   3rd Qu.:   1183              
##  Max.   :6548217       Max.   :6600274   Max.   :4941759              
##  NA's   :1150          NA's   :1296      NA's   :10230                
##  THERM.MINIMUM.2010 THERM.1ST.QUARTILE.2010 THERM.2ND.QUARTILE.2010
##  Min.   :     25    Min.   :     25         Min.   :     25        
##  1st Qu.:    592    1st Qu.:    957         1st Qu.:   1286        
##  Median :    990    Median :   1290         Median :   1724        
##  Mean   :   2267    Mean   :   2545         Mean   :   3634        
##  3rd Qu.:   1643    3rd Qu.:   1878         3rd Qu.:   2474        
##  Max.   :6600274    Max.   :6600274         Max.   :6600274        
##  NA's   :1296       NA's   :1296            NA's   :1296           
##  THERM.3RD.QUARTILE.2010 THERM.MAXIMUM.2010 THERMS.SQFT.MEAN.2010
##  Min.   :     25         Min.   :     25    Min.   :    300      
##  1st Qu.:   1595         1st Qu.:   1934    1st Qu.:   1318      
##  Median :   2182         Median :   2603    Median :   2200      
##  Mean   :   5490         Mean   :   6955    Mean   :   7175      
##  3rd Qu.:   3241         3rd Qu.:   4069    3rd Qu.:   3736      
##  Max.   :7012321         Max.   :7012321    Max.   :6548217      
##  NA's   :1296            NA's   :1296       NA's   :1673         
##  THERMS.SQFT.STANDARD.DEVIATION.2010 THERMS.SQFT.MINIMUM.2010
##  Min.   :      0                     Min.   :    100         
##  1st Qu.:    239                     1st Qu.:    950         
##  Median :    467                     Median :   1520         
##  Mean   :   3140                     Mean   :   5282         
##  3rd Qu.:   1034                     3rd Qu.:   2651         
##  Max.   :3840818                     Max.   :6548217         
##  NA's   :15684                       NA's   :1673            
##  THERMS.SQFT.1ST.QUARTILE.2010 THERMS.SQFT.2ND.QUARTILE.2010
##  Min.   :    132               Min.   :    300              
##  1st Qu.:   1075               1st Qu.:   1244              
##  Median :   1756               Median :   2116              
##  Mean   :   5462               Mean   :   6799              
##  3rd Qu.:   2820               3rd Qu.:   3564              
##  Max.   :6548217               Max.   :6548217              
##  NA's   :1673                  NA's   :1673                 
##  THERMS.SQFT.3RD.QUARTILE.2010 THERMS.SQFT.MAXIMUM.2010 TOTAL_POPULATION 
##  Min.   :    300               Min.   :    300          Min.   :   0.00  
##  1st Qu.:   1479               1st Qu.:   1888          1st Qu.:  37.00  
##  Median :   2450               Median :   2796          Median :  64.00  
##  Mean   :   8897               Mean   :   9851          Mean   :  83.85  
##  3rd Qu.:   4410               3rd Qu.:   5191          3rd Qu.: 104.00  
##  Max.   :6548217               Max.   :6548217          Max.   :1590.00  
##  NA's   :1673                  NA's   :1673             NA's   :14       
##   TOTAL.UNITS      AVERAGE.STORIES   AVERAGE.BUILDING.AGE
##  Min.   :   0.00   Min.   :  1.000   Min.   :  0.00      
##  1st Qu.:  15.00   1st Qu.:  1.140   1st Qu.: 53.00      
##  Median :  25.00   Median :  1.750   Median : 80.00      
##  Mean   :  38.11   Mean   :  1.887   Mean   : 71.61      
##  3rd Qu.:  42.00   3rd Qu.:  2.000   3rd Qu.: 96.50      
##  Max.   :1365.00   Max.   :110.000   Max.   :158.00      
##  NA's   :14                                              
##  AVERAGE.HOUSESIZE OCCUPIED.UNITS   OCCUPIED.UNITS.PERCENTAGE
##  Min.   : 0.000    Min.   :   0.0   Min.   :0.0000           
##  1st Qu.: 2.140    1st Qu.:  13.0   1st Qu.:0.8332           
##  Median : 2.700    Median :  22.0   Median :0.9148           
##  Mean   : 2.722    Mean   :  33.5   Mean   :0.8804           
##  3rd Qu.: 3.310    3rd Qu.:  37.0   3rd Qu.:0.9677           
##  Max.   :12.000    Max.   :1034.0   Max.   :1.0000           
##  NA's   :14        NA's   :14       NA's   :2445             
##  RENTER.OCCUPIED.HOUSING.UNITS RENTER.OCCUPIED.HOUSING.PERCENTAGE
##  Min.   :   0.00               Min.   :0.0000                    
##  1st Qu.:   3.00               1st Qu.:0.2860                    
##  Median :  11.00               Median :0.5379                    
##  Mean   :  19.78               Mean   :0.5116                    
##  3rd Qu.:  23.00               3rd Qu.:0.7330                    
##  Max.   :1009.00               Max.   :1.0000                    
##  NA's   :14                    NA's   :2618                      
##  OCCUPIED.HOUSING.UNITS
##  Min.   :   0.0        
##  1st Qu.:  13.0        
##  Median :  22.0        
##  Mean   :  33.5        
##  3rd Qu.:  37.0        
##  Max.   :1034.0        
##  NA's   :14
str(energy_raw)
## 'data.frame':    66974 obs. of  73 variables:
##  $ COMMUNITY.AREA.NAME                : chr  "Albany Park" "Albany Park" "Albany Park" "Albany Park" ...
##  $ CENSUS.BLOCK                       : num  1.7e+14 1.7e+14 1.7e+14 1.7e+14 1.7e+14 ...
##  $ BUILDING_TYPE                      : chr  "Residential" "Residential" "Residential" "Residential" ...
##  $ BUILDING_SUBTYPE                   : chr  "Multi 7+" "Multi < 7" "Single Family" "Multi 7+" ...
##  $ KWH.JANUARY.2010                   : int  11921 1233 4141 1230 12977 2878 1478 4985 4926 16639 ...
##  $ KWH.FEBRUARY.2010                  : int  12145 1645 3798 1333 14639 3755 1890 2636 6413 23502 ...
##  $ KWH.MARCH.2010                     : int  9759 994 2939 1260 12718 4571 1364 2353 5586 19587 ...
##  $ KWH.APRIL.2010                     : int  11542 1055 4727 1405 14973 2984 1271 4761 5606 23327 ...
##  $ KWH.MAY.2010                       : int  14348 1284 5324 1699 16384 3111 1464 4391 6271 26537 ...
##  $ KWH.JUNE.2010                      : int  26617 3527 9676 2094 32940 4808 2118 7362 11549 40725 ...
##  $ KWH.JULY.2010                      : int  24210 3099 7591 732 24454 4132 2384 6462 8549 41430 ...
##  $ KWH.AUGUST.2010                    : int  20383 2527 6287 1312 23926 3564 3767 8015 6709 41268 ...
##  $ KWH.SEPTEMBER.2010                 : int  11983 904 2920 1462 15012 2174 2059 7314 3963 26208 ...
##  $ KWH.OCTOBER.2010                   : int  10335 626 2565 1358 13679 1985 1387 3816 3480 23230 ...
##  $ KWH.NOVEMBER.2010                  : int  25327 2092 5979 1372 31979 5968 2874 7496 7998 43196 ...
##  $ KWH.DECEMBER.2010                  : int  22462 1622 5073 1495 30660 5400 3244 6391 8613 43582 ...
##  $ TOTAL_KWH                          : int  201032 20608 61020 16752 244341 45330 25300 65982 79663 369231 ...
##  $ ELECTRICITY.ACCOUNTS               : chr  "48" "Less than 4" "6" "Less than 4" ...
##  $ ZERO.KWH.ACCOUNTS                  : int  22 1 2 2 32 0 2 3 2 106 ...
##  $ THERM.JANUARY.2010                 : int  7247 321 1222 2961 11508 1793 1554 3107 3371 22813 ...
##  $ THERM.FEBRUARY.2010                : int  5904 130 1016 2664 9057 1573 1195 2749 2647 18905 ...
##  $ THERM.MARCH.2010                   : int  5180 86 860 1616 8000 1352 1280 2228 2396 16890 ...
##  $ TERM.APRIL.2010                    : int  3113 49 543 798 4529 890 821 1331 1407 10504 ...
##  $ THERM.MAY.2010                     : int  1822 19 346 344 2809 853 663 738 833 6981 ...
##  $ THERM.JUNE.2010                    : int  1272 13 247 404 1507 541 607 443 460 4455 ...
##  $ THERM.JULY.2010                    : int  1234 7 203 320 1179 448 487 329 286 3456 ...
##  $ THERM.AUGUST.2010                  : int  952 10 179 272 991 438 476 284 260 3232 ...
##  $ THERM.SEPTEMBER.2010               : int  1780 12 170 368 994 439 382 288 246 3306 ...
##  $ THERM.OCTOBER.2010                 : int  1472 9 190 745 1254 565 459 301 323 3477 ...
##  $ THERM.NOVEMBER.2010                : int  1961 21 298 1260 2595 787 590 520 632 5898 ...
##  $ THERM.DECEMBER.2010                : int  4885 78 791 2901 7167 1538 971 1821 1919 14630 ...
##  $ TOTAL_THERMS                       : int  36822 755 6065 14653 51590 11217 9485 14139 14780 114547 ...
##  $ GAS.ACCOUNTS                       : chr  "21" "Less than 4" "6" "6" ...
##  $ KWH.TOTAL.SQFT                     : int  48825 3306 9472 14407 58835 8240 13305 16654 9690 127916 ...
##  $ THERMS.TOTAL.SQFT                  : int  48825 3306 9472 14407 58835 8240 13305 16654 10840 127916 ...
##  $ KWH.MEAN.2010                      : num  20103 20608 10170 16752 15271 ...
##  $ KWH.STANDARD.DEVIATION.2010        : num  8610 NA 4410 NA 8090 ...
##  $ KWH.MINIMUM.2010                   : int  9414 20608 5619 16752 5462 15929 7285 8496 5388 4397 ...
##  $ KWH.1ST.QUARTILE.2010              : num  12563 20608 6746 16752 10344 ...
##  $ KWH.2ND.QUARTILE.2010              : num  19073 20608 9056 16752 12427 ...
##  $ KWH.3RD.QUARTILE.2010              : num  22177 20608 13014 16752 17496 ...
##  $ KWH.MAXIMUM.2010                   : int  36781 20608 17530 16752 34236 29401 18015 16794 19735 39809 ...
##  $ KWH.SQFT.MEAN.2010                 : num  24413 3306 1579 14407 3677 ...
##  $ KWH.SQFT.STANDARD.DEVIATION.2010   : num  5699 NA 864 NA 1062 ...
##  $ KWH.SQFT.MINIMUM.2010              : int  20383 3306 1226 14407 2414 8240 13305 2448 1116 24751 ...
##  $ KWH.SQFT.1ST.QUARTILE.2010         : num  20383 3306 1226 14407 2546 ...
##  $ KWH.SQFT.2ND.QUARTILE.2010         : num  24413 3306 1226 14407 3554 ...
##  $ KWH.SQFT.3RD.QUARTILE.2010         : num  28442 3306 1226 14407 4692 ...
##  $ KWH.SQFT.MAXIMUM.2010              : int  28442 3306 3342 14407 5530 8240 13305 4554 1334 27975 ...
##  $ THERM.MEAN.2010                    : num  5260 755 1011 14653 3224 ...
##  $ THERM.STANDARD.DEVIATION.2010      : num  8436 NA 621 NA 1079 ...
##  $ THERM.MINIMUM.2010                 : int  882 755 496 14653 2071 1634 1866 2689 835 114 ...
##  $ THERM.1ST.QUARTILE.2010            : num  957 755 514 14653 2499 ...
##  $ THERM.2ND.QUARTILE.2010            : num  1102 755 836 14653 2934 ...
##  $ THERM.3RD.QUARTILE.2010            : num  8024 755 1240 14653 3594 ...
##  $ THERM.MAXIMUM.2010                 : int  23460 755 2144 14653 5754 9583 7619 2956 2372 28459 ...
##  $ THERMS.SQFT.MEAN.2010              : num  24413 3306 1579 14407 3677 ...
##  $ THERMS.SQFT.STANDARD.DEVIATION.2010: num  5699 NA 864 NA 1062 ...
##  $ THERMS.SQFT.MINIMUM.2010           : int  20383 3306 1226 14407 2414 8240 13305 2448 1116 24751 ...
##  $ THERMS.SQFT.1ST.QUARTILE.2010      : num  20383 3306 1226 14407 2546 ...
##  $ THERMS.SQFT.2ND.QUARTILE.2010      : num  24413 3306 1226 14407 3554 ...
##  $ THERMS.SQFT.3RD.QUARTILE.2010      : num  28442 3306 1226 14407 4692 ...
##  $ THERMS.SQFT.MAXIMUM.2010           : int  28442 3306 3342 14407 5530 8240 13305 4554 1334 27975 ...
##  $ TOTAL_POPULATION                   : int  132 132 132 228 228 231 231 231 231 456 ...
##  $ TOTAL.UNITS                        : int  64 64 64 79 79 70 70 70 70 180 ...
##  $ AVERAGE.STORIES                    : num  3 2 1.17 3 2.5 1 3 2.2 1 3 ...
##  $ AVERAGE.BUILDING.AGE               : num  65.5 86 14.3 86 87.7 ...
##  $ AVERAGE.HOUSESIZE                  : num  2.2 2.2 2.2 3.51 3.51 3.73 3.73 3.73 3.73 2.73 ...
##  $ OCCUPIED.UNITS                     : int  60 60 60 65 65 62 62 62 62 167 ...
##  $ OCCUPIED.UNITS.PERCENTAGE          : num  0.938 0.938 0.938 0.823 0.823 ...
##  $ RENTER.OCCUPIED.HOUSING.UNITS      : int  33 33 33 49 49 49 49 49 49 167 ...
##  $ RENTER.OCCUPIED.HOUSING.PERCENTAGE : num  0.55 0.55 0.55 0.754 0.754 0.79 0.79 0.79 0.79 1 ...
##  $ OCCUPIED.HOUSING.UNITS             : int  60 60 60 65 65 62 62 62 62 167 ...
#Create a subset of "energy_raw" that contains only numeric data
energy_data0 <- subset(energy_raw, select = c(BUILDING_TYPE, TOTAL_KWH, TOTAL_POPULATION))
energy_data1 <- na.omit(energy_data0)
#Display the "head" and "tail" of the dataset, "energy_data1"
head(energy_data1)
##   BUILDING_TYPE TOTAL_KWH TOTAL_POPULATION
## 1   Residential    201032              132
## 2   Residential     20608              132
## 3   Residential     61020              132
## 4   Residential     16752              228
## 5   Residential    244341              228
## 6    Commercial     45330              231
tail(energy_data1)
##       BUILDING_TYPE TOTAL_KWH TOTAL_POPULATION
## 66969   Residential    139240              116
## 66970   Residential     27654              116
## 66971    Commercial     41977               31
## 66972   Residential     48850               31
## 66973   Residential     17707                0
## 66974   Residential     57736               77
#Display the summary statistics and the structure of the data
summary(energy_data1)
##  BUILDING_TYPE        TOTAL_KWH         TOTAL_POPULATION 
##  Length:66089       Min.   :      102   Min.   :   0.00  
##  Class :character   1st Qu.:    28189   1st Qu.:  37.00  
##  Mode  :character   Median :    62271   Median :  64.00  
##                     Mean   :   175675   Mean   :  83.81  
##                     3rd Qu.:   118156   3rd Qu.: 104.00  
##                     Max.   :231280522   Max.   :1590.00
str(energy_data1)
## 'data.frame':    66089 obs. of  3 variables:
##  $ BUILDING_TYPE   : chr  "Residential" "Residential" "Residential" "Residential" ...
##  $ TOTAL_KWH       : int  201032 20608 61020 16752 244341 45330 25300 65982 79663 369231 ...
##  $ TOTAL_POPULATION: int  132 132 132 228 228 231 231 231 231 456 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:885] 67 85 104 128 328 415 494 522 804 853 ...
##   .. ..- attr(*, "names")= chr [1:885] "67" "85" "104" "128" ...
#Transform 'BUILDING_TYPE' into categorical variables (0 represents residential buildings and 1 represents non-residential buildings, which could correspond to commercial or industrial buildings)
energy_data1$BUILDING_TYPE = as.character(energy_data1$BUILDING_TYPE)
energy_data1$BUILDING_TYPE[energy_data1$BUILDING_TYPE != "Residential"] = 0
energy_data1$BUILDING_TYPE[energy_data1$BUILDING_TYPE == "Residential"] = 1
#Categorize 'BUILDING.TYPE' as a factor and display its resulting levels
energy_data1$BUILDING_TYPE = as.factor(energy_data1$BUILDING_TYPE)
levels(energy_data1$BUILDING_TYPE)
## [1] "0" "1"

Data Selection for Hierarchical Multiple Linear Regression Model

Upon performing this initial summary statistics analysis, a hierarchical approach is carried out in beginning to develop a multiple linear regression model. Using information obtained from a U.S. Department of Energy document entitled “Energy Efficiency Trends in Residential and Commercial Buildings” [reference: http://apps1.eere.energy.gov/buildings/publications/pdfs/corporate/bt_stateindustry.pdf] and learning that a relationship exists between energy consumption, building type (residential, commercial, etc.), and building population, we aim to determine (using the “Energy Usage 2010” dataset) if building type can be determined using information pertaining to energy consumption (in kilowatt-hours) and/or building population. In answering our question, building type is treated as a dichotomous dependent variable and both building population and energy consumption (in kilowatt-hours) are treated as continuous independent variables.

Description of the null hypothesis (H_0) and the alternate hypothesis (H_1)

Therefore, upon carrying out this hierarchical approach for this experiment, we are now trying to determine whether or not the variation that is observed in the dependent variable (which corresponds to ‘BUILDING_TYPE’ in this analysis) can be explained by the variation existent in either of the independent variables in this experiment (which correspond to ‘TOTAL_KWH’ and ‘TOTAL_POPULATION’). Therefore, the null hypothesis that is being tested states that total energy consumption (in kilowatt-hours) and building population do not have a significant effect on the determination of building type (i.e., either residential or non-residential). Opposingly, the alternate hypothesis that is being tested states that total energy consumption (in kilowatt-hours) and building population do, in fact, have a significant effect on the determination of building type (i.e., either residential or non-residential). In our analysis, we aim to create an explanatory model that uses these independent variables in the determination of our dichotomous dependent variable.

2. The Linear Model (A Hierarchical Multiple Linear Logistic Regression Model)

Description of independent variables and dependent variable

In this experiment, a hierarchical multiple linear regression model is generated, which will offer some insight into determining whether building type can be explained by each of the independent variables being considered in this analysis, and whether any existence of suppression is likely to exist within a linear regression model comprised of this data. The independent variables include total energy consumption (in kilowatt-hours) and building population, and the dependent variable refers to building type characterized as being either residential or non-residential.

Power Analysis for Multiple Linear Regression Modeling

Originally, the “Energy Usage 2010” dataset contains 66,974 observations. However, this number of observations may serve to be too large for a statistically significant analysis, so a power analysis is performed in this experiment to determine the most appropriate sample size for our final multiple linear regression model (where our desired alpha-level equals 0.05, our desired power-level equals 0.95, our effect size equals 0.02, and the considered number of predictors equals 2).

#Generate an initial Hierarchical Multiple Linear Regression Model that uses all 66,974 observations
energy_model <- glm(energy_data1$BUILDING_TYPE~energy_data1$TOTAL_KWH+energy_data1$TOTAL_POPULATION, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#Display summary of the initial Hierarchical Multiple Linear Regression Model
summary(energy_model)
## 
## Call:
## glm(formula = energy_data1$BUILDING_TYPE ~ energy_data1$TOTAL_KWH + 
##     energy_data1$TOTAL_POPULATION, family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8184  -0.0002   0.7041   0.7457   5.0715  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    1.441e+00  1.389e-02  103.76   <2e-16 ***
## energy_data1$TOTAL_KWH        -2.062e-06  6.351e-08  -32.47   <2e-16 ***
## energy_data1$TOTAL_POPULATION -1.265e-03  1.073e-04  -11.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 74606  on 66088  degrees of freedom
## Residual deviance: 71640  on 66086  degrees of freedom
## AIC: 71646
## 
## Number of Fisher Scoring iterations: 8

Upon determining the effect size, the software G[STAR]Power is used to determine the most appropriate sample size for this hierarchical multiple linear regression analysis. In its results, G[STAR]Power generated a sample size of 543. So, with this sample size, the dataset “energy_data1” will be sampled, creating a new dataset to be used for this hierarchical multiple linear regression model, which will then be used to determine if corresponding building types can be explained by the variation existent in both energy consumption and building population.

#Randomly take a sample of 543 observations from "energy_data1", creating "energy_final".
S <- 543
set.seed(23)
energy.index <- sample(1:nrow(energy_data1),S,replace=FALSE)
energy_final <- energy_data1[energy.index,]
#Generate a new Hierarchical Multiple Linear Regression Model that uses 543 observations
energy_model_final <- glm(energy_final$BUILDING_TYPE~energy_final$TOTAL_KWH+energy_final$TOTAL_POPULATION, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
#Display summary of the final Hierarchical Multiple Linear Regression Model
summary(energy_model_final)
## 
## Call:
## glm(formula = energy_final$BUILDING_TYPE ~ energy_final$TOTAL_KWH + 
##     energy_final$TOTAL_POPULATION, family = "binomial")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.0014   0.5584   0.6022   0.6548   1.1704  
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    1.876e+00  1.799e-01  10.429   <2e-16 ***
## energy_final$TOTAL_KWH        -2.727e-06  9.221e-07  -2.957   0.0031 ** 
## energy_final$TOTAL_POPULATION -2.134e-03  1.348e-03  -1.583   0.1133    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 544.54  on 542  degrees of freedom
## Residual deviance: 510.97  on 540  degrees of freedom
## AIC: 516.97
## 
## Number of Fisher Scoring iterations: 7
#Collinearity Check
col.test <- lm(energy_final$TOTAL_KWH~energy_final$TOTAL_POPULATION)
summary(col.test)
## 
## Call:
## lm(formula = energy_final$TOTAL_KWH ~ energy_final$TOTAL_POPULATION)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2846881  -134912     3254   109245  9959580 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   -179506.9    35729.6  -5.024 6.88e-07 ***
## energy_final$TOTAL_POPULATION    4008.3      291.1  13.769  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 616800 on 541 degrees of freedom
## Multiple R-squared:  0.2595, Adjusted R-squared:  0.2581 
## F-statistic: 189.6 on 1 and 541 DF,  p-value: < 2.2e-16

3. Diagnostic Plots

Before beginning to check the model against the four “LINE” assumptions associated with linear regression modeling, histograms, boxplots, scatterplots, and a “Quality of Fit” plot (via a fitted vs. residual values determination) are generated, which will be used for their graphical nature in our interpretations.

#Generate histograms for all of the different independent variables being considered in our sampled data ('TOTAL_KWH' and 'TOTAL_POPULATION')
hist(energy_final$TOTAL_KWH, xlab = "Total Energy Consumption [in kilowatt-hours]", main = "Histogram of Total Energy Consumption")

hist(energy_final$TOTAL_POPULATION, xlab = "Total Building Population", main = "Histogram of Total Building Population")

#Generate a boxplot of the data (Independent Variable = Energy Consumption)
boxplot(x = energy_final$TOTAL_KWH, pch=21, bg="darkviolet", main="Total Energy Consumption", xlab = "Total Energy Consumption [in kilowatt-hours]")

#Generate a boxplot of the data (Independent Variable = Population)
boxplot(x = energy_final$TOTAL_POPULATION, pch=21, bg="darkviolet", main="Total Building Population", xlab = "Building Population")

#Generate a scatterplot of the data: "Building Type" vs. "Energy Consumption"
plot(y = energy_final$BUILDING_TYPE,x = energy_final$TOTAL_KWH, pch=21, bg="darkviolet", main="Total Energy Consumption vs. Building Type", ylab = "Building Type", xlab = "Energy Consumption (in kilowatt-hours)")

#Generate a scatterplot of the data: "Building Type" vs. "Building Population"
plot(y = energy_final$BUILDING_TYPE,x = energy_final$TOTAL_POPULATION, pch=21, bg="darkviolet", main="Total Building Population vs. Building Type", ylab = "Building Type", xlab = "Building Population")

#Create a "Quality of Fit Model" that plots the residuals of "energy_model_final" against its fitted model.
par(mfrow=c(1,1))
plot(fitted(energy_model_final),residuals(energy_model_final), main = "Residuals of 'energy_model_final' Against Fitted Model 'energy_model_final' [Not Standardized]")
abline(0,0, col='darkviolet', lwd=2.5)

#Create a "Quality of Fit Model" that plots the standardized residuals of "energy_model_final" against its fitted model.
par(mfrow=c(1,1))
standardized_energy_model <- rstandard(energy_model_final)
plot(fitted(energy_model_final),standardized_energy_model, main = "Standardized Residuals of 'energy_model_final' Against Fitted Model 'energy_model_final'")
abline(0,0, col='darkviolet', lwd=2.5)

4. Interpretation via LINE Assumptions

In interpreting our hierarchical multiple linear logistic regression model and the statistical significance of the results that were generated therein, it is important to test the model against the four “LINE” assumptions corresponding to linear regression.

1. The mean of the response at each set of values of the predictor is a linear function of the predictors (‘L’).

2. The errors are independent (‘I’).

3. The errors at each set of values of the predictor are normally distributed (‘N’).

4. The errors at each set of values of the predictor have equal variances (‘E’).

5. Interpretation of the Breusch-Pagan Test against Heteroscedasticity

6. Interpretation via Issues

In interpreting our hierarchical multiple linear logistic regression model and the statistical significance of the results that were generated therein, it is important to check the model against the four main issues surrounding linear regression.

1. Causality

2. Sample Sizes

3. Collinearity

4. Measurement Error

7. Final Interpretation of Explanatory Hiererchical Multiple Linear Logistic Regression Model