Solar Installations data

I am looking at customer and quotes datasets for some quick insights from the data. Quotes dataset has 100 records of quotes by solar installers.

quote<-read.csv("C:/Users/PGD/Downloads/quote.csv")
customer<-read.csv("C:/Users/PGD/Downloads/customer.csv")
length(unique(quote$property_id))
## [1] 40
summary(customer$state)
## AZ CA CT FL IL MA MD NC NJ NY PA TX 
##  1 12  4  2  2  8  3  1  2  3  1  1

These 100 entries actually come in for 40 customers(properties) each receiving atleast 2 quotes upto a max of 4. Highest quotes are 20% for Panel_brand = ‘LG’ followed by ‘SolarWorld’ mainly for roof installation (98%). 30% customers are based out of California - a very sunny state followed by 20% from not so sunny Massachusetts.

library(psych)
describe(quote)
##                            vars   n     mean       sd   median  trimmed
## property_id                   1 100    19.65    11.09    20.00    19.55
## quote_id                      2 100  1049.50    29.01  1049.50  1049.50
## num_quotes_received           3 100     2.73     0.78     3.00     2.66
## submitted_date*               4 100    35.21    19.76    36.00    35.19
## modified_date*                5 100    36.74    21.09    36.50    36.76
## est_install_date*             6 100    29.13    17.57    29.00    29.10
## system_size_wdc               7 100  9736.87 11533.97  6240.00  7316.56
## annual_prod_est_kwh           8 100 13361.22 17095.12  9418.50  9764.94
## percent_need_met              9  97    81.02    22.15    86.65    83.44
## panel_brand*                 10 100    12.82     5.93    13.00    12.75
## inverter_type*               11 100     2.05     0.81     2.00     2.06
## inverter_brand*              12 100     7.13     3.06     8.00     7.26
## mount_location*              13 100     2.97     0.22     3.00     3.00
## mount_system*                14 100     1.00     0.00     1.00     1.00
## pur_offered*                 15 100     1.93     0.26     2.00     2.00
## pur_gross_cost               16  93 33310.08 31614.92 23786.00 26993.09
## pur_rebate_value             17  19  4987.13  9160.27  2391.00  3124.91
## pur_state_tax_credit_value   18  32  2777.39  2551.73  1000.00  2370.30
## pur_itc_value                19  93  9687.36  9619.00  6881.40  7848.28
## pur_has_srecs_pbi*           20 100     2.31     0.60     2.00     2.35
## loan_offered*                21 100     1.59     0.49     2.00     1.61
## loan_down_payment            22  59   854.88  3073.52     0.00    72.45
## loan_amount                  23  59 26045.69 27410.55 20125.00 21969.41
## loan_term                    24  59    14.73     5.79    15.00    14.73
## loan_rate                    25  59     5.41     1.91     5.99     5.60
## bridge_loan_principal        26  42 11126.41  5424.31 10054.67 10428.92
## bridge_loan_term             27  42    13.79     2.63    12.00    13.50
## lease_offered*               28 100     1.11     0.31     1.00     1.01
## ppa_offered*                 29 100     1.09     0.29     1.00     1.00
##                                 mad      min       max     range  skew
## property_id                   13.34     1.00     40.00     39.00  0.08
## quote_id                      37.06  1000.00   1099.00     99.00  0.00
## num_quotes_received            1.48     2.00      4.00      2.00  0.50
## submitted_date*               25.20     1.00     70.00     69.00 -0.02
## modified_date*                27.43     1.00     72.00     71.00 -0.02
## est_install_date*             22.24     1.00     59.00     58.00 -0.01
## system_size_wdc             2653.85  2835.00  71250.00  68415.00  4.07
## annual_prod_est_kwh         4717.63  3338.00 103977.00 100639.00  4.22
## percent_need_met              20.93    20.95    111.47     90.52 -0.86
## panel_brand*                   8.15     1.00     23.00     22.00  0.02
## inverter_type*                 1.48     1.00      3.00      2.00 -0.09
## inverter_brand*                2.97     1.00     12.00     11.00 -0.38
## mount_location*                0.00     1.00      3.00      2.00 -7.75
## mount_system*                  0.00     1.00      1.00      0.00   NaN
## pur_offered*                   0.00     1.00      2.00      1.00 -3.32
## pur_gross_cost             11413.80 12960.00 216922.77 203962.77  4.07
## pur_rebate_value            2062.30     0.00  41632.00  41632.00  3.33
## pur_state_tax_credit_value     0.00     0.00  10500.00  10500.00  1.27
## pur_itc_value               3228.21 -2973.60  65076.83  68050.43  3.96
## pur_has_srecs_pbi*             0.00     1.00      3.00      2.00 -0.23
## loan_offered*                  0.00     1.00      2.00      1.00 -0.36
## loan_down_payment              0.00     0.00  16088.00  16088.00  4.19
## loan_amount                 7960.08  6065.30 216922.77 210857.47  5.74
## loan_term                      7.41     5.00     30.00     25.00  0.20
## loan_rate                      0.76     0.00      8.99      8.99 -1.13
## bridge_loan_principal       5479.21  4556.00  26130.00  21574.00  1.02
## bridge_loan_term               0.00    12.00     18.00      6.00  0.82
## lease_offered*                 0.00     1.00      2.00      1.00  2.46
## ppa_offered*                   0.00     1.00      2.00      1.00  2.82
##                            kurtosis      se
## property_id                   -1.16    1.11
## quote_id                      -1.24    2.90
## num_quotes_received           -1.20    0.08
## submitted_date*               -1.13    1.98
## modified_date*                -1.22    2.11
## est_install_date*             -1.32    1.76
## system_size_wdc               17.06 1153.40
## annual_prod_est_kwh           17.94 1709.51
## percent_need_met              -0.14    2.25
## panel_brand*                  -1.12    0.59
## inverter_type*                -1.48    0.08
## inverter_brand*               -1.42    0.31
## mount_location*               61.83    0.02
## mount_system*                   NaN    0.00
## pur_offered*                   9.11    0.03
## pur_gross_cost                18.15 3278.31
## pur_rebate_value              10.50 2101.51
## pur_state_tax_credit_value     0.98  451.09
## pur_itc_value                 17.53  997.44
## pur_has_srecs_pbi*            -0.67    0.06
## loan_offered*                 -1.89    0.05
## loan_down_payment             17.38  400.14
## loan_amount                   36.94 3568.55
## loan_term                     -0.82    0.75
## loan_rate                      1.17    0.25
## bridge_loan_principal          0.55  836.99
## bridge_loan_term              -1.25    0.41
## lease_offered*                 4.07    0.03
## ppa_offered*                   6.03    0.03
summary(quote)
##   property_id       quote_id    num_quotes_received    submitted_date
##  Min.   : 1.00   Min.   :1000   Min.   :2.00        7/27/2015 : 4    
##  1st Qu.:10.00   1st Qu.:1025   1st Qu.:2.00        11/16/2015: 3    
##  Median :20.00   Median :1050   Median :3.00        6/8/2015  : 3    
##  Mean   :19.65   Mean   :1050   Mean   :2.73        7/31/2015 : 3    
##  3rd Qu.:28.25   3rd Qu.:1074   3rd Qu.:3.00        10/12/2015: 2    
##  Max.   :40.00   Max.   :1099   Max.   :4.00        10/14/2015: 2    
##                                                     (Other)   :83    
##     modified_date   est_install_date system_size_wdc annual_prod_est_kwh
##  7/27/2015 : 4    10/1/2015 : 6      Min.   : 2835   Min.   :  3338     
##  11/16/2015: 3    8/1/2015  : 6      1st Qu.: 4948   1st Qu.:  6539     
##  6/8/2015  : 3    9/15/2015 : 5      Median : 6240   Median :  9418     
##  7/31/2015 : 3    10/30/2015: 4      Mean   : 9737   Mean   : 13361     
##  9/1/2015  : 3    11/30/2015: 4      3rd Qu.:10000   3rd Qu.: 13050     
##  10/12/2015: 2    1/15/2016 : 3      Max.   :71250   Max.   :103977     
##  (Other)   :82    (Other)   :72                                         
##  percent_need_met               panel_brand   inverter_type
##  Min.   : 20.95   LG                  :20   Micro    :30   
##  1st Qu.: 67.61   SolarWorld          :15   Optimizer:35   
##  Median : 86.65   Canadian Solar Inc. :10   String   :35   
##  Mean   : 81.02   Hanwha              : 7                  
##  3rd Qu.: 99.17   CSUN - China Sunergy: 5                  
##  Max.   :111.47   Hyundai             : 5                  
##  NA's   :3        (Other)             :38                  
##                 inverter_brand mount_location      mount_system
##  SolarEdge Technologies:35     carport: 1     penetrating:100  
##  Enphase Energy        :22     ground : 1                      
##  SMA America           :17     roof   :98                      
##  Fronius USA LLC       : 7                                     
##  Power-One, Inc        : 5                                     
##  LG                    : 4                                     
##  (Other)               :10                                     
##  pur_offered pur_gross_cost   pur_rebate_value pur_state_tax_credit_value
##  no : 7      Min.   : 12960   Min.   :    0    Min.   :    0             
##  yes:93      1st Qu.: 19440   1st Qu.: 1490    1st Qu.: 1000             
##              Median : 23786   Median : 2391    Median : 1000             
##              Mean   : 33310   Mean   : 4987    Mean   : 2777             
##              3rd Qu.: 35750   3rd Qu.: 5334    3rd Qu.: 4020             
##              Max.   :216923   Max.   :41632    Max.   :10500             
##              NA's   :7        NA's   :81       NA's   :68                
##  pur_itc_value   pur_has_srecs_pbi loan_offered loan_down_payment
##  Min.   :-2974      : 7            no :41       Min.   :    0.0  
##  1st Qu.: 5325   no :55            yes:59       1st Qu.:    0.0  
##  Median : 6881   yes:38                         Median :    0.0  
##  Mean   : 9687                                  Mean   :  854.9  
##  3rd Qu.:10650                                  3rd Qu.:    0.0  
##  Max.   :65077                                  Max.   :16088.0  
##  NA's   :7                                      NA's   :41       
##   loan_amount       loan_term       loan_rate     bridge_loan_principal
##  Min.   :  6065   Min.   : 5.00   Min.   :0.000   Min.   : 4556        
##  1st Qu.: 15100   1st Qu.:10.00   1st Qu.:4.975   1st Qu.: 6619        
##  Median : 20125   Median :15.00   Median :5.990   Median :10055        
##  Mean   : 26046   Mean   :14.73   Mean   :5.409   Mean   :11126        
##  3rd Qu.: 27917   3rd Qu.:20.00   3rd Qu.:6.490   3rd Qu.:14625        
##  Max.   :216923   Max.   :30.00   Max.   :8.990   Max.   :26130        
##  NA's   :41       NA's   :41      NA's   :41      NA's   :58           
##  bridge_loan_term lease_offered ppa_offered
##  Min.   :12.00    no :89        no :91     
##  1st Qu.:12.00    yes:11        yes: 9     
##  Median :12.00                             
##  Mean   :13.79                             
##  3rd Qu.:16.00                             
##  Max.   :18.00                             
##  NA's   :58
#str(quote)

Columns with more than 40% missing or NA values in quotes dataset such as: pur_rebate_value, pur_state_tax_credit_value, bridge_loan_principal, bridge_loan_term,, loan_down_payment,loan_amount, loan_term, loan_rate ; will not be of much significance for further analysis.

Time to installation -

hist(quote$annual_prod_est_kwh)

hist(quote$percent_need_met)

quote$actual_need_est_kwh<-(quote$annual_prod_est_kwh/quote$percent_need_met)*100

cust_q<-aggregate(quote[,c(1,30)],list(quote$property_id),FUN=mean)

df<-cbind(customer[,c("property.ID","state")],actual_need_est_kwh=cust_q$actual_need_est_kwh)

Total units these installations can produce:

pur_gross_cost

% of electricity bill covered - The given sample looks left skewed with a good no. of quotes claiming to generate atleast 80% of the needed electricity.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
# re-order levels
reorder_size <- function(x) {
        factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}

ggplot(df, aes(x = reorder_size(`state`))) +
        geom_bar() +
        xlab("State") +
#        scale_y_continuous(labels = scales::percent, name = "Proportion") +
#        facet_grid(`Marital Status` ~ Gender) +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(quote, aes(x = reorder_size(`inverter_type`),y=quote$annual_prod_est_kwh/1000)) +
        geom_bar(stat = "identity") + 
        xlab("Inverter Type") + 
#        title("Estimated energy Production by each solar installation") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(quote, aes(x = reorder_size(`inverter_type`),y=quote$pur_gross_cost)) +
        geom_bar(stat = "identity") + 
#                 , aes(y = (..count..)/sum(..count..))) +
        xlab("Inverter Type") + 
#        scale_y_continuous(labels = scales::percent, name = "Proportion") +
#        facet_grid( ~ state) +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 7 rows containing missing values (position_stack).

ggplot(quote, aes(pur_gross_cost, panel_brand, colour = inverter_type)) + 
  geom_point()
## Warning: Removed 7 rows containing missing values (geom_point).