I am looking at customer and quotes datasets for some quick insights from the data. Quotes dataset has 100 records of quotes by solar installers.
quote<-read.csv("C:/Users/PGD/Downloads/quote.csv")
customer<-read.csv("C:/Users/PGD/Downloads/customer.csv")
length(unique(quote$property_id))
## [1] 40
summary(customer$state)
## AZ CA CT FL IL MA MD NC NJ NY PA TX
## 1 12 4 2 2 8 3 1 2 3 1 1
These 100 entries actually come in for 40 customers(properties) each receiving atleast 2 quotes upto a max of 4. Highest quotes are 20% for Panel_brand = ‘LG’ followed by ‘SolarWorld’ mainly for roof installation (98%). 30% customers are based out of California - a very sunny state followed by 20% from not so sunny Massachusetts.
library(psych)
describe(quote)
## vars n mean sd median trimmed
## property_id 1 100 19.65 11.09 20.00 19.55
## quote_id 2 100 1049.50 29.01 1049.50 1049.50
## num_quotes_received 3 100 2.73 0.78 3.00 2.66
## submitted_date* 4 100 35.21 19.76 36.00 35.19
## modified_date* 5 100 36.74 21.09 36.50 36.76
## est_install_date* 6 100 29.13 17.57 29.00 29.10
## system_size_wdc 7 100 9736.87 11533.97 6240.00 7316.56
## annual_prod_est_kwh 8 100 13361.22 17095.12 9418.50 9764.94
## percent_need_met 9 97 81.02 22.15 86.65 83.44
## panel_brand* 10 100 12.82 5.93 13.00 12.75
## inverter_type* 11 100 2.05 0.81 2.00 2.06
## inverter_brand* 12 100 7.13 3.06 8.00 7.26
## mount_location* 13 100 2.97 0.22 3.00 3.00
## mount_system* 14 100 1.00 0.00 1.00 1.00
## pur_offered* 15 100 1.93 0.26 2.00 2.00
## pur_gross_cost 16 93 33310.08 31614.92 23786.00 26993.09
## pur_rebate_value 17 19 4987.13 9160.27 2391.00 3124.91
## pur_state_tax_credit_value 18 32 2777.39 2551.73 1000.00 2370.30
## pur_itc_value 19 93 9687.36 9619.00 6881.40 7848.28
## pur_has_srecs_pbi* 20 100 2.31 0.60 2.00 2.35
## loan_offered* 21 100 1.59 0.49 2.00 1.61
## loan_down_payment 22 59 854.88 3073.52 0.00 72.45
## loan_amount 23 59 26045.69 27410.55 20125.00 21969.41
## loan_term 24 59 14.73 5.79 15.00 14.73
## loan_rate 25 59 5.41 1.91 5.99 5.60
## bridge_loan_principal 26 42 11126.41 5424.31 10054.67 10428.92
## bridge_loan_term 27 42 13.79 2.63 12.00 13.50
## lease_offered* 28 100 1.11 0.31 1.00 1.01
## ppa_offered* 29 100 1.09 0.29 1.00 1.00
## mad min max range skew
## property_id 13.34 1.00 40.00 39.00 0.08
## quote_id 37.06 1000.00 1099.00 99.00 0.00
## num_quotes_received 1.48 2.00 4.00 2.00 0.50
## submitted_date* 25.20 1.00 70.00 69.00 -0.02
## modified_date* 27.43 1.00 72.00 71.00 -0.02
## est_install_date* 22.24 1.00 59.00 58.00 -0.01
## system_size_wdc 2653.85 2835.00 71250.00 68415.00 4.07
## annual_prod_est_kwh 4717.63 3338.00 103977.00 100639.00 4.22
## percent_need_met 20.93 20.95 111.47 90.52 -0.86
## panel_brand* 8.15 1.00 23.00 22.00 0.02
## inverter_type* 1.48 1.00 3.00 2.00 -0.09
## inverter_brand* 2.97 1.00 12.00 11.00 -0.38
## mount_location* 0.00 1.00 3.00 2.00 -7.75
## mount_system* 0.00 1.00 1.00 0.00 NaN
## pur_offered* 0.00 1.00 2.00 1.00 -3.32
## pur_gross_cost 11413.80 12960.00 216922.77 203962.77 4.07
## pur_rebate_value 2062.30 0.00 41632.00 41632.00 3.33
## pur_state_tax_credit_value 0.00 0.00 10500.00 10500.00 1.27
## pur_itc_value 3228.21 -2973.60 65076.83 68050.43 3.96
## pur_has_srecs_pbi* 0.00 1.00 3.00 2.00 -0.23
## loan_offered* 0.00 1.00 2.00 1.00 -0.36
## loan_down_payment 0.00 0.00 16088.00 16088.00 4.19
## loan_amount 7960.08 6065.30 216922.77 210857.47 5.74
## loan_term 7.41 5.00 30.00 25.00 0.20
## loan_rate 0.76 0.00 8.99 8.99 -1.13
## bridge_loan_principal 5479.21 4556.00 26130.00 21574.00 1.02
## bridge_loan_term 0.00 12.00 18.00 6.00 0.82
## lease_offered* 0.00 1.00 2.00 1.00 2.46
## ppa_offered* 0.00 1.00 2.00 1.00 2.82
## kurtosis se
## property_id -1.16 1.11
## quote_id -1.24 2.90
## num_quotes_received -1.20 0.08
## submitted_date* -1.13 1.98
## modified_date* -1.22 2.11
## est_install_date* -1.32 1.76
## system_size_wdc 17.06 1153.40
## annual_prod_est_kwh 17.94 1709.51
## percent_need_met -0.14 2.25
## panel_brand* -1.12 0.59
## inverter_type* -1.48 0.08
## inverter_brand* -1.42 0.31
## mount_location* 61.83 0.02
## mount_system* NaN 0.00
## pur_offered* 9.11 0.03
## pur_gross_cost 18.15 3278.31
## pur_rebate_value 10.50 2101.51
## pur_state_tax_credit_value 0.98 451.09
## pur_itc_value 17.53 997.44
## pur_has_srecs_pbi* -0.67 0.06
## loan_offered* -1.89 0.05
## loan_down_payment 17.38 400.14
## loan_amount 36.94 3568.55
## loan_term -0.82 0.75
## loan_rate 1.17 0.25
## bridge_loan_principal 0.55 836.99
## bridge_loan_term -1.25 0.41
## lease_offered* 4.07 0.03
## ppa_offered* 6.03 0.03
summary(quote)
## property_id quote_id num_quotes_received submitted_date
## Min. : 1.00 Min. :1000 Min. :2.00 7/27/2015 : 4
## 1st Qu.:10.00 1st Qu.:1025 1st Qu.:2.00 11/16/2015: 3
## Median :20.00 Median :1050 Median :3.00 6/8/2015 : 3
## Mean :19.65 Mean :1050 Mean :2.73 7/31/2015 : 3
## 3rd Qu.:28.25 3rd Qu.:1074 3rd Qu.:3.00 10/12/2015: 2
## Max. :40.00 Max. :1099 Max. :4.00 10/14/2015: 2
## (Other) :83
## modified_date est_install_date system_size_wdc annual_prod_est_kwh
## 7/27/2015 : 4 10/1/2015 : 6 Min. : 2835 Min. : 3338
## 11/16/2015: 3 8/1/2015 : 6 1st Qu.: 4948 1st Qu.: 6539
## 6/8/2015 : 3 9/15/2015 : 5 Median : 6240 Median : 9418
## 7/31/2015 : 3 10/30/2015: 4 Mean : 9737 Mean : 13361
## 9/1/2015 : 3 11/30/2015: 4 3rd Qu.:10000 3rd Qu.: 13050
## 10/12/2015: 2 1/15/2016 : 3 Max. :71250 Max. :103977
## (Other) :82 (Other) :72
## percent_need_met panel_brand inverter_type
## Min. : 20.95 LG :20 Micro :30
## 1st Qu.: 67.61 SolarWorld :15 Optimizer:35
## Median : 86.65 Canadian Solar Inc. :10 String :35
## Mean : 81.02 Hanwha : 7
## 3rd Qu.: 99.17 CSUN - China Sunergy: 5
## Max. :111.47 Hyundai : 5
## NA's :3 (Other) :38
## inverter_brand mount_location mount_system
## SolarEdge Technologies:35 carport: 1 penetrating:100
## Enphase Energy :22 ground : 1
## SMA America :17 roof :98
## Fronius USA LLC : 7
## Power-One, Inc : 5
## LG : 4
## (Other) :10
## pur_offered pur_gross_cost pur_rebate_value pur_state_tax_credit_value
## no : 7 Min. : 12960 Min. : 0 Min. : 0
## yes:93 1st Qu.: 19440 1st Qu.: 1490 1st Qu.: 1000
## Median : 23786 Median : 2391 Median : 1000
## Mean : 33310 Mean : 4987 Mean : 2777
## 3rd Qu.: 35750 3rd Qu.: 5334 3rd Qu.: 4020
## Max. :216923 Max. :41632 Max. :10500
## NA's :7 NA's :81 NA's :68
## pur_itc_value pur_has_srecs_pbi loan_offered loan_down_payment
## Min. :-2974 : 7 no :41 Min. : 0.0
## 1st Qu.: 5325 no :55 yes:59 1st Qu.: 0.0
## Median : 6881 yes:38 Median : 0.0
## Mean : 9687 Mean : 854.9
## 3rd Qu.:10650 3rd Qu.: 0.0
## Max. :65077 Max. :16088.0
## NA's :7 NA's :41
## loan_amount loan_term loan_rate bridge_loan_principal
## Min. : 6065 Min. : 5.00 Min. :0.000 Min. : 4556
## 1st Qu.: 15100 1st Qu.:10.00 1st Qu.:4.975 1st Qu.: 6619
## Median : 20125 Median :15.00 Median :5.990 Median :10055
## Mean : 26046 Mean :14.73 Mean :5.409 Mean :11126
## 3rd Qu.: 27917 3rd Qu.:20.00 3rd Qu.:6.490 3rd Qu.:14625
## Max. :216923 Max. :30.00 Max. :8.990 Max. :26130
## NA's :41 NA's :41 NA's :41 NA's :58
## bridge_loan_term lease_offered ppa_offered
## Min. :12.00 no :89 no :91
## 1st Qu.:12.00 yes:11 yes: 9
## Median :12.00
## Mean :13.79
## 3rd Qu.:16.00
## Max. :18.00
## NA's :58
#str(quote)
Columns with more than 40% missing or NA values in quotes dataset such as: pur_rebate_value, pur_state_tax_credit_value, bridge_loan_principal, bridge_loan_term,, loan_down_payment,loan_amount, loan_term, loan_rate ; will not be of much significance for further analysis.
Time to installation -
hist(quote$annual_prod_est_kwh)
hist(quote$percent_need_met)
quote$actual_need_est_kwh<-(quote$annual_prod_est_kwh/quote$percent_need_met)*100
cust_q<-aggregate(quote[,c(1,30)],list(quote$property_id),FUN=mean)
df<-cbind(customer[,c("property.ID","state")],actual_need_est_kwh=cust_q$actual_need_est_kwh)
Total units these installations can produce:
pur_gross_cost
% of electricity bill covered - The given sample looks left skewed with a good no. of quotes claiming to generate atleast 80% of the needed electricity.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
# re-order levels
reorder_size <- function(x) {
factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}
ggplot(df, aes(x = reorder_size(`state`))) +
geom_bar() +
xlab("State") +
# scale_y_continuous(labels = scales::percent, name = "Proportion") +
# facet_grid(`Marital Status` ~ Gender) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(quote, aes(x = reorder_size(`inverter_type`),y=quote$annual_prod_est_kwh/1000)) +
geom_bar(stat = "identity") +
xlab("Inverter Type") +
# title("Estimated energy Production by each solar installation") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(quote, aes(x = reorder_size(`inverter_type`),y=quote$pur_gross_cost)) +
geom_bar(stat = "identity") +
# , aes(y = (..count..)/sum(..count..))) +
xlab("Inverter Type") +
# scale_y_continuous(labels = scales::percent, name = "Proportion") +
# facet_grid( ~ state) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: Removed 7 rows containing missing values (position_stack).
ggplot(quote, aes(pur_gross_cost, panel_brand, colour = inverter_type)) +
geom_point()
## Warning: Removed 7 rows containing missing values (geom_point).