Executive Summary

This project is to explore the transaction datasets of 2009 US Gorvenment C.A.R.S (also as known as “Cash for Clunkder”) program. During this program, consumers turned in gas guzzlers and bought nearly 700,000 more fuel-efficient vehicles in fewer than 30 days. NHTSA website provides all the data collected from the program. In this analysis, we first try to create two metrics to identify 10 most successful states and 10 least successful states during the C.A.R.S program. Next, we would like to see whether the fuel-efficient car peference differs among different regions consumers by using the established metrics. Furthermore, by exploring the dataset, we find some behavioral patterns that can help us understand how consumers buy new cars. Lastly, we try to answer whether the data from the NHTSA sufficiently support that the program was “wildly successful” by the government. Although we only analyze Final Paid Transaction Data, other data and materials are also used to answer the last question.

Data Import

FinPaid_Trans <- read.delim("CARS_PAID_Final.txt", sep=",", header=T, na.strings="NA", dec=".", strip.white=TRUE)

I. 10 Most Successful States & 10 Least Successful States

In order to identify 10 most successful states and 10 least successful states, two metrics will be used:

1) Trade-in Count by State

2) Total MPG Increase by States

1. Trade-in Count by State

This metric is to count the transaction on the state level. The performance of each state depends on transaction amount, and more transactions occured means more successful of the C.A.R.S program in the state.

  1. Calculate Transaction Count

“Transaction Count” Summary Statistics

## transaction_State_Count$transaction_Count 
##       n missing  unique    Info    Mean     .05     .10     .25     .50 
##      55       0      55       1   12313   112.2   811.8  2703.0  8524.0 
##     .75     .90     .95 
## 16524.0 32573.4 34997.5 
## 
## lowest :     7    13    17   153   505
## highest: 33950 34246 36751 42949 76672
  1. Top 10 States
transaction_State_Count_top10
##           State Count
## 1    California 76672
## 2         Texas 42949
## 3      New York 36751
## 4       Florida 34246
## 5      Illinois 33950
## 6  Pennsylvania 32879
## 7          Ohio 32115
## 8      Michigan 31180
## 9    New Jersey 24538
## 10     Virginia 23555
  1. Last 10 States
transaction_State_Count_least10
##                       State Count
## 1  Northern Mariana Islands     7
## 2            Virgin Islands    13
## 3      District of Columbia    17
## 4                      Guam   153
## 5               Puerto Rico   505
## 6                   Wyoming   591
## 7                    Alaska  1143
## 8                   Montana  1469
## 9                    Hawaii  1717
## 10             North Dakota  2102

2. Total MPG Increase by States

This metric is to sum the MPG change (MPG difference between new vehicle and trade-in vehicle) of each transaction on the state level. First 10 states with greatest MPG increase are the top 10 most successful states, and last 10 successful states are with least MPG increase.

  1. Calculate the “MPG Change”

“Total MPG Change” Summary Statistics

## mpg_change_State_Sum$mpg_change_Sum 
##        n  missing   unique     Info     Mean      .05      .10      .25 
##       55        0       55        1   112944    983.3   6795.8  23840.5 
##      .50      .75      .90      .95 
##  76629.0 146145.5 290191.2 318967.9 
## 
## lowest :     69     90    158   1337   4554
## highest: 299140 314905 328448 380140 800064
  1. Top 10 States
mpg_change_State_Sum_top10
##           State MPG Increase
## 1    California       800064
## 2         Texas       380140
## 3      New York       328448
## 4       Florida       314905
## 5      Illinois       299140
## 6  Pennsylvania       291306
## 7          Ohio       288519
## 8      Michigan       256893
## 9      Virginia       224187
## 10   New Jersey       219168
  1. Last 10 States
mpg_change_State_Sum_least10
##                       State MPG Increase
## 1  Northern Mariana Islands           69
## 2            Virgin Islands           90
## 3      District of Columbia          158
## 4                      Guam         1337
## 5               Puerto Rico         4554
## 6                   Wyoming         5123
## 7                    Alaska         9305
## 8                   Montana        14045
## 9                    Hawaii        15408
## 10             North Dakota        17993

Both metrics come to the same conclusion: California, Texas, New York, Florida, Illinois, Pennsylvania, Ohio, Michigan, New Jersey and Virginia are the top 10 most successful states; while Northern Mariana Islands, Virgin Islands, District of Columbia, Guam, Puerto Rico, Wyoming, Alaska, Montana, Hawaii and North Dakota are top 10 least successful states.

II. West Coast Consumers Purchase More Fuel Efficient Cars?

Map-1: Trade-in Count by States Map

Map-2: Total MPG Increase by States Map

III. Consumer Behavioral Patterns

1. New Vehicle Category Purchase Patterns

1) First, we want to see whether there is association between trade-in vehicle category and new vehicle category by conducting Chi-Square test. The null hypothesis is that the trade-in car category does not affect the new car category.

Chi-Square Test Results:

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  677238 
## 
##  
##                                     | Paid_Cars$new_vehicle_category 
## Paid_Cars$trade_in_vehicle_category |         1 |         2 |         3 |         P | Row Total | 
## ------------------------------------|-----------|-----------|-----------|-----------|-----------|
##                                   1 |    182473 |       235 |         6 |    265098 |    447812 | 
##                                     |     0.407 |     0.001 |     0.000 |     0.592 |     0.661 | 
##                                     |     0.792 |     0.005 |     0.003 |     0.668 |           | 
## ------------------------------------|-----------|-----------|-----------|-----------|-----------|
##                                   2 |     31372 |     42289 |        19 |     45909 |    119589 | 
##                                     |     0.262 |     0.354 |     0.000 |     0.384 |     0.177 | 
##                                     |     0.136 |     0.890 |     0.008 |     0.116 |           | 
## ------------------------------------|-----------|-----------|-----------|-----------|-----------|
##                                   3 |       184 |      4990 |      2253 |       149 |      7576 | 
##                                     |     0.024 |     0.659 |     0.297 |     0.020 |     0.011 | 
##                                     |     0.001 |     0.105 |     0.989 |     0.000 |           | 
## ------------------------------------|-----------|-----------|-----------|-----------|-----------|
##                                   P |     16436 |        22 |         0 |     85803 |    102261 | 
##                                     |     0.161 |     0.000 |     0.000 |     0.839 |     0.151 | 
##                                     |     0.071 |     0.000 |     0.000 |     0.216 |           | 
## ------------------------------------|-----------|-----------|-----------|-----------|-----------|
##                        Column Total |    230465 |     47536 |      2278 |    396959 |    677238 | 
##                                     |     0.340 |     0.070 |     0.003 |     0.586 |           | 
## ------------------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  452923.1     d.f. =  9     p =  0 
## 
## 
## 
Since the p value is 0 (Chi^2 = 452923.1), which is significant at the 5% level. We can conclude that the trade-in car category does affect the new car category, therefore, there is an association between trade-in vehicle category and new vehicle category.
2) Then we would like to see if there are some purchase patterns in the vehicle category when consumers pick their new cars.

Barplot-1: Percentage of New Vehicle Category by Trade-in Vehicle Category

From above barplot, we can find below patterns:
<1> All categories car owners more prefer passenger car, except for category-3 car owners. Because people who want to upgrade to a fuel economic car will more likely to participate the C.A.R.S program. Also, C.A.R.S program rules more favor compact and fuel-efficient car type.
<2> When consumers trade their category-1 car, only 40% of them buy a new category-1 car, and almost 60% choose passenger car. One of the possible reasons is the similar functionality between the passenger car and category-1 car but better fuel-efficency in passenger car. We can see the similar preference change in the category-3 trade-in owners, who more prefer to the category-2 new car.
<3> More than 80% passenger car owners still purchase a passenger car. Because for most passenger car owners, passenger car is good for their daily commute.

2. New Vehicle Maker Origin Purchase Patterns

1) Based on the makder of origin, we categorize the makers into FIVE groups: USA, Japan Europe, Korean and Other (other countries or unkown origin). Below is the summary table:

Origin of Maker Dataset Summary Table

(“Origin of Trade-in Vehicle” as Row and “Origin of New Vehicle” as Column)
##         
##          Europe  Japan  Korea  Other    USA
##   Europe   1894  10489   1770      0   2541
##   Japan    3112  58633   8498      0  11267
##   Korea      77   1739   1170      0    766
##   Other       1     18      3      0     10
##   USA     14481 248462  66357      0 245950
2) Due to untracebility and small size on “Other” group, we exclude “Other” group from our further analysis. As previous analysis, we then use the Chi-Square test to check the independency between trade-in vehicle origin and new vehicle origin. The null hypothesis is that the trade-in car origin does not affect the new car origin.

Origin of Maker Dataset without “Other”" Group Summary Table

(“Origin of Trade-in Vehicle” as Row and “Origin of New Vehicle” as Column)
##         
##          Europe  Japan  Korea    USA
##   Europe   1894  10489   1770   2541
##   Japan    3112  58633   8498  11267
##   Korea      77   1739   1170    766
##   USA     14481 248462  66357 245950

Chi-Square Test Results:

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  677206 
## 
##  
##                                | Paid_Cars_new$new_vehicle_country 
## Paid_Cars_new$trade_in_country |    Europe |     Japan |     Korea |       USA | Row Total | 
## -------------------------------|-----------|-----------|-----------|-----------|-----------|
##                         Europe |      1894 |     10489 |      1770 |      2541 |     16694 | 
##                                |     0.113 |     0.628 |     0.106 |     0.152 |     0.025 | 
##                                |     0.097 |     0.033 |     0.023 |     0.010 |           | 
## -------------------------------|-----------|-----------|-----------|-----------|-----------|
##                          Japan |      3112 |     58633 |      8498 |     11267 |     81510 | 
##                                |     0.038 |     0.719 |     0.104 |     0.138 |     0.120 | 
##                                |     0.159 |     0.184 |     0.109 |     0.043 |           | 
## -------------------------------|-----------|-----------|-----------|-----------|-----------|
##                          Korea |        77 |      1739 |      1170 |       766 |      3752 | 
##                                |     0.021 |     0.463 |     0.312 |     0.204 |     0.006 | 
##                                |     0.004 |     0.005 |     0.015 |     0.003 |           | 
## -------------------------------|-----------|-----------|-----------|-----------|-----------|
##                            USA |     14481 |    248462 |     66357 |    245950 |    575250 | 
##                                |     0.025 |     0.432 |     0.115 |     0.428 |     0.849 | 
##                                |     0.740 |     0.778 |     0.853 |     0.944 |           | 
## -------------------------------|-----------|-----------|-----------|-----------|-----------|
##                   Column Total |     19564 |    319323 |     77795 |    260524 |    677206 | 
##                                |     0.029 |     0.472 |     0.115 |     0.385 |           | 
## -------------------------------|-----------|-----------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  37699.88     d.f. =  9     p =  0 
## 
## 
## 
Since the p value is 0 (Chi^2 = 37699.88), which is significant at the 5% level. We can conclude that the trade-in car origin does affect the the new car origin, therefore, there is an association between trade-in car origin and new car origin.
3) Then we would like to see if there are purchase patterns of maker origin that can help us understand how consumers choose their new cars.

Barplot-2: Percentage of Origin of New Vehicle by Origin of Trade-in Vehicle

New Vehicle Average MPG (Mile Per Gallon) Based on Origin of Vehicle Table

##   Origin of Maker Average MPG
## 1          Europe       26.98
## 2           Japan       26.59
## 3           Korea       25.48
## 4             USA       22.43

New Vehicle Average MSRP (Manufacturer Suggested Retail Price) Based on Origin of Vehicle Table

##   Origin of Maker Average MSRP
## 1           Korea        19007
## 2           Japan        22163
## 3             USA        23630
## 4          Europe        25206

IV. “Wildly Successful” by Government?

I think there is sufficient data from NHTSA to support that conclusion. The transaction data from NHTSA shows that consumers are very likely to trade-in their gas cluncker and buy a fuel-efficient car due to C.A.R.S program. NHTSA also conducted a consumer survey that provides more information about consumer trade-in decision without C.A.R.S program. Question 1 of the consumer survey indicates the number of years the consumer would have delayed the resale or trade of the trade-in vehicle. The number of vehicles sold not as a result of the program can be identified from Question 1, if consumer indicates they would have purchased a vehicle this month without the incentive. Question 3 indicates whether consumer trades a primary vehicle or a rarely used vehicle. Consumer surveys can help to eliminate any interaction effects by other factors in the analysis. Consumer surveys results in the report available on NHTSA website show that, “88 percent indicated the main incentive for purchasing a vehicle in the current month was the CARS program. This correlates to roughly 597,950 sales due to the CARS Act.” These data and results prove government’s conclusion.

APPENDIX

1. Structure of Final Paid Transaction Dataset

## 'data.frame':    677238 obs. of  43 variables:
##  $ vendor_id                     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ dealer_name                   : Factor w/ 17020 levels "#N/A","0","10-2002  LLC",..: 7366 7366 7366 7366 7366 7366 7366 7366 7366 7366 ...
##  $ address_line1                 : Factor w/ 16676 levels "# 1 Swope AuroCenter Dr",..: 11538 11538 11538 11538 11538 11538 11538 11538 11538 11538 ...
##  $ address_line2                 : Factor w/ 2038 levels "","(M-99)","`",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ address_line3                 : Factor w/ 108 levels "","`","100 Blackheath Drive",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ address_line4                 : Factor w/ 27 levels "","1012 Fox Creek Circle",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ city                          : Factor w/ 4103 levels "Abbeville","Abbotsford",..: 2270 2270 2270 2270 2270 2270 2270 2270 2270 2270 ...
##  $ state                         : Factor w/ 55 levels "AK","AL","AR",..: 39 39 39 39 39 39 39 39 39 39 ...
##  $ ZIP                           : int  73110 73110 73110 73110 73110 73110 73110 73110 73110 73110 ...
##  $ area_code                     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ phone                         : Factor w/ 13598 levels "","1-800-528-8993",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ invoice_id                    : int  844935 380253 129314 726314 817520 297526 1027993 378066 952741 388873 ...
##  $ invoice_num                   : Factor w/ 480465 levels "_35439","-106086",..: 454011 453260 453298 453305 453908 453914 454936 453981 453924 453932 ...
##  $ invoice_date                  : Factor w/ 518 levels "1-Apr-09","1-Aug-06",..: 203 375 396 119 170 4 251 375 236 375 ...
##  $ sale_date                     : Factor w/ 99 levels "1-Aug-09","1-Jul-09",..: 22 80 29 5 18 19 48 71 5 1 ...
##  $ disposal_status               : Factor w/ 5 levels "CRUSH","No Entity",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ disposal_facility_nmvtis_id   : int  3101 3640 3640 3640 3640 3640 3101 3640 3640 3640 ...
##  $ disposal_facility_contact_info: Factor w/ 3694 levels "","#1 AUTO","#1 AUTO SALVAGE",..: 1462 1462 1462 1462 1462 1462 1462 1462 1462 1462 ...
##  $ sales_type                    : Factor w/ 2 levels "LEASED","PURCHASED": 2 2 2 2 2 2 2 2 2 2 ...
##  $ invoice_amount                : int  4500 4500 4500 4500 4500 4500 4500 4500 4500 4500 ...
##  $ trade_in_VIN                  : Factor w/ 677222 levels "/G6KD52YXTU229685",..: 3340 8204 16100 16125 21585 21749 24009 27842 33194 34882 ...
##  $ trade_in_vehicle_category     : Factor w/ 4 levels "1","2","3","P": 1 1 1 1 1 1 1 1 1 1 ...
##  $ trade_in_make                 : Factor w/ 60 levels "Acura","Alfa Romeo",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ trade_in_model                : Factor w/ 952 levels "1-Ton Truck 2WD",..: 272 272 350 350 351 351 163 331 331 331 ...
##  $ trade_in_year                 : int  1994 1997 1999 1999 2000 2000 1984 1995 1999 1997 ...
##  $ trade_in_vehicle_drive_train  : Factor w/ 1943 levels " 0.0L, 8 cyl, Automatic 4-spd, RWD, (MPFI)     (NO-CAT), Diesel ",..: 624 499 1273 1273 1271 1271 1092 650 651 644 ...
##  $ trade_in_mileage              : int  17 18 13 13 12 12 12 17 17 16 ...
##  $ trade_in_title_state          : Factor w/ 60 levels "AK","AL","AR",..: 42 42 42 42 42 42 42 42 42 42 ...
##  $ trade_in_registration_state   : Factor w/ 60 levels "AK","AL","AR",..: 42 42 42 42 42 42 42 42 42 42 ...
##  $ trade_in_registration_start   : Factor w/ 8971 levels "1-Apr-00","1-Apr-01",..: 4449 4198 4294 7319 670 5479 3065 6265 2304 1049 ...
##  $ trade_in_registration_end     : Factor w/ 2543 levels "1-Apr-06","1-Apr-07",..: 1985 1963 1863 2008 1986 2075 2008 55 1608 2025 ...
##  $ trade_in_insurance_start      : Factor w/ 8285 levels "","1-Apr-00",..: 1 1 7109 1 1 1 1 1 1 1 ...
##  $ trade_in_NMVTIS_flag          : Factor w/ 1626 levels "","1-Apr-03",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ trade_in_odometer_reading     : int  166000 231519 130648 178733 93105 169662 68102 175000 162545 106190 ...
##  $ new_vehicle_VIN_trunc         : Factor w/ 25436 levels "!FMCU49329KXXXXXX",..: 8258 8195 21160 15641 15687 22222 15378 21031 8229 20895 ...
##  $ new_vehicle_category          : Factor w/ 4 levels "1","2","3","P": 4 4 1 1 4 4 4 4 4 4 ...
##  $ new_vehicle_make              : Factor w/ 35 levels "Acura","Audi",..: 33 33 33 33 33 33 33 33 33 33 ...
##  $ new_vehicle_model             : Factor w/ 437 levels "128i","3","300/SRT-8",..: 88 88 200 411 54 54 54 276 88 88 ...
##  $ new_vehicle_year              : int  2010 2009 2009 2009 2010 2010 2010 2010 2010 2010 ...
##  $ new_vehicle_drive_train       : Factor w/ 384 levels "1.3L, 4 cyl, Auto(AV-S7), FWD, Regular",..: 19 19 213 219 91 91 91 18 19 19 ...
##  $ new_vehicle_car_mileage       : int  29 30 20 22 26 26 26 50 29 29 ...
##  $ new_vehicle_MSRP              : int  17250 16750 27600 27800 21900 24565 23165 22000 16150 16750 ...
##  $ mpg_change                    : int  12 12 7 9 14 14 14 33 12 13 ...

2. First Row of Final Paid Transaction Dataset

##   vendor_id           dealer_name        address_line1 address_line2
## 1         1 HUDIBURG MOTORS, INC. 5800 Tinker Diagonal              
##   address_line3 address_line4         city state   ZIP area_code phone
## 1                             Midwest City    OK 73110        NA      
##   invoice_id invoice_num invoice_date sale_date disposal_status
## 1     844935     T40262A    20-Aug-09 15-Aug-09           CRUSH
##   disposal_facility_nmvtis_id disposal_facility_contact_info sales_type
## 1                        3101                 G&R AUTO PARTS  PURCHASED
##   invoice_amount      trade_in_VIN trade_in_vehicle_category trade_in_make
## 1           4500 1B4GH54R7RX188224                         1         Dodge
##              trade_in_model trade_in_year
## 1 Caravan/Grand Caravan 2WD          1994
##                          trade_in_vehicle_drive_train trade_in_mileage
## 1  3.8L, 6 cyl, Automatic 4-spd, FWD, (FFS), Regular                17
##   trade_in_title_state trade_in_registration_state
## 1                   OK                          OK
##   trade_in_registration_start trade_in_registration_end
## 1                   23-Apr-07                 31-Dec-09
##   trade_in_insurance_start trade_in_NMVTIS_flag trade_in_odometer_reading
## 1                                                                  166000
##   new_vehicle_VIN_trunc new_vehicle_category new_vehicle_make
## 1     1NXBU4EE7AZXXXXXX                    P           Toyota
##   new_vehicle_model new_vehicle_year
## 1           Corolla             2010
##                      new_vehicle_drive_train new_vehicle_car_mileage
## 1 1.8L, 4 cyl, Automatic 4-spd, FWD, Regular                      29
##   new_vehicle_MSRP mpg_change
## 1            17250         12