Luke Browne s3689420 and Simon Boyd s3046049
Last updated: 28 October, 2018
In March 2014, journalist Nina Nina Hardy(2014) published an article claiming 73% of new cars sold are coloured white, silver, grey or black.
As car enthusiasts, we’ve decided to test this assertation by examining car registrations recorded by Vic Roads for the month of August 2018.
The hypothesis is based on Hardy’s article that 73% of new cars sold are coloured white, silver, grey or black.
The downloaded data was filtered to produce a list of suspected new cars.
This created a subset of the total data.
The assumption is that cars newly registered for the month were either built in 2018 or 2017 and the sellers were registered LMCT providers.
A chi-squared “Goodness of fit” test was applied to the data subset.
The results prompted the authors to test the colour distribution ratio to all older cars registered in the month of August 2018 and all cars registered in 2014 that remain on the register.
The same test was applied to these two additional subsets.
The car registrations transfer data were taken from https://www.data.vic.gov.au/data/dataset/motor_vehicle_transfers and produced by Vic Roads.
carreg <- read_xlsx('Monthly Vehicle Transfer data.xlsx',skip=3,sheet="VACC Monthly Vehicle Transfers")## Classes 'tbl_df', 'tbl' and 'data.frame': 61276 obs. of 8 variables:
## $ LMCT acquirer : chr "n" "n" "n" "n" ...
## $ LMCT disposer : chr "n" "n" "n" "n" ...
## $ Vehicle Make : chr "ALFA R" "ALFA R" "ALFA R" "ALFA R" ...
## $ Model Name : chr "147" "147" "147" "147" ...
## $ Colour primary : chr "BLK" "BLK" "BLU" "GLD" ...
## $ Year of Manufacture : num 2008 2009 2005 2001 2005 ...
## $ Resi. address postcode: num 3083 3004 3089 3058 3337 ...
## $ Count Transfer Id : num 1 1 1 1 1 1 1 1 1 1 ...
| LMCT acquirer | LMCT disposer | Vehicle Make | Model Name | Colour primary | Year of Manufacture | Resi. address postcode | Count Transfer Id |
|---|---|---|---|---|---|---|---|
| n | n | ALFA R | 147 | BLK | 2008 | 3083 | 1 |
| n | n | ALFA R | 147 | BLK | 2009 | 3004 | 1 |
| n | n | ALFA R | 147 | BLU | 2005 | 3089 | 1 |
| n | n | ALFA R | 147 | GLD | 2001 | 3058 | 1 |
| n | n | ALFA R | 147 | RED | 2005 | 3337 | 1 |
| n | n | ALFA R | 147 | SIL | 2005 | 3034 | 1 |
carreg$`LMCT acquirer` <- factor(carreg$`LMCT acquirer`,levels=c("n","y"),labels=c("No","Yes"))
carreg$`LMCT disposer` <- factor(carreg$`LMCT disposer`,levels=c("n","y"),labels=c("No","Yes"))
carreg$`Colour primary` <- factor(carreg$`Colour primary`) carreg <- carreg %>% mutate(Age=year(now())-`Year of Manufacture`)## Classes 'tbl_df', 'tbl' and 'data.frame': 61276 obs. of 9 variables:
## $ LMCT acquirer : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ LMCT disposer : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ Vehicle Make : chr "ALFA R" "ALFA R" "ALFA R" "ALFA R" ...
## $ Model Name : chr "147" "147" "147" "147" ...
## $ Colour primary : Factor w/ 17 levels "BLK","BLU","BRN",..: 1 1 2 6 14 15 15 14 1 1 ...
## $ Year of Manufacture : num 2008 2009 2005 2001 2005 ...
## $ Resi. address postcode: num 3083 3004 3089 3058 3337 ...
## $ Count Transfer Id : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Age : num 10 9 13 17 13 13 10 14 16 14 ...
| LMCT acquirer | LMCT disposer | Vehicle Make | Model Name | Colour primary | Year of Manufacture | Resi. address postcode | Count Transfer Id | Age |
|---|---|---|---|---|---|---|---|---|
| No | No | ALFA R | 147 | BLK | 2008 | 3083 | 1 | 10 |
| No | No | ALFA R | 147 | BLK | 2009 | 3004 | 1 | 9 |
| No | No | ALFA R | 147 | BLU | 2005 | 3089 | 1 | 13 |
| No | No | ALFA R | 147 | GLD | 2001 | 3058 | 1 | 17 |
| No | No | ALFA R | 147 | RED | 2005 | 3337 | 1 | 13 |
| No | No | ALFA R | 147 | SIL | 2005 | 3034 | 1 | 13 |
Examination of the age variable highlights an error - some cars had no year of manufacture, creating a age outlier of 2018 years.
We have made the assumption that the year of manuafacture for these cars are 2018, and, using a ifelse statement to change the year of manufacture.
We then created a subset of data with cars that are made in 2017 and 2018.
carreg_subsetnew <- carreg %>% filter(Age<=1 & `LMCT disposer`=="Yes")
table1<- head(carreg_subsetnew)
knitr::kable(table1)| LMCT acquirer | LMCT disposer | Vehicle Make | Model Name | Colour primary | Year of Manufacture | Resi. address postcode | Count Transfer Id | Age |
|---|---|---|---|---|---|---|---|---|
| No | Yes | ALFA R | 4C | RED | 2017 | 3101 | 1 | 1 |
| No | Yes | ALFA R | GIULIA | BLK | 2017 | 3002 | 1 | 1 |
| No | Yes | ALFA R | GIULIA | BLK | 2018 | 3145 | 1 | 0 |
| No | Yes | ALFA R | GIULIA | GRY | 2017 | 3107 | 1 | 1 |
| No | Yes | ALFA R | GIULIA | RED | 2018 | 3002 | 1 | 0 |
| No | Yes | ALFA R | GIULIA | WHI | 2017 | 3113 | 1 | 1 |
\[H_0: \mu_1 = \mu_2 \]
\[H_A: \mu_1 \ne \mu_2\] To peform the test we needed to build the expected frequency table based on the article, this gave us specific values for 4 colors and 73% of new cars and stated the remaing colors were distributed evenly across the remaing 27%. This yielded expected colors of:
expcol <- c(0.17,0.02076923,0.02076923,0.02076923,0.02076923,0.02076923,0.02076923,
0.18,0.02076923,0.02076923,0.02076923,0.02076923,0.02076923,0.02076923,0.17,0.21,0.02076923)We then conducted a goodness of fit test to test our Hypothesis. The result of 26.30 tells us that \(x^2\) above this number has less than a 0.05 probability of occuring if \(H_0\) is True.
qchisq(p=0.05,df=16,lower.tail = FALSE)## [1] 26.29623
carnewtab <- table(carreg_subsetnew$`Colour primary`)
carnewtab##
## BLK BLU BRN CRM FWN GLD GRN GRY MRN MVE OGE PNK PUR RED SIL
## 614 476 55 2 2 15 19 875 4 0 23 1 1 425 690
## WHI YLW
## 1616 24
prop.table(carnewtab)##
## BLK BLU BRN CRM FWN
## 0.1268071045 0.0983064849 0.0113589426 0.0004130525 0.0004130525
## GLD GRN GRY MRN MVE
## 0.0030978934 0.0039239983 0.1807104502 0.0008261049 0.0000000000
## OGE PNK PUR RED SIL
## 0.0047501033 0.0002065262 0.0002065262 0.0877736473 0.1425030979
## WHI YLW
## 0.3337463858 0.0049566295
GOFT <- chisq.test(carnewtab,p=expcol)
GOFT$observed##
## BLK BLU BRN CRM FWN GLD GRN GRY MRN MVE OGE PNK PUR RED SIL
## 614 476 55 2 2 15 19 875 4 0 23 1 1 425 690
## WHI YLW
## 1616 24
GOFT$expected## BLK BLU BRN CRM FWN GLD GRN
## 823.1400 100.5646 100.5646 100.5646 100.5646 100.5646 100.5646
## GRY MRN MVE OGE PNK PUR RED
## 871.5600 100.5646 100.5646 100.5646 100.5646 100.5646 100.5646
## SIL WHI YLW
## 823.1400 1016.8200 100.5646
GOFT##
## Chi-squared test for given probabilities
##
## data: carnewtab
## X-squared = 3737.4, df = 16, p-value < 2.2e-16
We then tested 2014, which is when the article was written. Again, if \(H_0\) is true, then any \(X^2\) value > 26.30 would have a 0.05 probability of occuring.
qchisq(p=0.05,df=16,lower.tail = FALSE)## [1] 26.29623
carreg_subset2014 <- carreg %>% filter(`Year of Manufacture`==2014 | `Year of Manufacture`==2013 )
car2014tab <- table(carreg_subset2014$`Colour primary`)
car2014tab##
## BLK BLU BRN CRM FWN GLD GRN GRY MRN MVE OGE PNK PUR RED SIL
## 1151 669 102 4 9 52 77 1201 26 0 99 6 14 615 1267
## WHI YLW
## 3032 28
prop.table(car2014tab)##
## BLK BLU BRN CRM FWN
## 0.1378113027 0.0801005747 0.0122126437 0.0004789272 0.0010775862
## GLD GRN GRY MRN MVE
## 0.0062260536 0.0092193487 0.1437978927 0.0031130268 0.0000000000
## OGE PNK PUR RED SIL
## 0.0118534483 0.0007183908 0.0016762452 0.0736350575 0.1517001916
## WHI YLW
## 0.3630268199 0.0033524904
GOFT3 <- chisq.test(car2014tab,p=expcol)
GOFT3$observed##
## BLK BLU BRN CRM FWN GLD GRN GRY MRN MVE OGE PNK PUR RED SIL
## 1151 669 102 4 9 52 77 1201 26 0 99 6 14 615 1267
## WHI YLW
## 3032 28
GOFT3$expected## BLK BLU BRN CRM FWN GLD GRN
## 1419.8400 173.4646 173.4646 173.4646 173.4646 173.4646 173.4646
## GRY MRN MVE OGE PNK PUR RED
## 1503.3600 173.4646 173.4646 173.4646 173.4646 173.4646 173.4646
## SIL WHI YLW
## 1419.8400 1753.9200 173.4646
GOFT3##
## Chi-squared test for given probabilities
##
## data: car2014tab
## X-squared = 4849.6, df = 16, p-value < 2.2e-16
carreg_subsetold <- carreg %>% filter(Age>1) # this subset excludes cars that have been removed from the register
qchisq(p=0.05,df=16,lower.tail = FALSE)## [1] 26.29623
caroldtab <- table(carreg_subsetold$`Colour primary`)
caroldtab##
## BLK BLU BRN CRM FWN GLD GRN GRY MRN MVE OGE PNK
## 7547 5770 464 67 264 1463 1453 6787 397 12 334 43
## PUR RED SIL WHI YLW
## 201 4184 11341 14500 264
prop.table(caroldtab)##
## BLK BLU BRN CRM FWN
## 0.1369915231 0.1047358008 0.0084224283 0.0012161696 0.0047920713
## GLD GRN GRY MRN MVE
## 0.0265560618 0.0263745439 0.1231961663 0.0072062587 0.0002178214
## OGE PNK PUR RED SIL
## 0.0060626963 0.0007805268 0.0036485088 0.0759470694 0.2058593963
## WHI YLW
## 0.2632008858 0.0047920713
GOFT2 <- chisq.test(caroldtab,p=expcol)
GOFT2$observed##
## BLK BLU BRN CRM FWN GLD GRN GRY MRN MVE OGE PNK
## 7547 5770 464 67 264 1463 1453 6787 397 12 334 43
## PUR RED SIL WHI YLW
## 201 4184 11341 14500 264
GOFT2$expected## BLK BLU BRN CRM FWN GLD GRN
## 9365.470 1144.198 1144.198 1144.198 1144.198 1144.198 1144.198
## GRY MRN MVE OGE PNK PUR RED
## 9916.380 1144.198 1144.198 1144.198 1144.198 1144.198 1144.198
## SIL WHI YLW
## 9365.470 11569.110 1144.198
GOFT2##
## Chi-squared test for given probabilities
##
## data: caroldtab
## X-squared = 36241, df = 16, p-value < 2.2e-16
As \(x^2=36241\) and \(36241 > 26.30\) we reject \(H_0\)
A Chi-squared goodness of fit test was conducted to determine whether the distribution of colours for new cards match the figures given by Nina Hardy in 2014. The test was statistically significant, with the \(x^2=3747.4\), \(df=16\) and \(p<0.001\).
The results suggest that for 2018. the color distribution of new cars does not match the figures given in Nina Hardy’s article.
We then conducted a second test on cars sold in 2014, the year the article was written. This test was also statistically significant, with \(x^2=4849.6\), \(df=16\) and \(p<0.001\).
Finally, we tested all calls greater then 1 year in age, and again we found statistically significant results, with \(x^2=386241\), \(df=16\) and \(p<0.001\)
The results obtained do not support the figures given in Nina Hardy’s article with sufficent statistical reliability.
The limitations for this test is that the data is limited to Victoria only, and the assumption that the cars sold were new cars only, not second-hand cars.
Hardy, Nina “Any colour as long as it’s white”, https://www.smh.com.au/money/borrowing/any-colour-you-like-as-long-as-its-white-20140308-34e0p.html , Sydney Morning Herald, 10 March 2014.