As an international city, shanghai has become the center of attention of East Asia, along with the expedited development, traffic became one of the most cumbersome issues. We will try to do an analysis on the overall car license price, by year, by season and by month.
We will check if there are missing data, assign seasons and month.
shcar$b <- as.character.Date(shcar$Date)
shcar$a <- strsplit(as.character(shcar$b), ".", fixed=TRUE)
#seperate year and month
shcar$year <- substr(shcar$b, 1,4)
shcar$t1 <- substr(shcar$b, 6,8)
#clean 20-90
shcar$t <- ifelse ((shcar$t %in% c("20","30","40","50","60","70","80","90")),
substr(shcar$t,1,1), shcar$t)
shcar$o <- lead(shcar$t, 1)
#clean 10 (oct, Jan)
shcar$tt <- ifelse( shcar$t=="10" & shcar$o=="2", substr(shcar$t,1,1), shcar$t)
shcar$ttt <- ifelse( shcar$t=="10" & is.na(shcar$o) ,substr(shcar$t,1,2), shcar$tt)
#clean 01-09
shcar$ttt <- ifelse (shcar$ttt %in% c("01" , "02", "03", "04","05", "06","07","08" ,"09"),
substr(shcar$ttt,2,3), shcar$ttt)
#assign season winter(12,1,2), spring(3,4,5) , summer(6,7,8), fall(9,10,11)
shcar$s <- ifelse(shcar$ttt %in% c("1" , "2", "12"), "Winter",
ifelse( shcar$ttt %in% c("3" , "4" , "5"), "Spring",
ifelse ( shcar$ttt %in% c("6", "7", "8"), "Summer",
"Fall")))
#combine year and month
shcar$d <- paste( as.numeric(shcar$year), as.numeric(shcar$ttt), sep="-")
shcar$date <- as.yearmon(shcar$d)
shcar$date1 <- as.Date(as.yearmon(shcar$d))
shcar$tttt <- as.numeric(shcar$ttt)
shcar$ttt3 <- factor(shcar$tttt,
labels=c("Jan","Feb","Mar","Apr","May","Jun","Jul",
"Aug","Sep","Oct","Nov","Dec"))
we will perform our analysis in three steps, we first look the overall price over the 12 years from 2002-2017. then we will perform an analysis by each year. followed by we will group all the seasons over 12 years see if the price will affected by the season. If so, we will break down to analysis by month.
From the graph we can see the license plate price increased drastically, especially from 2014. It went up from around 20,000Yuan to nearly 90,000Yuan with in 16 years. Anther feature we can see from the graph is lowest bid price is very close to the average bid price, and difference is getting closer and closer espcially after 2013, the difference is within 200 yuan. The second plot showed the participants of license bid has an outrageous increasing rate, particually happen around 2014. However, the number of plate offer to public is significantly less than the bid number.The number of bids went from average of 4,373 a year to 253,335 applications a year. Netherless, the license plate is only offer 2,654 to 11,023 per year on average.
# facet grid time series by year
#f1 <- select (shcar, date, avg_deal_price )
#f1$year <- format(f1$date, "%Y")
#f1$month <-format(f1$date, "%b")
#f1$commond <- as.Date( paste0("2002-", format(f1$date,"%j")), "%Y-%j")
#ff1 <- ggplot(data=f1, mapping=aes(x=commond, y=avg_deal_price , shape=year, colour=year))+
# geom_point()+geom_line()+ facet_grid(facets= year~ .)+
#scale_x_date(labels=function(x) format(x,"%b") )
#ggplotly()
#plate price
car1 <- cbind(shcar$avg_deal_price, shcar$lowest_deal_price)
car2<-xts(car1, shcar$date)
dygraph(car2, main="ShangHai Car License Bid Price") %>%
dyOptions(colors=RColorBrewer::brewer.pal(3, "Set2"), fillGraph=TRUE, fillAlpha=0.4 ,
axisLineWidth=1.5) %>%
dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.2, hideOnMouseOut=FALSE)%>% dyAxis("x", label = "year") %>% dyLegend(show="follow") %>%
dySeries("V1", label="Average Deal Price") %>% dySeries("V2", label="Lowest Deal Price")
#plate price difference
car3 <- shcar %>% mutate(pricediff= avg_deal_price -lowest_deal_price)
car4 <- xts(car3$pricediff, car3$date)
dygraph(car4, main="ShangHai Car License Bid Price Difference") %>%
dyOptions(colors=RColorBrewer::brewer.pal(3, "Set2"), fillGraph=TRUE, fillAlpha=0.4 ,
axisLineWidth=1.5) %>%
dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.2, hideOnMouseOut=FALSE)%>% dyAxis("x", label = "year") %>% dyLegend(show="follow") %>%
dySeries("V1", label="Bid Price Difference")
#plate number
plate1 <- cbind(shcar$num_bidder, shcar$num_plates)
plate2 <- xts(plate1, shcar$date)
dygraph(plate2, main="Shanghai Car License bid number") %>%
dyOptions(colors=RColorBrewer::brewer.pal(4, "Set1"), fillGraph=TRUE, fillAlpha=0.4 ,
axisLineWidth=1.5) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.2)%>% dyAxis("x", label = "year") %>%
dySeries("V1", label="Number of Bidder") %>% dySeries("V2", label="Number of Plates")
#line plot
line1 <- select(shcar, avg_deal_price, ttt3, year)
colnames(line1) <-c("avgprice","mon","year")
line1$avgprice <- as.numeric(line1$avgprice)
l1<- ggplot(line1, aes(x=mon, y=avgprice, colour=year, group=year))+geom_point()+geom_line()+
scale_y_continuous( breaks=c(0,10000,20000,30000,40000,50000,60000,70000,80000,90000,100000))+
geom_dl(aes(label= year), method=list("first.points", alpha=0.6 ,rot=-45))+
labs(x="Month", y="Avgerage Price", title="Average Price of ShangHai Car License Plate From 2002-2017")
l1
Now we will investigate more into the data, by compare them in year. We will group the data by year then take the average price of each year. We first show a bar plot of the relationship between car license plate applicants and number of car plates. The graph demonstrated from 2014, the number of applicants grow excessively out of the plates. Then the growth rate plot showed the number of license plates are relatively stable compare the the number of applicants. The Car Plate Winning rate supported the previous conclusion, before 2014, the winning rate is 30% or more. from 2014, the winning rate decreased to less than 10%, nowadays it is around 5%. Moreover, the growth of average price is within 10% from 2015. Then we will plot the growth rate with 2002 as the baseline, see how it changes every year respect to 2002. We can see, the winning rate, the average price is gradually converges to stable from 2015. I guess the main reason is the market is gradually saturated.
#dlyr aggregate them in year.
year1 <- group_by(shcar, year)
#Calculate the average of deal price, lowest price, number of bid and number of plates
year2 <- summarise(year1, count=n(),
aprice= mean(avg_deal_price),
lprice=mean(lowest_deal_price), abid=mean(num_bidder),
anum=mean(num_plates))
#Calucate the chance getting win the plate
year2<- year2 %>% mutate(win=anum/abid*100,
gbid=(abid-lag(abid))*100/lag(abid),
gnum=(anum-lag(anum))*100/lag(anum),
gaprice=(aprice-lag(aprice))*100/lag(aprice),
gbaprice=round(((aprice-20956)*100/20956),0), gbbid=round((abid-4373)*100/4373,0),
gbnum=round((anum-2654)*100/2654,0),
mon1=12)
#Compare the average bid number and plate number
p1 <- ggplot(year2, aes(x=year, fill="white"))+
geom_bar(aes(y=abid,fill="coral"), stat="identity", position="dodge",col="coral")+
geom_bar(aes(y=anum), stat="identity", position="dodge",col="cyan")+
labs(x="Year", y="Number of license Plate",
title="Number of Bids Vs Number of License Plates")+ theme(legend.position="none", axis.text.x = element_text(angle = 60, hjust = 1)) #+ scale_x_continuous(breaks=year2$year)
p1
#Growth Rate of number plate and bid number
year2$d <- paste( as.numeric(year2$year), year2$mon1, sep="-")
year2$d1 <- as.yearmon(year2$d)
num <- cbind(year2$gbid, year2$gnum)
num1 <-xts( num , order.by=as.POSIXct(as.Date(year2$d1)))
gw0<-dygraph(num1, main="Shanghai Car License plate growth rate vs bid growth rate") %>%
dyOptions(colors=RColorBrewer::brewer.pal(4, "Set1"), fillGraph=TRUE, fillAlpha=0.4, drawXAxis = TRUE ) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.1)%>% dyAxis("x", label = "Year from 2002-2017", c(1002,2017)) %>% dyLegend(show="follow") %>%
dySeries("V1", label="Growth rate of Bids") %>% dySeries("V2", label="Growth rate of Car Plates")
gw0
#Growth Rate of number plate and bid number with baseline 2002
bnum <- cbind(year2$gbbid, year2$gbnum)
bnum1 <-xts( bnum , order.by=as.POSIXct(as.Date(year2$d1)))
bgw0<-dygraph(bnum1, main="Shanghai License Plate vs Bid growth rate with baseline 2002") %>%
dyOptions(colors=RColorBrewer::brewer.pal(4, "Set2"), fillGraph=TRUE, fillAlpha=0.4, drawXAxis = TRUE ) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.1)%>% dyAxis("x", label = "Year from 2002-2017", c(1002,2017)) %>% dyLegend(show="follow") %>%
dySeries("V1", label="Growth rate of Bids based on 2002") %>% dySeries("V2", label="Growth rate of Car Plates based on 2002")
bgw0
#winning rate
year2$d <- paste( as.numeric(year2$year), year2$mon1, sep="-")
year2$d1 <- as.yearmon(year2$d)
win1 <- xts( year2$win , order.by=as.POSIXct(as.Date(year2$d1)))
gw1<-dygraph(win1, main="Shanghai Car License Lottery Winning Chance (%)") %>%
dyOptions(colors="pink" ,fillGraph=TRUE, fillAlpha=0.2, drawXAxis = TRUE ) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.1, hideOnMouseOut=FALSE)%>% dyAxis("x", label = "Year from 2002-2017", c(1002,2017)) %>% dyLegend(show="follow") %>%
dySeries("V1", label="Car Plate Winning (%)")
gw1
#Compare the average price and lowest bid price
price <- cbind(year2$lprice, year2$aprice)
price1 <- xts( price , order.by=as.POSIXct(as.Date(year2$d1)))
gp1<-dygraph(price1, main="Shanghai Car License Average Price VS Lowest Price by Year") %>%
dyOptions(colors=RColorBrewer::brewer.pal(4, "Set1"), fillGraph=TRUE, fillAlpha=0.4, drawXAxis = TRUE ) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.1)%>% dyAxis("x", label = "Year from 2002-2017", c(1002,2017)) %>% dyLegend(show="follow") %>%
dySeries("V1", label="Lowest Deal Price") %>%
dySeries("V2", label="Average Deal Price")
gp1
#Growth Rate of average price
gaprice1 <- xts( year2$gaprice , order.by=as.POSIXct(as.Date(year2$d1)))
gap1<-dygraph(gaprice1, main="Shanghai car License Plate average rate of growth") %>%
dyOptions(colors="green" ,fillGraph=TRUE, fillAlpha=0.2, drawXAxis = TRUE ) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.1, hideOnMouseOut=FALSE)%>% dyAxis("x", label = "Year from 2002-2017", c(1002,2017)) %>% dyLegend(show="follow") %>%
dySeries("V1", label="Rate of Average Car Plate price growth(%) by Year")
gap1
#Growth Rate of average price from baseline
gbaprice1 <- xts( year2$gbaprice , order.by=as.POSIXct(as.Date(year2$d1)))
gbap1<-dygraph(gbaprice1,
main="Shanghai car License Plate average rate of growth from 2002") %>% dyOptions(colors="purple" ,fillGraph=TRUE, fillAlpha=0.2, drawXAxis = TRUE ) %>% dyHighlight(highlightCircleSize=5, highlightSeriesBackgroundAlpha=0.1, hideOnMouseOut=FALSE)%>% dyAxis("x", label = "Year from 2002-2017", c(1002,2017)) %>% dyLegend(show="follow") %>%
dySeries("V1", label="Rate of average price growth(%) from 2002")
gbap1
We will group the data by season, as we can see, Winter has the highest winning rate with the lowest average bid price overall.
#dlyr aggregate them in Season.
s1 <- group_by(shcar, s)
#Calculate the average of deal price, lowest price, number of bid and number of plates by Month
s2 <- summarise(s1, count=n(),
aprice= mean(avg_deal_price), lprice=mean(lowest_deal_price), abid=mean(num_bidder), anum=mean(num_plates))
#Calucate the chance getting win the plate
s2<- s2 %>% mutate(swin=anum/abid*100)
#Plot the average price for each Month
ps1 <- ggplot(s2, aes(x=s))+
geom_bar(aes(y=aprice,fill=s), stat="identity", position="dodge")+
labs(x="Season", y="Average Price of License Plate",
title="Average Price of License Plate by Season")+ theme(legend.position="none", axis.text.x = element_text(angle = 60, hjust = 1))
ps1
#Plot the winning rate by season
wins1 <- ggplot(s2, aes(x=s , y=swin )) + geom_point(aes( size=aprice , colour=factor(aprice)), alpha=1) +
labs(Season="Month", y="Probability of Winning car license plate (%)" , title="Shanghai Car Plate Winning Rate by Season") + theme(legend.position="none")
wins1
We will group the data by months now. As it suggested in previous section, Winter has the highest winning rate and lowest bidding price, January in particular. March has the lowest winning rate overall.
#dlyr aggregate them in Month.
shcar$m1 <- substr(shcar$date,1,3)
month1 <- group_by(shcar, m1 ,ttt)
#Calculate the average of deal price, lowest price, number of bid and number of plates by Month
month2 <- summarise(month1, count=n(),
aprice= mean(avg_deal_price), lprice=mean(lowest_deal_price), abid=mean(num_bidder), anum=mean(num_plates))
#Calucate the chance getting win the plate
month2<- month2 %>% mutate(win=anum/abid*100)
month2$ttt <- factor(as.numeric(month2$ttt),
labels=c("jan","feb","Mar","apr","may","jun","jul",
"aug","sep","oct","nov","dec"))
month2<-arrange(month2,ttt)
#Plot the average price for each Month
pm1 <- ggplot(month2, aes(x=ttt))+
geom_bar(aes(y=aprice,fill=m1), stat="identity", position="dodge")+
labs(x="Year", y="Average Price of License Plate by Month",
title="Number of Bids Vs Number of License Plates")+ theme(legend.position="none", axis.text.x = element_text(angle = 60, hjust = 1))+scale_x_discrete(name = "Month")
pm1
#Plot the winning rate by Month
winm1 <- ggplot(month2, aes(x=ttt , y=win , group=1)) + geom_point(color="orangered1") +geom_line(color="orangered1")+
labs(x="Month", y="Probability of Winning car license plate (%)" , title="Winning rate by month") +
theme(panel.background = element_rect(fill = "azure1",
colour = "azure1"))
winm1
#heatmap
ggplot(shcar, aes(year, m1, fill=avg_deal_price))+
geom_tile(colour = "white")+
scale_fill_gradient(low="green", high="red") +
labs(x="Year ",y= "Month" , title="Shanghai Car Plate Average Deal Price Heatmap" ,subtitle=" Year vs Month " , fill="Avg Price")
We know the average deal price is very close to the lowest deal price. So we are trying to use number of bidder to predict the lowest deal price. We can see when the bid number > 50,000, there might be a linear relationship. Our Model supported our conjecture, the p-value <0.05. So if we know the total number of bidder then we can calucate the lowest price, and by the lowest price we add 300-500Yuan , it will be in the Lottery Range.
mod1<- ggplot(shcar, aes(x=num_bidder, y=lowest_deal_price))+geom_point(color="darkseagreen2")+
labs(x="Number of Bidder", y="Lowest price of car license plate", title="Number of Bidder Vs Lowest price of car license plate")
mod1
mod.shcar <- filter(shcar, num_bidder >50000)
mod2<- ggplot(mod.shcar, aes(x=num_bidder, y=lowest_deal_price))+geom_point(color="darkseagreen3")+ stat_smooth(color="violetred2", method="lm")+
labs(x="Number of Bidder", y="Lowest price of car license plate", title="Number of Bidder Vs Lowest price of car license plate")
mod2
lm.car <- lm(lowest_deal_price ~ num_bidder, data=mod.shcar)
summary(lm.car)
##
## Call:
## lm(formula = lowest_deal_price ~ num_bidder, data = mod.shcar)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33432 -438 1229 4138 9291
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.631e+04 3.623e+03 15.540 < 2e-16 ***
## num_bidder 1.326e-01 1.871e-02 7.089 8.46e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8106 on 44 degrees of freedom
## Multiple R-squared: 0.5332, Adjusted R-squared: 0.5226
## F-statistic: 50.26 on 1 and 44 DF, p-value: 8.462e-09
Although, the Car License Plate is expensive, and the chances of getting the plate is very low, ShangHai is still a young and energized city, the Pearl of East.