R Markdown

This assignment looks at a annual time series sales data of vehicles in India. The data is publicly available through government of India website and presents sales figures for car, jeep, light motor vehicles taxis and two wheeler. Across all the years, the two wheeler sales data surpasses the others. The plot_ly package is used to present a line graph of time series. The data is avilable from the Government of India Website https://data.gov.in/resources/all-india-and-state-wise-number-taxed-and-tax-exempted-motor-vehicles-registered-india/download

library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
vehicledata <- read.xlsx("C:/Users/Ranjan/Downloads/NUMBER_OF_MOTOR_VEHICLES_REGISTERED_IN_INDIA_TAXED_AND_TAX-EXEMPTED_1 (2).xls", sheetIndex=1, header = TRUE)

Subsets of the sales data

From the vehicle data, some fields such as total sales, etc are removed.Also, double checking that the structure is of a data frame.

is.data.frame(vehicledata)
## [1] TRUE
vehicledata <- vehicledata[, 1:6]
head(vehicledata)
##   Year Two.wheelers Light.Motor.Vehicles.Passengers.   Jeeps    Cars
## 1 2001     38556026                          1777130 1126148 5297219
## 2 2002     41581058                          1878261 1177245 5748036
## 3 2003     47519489                          2113781 1180057 6594166
## 4 2004     51921973                          2167324 1282113 7267174
## 5 2005     58799702                          2337264 1307926 8072650
## 6 2006     64743126                          2492726 1376744 9109855
##     Taxis
## 1  634357
## 2  688204
## 3  825416
## 4  901889
## 5  939738
## 6 1039845
vehicledata2 <- subset(vehicledata, select = c(Two.wheelers,Light.Motor.Vehicles.Passengers., Jeeps, Cars, Taxis))
head(vehicledata2)
##   Two.wheelers Light.Motor.Vehicles.Passengers.   Jeeps    Cars   Taxis
## 1     38556026                          1777130 1126148 5297219  634357
## 2     41581058                          1878261 1177245 5748036  688204
## 3     47519489                          2113781 1180057 6594166  825416
## 4     51921973                          2167324 1282113 7267174  901889
## 5     58799702                          2337264 1307926 8072650  939738
## 6     64743126                          2492726 1376744 9109855 1039845

Data Transformation

I then stack the data in two variables namely vehicle type and sales quantity. Also, I add one time variable that has the years from 2001 to 2011.

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
vehicledaat3 <- vehicledata2 %>% gather(vehicletype, quantity)
vehicledaat3$Year <- rep(2001:2011, 5)
head(vehicledaat3)
##    vehicletype quantity Year
## 1 Two.wheelers 38556026 2001
## 2 Two.wheelers 41581058 2002
## 3 Two.wheelers 47519489 2003
## 4 Two.wheelers 51921973 2004
## 5 Two.wheelers 58799702 2005
## 6 Two.wheelers 64743126 2006

Time Series Sales of Vehicles in India (2001 - 2011)

library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(vehicledaat3, x= vehicledaat3$Year, y=vehicledaat3$quantity, color = vehicledaat3$vehicletype, type = "scatter", mode = "lines")