Dhananjay Kumar
1. Create a .CSV file that includes the Airline information .
2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.
Load CSV file from desktop and validate it using Head.
untidyData <- read.csv(paste0("C:/data/China.csv"), header=T)
datatable(untidyData)
Lets tidy the untidy dataset by gathering the Years in one Column : Year.
untidyData <- untidyData %>% gather(Year, Value, X2000:X2014)
untidyData$Series.Code <- NULL
head(untidyData)
## Country.Name Country.Code Series.Name
## 1 China CHN Population, total
## 2 China CHN Population growth (annual %)
## 3 China CHN Surface area (sq. km)
## 4 China CHN GNI, Atlas method (current US$)
## 5 China CHN GNI per capita, Atlas method (current US$)
## 6 China CHN GNI, PPP (current international $)
## Year Value
## 1 X2000 1262645000
## 2 X2000 0.787956593
## 3 X2000 9562920
## 4 X2000 1.17579E+12
## 5 X2000 930
## 6 X2000 3.63634E+12
Since this data is for China, there is no need to mention that in an explicit Column.
untidyData$Country.Name <- NULL
untidyData$Country.Code <- NULL
Lets create column names from Categorical data Series.Name. This will make our untidy dataset to Tidy dataset as all varibales will be moved to Columns and Observations into Rows.
tidyData <- untidyData %>% spread(Series.Name, Value)
datatable(tidyData)
chinaGDP <- data.frame(as.numeric(gsub( "X", "", tidyData$Year)), as.numeric(tidyData$`GDP (current US$)`), as.numeric(tidyData$`GDP growth (annual %)`) , as.numeric(tidyData$`Agriculture, value added (% of GDP)`))
colnames(chinaGDP) <- c("Year","GDP-Current", "GDP-Growth","Agriculture_as_%_of_GDP")
chinaGDP <- mutate(chinaGDP, Agriculture_in_USD =`GDP-Current`*`Agriculture_as_%_of_GDP`/100 )
datatable(chinaGDP)
ggplotly(ggplot(chinaGDP, aes(x=`GDP-Growth`, y=Year)) + geom_point() + geom_smooth() + labs(title="Year vs GDP"))
As seen in the above plot, the best GDP growth in terms of percentage for China happened to be in the interval 2005 - 2010.
ggplotly(ggplot(chinaGDP, aes(x=`Agriculture_as_%_of_GDP`, y=Year)) + geom_point() + geom_smooth() + labs(title="Year vs Agriculture GDP"))
As seen in the above plot, the growth of Agriculture GDP got significantly reduced as years passed by which confirms the theory that China is pursuic the policy of aggresive industrialization.
ggplotly(ggplot(chinaGDP, aes(x=`Agriculture_as_%_of_GDP`, y=`GDP-Growth`)) + geom_point() + geom_smooth() + labs(title="GDP vs Agriculture GDP"))
The above chart further confirms the conclusion of second chart which is the Agriculture GDP growth has relatively reduced in the recent past.From Total GDP growth of 7 % to 10% the AGriculture GDP gas grown as well.