Assignment

1. Create a .CSV file that includes the Airline information .

2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.

Load Data

China

1.1 Load CSV file

Load CSV file from desktop and validate it using Head.

untidyData <-  read.csv(paste0("C:/data/China.csv"), header=T)
datatable(untidyData)

Tidy Data

2.1 Tidy Data - I

Lets tidy the untidy dataset by gathering the Years in one Column : Year.

untidyData <- untidyData %>% gather(Year, Value, X2000:X2014)
untidyData$Series.Code <- NULL
head(untidyData)
##   Country.Name Country.Code                                Series.Name
## 1        China          CHN                          Population, total
## 2        China          CHN               Population growth (annual %)
## 3        China          CHN                      Surface area (sq. km)
## 4        China          CHN            GNI, Atlas method (current US$)
## 5        China          CHN GNI per capita, Atlas method (current US$)
## 6        China          CHN         GNI, PPP (current international $)
##    Year       Value
## 1 X2000  1262645000
## 2 X2000 0.787956593
## 3 X2000     9562920
## 4 X2000 1.17579E+12
## 5 X2000         930
## 6 X2000 3.63634E+12

2.2 Remove unwanted Columns

Since this data is for China, there is no need to mention that in an explicit Column.

untidyData$Country.Name <- NULL
untidyData$Country.Code <- NULL

2.3 Tidy Data - II

Lets create column names from Categorical data Series.Name. This will make our untidy dataset to Tidy dataset as all varibales will be moved to Columns and Observations into Rows.

tidyData <- untidyData %>% spread(Series.Name, Value)
datatable(tidyData)

2.3 Calculate Value of Agriculture in USD

chinaGDP <- data.frame(as.numeric(gsub( "X", "", tidyData$Year)), as.numeric(tidyData$`GDP (current US$)`), as.numeric(tidyData$`GDP growth (annual %)`) , as.numeric(tidyData$`Agriculture, value added (% of GDP)`))
colnames(chinaGDP) <- c("Year","GDP-Current", "GDP-Growth","Agriculture_as_%_of_GDP")
chinaGDP <- mutate(chinaGDP, Agriculture_in_USD =`GDP-Current`*`Agriculture_as_%_of_GDP`/100 )
datatable(chinaGDP)

2.4 Analyses

ggplotly(ggplot(chinaGDP, aes(x=`GDP-Growth`, y=Year)) + geom_point() +  geom_smooth() +  labs(title="Year vs GDP"))

As seen in the above plot, the best GDP growth in terms of percentage for China happened to be in the interval 2005 - 2010.

ggplotly(ggplot(chinaGDP, aes(x=`Agriculture_as_%_of_GDP`, y=Year)) + geom_point() +  geom_smooth() + labs(title="Year vs Agriculture GDP"))

As seen in the above plot, the growth of Agriculture GDP got significantly reduced as years passed by which confirms the theory that China is pursuic the policy of aggresive industrialization.

ggplotly(ggplot(chinaGDP, aes(x=`Agriculture_as_%_of_GDP`, y=`GDP-Growth`)) + geom_point() +  geom_smooth() + labs(title="GDP vs Agriculture GDP"))

The above chart further confirms the conclusion of second chart which is the Agriculture GDP growth has relatively reduced in the recent past.From Total GDP growth of 7 % to 10% the AGriculture GDP gas grown as well.