library("tibble")
library("readr")
data <- as_tibble(read_csv("https://raw.githubusercontent.com/ati-ozgur/course-r-programming/master/data/Top-100-Global-Steel-Producers-2011-2016.csv"))
## Rows: 100 Columns: 10
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Companies, Headquarters
## dbl (8): 2011 Tonnage (Millions), 2012 Tonnage (Millions), 2013 Tonnage (Mil...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 x 10
## Companies Headquarters `2011 Tonnage (M~ `2012 Tonnage (~ `2013 Tonnage (~
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 ArcelorMittal Luxembourg 97.2 93.6 96.1
## 2 China Baowu ~ China 81.0 79.1 83.2
## 3 HBIS Group China 44.4 42.8 45.8
## 4 Nippon Steel~ Japan 33.4 47.9 50.1
## 5 POSCO South Korea 39.1 39.9 38.4
## 6 Baosteel Gro~ China 43.3 42.7 43.9
## # ... with 5 more variables: 2014 Tonnage (Millions) <dbl>,
## # 2015 Tonnage (Millions) <dbl>, 2016 Tonnage (Millions) <dbl>,
## # 2015 Ranking <dbl>, 2016 Ranking <dbl>
Here we notice that the yearly tonnage years variable is spread across multiple columns. We fix this using pivot_longer() command from dplyr.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
tidy= data %>%
pivot_longer(cols = c(
"2011 Tonnage (Millions)", "2012 Tonnage (Millions)",
"2013 Tonnage (Millions)", "2014 Tonnage (Millions)",
"2015 Tonnage (Millions)", "2016 Tonnage (Millions)",
), names_to = "years", values_to = "Tonnage")
Also, the rankings column shows similar problems. But since we do not have the ranking for the rest of the years, we drop off the two variables to achieve a tidy data.
tidy <- tidy %>%
select(-c("2015 Ranking","2016 Ranking"))
Summing the tonnage and creating a barchart for the values.
library(ggplot2)
(bar <- tidy %>%
group_by(Headquarters) %>%
summarise(total_tonnage = sum(Tonnage)) %>%
ungroup() %>%
ggplot() +
ggtitle("Total Tonnage of steel production between 2011-2016") +
geom_col(aes(y=total_tonnage, x = Headquarters, fill = Headquarters)) +
theme(plot.title = element_text(size = 15, face = "bold")) +
ylab("Total Tonnage"))
## Question 5 A pie chart reprsentation of above information.
(pie <- tidy %>%
group_by(Headquarters) %>%
summarise(total_tonnage = sum(Tonnage)) %>%
ungroup() %>%
ggplot() +
ggtitle("Total Tonnage of steel production between 2011-2016") +
geom_col(aes(y=total_tonnage, x = "", fill = Headquarters)) +
theme(plot.title = element_text(size = 15, face = "bold")) +
ylab("Total Tonnage") + coord_polar("y", start=0))
The Two figures are already in the R-markdown file.