Question 1 & 2

library("tibble")
library("readr")
data <- as_tibble(read_csv("https://raw.githubusercontent.com/ati-ozgur/course-r-programming/master/data/Top-100-Global-Steel-Producers-2011-2016.csv"))

## Rows: 100 Columns: 10

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Companies, Headquarters
## dbl (8): 2011 Tonnage (Millions), 2012 Tonnage (Millions), 2013 Tonnage (Mil...

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(data)

## # A tibble: 6 x 10
##   Companies     Headquarters `2011 Tonnage (M~ `2012 Tonnage (~ `2013 Tonnage (~
##   <chr>         <chr>                    <dbl>            <dbl>            <dbl>
## 1 ArcelorMittal Luxembourg                97.2             93.6             96.1
## 2 China Baowu ~ China                     81.0             79.1             83.2
## 3 HBIS Group    China                     44.4             42.8             45.8
## 4 Nippon Steel~ Japan                     33.4             47.9             50.1
## 5 POSCO         South Korea               39.1             39.9             38.4
## 6 Baosteel Gro~ China                     43.3             42.7             43.9
## # ... with 5 more variables: 2014 Tonnage (Millions) <dbl>,
## #   2015 Tonnage (Millions) <dbl>, 2016 Tonnage (Millions) <dbl>,
## #   2015 Ranking <dbl>, 2016 Ranking <dbl>

Question 3

Here we notice that the yearly tonnage years variable is spread across multiple columns. We fix this using pivot_longer() command from dplyr.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
tidy= data %>%
  pivot_longer(cols = c(
    "2011 Tonnage (Millions)", "2012 Tonnage (Millions)",
     "2013 Tonnage (Millions)", "2014 Tonnage (Millions)",
     "2015 Tonnage (Millions)", "2016 Tonnage (Millions)",
  ), names_to = "years", values_to = "Tonnage")

Also, the rankings column shows similar problems. But since we do not have the ranking for the rest of the years, we drop off the two variables to achieve a tidy data.

tidy <- tidy %>%
  select(-c("2015 Ranking","2016 Ranking"))

Question 4.

Summing the tonnage and creating a barchart for the values.

library(ggplot2)
(bar <- tidy %>%
  group_by(Headquarters) %>%
  summarise(total_tonnage = sum(Tonnage)) %>%
  ungroup() %>%
  ggplot() +
      ggtitle("Total Tonnage of steel production between 2011-2016") +
      geom_col(aes(y=total_tonnage, x = Headquarters, fill = Headquarters)) +
      theme(plot.title = element_text(size = 15, face = "bold")) + 
      ylab("Total Tonnage"))

## Question 5 A pie chart reprsentation of above information.

(pie <- tidy %>%
  group_by(Headquarters) %>%
  summarise(total_tonnage = sum(Tonnage)) %>%
  ungroup() %>%
  ggplot() +
      ggtitle("Total Tonnage of steel production between 2011-2016") +
      geom_col(aes(y=total_tonnage, x = "", fill = Headquarters)) +
      theme(plot.title = element_text(size = 15, face = "bold")) + 
      ylab("Total Tonnage") + coord_polar("y", start=0))

The Two figures are already in the R-markdown file.

Lab 2021-11-23

Yash Shinde

1/4/2022

Question 1 & 2

Question 3

Question 4.