Home work week 2
BONUS: Loading the data from github where the
original csv is stored
# Setting up the environment
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
# Loading the data from Github
theURL <- "https://raw.githubusercontent.com/Umerfarooq122/Data_sets/main/OECDGas.csv"
Gas_consumption <- read.csv(theURL)
head(Gas_consumption)
## X country year gas income price cars
## 1 1 Austria 1960 4.173244 -6.474277 -0.3345476 -9.766840
## 2 2 Austria 1961 4.100989 -6.426006 -0.3513276 -9.608622
## 3 3 Austria 1962 4.073177 -6.407308 -0.3795177 -9.457257
## 4 4 Austria 1963 4.059509 -6.370679 -0.4142514 -9.343155
## 5 5 Austria 1964 4.037689 -6.322247 -0.4453354 -9.237739
## 6 6 Austria 1965 4.033983 -6.294668 -0.4970607 -9.123903
Question 1 : Use the summary function to gain an
overview of the data set. Then display the mean and median for at least
two attributes.
Answer :
summary(Gas_consumption)
## X country year gas
## Min. : 1.00 Length:342 Min. :1960 Min. :3.380
## 1st Qu.: 86.25 Class :character 1st Qu.:1964 1st Qu.:3.944
## Median :171.50 Mode :character Median :1969 Median :4.088
## Mean :171.50 Mean :1969 Mean :4.296
## 3rd Qu.:256.75 3rd Qu.:1974 3rd Qu.:4.556
## Max. :342.00 Max. :1978 Max. :6.157
## income price cars
## Min. :-8.073 Min. :-2.8965 Min. :-13.475
## 1st Qu.:-6.320 1st Qu.:-0.6523 1st Qu.: -9.307
## Median :-5.985 Median :-0.3795 Median : -8.658
## Mean :-6.139 Mean :-0.5231 Mean : -9.042
## 3rd Qu.:-5.720 3rd Qu.:-0.2234 3rd Qu.: -8.262
## Max. :-5.221 Max. : 1.1253 Max. : -7.536
mean(Gas_consumption$price)
## [1] -0.5231032
mean(Gas_consumption$year)
## [1] 1969
median(Gas_consumption$price)
## [1] -0.3794872
median(Gas_consumption$year)
## [1] 1969
Question 2 : Create a new data frame with a subset
of the columns and rows. Make sure to rename it.
Answer :
new_df <- subset(Gas_consumption, select = c("price","cars","year"))
head(new_df)
## price cars year
## 1 -0.3345476 -9.766840 1960
## 2 -0.3513276 -9.608622 1961
## 3 -0.3795177 -9.457257 1962
## 4 -0.4142514 -9.343155 1963
## 5 -0.4453354 -9.237739 1964
## 6 -0.4970607 -9.123903 1965
Question 3 : Create new column names for the new
data frame
Answer :
new_df <- set_names(new_df, c("value","vehicle","model_year")) # can also use rename() function too
head(new_df)
## value vehicle model_year
## 1 -0.3345476 -9.766840 1960
## 2 -0.3513276 -9.608622 1961
## 3 -0.3795177 -9.457257 1962
## 4 -0.4142514 -9.343155 1963
## 5 -0.4453354 -9.237739 1964
## 6 -0.4970607 -9.123903 1965
Question 4 : Use the summary function to create an
overview of your new data frame. The print the mean and median for the
same two attributes. Please compare.
Answer :
summary(new_df)
## value vehicle model_year
## Min. :-2.8965 Min. :-13.475 Min. :1960
## 1st Qu.:-0.6523 1st Qu.: -9.307 1st Qu.:1964
## Median :-0.3795 Median : -8.658 Median :1969
## Mean :-0.5231 Mean : -9.042 Mean :1969
## 3rd Qu.:-0.2234 3rd Qu.: -8.262 3rd Qu.:1974
## Max. : 1.1253 Max. : -7.536 Max. :1978
mean(new_df$value)
## [1] -0.5231032
mean(new_df$model_year)
## [1] 1969
median(new_df$value)
## [1] -0.3794872
median(new_df$model_year)
## [1] 1969
Question 5 : For at least 3 values in a column
please rename so that every value in that column is renamed. For
example, suppose I have 20 values of the letter “e” in one column.
Rename those values so that all 20 would show as “excellent”.
Answer :
Gas_consumption$country <- replace("austria", 1:342, "Pakistan")
Question 6 : Display enough rows to see examples of
all of steps 1-5 above
Answer :
head(Gas_consumption)
## X country year gas income price cars
## 1 1 Pakistan 1960 4.173244 -6.474277 -0.3345476 -9.766840
## 2 2 Pakistan 1961 4.100989 -6.426006 -0.3513276 -9.608622
## 3 3 Pakistan 1962 4.073177 -6.407308 -0.3795177 -9.457257
## 4 4 Pakistan 1963 4.059509 -6.370679 -0.4142514 -9.343155
## 5 5 Pakistan 1964 4.037689 -6.322247 -0.4453354 -9.237739
## 6 6 Pakistan 1965 4.033983 -6.294668 -0.4970607 -9.123903
head(new_df)
## value vehicle model_year
## 1 -0.3345476 -9.766840 1960
## 2 -0.3513276 -9.608622 1961
## 3 -0.3795177 -9.457257 1962
## 4 -0.4142514 -9.343155 1963
## 5 -0.4453354 -9.237739 1964
## 6 -0.4970607 -9.123903 1965