Home work week 2

BONUS: Loading the data from github where the original csv is stored
# Setting up the environment

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# Loading the data from Github

theURL <- "https://raw.githubusercontent.com/Umerfarooq122/Data_sets/main/OECDGas.csv"
Gas_consumption <- read.csv(theURL)
head(Gas_consumption)
##   X country year      gas    income      price      cars
## 1 1 Austria 1960 4.173244 -6.474277 -0.3345476 -9.766840
## 2 2 Austria 1961 4.100989 -6.426006 -0.3513276 -9.608622
## 3 3 Austria 1962 4.073177 -6.407308 -0.3795177 -9.457257
## 4 4 Austria 1963 4.059509 -6.370679 -0.4142514 -9.343155
## 5 5 Austria 1964 4.037689 -6.322247 -0.4453354 -9.237739
## 6 6 Austria 1965 4.033983 -6.294668 -0.4970607 -9.123903
Question 1 : Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
Answer :
summary(Gas_consumption)
##        X            country               year           gas       
##  Min.   :  1.00   Length:342         Min.   :1960   Min.   :3.380  
##  1st Qu.: 86.25   Class :character   1st Qu.:1964   1st Qu.:3.944  
##  Median :171.50   Mode  :character   Median :1969   Median :4.088  
##  Mean   :171.50                      Mean   :1969   Mean   :4.296  
##  3rd Qu.:256.75                      3rd Qu.:1974   3rd Qu.:4.556  
##  Max.   :342.00                      Max.   :1978   Max.   :6.157  
##      income           price              cars        
##  Min.   :-8.073   Min.   :-2.8965   Min.   :-13.475  
##  1st Qu.:-6.320   1st Qu.:-0.6523   1st Qu.: -9.307  
##  Median :-5.985   Median :-0.3795   Median : -8.658  
##  Mean   :-6.139   Mean   :-0.5231   Mean   : -9.042  
##  3rd Qu.:-5.720   3rd Qu.:-0.2234   3rd Qu.: -8.262  
##  Max.   :-5.221   Max.   : 1.1253   Max.   : -7.536
mean(Gas_consumption$price)
## [1] -0.5231032
mean(Gas_consumption$year)
## [1] 1969
median(Gas_consumption$price)
## [1] -0.3794872
median(Gas_consumption$year)
## [1] 1969
Question 2 : Create a new data frame with a subset of the columns and rows. Make sure to rename it.
Answer :
new_df <- subset(Gas_consumption, select = c("price","cars","year"))
head(new_df)
##        price      cars year
## 1 -0.3345476 -9.766840 1960
## 2 -0.3513276 -9.608622 1961
## 3 -0.3795177 -9.457257 1962
## 4 -0.4142514 -9.343155 1963
## 5 -0.4453354 -9.237739 1964
## 6 -0.4970607 -9.123903 1965
Question 3 : Create new column names for the new data frame

Answer :

new_df <- set_names(new_df, c("value","vehicle","model_year")) # can also use rename() function too

head(new_df)
##        value   vehicle model_year
## 1 -0.3345476 -9.766840       1960
## 2 -0.3513276 -9.608622       1961
## 3 -0.3795177 -9.457257       1962
## 4 -0.4142514 -9.343155       1963
## 5 -0.4453354 -9.237739       1964
## 6 -0.4970607 -9.123903       1965
Question 4 : Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
Answer :
summary(new_df)
##      value            vehicle          model_year  
##  Min.   :-2.8965   Min.   :-13.475   Min.   :1960  
##  1st Qu.:-0.6523   1st Qu.: -9.307   1st Qu.:1964  
##  Median :-0.3795   Median : -8.658   Median :1969  
##  Mean   :-0.5231   Mean   : -9.042   Mean   :1969  
##  3rd Qu.:-0.2234   3rd Qu.: -8.262   3rd Qu.:1974  
##  Max.   : 1.1253   Max.   : -7.536   Max.   :1978
mean(new_df$value)
## [1] -0.5231032
mean(new_df$model_year)
## [1] 1969
median(new_df$value)
## [1] -0.3794872
median(new_df$model_year)
## [1] 1969
Question 5 : For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
Answer :
Gas_consumption$country <- replace("austria", 1:342, "Pakistan")
Question 6 : Display enough rows to see examples of all of steps 1-5 above
Answer :
head(Gas_consumption)
##   X  country year      gas    income      price      cars
## 1 1 Pakistan 1960 4.173244 -6.474277 -0.3345476 -9.766840
## 2 2 Pakistan 1961 4.100989 -6.426006 -0.3513276 -9.608622
## 3 3 Pakistan 1962 4.073177 -6.407308 -0.3795177 -9.457257
## 4 4 Pakistan 1963 4.059509 -6.370679 -0.4142514 -9.343155
## 5 5 Pakistan 1964 4.037689 -6.322247 -0.4453354 -9.237739
## 6 6 Pakistan 1965 4.033983 -6.294668 -0.4970607 -9.123903
head(new_df)
##        value   vehicle model_year
## 1 -0.3345476 -9.766840       1960
## 2 -0.3513276 -9.608622       1961
## 3 -0.3795177 -9.457257       1962
## 4 -0.4142514 -9.343155       1963
## 5 -0.4453354 -9.237739       1964
## 6 -0.4970607 -9.123903       1965