Load the dplyr packages
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
1.Use the summary function to gain an overview of the data set.Then display the mean and median for at least two attributes.
Filmdata <- read.csv(file = "C:\\Users\\drake\\Film.csv", header=TRUE, sep=",")
summary(Filmdata)
## X Title Year Time
## Min. : 1.00 Length:100 Min. :1924 Min. : 45.00
## 1st Qu.: 25.75 Class :character 1st Qu.:1949 1st Qu.: 81.00
## Median : 50.50 Mode :character Median :1966 Median : 93.00
## Mean : 50.50 Mean :1964 Mean : 92.87
## 3rd Qu.: 75.25 3rd Qu.:1978 3rd Qu.:101.25
## Max. :100.00 Max. :1995 Max. :145.00
## Cast Rating Description Origin
## Min. : 3.00 Min. :1.000 Min. : 5.00 Min. :0.00
## 1st Qu.: 5.00 1st Qu.:2.000 1st Qu.: 8.00 1st Qu.:0.00
## Median : 6.00 Median :2.500 Median : 9.50 Median :0.00
## Mean : 6.75 Mean :2.335 Mean :10.02 Mean :0.48
## 3rd Qu.: 8.00 3rd Qu.:3.000 3rd Qu.:12.00 3rd Qu.:0.00
## Max. :13.00 Max. :4.000 Max. :21.00 Max. :6.00
## Time_code Good
## Length:100 Min. :0.00
## Class :character 1st Qu.:0.00
## Mode :character Median :0.00
## Mean :0.31
## 3rd Qu.:1.00
## Max. :1.00
mean(Filmdata$Rating)
## [1] 2.335
median(Filmdata$Rating)
## [1] 2.5
mean(Filmdata$Time)
## [1] 92.87
median(Filmdata$Time)
## [1] 93
2.Create a new data frame with a subset of the columns and rows. Make sure to rename it.
Filmdata_80 <- subset(Filmdata, Year >= '1980')
3.Create new column names for the new data frame.
colnames(Filmdata_80) <- paste(colnames(Filmdata_80), '80', sep='_')
head(Filmdata_80)
## X_80 Title_80 Year_80 Time_80 Cast_80 Rating_80 Description_80
## 3 3 And_the_Ship_Sails_On 1984 138 7 3.0 15
## 8 8 Blaze 1989 119 8 2.5 15
## 14 14 City_Lights 1985 85 10 1.0 13
## 18 18 Dakota 1988 97 6 2.0 11
## 29 29 Hambone_and_Hillie 1984 89 8 2.5 8
## 32 32 House_Party_3 1994 94 8 1.5 12
## Origin_80 Time_code_80 Good_80
## 3 3 long 1
## 8 0 long 0
## 14 0 short 0
## 18 0 long 0
## 29 0 short 0
## 32 0 long 0
4.Use the summary function to create an overview of yournew data frame. The print the mean and median for the same two attributes. Please compare.
summary(Filmdata_80)
## X_80 Title_80 Year_80 Time_80
## Min. : 3.0 Length:25 Min. :1980 Min. : 85.0
## 1st Qu.:34.0 Class :character 1st Qu.:1983 1st Qu.: 94.0
## Median :63.0 Mode :character Median :1985 Median : 97.0
## Mean :55.6 Mean :1986 Mean :103.8
## 3rd Qu.:81.0 3rd Qu.:1989 3rd Qu.:119.0
## Max. :99.0 Max. :1995 Max. :138.0
## Cast_80 Rating_80 Description_80 Origin_80
## Min. : 4.00 Min. :1.00 Min. : 8.00 Min. :0.00
## 1st Qu.: 6.00 1st Qu.:1.50 1st Qu.:11.00 1st Qu.:0.00
## Median : 7.00 Median :2.50 Median :12.00 Median :0.00
## Mean : 7.28 Mean :2.26 Mean :12.32 Mean :0.84
## 3rd Qu.: 8.00 3rd Qu.:3.00 3rd Qu.:13.00 3rd Qu.:0.00
## Max. :12.00 Max. :3.50 Max. :21.00 Max. :6.00
## Time_code_80 Good_80
## Length:25 Min. :0.00
## Class :character 1st Qu.:0.00
## Mode :character Median :0.00
## Mean :0.32
## 3rd Qu.:1.00
## Max. :1.00
mean(Filmdata_80$Rating_80)
## [1] 2.26
median(Filmdata_80$Rating_80)
## [1] 2.5
mean(Filmdata_80$Time_80)
## [1] 103.8
median(Filmdata_80$Time_80)
## [1] 97
Comparison <- c("Rating_Mean", "Rating_Median", "Time_Mean","Time_Median")
Alldata <-c(mean(Filmdata$Rating), median(Filmdata$Rating), mean(Filmdata$Time), median(Filmdata$Time))
After1980 <- c(mean(Filmdata_80$Rating_80),median(Filmdata_80$Rating_80),mean(Filmdata_80$Time_80), median(Filmdata_80$Time_80))
df=data.frame(Comparison, Alldata ,After1980)
df
## Comparison Alldata After1980
## 1 Rating_Mean 2.335 2.26
## 2 Rating_Median 2.500 2.50
## 3 Time_Mean 92.870 103.80
## 4 Time_Median 93.000 97.00
5.For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e”in one column. Rename those values so that all 20 would show as “excellent”.
Filmdata_80$Time_code_80 <-
gsub("long","L", Filmdata_80$Time_code_80)
head(Filmdata_80)
## X_80 Title_80 Year_80 Time_80 Cast_80 Rating_80 Description_80
## 3 3 And_the_Ship_Sails_On 1984 138 7 3.0 15
## 8 8 Blaze 1989 119 8 2.5 15
## 14 14 City_Lights 1985 85 10 1.0 13
## 18 18 Dakota 1988 97 6 2.0 11
## 29 29 Hambone_and_Hillie 1984 89 8 2.5 8
## 32 32 House_Party_3 1994 94 8 1.5 12
## Origin_80 Time_code_80 Good_80
## 3 3 L 1
## 8 0 L 0
## 14 0 short 0
## 18 0 L 0
## 29 0 short 0
## 32 0 L 0
BONUS –place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
theUrl <- "https://raw.githubusercontent.com/GabrielSantos33/Week-2-Assignment/main/Film.csv"
bonus <- read.table(file=theUrl, header=TRUE, sep=",")
head(bonus)
## X Title Year Time Cast Rating Description Origin
## 1 1 A_Ticklish_Affair 1963 89 5 2.0 7 0
## 2 2 Action_in_the_North_Atlantic 1943 127 7 3.0 9 0
## 3 3 And_the_Ship_Sails_On 1984 138 7 3.0 15 3
## 4 4 Autumn_Sonata 1978 97 5 3.0 11 5
## 5 5 Bachelor_Apartment 1931 77 6 2.5 7 0
## 6 6 Benson_Murder_Case 1930 69 8 2.5 10 0
## Time_code Good
## 1 short 0
## 2 long 1
## 3 long 1
## 4 long 1
## 5 short 0
## 6 short 0