R Bridge Week 2 Assignment

by Gabriel Santos

Load the dplyr packages

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

1.Use the summary function to gain an overview of the data set.Then display the mean and median for at least two attributes.

Filmdata <- read.csv(file = "C:\\Users\\drake\\Film.csv", header=TRUE, sep=",")
summary(Filmdata)
##        X             Title                Year           Time       
##  Min.   :  1.00   Length:100         Min.   :1924   Min.   : 45.00  
##  1st Qu.: 25.75   Class :character   1st Qu.:1949   1st Qu.: 81.00  
##  Median : 50.50   Mode  :character   Median :1966   Median : 93.00  
##  Mean   : 50.50                      Mean   :1964   Mean   : 92.87  
##  3rd Qu.: 75.25                      3rd Qu.:1978   3rd Qu.:101.25  
##  Max.   :100.00                      Max.   :1995   Max.   :145.00  
##       Cast           Rating       Description        Origin    
##  Min.   : 3.00   Min.   :1.000   Min.   : 5.00   Min.   :0.00  
##  1st Qu.: 5.00   1st Qu.:2.000   1st Qu.: 8.00   1st Qu.:0.00  
##  Median : 6.00   Median :2.500   Median : 9.50   Median :0.00  
##  Mean   : 6.75   Mean   :2.335   Mean   :10.02   Mean   :0.48  
##  3rd Qu.: 8.00   3rd Qu.:3.000   3rd Qu.:12.00   3rd Qu.:0.00  
##  Max.   :13.00   Max.   :4.000   Max.   :21.00   Max.   :6.00  
##   Time_code              Good     
##  Length:100         Min.   :0.00  
##  Class :character   1st Qu.:0.00  
##  Mode  :character   Median :0.00  
##                     Mean   :0.31  
##                     3rd Qu.:1.00  
##                     Max.   :1.00
mean(Filmdata$Rating)
## [1] 2.335
median(Filmdata$Rating)
## [1] 2.5
mean(Filmdata$Time)
## [1] 92.87
median(Filmdata$Time)
## [1] 93

2.Create a new data frame with a subset of the columns and rows. Make sure to rename it.

Filmdata_80 <- subset(Filmdata, Year >= '1980')

3.Create new column names for the new data frame.

colnames(Filmdata_80) <- paste(colnames(Filmdata_80), '80', sep='_')
head(Filmdata_80)
##    X_80              Title_80 Year_80 Time_80 Cast_80 Rating_80 Description_80
## 3     3 And_the_Ship_Sails_On    1984     138       7       3.0             15
## 8     8                 Blaze    1989     119       8       2.5             15
## 14   14           City_Lights    1985      85      10       1.0             13
## 18   18                Dakota    1988      97       6       2.0             11
## 29   29    Hambone_and_Hillie    1984      89       8       2.5              8
## 32   32         House_Party_3    1994      94       8       1.5             12
##    Origin_80 Time_code_80 Good_80
## 3          3         long       1
## 8          0         long       0
## 14         0        short       0
## 18         0         long       0
## 29         0        short       0
## 32         0         long       0

4.Use the summary function to create an overview of yournew data frame. The print the mean and median for the same two attributes. Please compare.

summary(Filmdata_80)
##       X_80        Title_80            Year_80        Time_80     
##  Min.   : 3.0   Length:25          Min.   :1980   Min.   : 85.0  
##  1st Qu.:34.0   Class :character   1st Qu.:1983   1st Qu.: 94.0  
##  Median :63.0   Mode  :character   Median :1985   Median : 97.0  
##  Mean   :55.6                      Mean   :1986   Mean   :103.8  
##  3rd Qu.:81.0                      3rd Qu.:1989   3rd Qu.:119.0  
##  Max.   :99.0                      Max.   :1995   Max.   :138.0  
##     Cast_80        Rating_80    Description_80    Origin_80   
##  Min.   : 4.00   Min.   :1.00   Min.   : 8.00   Min.   :0.00  
##  1st Qu.: 6.00   1st Qu.:1.50   1st Qu.:11.00   1st Qu.:0.00  
##  Median : 7.00   Median :2.50   Median :12.00   Median :0.00  
##  Mean   : 7.28   Mean   :2.26   Mean   :12.32   Mean   :0.84  
##  3rd Qu.: 8.00   3rd Qu.:3.00   3rd Qu.:13.00   3rd Qu.:0.00  
##  Max.   :12.00   Max.   :3.50   Max.   :21.00   Max.   :6.00  
##  Time_code_80          Good_80    
##  Length:25          Min.   :0.00  
##  Class :character   1st Qu.:0.00  
##  Mode  :character   Median :0.00  
##                     Mean   :0.32  
##                     3rd Qu.:1.00  
##                     Max.   :1.00
mean(Filmdata_80$Rating_80)
## [1] 2.26
median(Filmdata_80$Rating_80)
## [1] 2.5
mean(Filmdata_80$Time_80)
## [1] 103.8
median(Filmdata_80$Time_80)
## [1] 97
Comparison <- c("Rating_Mean", "Rating_Median", "Time_Mean","Time_Median")
Alldata <-c(mean(Filmdata$Rating), median(Filmdata$Rating), mean(Filmdata$Time), median(Filmdata$Time))
After1980 <- c(mean(Filmdata_80$Rating_80),median(Filmdata_80$Rating_80),mean(Filmdata_80$Time_80), median(Filmdata_80$Time_80))
df=data.frame(Comparison, Alldata ,After1980)
df
##      Comparison Alldata After1980
## 1   Rating_Mean   2.335      2.26
## 2 Rating_Median   2.500      2.50
## 3     Time_Mean  92.870    103.80
## 4   Time_Median  93.000     97.00

5.For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e”in one column. Rename those values so that all 20 would show as “excellent”.

Filmdata_80$Time_code_80 <-  
  gsub("long","L", Filmdata_80$Time_code_80)
head(Filmdata_80)
##    X_80              Title_80 Year_80 Time_80 Cast_80 Rating_80 Description_80
## 3     3 And_the_Ship_Sails_On    1984     138       7       3.0             15
## 8     8                 Blaze    1989     119       8       2.5             15
## 14   14           City_Lights    1985      85      10       1.0             13
## 18   18                Dakota    1988      97       6       2.0             11
## 29   29    Hambone_and_Hillie    1984      89       8       2.5              8
## 32   32         House_Party_3    1994      94       8       1.5             12
##    Origin_80 Time_code_80 Good_80
## 3          3            L       1
## 8          0            L       0
## 14         0        short       0
## 18         0            L       0
## 29         0        short       0
## 32         0            L       0

BONUS –place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

theUrl <- "https://raw.githubusercontent.com/GabrielSantos33/Week-2-Assignment/main/Film.csv"
bonus <- read.table(file=theUrl, header=TRUE, sep=",")
head(bonus)
##   X                        Title Year Time Cast Rating Description Origin
## 1 1            A_Ticklish_Affair 1963   89    5    2.0           7      0
## 2 2 Action_in_the_North_Atlantic 1943  127    7    3.0           9      0
## 3 3        And_the_Ship_Sails_On 1984  138    7    3.0          15      3
## 4 4                Autumn_Sonata 1978   97    5    3.0          11      5
## 5 5           Bachelor_Apartment 1931   77    6    2.5           7      0
## 6 6           Benson_Murder_Case 1930   69    8    2.5          10      0
##   Time_code Good
## 1     short    0
## 2      long    1
## 3      long    1
## 4      long    1
## 5     short    0
## 6     short    0