R Bridge Week 2 Assignment

Load the dplyr packages.

library(dplyr)

1.Use the summary function to gain an overview of the data set.Then display the mean and median for at least two attributes.

Filmdata <- read.csv(file = "c:/data/Film.csv", header=TRUE, sep=",")
summary(Filmdata)

##        X                                   Title         Year     
##  Min.   :  1.00   A_Ticklish_Affair           : 1   Min.   :1924  
##  1st Qu.: 25.75   Action_in_the_North_Atlantic: 1   1st Qu.:1949  
##  Median : 50.50   And_the_Ship_Sails_On       : 1   Median :1966  
##  Mean   : 50.50   Autumn_Sonata               : 1   Mean   :1964  
##  3rd Qu.: 75.25   Bachelor_Apartment          : 1   3rd Qu.:1978  
##  Max.   :100.00   Benson_Murder_Case          : 1   Max.   :1995  
##                   (Other)                     :94                 
##       Time             Cast           Rating       Description   
##  Min.   : 45.00   Min.   : 3.00   Min.   :1.000   Min.   : 5.00  
##  1st Qu.: 81.00   1st Qu.: 5.00   1st Qu.:2.000   1st Qu.: 8.00  
##  Median : 93.00   Median : 6.00   Median :2.500   Median : 9.50  
##  Mean   : 92.87   Mean   : 6.75   Mean   :2.335   Mean   :10.02  
##  3rd Qu.:101.25   3rd Qu.: 8.00   3rd Qu.:3.000   3rd Qu.:12.00  
##  Max.   :145.00   Max.   :13.00   Max.   :4.000   Max.   :21.00  
##                                                                  
##      Origin     Time_code       Good     
##  Min.   :0.00   long :58   Min.   :0.00  
##  1st Qu.:0.00   short:42   1st Qu.:0.00  
##  Median :0.00              Median :0.00  
##  Mean   :0.48              Mean   :0.31  
##  3rd Qu.:0.00              3rd Qu.:1.00  
##  Max.   :6.00              Max.   :1.00  
##

mean(Filmdata$Rating)

## [1] 2.335

median(Filmdata$Rating)

## [1] 2.5

mean(Filmdata$Time)

## [1] 92.87

median(Filmdata$Time)

## [1] 93

2.Create a new data frame with a subsetof the columns and rows. Make sure to rename it.

Filmdata80 <- subset(Filmdata, Year >= '1980')

3.Create new column names for the new data frame.

colnames(Filmdata80) <- paste(colnames(Filmdata80), '80', sep='_')
head(Filmdata80)

##    X_80              Title_80 Year_80 Time_80 Cast_80 Rating_80
## 3     3 And_the_Ship_Sails_On    1984     138       7       3.0
## 8     8                 Blaze    1989     119       8       2.5
## 14   14           City_Lights    1985      85      10       1.0
## 18   18                Dakota    1988      97       6       2.0
## 29   29    Hambone_and_Hillie    1984      89       8       2.5
## 32   32         House_Party_3    1994      94       8       1.5
##    Description_80 Origin_80 Time_code_80 Good_80
## 3              15         3         long       1
## 8              15         0         long       0
## 14             13         0        short       0
## 18             11         0         long       0
## 29              8         0        short       0
## 32             12         0         long       0

4.Use the summary function to create an overview of yournew data frame. The print the mean and median for the same two attributes. Please compare.

summary(Filmdata80)

##       X_80                       Title_80     Year_80        Time_80     
##  Min.   : 3.0   And_the_Ship_Sails_On: 1   Min.   :1980   Min.   : 85.0  
##  1st Qu.:34.0   Blaze                : 1   1st Qu.:1983   1st Qu.: 94.0  
##  Median :63.0   City_Lights          : 1   Median :1985   Median : 97.0  
##  Mean   :55.6   Dakota               : 1   Mean   :1986   Mean   :103.8  
##  3rd Qu.:81.0   Hambone_and_Hillie   : 1   3rd Qu.:1989   3rd Qu.:119.0  
##  Max.   :99.0   House_Party_3        : 1   Max.   :1995   Max.   :138.0  
##                 (Other)              :19                                 
##     Cast_80        Rating_80    Description_80    Origin_80   
##  Min.   : 4.00   Min.   :1.00   Min.   : 8.00   Min.   :0.00  
##  1st Qu.: 6.00   1st Qu.:1.50   1st Qu.:11.00   1st Qu.:0.00  
##  Median : 7.00   Median :2.50   Median :12.00   Median :0.00  
##  Mean   : 7.28   Mean   :2.26   Mean   :12.32   Mean   :0.84  
##  3rd Qu.: 8.00   3rd Qu.:3.00   3rd Qu.:13.00   3rd Qu.:0.00  
##  Max.   :12.00   Max.   :3.50   Max.   :21.00   Max.   :6.00  
##                                                               
##  Time_code_80    Good_80    
##  long :23     Min.   :0.00  
##  short: 2     1st Qu.:0.00  
##               Median :0.00  
##               Mean   :0.32  
##               3rd Qu.:1.00  
##               Max.   :1.00  
##

mean(Filmdata80$Rating_80)

## [1] 2.26

median(Filmdata80$Rating_80)

## [1] 2.5

mean(Filmdata80$Time_80)

## [1] 103.8

median(Filmdata80$Time_80)

## [1] 97

Comparison <- c("Rating_Mean", "Rating_Median", "Time_Mean","Time_Median")
Alldata <-c(mean(Filmdata$Rating), median(Filmdata$Rating), mean(Filmdata$Time), median(Filmdata$Time))
After1980 <- c(mean(Filmdata80$Rating_80),median(Filmdata80$Rating_80),mean(Filmdata80$Time_80), median(Filmdata80$Time_80))
df=data.frame(Comparison, Alldata ,After1980)
df

##      Comparison Alldata After1980
## 1   Rating_Mean   2.335      2.26
## 2 Rating_Median   2.500      2.50
## 3     Time_Mean  92.870    103.80
## 4   Time_Median  93.000     97.00

5.For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e”in one column. Rename those values so that all 20 would show as “excellent”.

Filmdata80$Time_code_80 <-  
  gsub("long","L", Filmdata80$Time_code_80)
head(Filmdata80)

##    X_80              Title_80 Year_80 Time_80 Cast_80 Rating_80
## 3     3 And_the_Ship_Sails_On    1984     138       7       3.0
## 8     8                 Blaze    1989     119       8       2.5
## 14   14           City_Lights    1985      85      10       1.0
## 18   18                Dakota    1988      97       6       2.0
## 29   29    Hambone_and_Hillie    1984      89       8       2.5
## 32   32         House_Party_3    1994      94       8       1.5
##    Description_80 Origin_80 Time_code_80 Good_80
## 3              15         3            L       1
## 8              15         0            L       0
## 14             13         0        short       0
## 18             11         0            L       0
## 29              8         0        short       0
## 32             12         0            L       0

BONUS –place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

theUrl <- "https://raw.githubusercontent.com/ferrysany/Assignment2/master/Film.csv"
bonus <- read.table(file=theUrl, header=TRUE, sep=",")
head(bonus)

##   X                        Title Year Time Cast Rating Description Origin
## 1 1            A_Ticklish_Affair 1963   89    5    2.0           7      0
## 2 2 Action_in_the_North_Atlantic 1943  127    7    3.0           9      0
## 3 3        And_the_Ship_Sails_On 1984  138    7    3.0          15      3
## 4 4                Autumn_Sonata 1978   97    5    3.0          11      5
## 5 5           Bachelor_Apartment 1931   77    6    2.5           7      0
## 6 6           Benson_Murder_Case 1930   69    8    2.5          10      0
##   Time_code Good
## 1     short    0
## 2      long    1
## 3      long    1
## 4      long    1
## 5     short    0
## 6     short    0

R Bridge Week 2 Assignment

Chun San Yip

2019/01/11