Bonus: Importing data into R from my GitHub file

mcu_films <- read.csv(file = 'https://raw.githubusercontent.com/pmahdi/cuny-bridge/d984d8223160390beab9b155b3904337a8c41cfd/mcu_films.csv', stringsAsFactors = FALSE)
  1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
summary(mcu_films)
##        X           movie             length_hrs      length_min  
##  Min.   : 1.0   Length:23          Min.   :1.000   Min.   : 1.0  
##  1st Qu.: 6.5   Class :character   1st Qu.:1.500   1st Qu.: 7.5  
##  Median :12.0   Mode  :character   Median :2.000   Median :16.0  
##  Mean   :12.0                      Mean   :1.783   Mean   :23.3  
##  3rd Qu.:17.5                      3rd Qu.:2.000   3rd Qu.:40.5  
##  Max.   :23.0                      Max.   :3.000   Max.   :58.0  
##  release_date       opening_weekend_us     gross_us        
##  Length:23          Min.   : 55414050   Min.   :134806913  
##  Class :character   1st Qu.: 85398076   1st Qu.:224645330  
##  Mode  :character   Median :117027503   Median :333718600  
##                     Mean   :135096585   Mean   :371600489  
##                     3rd Qu.:176641864   3rd Qu.:417921916  
##                     Max.   :357115007   Max.   :858373000  
##   gross_world        
##  Min.   : 264770996  
##  1st Qu.: 623303735  
##  Median : 853983829  
##  Mean   : 982119760  
##  3rd Qu.:1184186450  
##  Max.   :2797800564
gross_us_mean <- mean(mcu_films$gross_us)
gross_us_median <- median(mcu_films$gross_us)

gross_world_mean <- mean(mcu_films$gross_world)
gross_world_median <- median(mcu_films$gross_world)
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
mcu_subset <- mcu_films[1:10, -1]
  1. Create new column names for the new data frame.
colnames(mcu_subset)
## [1] "movie"              "length_hrs"         "length_min"        
## [4] "release_date"       "opening_weekend_us" "gross_us"          
## [7] "gross_world"
colnames(mcu_subset) <- c('title', 'hr_length', 'min_length', 'date_release', 'date_release_us', 'total_us', 'total_world')

colnames(mcu_subset)
## [1] "title"           "hr_length"       "min_length"      "date_release"   
## [5] "date_release_us" "total_us"        "total_world"
  1. Use the summary function to create an overview of your new data frame. Then print the mean and median for the same two attributes. Please compare.
summary(mcu_subset)
##     title             hr_length      min_length    date_release      
##  Length:10          Min.   :1.00   Min.   : 1.00   Length:10         
##  Class :character   1st Qu.:1.25   1st Qu.: 4.50   Class :character  
##  Mode  :character   Median :2.00   Median :13.00   Mode  :character  
##                     Mean   :1.70   Mean   :22.30                     
##                     3rd Qu.:2.00   3rd Qu.:44.75                     
##                     Max.   :2.00   Max.   :55.00                     
##  date_release_us        total_us          total_world        
##  Min.   : 55414050   Min.   :134806913   Min.   : 264770996  
##  1st Qu.: 70726964   1st Qu.:187363503   1st Qu.: 483444025  
##  Median : 94672302   Median :286099952   Median : 634358236  
##  Mean   :106960280   Mean   :295617872   Mean   : 716056940  
##  3rd Qu.:120746527   3rd Qu.:330047482   3rd Qu.: 758611144  
##  Max.   :207438708   Max.   :623357910   Max.   :1518815515
total_us_mean <- mean(mcu_subset$total_us)
total_us_median <- median(mcu_subset$total_us)

total_world_mean <- mean(mcu_subset$total_world)
total_world_median <- median(mcu_subset$total_world)

print(c(gross_us_mean, gross_us_median, gross_world_mean, gross_world_median))
## [1] 371600489 333718600 982119760 853983829
print(c(total_us_mean, total_us_median, total_world_mean, total_world_median))
## [1] 295617872 286099952 716056940 634358236

The mean gross earnings for MCU films within the United States is 371600489.434783 when all the MCU films are taken into account. Similarly, the median is 333718600 for all MCU films. These values decrease when a subset of the 10 earliest MCU films is used to create a new data frame. The mean gross US earnings for those 10 films is 295617871.5, and the median is 286099951.5.

The mean gross earnings for all MCU films worldwide is 982119760, and the median is 853983829. Meanwhile, these values decrease for the new data frame consisting of the earliest 10 MCU films. The mean becomes 716056940, and the median becomes 634358235.5.

Since both US and worldwide earnings for the first 10 MCU films have lower mean and median values, the implication is that the MCU films became more commercially successful with the growth of the franchise.

  1. For at least 3 values in a column please rename so that every value in that column is renamed.
mcu_subset[mcu_subset$title == 'Iron Man', 'title'] <- 'Iron Man I'
mcu_subset[mcu_subset$title == 'Captain America: The First Avenger', 'title'] <- 'Captain America I'
mcu_subset[mcu_subset$title == 'Thor: The Dark World', 'title'] <- 'Thor II'
  1. Display enough rows to see examples of all of steps 1-5 above.
head(x = mcu_subset, n = 10L)
##                                  title hr_length min_length date_release
## 1                           Iron Man I         2          6     5/2/2008
## 2                  The Incredible Hulk         1         52    6/12/2008
## 3                           Iron Man 2         2          4     5/7/2010
## 4                                 Thor         1         55     5/6/2011
## 5                    Captain America I         2          4    7/22/2011
## 6                Marvel's The Avengers         2         23     5/4/2012
## 7                           Iron Man 3         2         10     5/3/2013
## 8                              Thor II         1         52    11/8/2013
## 9  Captain America: The Winder Soldier         2         16     4/4/2014
## 10             Guardians of the Galaxy         2          1     8/1/2014
##    date_release_us  total_us total_world
## 1         98618668 319034126   585796247
## 2         55414050 134806913   264770996
## 3        128122480 312433331   623933331
## 4         65723338 181030624   449326618
## 5         65058524 176654505   370569774
## 6        207438708 623357910  1518815515
## 7        174144585 409013994  1214811252
## 8         85737841 206362140   644783140
## 9         95023721 259766572   714421503
## 10        94320883 333718600   773341024