Data Exploration
## Observations: 125,003
## Variables: 15
## $ tdid <chr> "f0fd92db-69a2-4daf-a6cf-2ed33a506f28", "10...
## $ logentrytime <dttm> 2015-05-02 19:18:00, 2015-05-02 19:29:00, ...
## $ logfileid <int> 371391483, 371399382, 371086147, 371087588,...
## $ site <chr> "www.youtube.com", "www.popupportal.com", "...
## $ userHourOfWeek <dbl> 158, 158, 150, 150, 142, 152, 139, 157, 150...
## $ country <chr> "United States", "United States", "United S...
## $ region <chr> "Massachusetts", "Maine", "Virginia", "Virg...
## $ metro <chr> "506", "500", "518", "518", "505", "618", "...
## $ city <chr> "Boston", "Waterville", "Lambsburg", "Lambs...
## $ devicetype <chr> "PC", "PC", "PC", "PC", "PC", "PC", "PC", "...
## $ osfamily <chr> "Windows", "Windows", "Windows", "Windows",...
## $ os <chr> "Windows8", "Windows8", "Windows7", "Window...
## $ browser <chr> "Chrome", "Firefox", "IE11", "IE11", "Firef...
## $ FavoriteMovieGenre <chr> "BlindedGenre1", "BlindedGenre1", "BlindedG...
## $ year <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2...
## tdid logentrytime logfileid
## Length:125003 Min. :2015-05-01 00:01:00 Min. :369749652
## Class :character 1st Qu.:2015-05-08 01:52:00 1st Qu.:376315832
## Mode :character Median :2015-05-14 15:48:00 Median :382569992
## Mean :2015-05-14 16:57:32 Mean :382351585
## 3rd Qu.:2015-05-21 02:30:00 3rd Qu.:388138959
## Max. :2015-05-27 22:02:00 Max. :394043647
##
## site userHourOfWeek country region
## Length:125003 Min. : 0.00 Length:125003 Length:125003
## Class :character 1st Qu.: 43.00 Class :character Class :character
## Mode :character Median : 81.00 Mode :character Mode :character
## Mean : 82.78
## 3rd Qu.:125.00
## Max. :167.00
## NA's :2627
## metro city devicetype
## Length:125003 Length:125003 Length:125003
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## osfamily os browser
## Length:125003 Length:125003 Length:125003
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## FavoriteMovieGenre year
## Length:125003 Min. :2015
## Class :character 1st Qu.:2015
## Mode :character Median :2015
## Mean :2015
## 3rd Qu.:2015
## Max. :2015
##
- userHourOfWeek has a lot of missing values
Question 1: Guess Movie Genres
## # A tibble: 5 x 2
## FavoriteMovieGenre genre
## <chr> <chr>
## 1 BlindedGenre1 Adventure
## 2 BlindedGenre2 Drama
## 3 BlindedGenre3 Action
## 4 BlindedGenre4 Comedy
## 5 BlindedGenre5 Thriller
Based on data I was able to find on the internet about most popular movie genres in the US by total revenue, I was able to determine the results shown above. In the US, Adventure made the most money by total revenue, drama the second most, etc.
## Selecting by gCt
Testing some other countries for the favorites based on data from: https://www.nyfa.edu/student-resources/12-of-the-most-popular-movie-genres-by-country/ * Brazil aligns with Action appropriately * Italy aligns with Comedy appropriately * Germany aligns with Action appropriately * There are observation that do not align with the above reference that can be attributed to sampling error
Question 2: 3 Insights
- The most popular browswer amongst the top 5 countries within the observation dataset is Chrome, followed by IE11
- PC is the most prevalent devicetype in the US
- Top 5 states are Florida, California, Texas, Georgia, Washington
- There are 27 regions within the US that prefer genreās other than Adventure (which prevails in the US on a whole)
- Top 5 deviants:
- Alabama prefers Comedy
- Georgia prefers Drama
- Washington, NY, NJ prefer Action
## Selecting by n()
## $Australia
##
## $Canada
##
## $Ireland
##
## $`United Kingdom`
##
## $`United States`
## Selecting by count
Question 3: Friends
- The three users who I think would be great friends are:
- 8adb33e7-45d8-4478-ac0c-a164a306bd5a
- 8adb33e7-45d8-4478-ac0c-a164a306bd5a
- 8adb33e7-45d8-4478-ac0c-a164a306bd5a
- They live in the same region, use the same browser and device, and like hte same type of movie