There are more and more movies and TV shows in the world, causing people to lose their way finding good movies or TV shows as they may be confused by large amount of movies available.
The dataset we used is the IMDB Movies Dataset by Harshit Shankhdhar, sourced from the Kaggle website. This dataset contains 16 features with 1000 observations. We make sure that every single part of data is fully utilised to prevent wastage of data. This is the dataset we used.
Data Acquisition
We searched for tons of datasets online and obtained the most viable dataset for our project, which is from Kaggle website.
Data Cleaning
We spent time cleaning the data to make sure there is no missing or false data
Data Analysis (Exploratory Data Analysis and Predictive Analysis)
The “dplyr” library is the main library we used, while we also did some research online to probe for other libraries which are necessary for the feasibility of our shiny app.
It is challenging for us to actually use R functions to do the recommendation systems as we do not have much opportunities to do the practical coding. In addition, we are having difficulties discussing with each other since our group consists of local and international students. However, we managed to done the project by keep trying and debugging the code, and we tried to discuss our assignment through online meeting.
This is the GitHub link to our code.
This is our Shiny app.
After discussion, our data product is suitable for public user, especially movies lovers. Besides that, we added another feature called “Kid mode” which filters out all the movies that are inappropriate for kids.
This is a screenshot of the Discoverer tab and the Recommendation tab of our shiny app.