MovieLens: Movie Recommendations based on Collaborative Topic Modeling
GroupLens: an Introduction
GroupLens is an academic research group from the Department of Computer Science and Engineering at the University of Minnesota. Thier mission statement aims to “advance the theory and practice of social computing by building and understanding systems used by real people” and the projects the group undertake are motivated by a desire to “advance the theory and practice of social computing”. GroupLens has a history of advancing our understanding of recommendations systems, they publish thier applications regularly in relevant journals and make datasets generated from their projects available to the public.
MovieLens: movie recommendations free of commercial influence
MovieLens is a non-commercial movie recommending web site and project of GroupLens’ that has been online since 1997. MovieLens is unique in that the algorithms are purely driven by user data and not biased towards commercial influences. This project helps users find interesting movies to watch while generating data and acting as a petri dish for GroupLens to experiment on automated content recommendations.
Scenario Design Analysis
Scenario Design provides a simple framework for evaluating a User Experience perspective of an interactive system. Scenario Design Analysis applies three questions to make sure that the users needs are adressed (Figure 1). The following section works through these questions in turn:
Scenario Design Analysis: three question framework (4)
1. Who are the target users?
MovieLens caters to two main types of users: 1) recreational users and 2) research users
2. What are the key user goals?
The motivation of a recreation user is straight forward: to find new movie recommendations. The second user group, the researchers have more nuanced goals and are primarily interested in using MovieLens as a research vehicle to explore aspects of recommender systems such as personalization and filtering technologies.
3. How does MovieLens accomplish these goals?
MovieLens brilliantly balances the needs of it’s two major users; it’s site hosts a simple and intuitive layout. Usability.gov outlines seven key concepts to be implemented as User Experience (UX) best practice guidelines (see figure below). These concepts are implemented well on the MovieLens web site several examples are reviewed for both user groups below.
Recreational UX
Useful: The MovieLens web site offers an add-free alternative to commercial movie recommendation cites (i.e. IMDB or Rotten Tomatoes)
Usable: MovieLens is very intuitive to use. Additionally, new users are greated with a simple and self-explanatory getting started page (see image below) and subsequently led to an optional tutorial for feature overviews.
Findable: The web site has an intuitive layout that follows organization conventions set forth by commercial movie recommendation cites (e.g. similar five-star rating systems)
Accessible: MovieLens is freely available and can be accessed by anyone with an internet connection
Researcher UX
Credible: GroupLens is a respected academic group at University of Minnesota that publishes cutting edge research on recommender systems and related fields in pier reviewed journals.
Valuable: GroupLens Research curates and maintains MovieLens datasets that are available for anyone interested in using the data for research purposes. The data is very valuable because of how extensive the sets are. For example, the most recent ‘MovieLens 25M Dataset’ holds 25 million ratings of ~62,000 movies. For more information, please visit the GroupLens datasets page
MovieLens’ Implementation of Collaborative Topic Modeling
Being an academic research group, GroupLens has published extensively on the algorithmic approaches applied by the MovieLens recommender system. GroupLens applied a Collaborative Topic Regression (CTR) Model to make movie recommendations. CTR models were first developed in 2011 by Wang et al of Princeton University to implement the recommender system of scientific articles for CiteULike, a now defunct web cite that allowed academic researchers to share publications. CRT models combine two approaches in one: collaborative filtering based on latent factor models with content modeling based on probabilitic topic modeling. This combination of approaches addresses the shortcomings of each individual application. For example, a shotcoming of collaborative filtering methods is the so-called ‘cold start’ problem. That is, collaborative filtering fails when presented with new instances. Probabilistic topic modeling methods can address this issue by modeling the content of new instances and estimating a similarity measure to compare with instances that carry historical data. MovieLens uses a Probabilistic Matrix Factorization (PMF) for collaborative movie ratings on the user ratings histories to measure user-user similarities. Additionally, Latent Dirichlet Allocation (LDA) methods are used for topic modeling of the movie plot summary corpus; this helps build a similarity measure for a new movie instance to the existing movie library. The details of GroupLens’ Algorithm are beyond the scope of this summary; the interested reader is encouraged to read Movie Recommendation based on Collaborative Topic Modeling for more information.
Conclusions & Suggestions
The academic reseach group GroupLens has developed MovieLens, a project that is both an ad-free movie recommendation system freely available to the public and a valuable data resourse accessible to anyone interested in using the dataset to further research into recommender systems. UX principles are applied well throughout the MovieLens web cite and the result is an elegantly simple and readily intuitive movie recommendation cite. The needs of the recreational user are fully met in MovieLens current state. However, the research user must navigate to GroupLens’ web cite to access MovieLens datasets. One suggested change is to have a menu option for those interested in accessing the data on the MovieLens web cite. There are two clear benefits of this. First, research users would be able to access data directly from the MovieLens web cite. Second, making the data available and information about the data would be beneficial for the recreational users by showing how thier data is used; this increases transparency and democratizes the data.
References
- Bhowmick, Abhishek, Udbhav Prasad, and Satwik Kottur. “Movie Recommendation based on Collaborative Topic Modeling.” (2014).
- Wang, Chong, and David M. Blei. “Collaborative topic modeling for recommending scientific articles.” Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. 2011.
- Spangher, Alexander. “Building the next New York Times recommendation engine.” The New York Times (2015).
- “Scenario Design: A Disciplined Approach to Customer Experience,” Bruce D. Temkin, Forrester Research, 2004.