Sample R visualization. The movies dataset from ggplot2 package is used to demonstrate the graphs in R
library(ggplot2)
data(movies)
Let’s check budget range of movies from 1981-2005
after80=movies$year>1980
boxplot(budget/1000~year,movies[after80&!is.na(movies$budget),],range=0,ylab="Budget in Thousands",ylab="Year",col=c("orange","purple"))
title("Movie Budgets")
From the graph we could see that the movie budget were higher around 1996. Also we could see the mean line going down from 2000. One possible reason could be technology advancement in 2000’s which made it possible to produce movies in very low budget.
Note: All these observations are based on the sample size we have and may or may not apply to global population
Let’s check distribution of rating
hist(movies$rating[after80],breaks=10,col=movies$rating,freq=F,xlab="Rating",main="Movie Rating")
From the graph we could observe that the rating approximately follows normal distribution
Let’s compare the rating with various parameter using scatterplot
pairs(~rating+year+budget+votes,data=movies[after80,])
title(list("Rating and other movie factors"),line=3)
Now, let’s try to find famous movies
library(wordcloud)
## Loading required package: RColorBrewer
wordcloud(substr(movies$title[after80],1,30),movies$rating[after80]^2*movies$votes[after80],max.words = 100,colors = brewer.pal(8,"Dark2"),scale=c(2,0),random.order = F)
title(list("Famous Movies",col="purple"),line=3)