R-Week04-Assignment

Sample R visualization. The movies dataset from ggplot2 package is used to demonstrate the graphs in R

library(ggplot2)
data(movies)

Let’s check budget range of movies from 1981-2005

after80=movies$year>1980
boxplot(budget/1000~year,movies[after80&!is.na(movies$budget),],range=0,ylab="Budget in Thousands",ylab="Year",col=c("orange","purple"))
title("Movie Budgets")

From the graph we could see that the movie budget were higher around 1996. Also we could see the mean line going down from 2000. One possible reason could be technology advancement in 2000’s which made it possible to produce movies in very low budget.

Note: All these observations are based on the sample size we have and may or may not apply to global population

Let’s check distribution of rating

hist(movies$rating[after80],breaks=10,col=movies$rating,freq=F,xlab="Rating",main="Movie Rating")

From the graph we could observe that the rating approximately follows normal distribution

Let’s compare the rating with various parameter using scatterplot

pairs(~rating+year+budget+votes,data=movies[after80,])
title(list("Rating and other movie factors"),line=3)

Now, let’s try to find famous movies

library(wordcloud)

## Loading required package: RColorBrewer

wordcloud(substr(movies$title[after80],1,30),movies$rating[after80]^2*movies$votes[after80],max.words = 100,colors = brewer.pal(8,"Dark2"),scale=c(2,0),random.order = F)
title(list("Famous Movies",col="purple"),line=3)

R-Week04-Assignment

Mohan

July 26, 2015