Use the Import Dataset feature in the workspace to import homePrice.csv and save it as a data.frame called homes.
Briefly explore (from the console) the data and view the variable names. While entering the commands, remember to use code completion with the tab key. Don't copy and paste commands.
names(homes)
## [1] "price" "age" "rating" "state"
price = Market price of the home
age = Age (years) of the house
rating = Grade given to the house from 1-5 (1=poor, 5=best)
state = State where the house is located
summary(homes)
hist(homes$price)
It appears that there are more homes in the lower price range than others. However the summary of the rating variable is not very helpful. Look at the class of the object and change it to something more appropriate such as a factor.
class(homes$rating)
homes$rating <- as.factor(homes[, 3])
class(homes$rating)
Plot price based on the rating
plot(homes$rating, homes$price)
The boxplots are nice, but maybe we want to instead see the individual points and overlay some text on the plot. First change the rating variable back to an integer. We also want the mean for each rating group.
# change rating back to an integer
homes$rating <- as.integer(homes[, 3])
class(homes$rating)
# calculate the mean for each rating group
avePrice <- rep(0, 5)
for (i in 1:5) {
avePrice[i] <- mean(subset(homes, rating == i)$price)
}
Now plot price based on each rating group. Also include the mean for each group anotated in red with it's value written to the side. You can copy and paste these commands.
plot(homes$rating, homes$price, main = "Breakdown of Price by Rating",
xlab = "Rating", ylab = "Price", las = 2, labels = FALSE)
## Warning message: "labels" is not a graphical parameter
## Warning message: "labels" is not a graphical parameter
## Warning message: "labels" is not a graphical parameter
## Warning message: "labels" is not a graphical parameter
axis(1)
axis(2, at = c(0, 2e+05, 4e+05, 6e+05, 8e+05), labels = c("0", "200k",
"400k", "600k", "800k"), las = 2)
points(1:5, avePrice, col = "red", pch = 19)
text(1:4, avePrice[1:4], round(avePrice[1:4], -3), pos = 4)
text(5, avePrice[5], round(avePrice[5], -3), pos = 2)
Install and load the ggplot2 package from the Packages pane.
Open the Help pane and take a look at the documentation for the qplot function. Look at the example and the function arguments to plot price based on rating
Use other functions inside the ggplot2 package, to improve the look of the graph:
p <- qplot(rating, price, color = age, data = homes, main = "Breakdown of Price by Rating",
xlab = "Rating")
p <- p + opts(plot.title = theme_text(size = 23))
p <- p + opts(axis.title.x = theme_text(size = 16, vjust = -0.05))
p <- p + opts(axis.title.y = theme_text(size = 16))
p <- p + opts(legend.title = theme_text(size = 14))
p <- p + scale_y_continuous(name = "Price", labels = c("200k", "400k",
"600k", "800k"))
p <- p + geom_point()
print(p)
We've done some decent analysis, but want to make sure and save our work. Use the History pane to view previous commands that worked and wrap them up into an R script. Use the To Source button to put commands from your history into the active source document.
The last plot with ggplot2 used a lot of formatting commands. Make a copy of these commands and paste them into a new R source file for editing. Use the extract function* feature to place these into it's own function called formatPlot(). Save this function in its own source file called formatPlot.R.
Go back to your first document and source formatPlot.R to use your new function to produce other plots.
source("~/Desktop/rstudioTraining/R/formatPlot.R")
p <- qplot(age, price, color = rating, data = homes, main = "Breakdown of Prices by Age and rating",
xlab = "Age") + geom_smooth()
formatPlot(p)
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
p <- qplot(state, price, color = rating, data = homes, main = "Breakdown of Prices by State and Rating",
xlab = "State")
formatPlot(p)
Review your previous plots in the plots pane. Select your favorite and use the export feature (“Save Plot as Image…”) to save it. Manually adjust the size with the drag and drop tool (bottom right) and save your plot.
Taking a look back, our variable “avePrice” wasn't very descriptive. Since we used this throughout the document, we use the Find and Replace tool to change all instancs of “avePrice” into something more descript such as “ratingMeans”.