- Histogram
- Bar plot
- Pie chart
- Scatter plot
We will use the same House Price dataset. You can go to Canvas - Dataset Module to download the following data.
df<-read.table(file="HousePrices.csv",
sep=",", header=TRUE, stringsAsFactors=FALSE)
class(df) # R will convert the file to a data frame
## [1] "data.frame"
Histogram takes a numeric vector as input, and creates bins to visualize the distribution of numbers in the vector.
hist(df$price)
hist(df$price, breaks=40) # adjust the size of each bin via breaks
# to learn more about these additional arguments, just type ?hist
hist(df$price, col="green", xlim=c(50000, 800000),
xlab="Price of house", ylab="Count",
main="Distribution of house price")
Bigger font size:
hist(df$price, col="plum", xlim=c(50000, 800000),
xlab="Price of house", ylab="Count",
main="Distribution of house price",
cex.lab=1.5, cex.axis=1.5, cex.main=1.5)
Name of colors in R: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
More fancy colors: https://www.datanovia.com/en/blog/top-r-color-palettes-to-know-for-great-data-visualization/
hist(df$price, col=cm.colors(10), xlim=c(50000, 800000),
xlab="Price of house", ylab="Count",
main="Distribution of house price")
Create the following graph with default number of bins
Do you think this is a good visualization? Re-generate the graph to show to show the distribution of the majority. Add your favorate color to the chart.
table(df$air_cond)
## ## No Yes ## 1093 635
table(df$heat)
## ## Electric Hot Air Hot Water ## 305 1121 302
table(df$fuel)
## ## Electric Gas Oil ## 315 1197 216
In a bar plot, the x axis is for categorical values while in a histogram, the x axis is for numerical values.
ditribution <- table(df$heat) ditribution
## ## Electric Hot Air Hot Water ## 305 1121 302
barplot(ditribution)
ditribution <- table(df$fuel) pie(ditribution)
ditribution <- table(df$fuel)
pct <- round(ditribution/nrow(df)*100) # calculate percentages
lbls <- paste(names(ditribution), pct, "%") # add percents to labels
pie(ditribution,col=c("steelblue4", "grey", "grey"),
labels=lbls)
Show a histogram of lot_size
Show a bar plot of air_cond
Show a pie chart of construction
Customize your plots to enhance the visualization.
The template for a formula in R is as follows:
outcome ~ predictor_1 + predictor_2 + ...
A formula usually has two parts: 1) one outcome variable on the left and 2) a set of predictors on the right
The two parts are separated by a tilde sign (~)
Formulas are frequently used in regression analyses and data mining. But you can also use formulas in some basic graphics.
Essentially, you use a formula to tell R that you are interested in the relationship between the outcome variable and the predictors
plot(price ~ lot_size + bathrooms + rooms + age, data=df) # y_axis ~ x_axis
ifelse(condition, do_this_if_true, do_this_if_false)
ifelse() is a function.Example:
a <- 2; b <- 1 ifelse(a > b, "a is greater than b", "a is less than b")
## [1] "a is greater than b"
color <- ifelse(a==b, "red", "blue") print(color)
## [1] "blue"
plot(price ~ lot_size, data=df,
pch=ifelse(df$air_cond=="Yes", 0, 1), # pch: symbols for the points
col=ifelse(df$air_cond=="Yes", "red", "blue")) # col: colors of the points
legend("topleft", c("w/ aircon", "w/t aircon"),
pch=c(0, 1), col=c("red", "blue"))
Alternatively, you can just google “R pch symbols”
Create a scatter plot with living_area on the x axis and price on the y axis. Color the dots in the plot according to whether the house has a fireplace