- Histogram
- Bar plot
- Pie chart
- Box plot
- Scatter plot
We will use the same House Price dataset. You can go to Canvas - Sample Dataset Module to download the following data.
df<-read.table(file="HousePrices.csv",
sep=",", header=TRUE, stringsAsFactors=FALSE)
class(df) # R will convert the file to a data frame
## [1] "data.frame"
Histogram takes a numeric vector as input, and creates bins to visualize the distribution of numbers in the vector.
hist(df$price)
hist(df$price, breaks=20) # adjust the size of each bin via breaks
# to learn more about these additional arguments, just type ?hist
hist(df$price, col="green", xlim=c(50000, 800000),
xlab="Price of house", ylab="Count",
main="Distribution of house price")
Bigger font size: Adjusting the cex argument
hist(df$price, col="plum", xlim=c(50000, 800000),
xlab="Price of house", ylab="Count",
main="Distribution of house price",
cex.lab=1.2, cex.axis=1.2, cex.main=1.5) #The default cex value = 1
Create the following graph with a slightly larger font size for the captions and axis (cex=2)
table(df$air_cond)
## ## No Yes ## 1093 635
table(df$heat)
## ## Electric Hot Air Hot Water ## 305 1121 302
table(df$fuel)
## ## Electric Gas Oil ## 315 1197 216
In a bar plot, the x axis is for categorical values while in a histogram, the x axis is for numerical values.
ditribution <- table(df$heat) ditribution
## ## Electric Hot Air Hot Water ## 305 1121 302
barplot(ditribution)
ditribution <- table(df$fuel) pie(ditribution)
ditribution <- table(df$fuel)
pct <- round(ditribution/nrow(df)*100) # calculate percentages
lbls <- paste(names(ditribution), pct, "%") # add percents to labels
pie(ditribution,col=c("steelblue4", "grey", "grey"),
labels=lbls)
This is a simple example for generating boxplot in R
boxplot(df$price,data=df, main="House Price BoxPlot")
We can use the outline argument to specify not including outliers
boxplot(df$lot_size,outline=FALSE,
ylab="Lot size of house", main="Box plot of house lot size")
This is an example for generating boxplot for a variable with an additional feature. For example, the distribution of price for houses with different numbers of rooms.
boxplot(price~rooms,data=df, main="House Prices of Different Rooms", xlab="Number of rooms", ylab="House Prices ($)")
Show a histogram of lot_size
Show a bar plot of air_cond
Show a pie chart of construction
Customize your plots to enhance the visualization.
The template for a formula in R is as follows:
outcome ~ predictor_1 + predictor_2 + ...
A formula usually has two parts: 1) one outcome variable on the left and 2) a set of predictors on the right
The two parts are separated by a tilde sign (~)
Formulas are frequently used in regression analyses and data mining. But you can also use formulas in some basic graphics.
Essentially, you use a formula to tell R that you are interested in the relationship between the outcome variable and the predictors
plot(price ~ lot_size + rooms + living_area, data=df) # y_axis ~ x_axis
ifelse(condition, do_this_if_true, do_this_if_false)
ifelse() is a function. Do not confuse it with if( ){...}else{...}, although they achieve the same thing.Example:
a <- 2; b <- 1 ifelse(a > b, "a is greater than b", "a is less than b")
## [1] "a is greater than b"
color <- ifelse(a==b, "red", "blue") print(color)
## [1] "blue"
plot(price ~ living_area, data=df,
pch=ifelse(df$air_cond== "Yes", 0, 1), # pch: symbols for the points
col=ifelse(df$air_cond=="Yes", "blue", "red")) # col: colors of the points
legend("topleft", c("with AC", "without AC"),
pch=c(0, 1), col=c("blue","red"))
Alternatively, you can just google “R pch symbols” and “R color chart”
Create a scatter plot with living_area on the x axis and price on the y axis. Color the dots in the plot according to whether the house has a fireplace