##4.3 ## 1) Using the sahp dataset with the ggplot2 package, answer the following questions.
Create a scatterplot to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis).
#install.packages("ggplot2")
library(ggplot2)
#install.packages("r02pro")
library(r02pro)
sahp
## # A tibble: 165 × 12
## dt_sold bedroom bathroom gar_car oa_qual liv_area lot_area house_style
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2010-03-25 3 2.5 2 6 1479 13517 2Story
## 2 2009-04-10 4 3.5 2 7 2122 11492 2Story
## 3 2010-01-15 3 2 1 5 1057 7922 1Story
## 4 2010-04-19 3 2.5 2 5 1444 9802 2Story
## 5 2010-03-22 3 2 2 6 1445 14235 1.5Fin
## 6 2010-06-06 2 2.5 2 6 1888 16492 1Story
## 7 2006-06-14 2 3 2 6 1072 3675 SFoyer
## 8 2010-05-08 3 2 2 5 1188 12160 1Story
## 9 2007-06-14 2 1 1 5 924 15783 1Story
## 10 2007-09-01 5 2.5 2 5 2080 11606 2Story
## # … with 155 more rows, and 4 more variables: kit_qual <chr>, heat_qual <chr>,
## # central_air <chr>, sale_price <dbl>
ggplot(data= sahp) + geom_point(mapping = aes(x = lot_area, y = sale_price))
## Warning: Removed 1 rows containing missing values (geom_point).
In the scatterplot from Q1, change the size of all points to 3 and use different colors according to the value of house_style
ggplot(data= sahp) + geom_point(mapping = aes(x = lot_area, y = sale_price,color = house_style),size = 3)
## Warning: Removed 1 rows containing missing values (geom_point).
Change legend order in the scatterplot from Q2 to be “1Story,” “SFoyer,” “1.5Fin,” “2Story,” “SLvl” from top to bottom.
sahp$house_style <- factor(sahp$house_style , order = TRUE , levels = c("1Story", "SFoyer", "1.5Fin","2Story", "SLvl"))
ggplot(data= sahp) + geom_point(mapping = aes(x = lot_area, y = sale_price,color = house_style), size = 3)
## Warning: Removed 1 rows containing missing values (geom_point).
In the scatterplot from Q1, change the shape of all points to triangle and map bedroom to the color aesthetic. Then map the low value of bedroom to the yellow color and map the high value of bedroom to the green color. Write R code to get the number of birds from zoo3
ggplot(data= sahp) + geom_point(mapping = aes(x = lot_area, y = sale_price, color = bedroom), shape = 2) +scale_color_continuous(low = "yellow", high = "green")
## Warning: Removed 1 rows containing missing values (geom_point).
In the scatterplot from Q1, change the color of all points to green and use different shapes to distinguish whether the house has more than 3 bedrooms.
ggplot(data= sahp) + geom_point(mapping = aes(x = lot_area, y = sale_price, shape=(bedroom>3)), color = "green")
## Warning: Removed 1 rows containing missing values (geom_point).
##4.4 Using the sahp dataset with the ggplot2 package, answer the following questions. ###Q1 Create a smoothline fit to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis).
ggplot(data = sahp) + geom_smooth(mapping = aes(x = lot_area, y = sale_price))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
### Q2 Create several smoothlines with different colors corresponding to the value of kit_qual to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis).
ggplot(data = sahp) + geom_smooth(mapping = aes(x = lot_area, y = sale_price , color = kit_qual))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
### Q3 Create smoothlines without confidence interval around and with different linetypes to distinguish whether the house has more than 2 bedrooms to visualize the relationship between lot_area (on the x-axis) and sale_price (on the y-axis) .
sahp$big_bedroom <- sahp$bedroom >2
ggplot(data = sahp) + geom_smooth(mapping = aes(x = lot_area, y = sale_price , linetype = big_bedroom), se = FALSE )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
##4.5 Using the sahp dataset with the ggplot2 package, answer the following questions. ###1)With lot_area on the x-axis and sale_price on the y-axis, create a plot that contains both the scatterplot and smoothline fits, where we use different colors in the scatterplot to distinguish whether heat_qual is excellent and different linetypes for the smoothline fits depending on whether house_style is 2Story.
ggplot(data = sahp, mapping = aes(x = lot_area, y = sale_price)) + geom_point(mapping = aes( color = heat_qual=="Excellent")) +geom_smooth(mapping = aes(linetype = house_style =="2Story"))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
##4.6 Using the sahp dataset with the ggplot2 package, and using global and local mappings to answer the following questions.
###Q1 Create a plot of liv_area (on the x-axis) and sale_price (on the y-axis) that contains both the scatterplot and the smoothline fit.
ggplot(data= sahp, mapping = aes(x = lot_area, y = sale_price)) + geom_point() +geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
###Q2 In the plot from Q1, using different colors for both the scatterplot and the smoothline fit to distinguish whether the house has more than 3 bedrooms and make all points to be size 2.
sahp$threebedroom <- sahp$bedroom >3
ggplot(data= sahp, mapping = aes(x = lot_area, y = sale_price,color= threebedroom )) + geom_point(mapping= aes(size=2)) +geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
If you run the following code ggplot(data = sahp, mapping = aes(x = liv_area, y = sale_price), color = “green”) + geom_point() + geom_smooth(), do you think all the points and the smoothline will be green? If not, explain the reason and make them green.
ggplot(data = sahp, mapping = aes(x = liv_area, y = sale_price)) + geom_point(color="green") + geom_smooth(color="green")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
We can’t change all the points and the smoothline to be green cause that the global aesthetics set the aesthetic for all points/lines on the graph while the global aesthetic mapping represents the mapping which will be passed to all geoms.We only can change both of them to be green by global aesthetics for both geoms as above.
If you run ggplot(data = sahp, mapping = aes(x = liv_area, y = sale_price, color = house_style)) + geom_point(mapping = aes(color = bedroom > 3)) + geom_smooth(mapping = aes(color = bedroom > 3)), explain why you only see two colors in the plot?
ggplot(data = sahp, mapping = aes(x = liv_area, y = sale_price, color = house_style)) + geom_point(mapping = aes(color = bedroom > 3)) + geom_smooth(mapping = aes(color = bedroom > 3))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).
Cause when we add the same aesthetic mapping in the local geom as one of the global mappings, the corresponding local aesthetic mapping will be overwritten, which means that “color >3” overwrites “color = house_style”.
First, create a data set sahp_2006 by running the following code sahp_2006 <- sahp[format(sahp$dt_sold, “%Y”) < 2007, ] #all houses sold before 2007
sahp_2006 <- sahp[format(sahp$dt_sold, "%Y") < 2007, ]
Using plot() to create a line plot of dt_sold (on the x-axis) and sale_price (on the y-axis) to show the trend of the sale price as a function of the sold date of the house, then give title “DS” for the x-axis and title “SP” for the y-axis and make the line to be a “twodash” line.
dt_sold_order <- order(sahp_2006$dt_sold)
plot(sahp_2006$dt_sold[dt_sold_order], sahp_2006$sale_price[dt_sold_order], type = "l" ,xlab = "DS" ,ylab = "SP", lty = 6)
### Q2 Using the ggplot2 package to create a line plot of dt_sold (on the x-axis) and sale_price (on the y-axis) with different linetypes depending on the value of house_style.
ggplot( data = sahp_2006 ) + geom_line(mapping = aes(x = dt_sold, y = sale_price, linetype = house_style))
## 4.8 ###Q1 Using the sahp data, using plot() to generate a scatterplot between lot_area (x-axis) and sale_price (y-axis), and add horizontal lines at 150 with color “blue” and 200 with color “red.”
plot(sahp$lot_area , sahp$sale_price)
abline(h = 150 , col = "blue")
abline(h = 200 , col = "red")
### Q2 Using the sahp data, using ggplot() to generate a scatterplot between sale_price (x-axis) and liv_area (y-axis), and add the following lines to the plot: y = 5 * x +1000 (in green color, dashed line) y = 3 * x +1500 (in purple color, solid line)
ggplot(data = sahp ) + geom_point( mapping = aes(x = sale_price, y = liv_area)) + geom_abline (slope = 5 , intercept = 1000, lty = 2 , col= "green") + geom_abline(slope = 3 , intercept = 1500 , lty= 1 , col = "purple")
## Warning: Removed 1 rows containing missing values (geom_point).