http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
Let’s look at housing prices.
library(ggplot2)
housing <- read.csv("dataSets/landdata-states.csv")
Old way…kind of lame:
hist(housing$Home.Value)
ggplot way…better?:
ggplot(housing, aes(x = Home.Value)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(subset(housing, State %in% c("MA", "TX")),
aes(x=Date,
y=Home.Value,
color=State))+
geom_point()
In ggplot land aesthetic means “something you can see”. Examples include: position (i.e., on the x and y axes), color (“outside” color), fill (“inside” color), shape (of points), linetype, size
Geometric objects are the actual marks we put on a plot. A plot must have at least one geom; there is no upper limit. You can add a geom to a plot using the + operator
hp2001Q1 <- subset(housing, Date == 2001.25)
ggplot(hp2001Q1,
aes(y = Structure.Cost, x = log(Land.Value))) +
geom_point()
First construct linear regression model and use predict function
hp2001Q1$pred.SC <- predict(lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1))
Then, add new variable for prediction line to plot:
# base chart
p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))
# with prediction line
p1 + geom_point(aes(color = Home.Value)) +
geom_line(aes(y = pred.SC))
What is a Smoother? https://www.stat.berkeley.edu/~s133/Smooth-a.html
There are various smoothing methods/formulas, the graph below uses Loess (most likely due to multiple predictors)
p1 +
geom_point(aes(color = Home.Value)) +
geom_smooth()
## `geom_smooth()` using method = 'loess'
p1 +
geom_text(aes(label=State), size = 3)
Confusing…need to research
https://www.r-bloggers.com/ggplot2-mapping-vs-setting/
First, we need to understand that any aesthetic in ggplot2 (such as colour, size, shape, etc.) can be used in two distinct ways in your plots:
Option 1 – you can use the aesthetic to reflect some properties of your data. For example, clarity of the diamonds. This is called MAPPING an aesthetic.
Option 2 - you can choose a certain value for an aesthetic. For example, make the colour blue for ALL points or make the shape a square for ALL points. This is called SETTING an aesthetic and the keyword here is ALL.
When mapping you can convey more insights, whereas when setting you get more control of how your chart looks.
#For example
## geom_point(aes(size = 2),# incorrect! 2 is not a variable
## color="red") # this is fine -- all points red
dat <- read.csv("dataSets/EconomistData.csv")
ex1<-ggplot(dat, aes(x=CPI, y = HDI))
#Mapping
ex1+geom_point(aes(color = HDI.Rank))
#Setting
ex1+geom_point(color = "blue")
Some plot types (such as scatterplots) do not require transformations–each point is plotted at x and y coordinates equal to the original value. Other plots, such as boxplots, histograms, prediction lines etc. require statistical transformations.
Each geom has a default statistic, but these can be changed. For example, the default statistic for geom_bar is stat_bin:
Arguments to stat_ functions can be passed through geom_ functions. This can be slightly annoying because in order to change it you have to first determine which stat the geom uses, then determine the arguments to that stat.
#bin_width example:
p2 <- ggplot(housing, aes(x = Home.Value))
p2 + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#with manually entered bin_width
p2 + geom_histogram(stat = "bin", binwidth=4000)
Sometimes the default statistical transformation is not what you need. The chart is trying to summarize a field that is already summarized. In this case, we can add stat=identity to the geom function.
#summarizing the data now:
housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean)
rbind(head(housing.sum), tail(housing.sum))
## State Home.Value
## 1 AK 147385.14
## 2 AL 92545.22
## 3 AR 82076.84
## 4 AZ 140755.59
## 5 CA 282808.08
## 6 CO 158175.99
## 46 VA 155391.44
## 47 VT 132394.60
## 48 WA 178522.58
## 49 WI 108359.45
## 50 WV 77161.71
## 51 WY 122897.25
#won't need to in the plot
ggplot(housing.sum, aes(x=State, y=Home.Value)) +
geom_bar(stat="identity")
Used when you map an aesthetic (via aes()) to a variable and want to determine how. Ex. color = HDI.Rank but also red.
cales are modified with a series of functions using a scale_
Examples:
ex2<- ggplot(housing,
aes(x = State,
y = Home.Price.Index))+
theme(legend.position = "top",
axis.text = element_text(size = 6))
ex2 + geom_point(aes(color = Date),
alpha = 0.5,
size = 1.5,
position = position_jitter(width = 0.25, height = 0))
Adds noise to charts with discrete variables to avoid over-charting. Helps see a little more of the points that would normally be plotted on top of one another. Just be careful to add too much noise.