There are a number of datasets in this package to use to practice creating visualizations
# install.packages("dslabs") # these are data science labs
library("dslabs")
## Warning: package 'dslabs' was built under R version 4.1.3
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
## [1] "make-admissions.R"
## [2] "make-brca.R"
## [3] "make-brexit_polls.R"
## [4] "make-death_prob.R"
## [5] "make-divorce_margarine.R"
## [6] "make-gapminder-rdas.R"
## [7] "make-greenhouse_gases.R"
## [8] "make-historic_co2.R"
## [9] "make-mnist_27.R"
## [10] "make-movielens.R"
## [11] "make-murders-rda.R"
## [12] "make-na_example-rda.R"
## [13] "make-nyc_regents_scores.R"
## [14] "make-olive.R"
## [15] "make-outlier_example.R"
## [16] "make-polls_2008.R"
## [17] "make-polls_us_election_2016.R"
## [18] "make-reported_heights-rda.R"
## [19] "make-research_funding_rates.R"
## [20] "make-stars.R"
## [21] "make-temp_carbon.R"
## [22] "make-tissue-gene-expression.R"
## [23] "make-trump_tweets.R"
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"
Note that the package dslabs also includes some of the scripts used to wrangle the data from their original source:
data("murders")
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.1.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#install.packages("ggthemes")
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.1.3
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.1.3
view(murders)
str(murders)
## 'data.frame': 51 obs. of 5 variables:
## $ state : chr "Alabama" "Alaska" "Arizona" "Arkansas" ...
## $ abb : chr "AL" "AK" "AZ" "AR" ...
## $ region : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
## $ population: num 4779736 710231 6392017 2915918 37253956 ...
## $ total : num 135 19 232 93 1257 ...
write_csv(murders, "murders.csv", na="")
#The following is the another way to get the murders dataset and comment out
# load required packages
#install.packages("ggrepel")
#library(ggplot2)
#library(readr)
# load murders data
#murders <- read_csv("murders.csv")
#head(murders)
R’s basic fonts are fairly limited (run: names(postscriptFonts())) to view those available). Using extrafont in three easy steps The first step is to install extrafont, and then import the fonts from your system into the extrafont database:
Installation - the function font_import() does not work on my computer, so you can omit this chunk of code
# R's basic fonts are fairly limited. View those available by running this code
names(postscriptFonts())
## [1] "serif" "sans" "mono"
## [4] "AvantGarde" "Bookman" "Courier"
## [7] "Helvetica" "Helvetica-Narrow" "NewCenturySchoolbook"
## [10] "Palatino" "Times" "URWGothic"
## [13] "URWBookman" "NimbusMon" "NimbusSan"
## [16] "URWHelvetica" "NimbusSanCond" "CenturySch"
## [19] "URWPalladio" "NimbusRom" "URWTimes"
## [22] "ArialMT" "ComputerModern" "ComputerModernItalic"
## [25] "Japan1" "Japan1HeiMin" "Japan1GothicBBB"
## [28] "Japan1Ryumin" "Korea1" "Korea1deb"
## [31] "CNS1" "GB1"
# install.packages("extrafont")
#library(extrafont)
# the following command "font_import()" takes a long time to load - comment it out if you don't want to wait
#font_import()
Once we determine the per million rate to be r, this line is defined by the formula: y=rx, with y and x our axes: total murders and population in millions respectively.
In the log-scale this line turns into: log(y)=log(r)+log(x). So in our plot it’s a line with slope 1 and intercept log(r). To compute r, we use dplyr:
# Figure out the murder rate
r <- murders %>%
summarize(rate = sum(total) / sum(population) * 10^6) %>%
pull(rate)
Use the data science theme. Plot the murders with the x-axis as population for each state per million, the y-axis as the total murders for each state.
Color by region, add a linear regression line based on this calculation for r above, where we only need the intercept: geom_abline(intercept = log10(r))
Scale the x- and y-axes by a factor of log 10, add axes labels and a title.
Use the command nudge_x argument, if wanting to move the text slightly to the right or to the left:
ds_theme_set()
murders %>% ggplot(aes(x = population/10^6, y = total, label = abb)) + #x = population/10^6 because of the big number of population
geom_abline(intercept = log10(r), lty=2, col="darkgrey") +
geom_point(aes(color=region), size = 3) +
geom_text_repel(nudge_x = 0.005) +
#scale_x_log10("Populations in millions (log scale)") + # scale_x_log10 in log scale
#scale_y_log10("Total number of murders (log scale)") + # scale_y_log10 in log scale
xlab("Populations in millions") +
ylab("Total number of murders") +
ggtitle("US Gun Murders in 2010") +
scale_color_discrete(name="Region")
## Warning: ggrepel: 31 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
The default gray theme of ggplot2 has a rather academic look. You can use one of the ggplot2 built-in themes, and then customize the fonts.
murders_chart <-ggplot(murders, aes(x = population/10^6, y = total)) +
#xlab("Populations in millions") +
#ylab("Total number of murders") +
scale_x_log10("Populations in millions (log scale)") + # scale_x_log10 in log scale
scale_y_log10("Total number of murders (log scale)") + # scale_y_log10 in log scale
ggtitle("US Gun Murders in 2010") +
scale_color_discrete(name="Region")+
theme_minimal(base_size = 14, base_family = "URWTimes")
#theme_minimal(base_size = 24, base_family = "Bookman") #Testing base_size and base_family
#Testing base_size and base_family and the following is working and comment out
#murders_chart <-ggplot(murders, aes(x = population, y = total)) +
# xlab("Population in the State") +
#ylab("Total Murders") +
#ggtitle("Comparison of Total Murders and Population in the State") +
#theme_minimal(base_size = 24, base_family = "Bookman") #Testing base_size and base_family
This code will add a geom layer with points to the template:
murders_chart +
geom_point()
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
The following code modifies the two geom layers to change their appearance.
murders_chart +
#geom_point(size = 3, alpha = 0.5) + #Testing different size and alpha
#geom_point(size = 10, alpha = 5) + #Testing different size and alpha
geom_point(size = 6, alpha = 15) + #Testing different size and alpha
geom_smooth(method = lm, se=FALSE, color = "red")
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
You can make a dashed line by: linetype = “dotdash”, or equivalently, lty = 2
murders_chart +
geom_point(size = 3, alpha = 0.5, aes(color = state)) +
geom_smooth(method = lm, se =FALSE, color = "black", lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
#geom_point(size = 1, alpha = 0.9, aes(color = state)) + #Testing different values
#geom_smooth(method = lm, se =FALSE, color = "black", lty = 1, size = 0.1) #Testing different values
You can make a dashed line by: linetype = “dotdash”, or equivalently, lty = 2
murders_chart +
geom_point(size = 3, alpha = 0.5, aes(color = abb)) +
geom_smooth(method = lm, se =FALSE, color = "black", lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
Notice how the aes function colors the points by values in the data, rather than setting them to a single color. ggplot2 recognizes that state is a categorical variable, and uses its default qualitative color palette.
Now run this code, to see the different effect of setting the aes color mapping for the entire chart, rather than just one geom layer.
ggplot(murders, aes(x = population/10^6, y = total, color=state)) +
xlab("Population in the State (million)") +
ylab("Total Murders") +
#scale_x_log10("Populations in millions (log scale)") + # scale_x_log10 in log scale and it is working
#scale_y_log10("Total number of murders (log scale)") + # scale_y_log10 in log scale and it is working
#ggtitle("US Gun Murders in 2010") + + #Add the title
#theme_minimal(base_size = 14, base_family = "URWTimes") + #Testing different value for base_family
theme_minimal(base_size = 14, base_family = "Georgia") +
geom_point(size = 3, alpha = 0.5) +
geom_smooth(method=lm, se=FALSE, lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
ggplot(murders, aes(x = population/10^6, y = total, color=abb)) +
#xlab("Population in the State (million)") +
#ylab("Total Murders") +
scale_x_log10("Populations in millions (log scale)") + # scale_x_log10 in log scale and it is working
scale_y_log10("Total number of murders (log scale)") + # scale_y_log10 in log scale and it is working
#ggtitle("US Gun Murders in 2010") + + #Add the title
#theme_minimal(base_size = 14, base_family = "URWTimes") + #Testing different value for base_family
theme_minimal(base_size = 14, base_family = "Georgia") +
geom_point(size = 3, alpha = 0.5) +
geom_smooth(method=lm, se=FALSE, lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
You can apply ColorBrewer qualitative palettes by using the scale_color_brewer function. Add the text you want to appear as a legend title using name.
# set the axis ranges, change color palette
murders_chart +
xlab("Population in the State (million)") +
ylab("Total Murders") +
geom_point(size = 5, alpha = 1.0, aes(color = region)) +
geom_smooth(method = lm, se = FALSE, color = "black", lty=2, size = 0.2) +
scale_x_continuous(limits=c(0,10)) +
scale_y_continuous(limits=c(0,150)) +
scale_color_brewer(name="region", palette = "Set1")
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
This assignment is to choose the dataset of “murders.cvs” in “dslabs”. This dataset includes gun murder data for US states in 2010. The dataset of “murders.cvs” has five columns with the related data type as the following format – data name / : state /
This assignment uses this dataset to practice what I learn from week 8 class and also implement the week 8 assignment item 2 by R program.
Next, change the font and get the list font for the assignment to use
Next, work with the Murders Dataset to created multiple graphs. Then to explain each of them how to be created:
However, if I comment out
scale_x_log10(“Populations in millions (log scale)”) +
scale_y_log10(“Total number of murders (log scale)”) +
And add the following:
xlab(“Populations in millions”) +
ylab(“Total number of murders”) +
And I got my graph with x-axis “Populations in millions” and y-axis “Total number of murders”.
For the second graph, change the theme - The default gray theme of ggplot2 has a rather academic look. You can use one of the ggplot2 built-in themes, and then customize the fonts. And I can test the base_size and base_family but I comment out the code after worked out and tested.