DS Labs Datasets

Load the required packages and dataset in “dslabs”

Use the package DSLabs (Data Science Labs)

There are a number of datasets in this package to use to practice creating visualizations

# install.packages("dslabs")  # these are data science labs
library("dslabs")
## Warning: package 'dslabs' was built under R version 4.1.3
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
##  [1] "make-admissions.R"                   
##  [2] "make-brca.R"                         
##  [3] "make-brexit_polls.R"                 
##  [4] "make-death_prob.R"                   
##  [5] "make-divorce_margarine.R"            
##  [6] "make-gapminder-rdas.R"               
##  [7] "make-greenhouse_gases.R"             
##  [8] "make-historic_co2.R"                 
##  [9] "make-mnist_27.R"                     
## [10] "make-movielens.R"                    
## [11] "make-murders-rda.R"                  
## [12] "make-na_example-rda.R"               
## [13] "make-nyc_regents_scores.R"           
## [14] "make-olive.R"                        
## [15] "make-outlier_example.R"              
## [16] "make-polls_2008.R"                   
## [17] "make-polls_us_election_2016.R"       
## [18] "make-reported_heights-rda.R"         
## [19] "make-research_funding_rates.R"       
## [20] "make-stars.R"                        
## [21] "make-temp_carbon.R"                  
## [22] "make-tissue-gene-expression.R"       
## [23] "make-trump_tweets.R"                 
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"

Note that the package dslabs also includes some of the scripts used to wrangle the data from their original source:

US murders

Load murders data

This dataset includes gun murder data for US states in 2010. This assignment uses this dataset to practice what I learn from week 8 class and also implement the week 8 assignment item 2 by R program.

data("murders")
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#install.packages("ggthemes")
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.1.3
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.1.3
view(murders)
str(murders)
## 'data.frame':    51 obs. of  5 variables:
##  $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ abb       : chr  "AL" "AK" "AZ" "AR" ...
##  $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
##  $ population: num  4779736 710231 6392017 2915918 37253956 ...
##  $ total     : num  135 19 232 93 1257 ...
write_csv(murders, "murders.csv", na="")
#The following is the another way to get the murders dataset and comment out   

# load required packages
#install.packages("ggrepel")

#library(ggplot2)
#library(readr)

# load murders data
#murders <- read_csv("murders.csv")
#head(murders)

Change the font

R’s basic fonts are fairly limited (run: names(postscriptFonts())) to view those available). Using extrafont in three easy steps The first step is to install extrafont, and then import the fonts from your system into the extrafont database:

Installation - the function font_import() does not work on my computer, so you can omit this chunk of code

# R's basic fonts are fairly limited. View those available by running this code
names(postscriptFonts())
##  [1] "serif"                "sans"                 "mono"                
##  [4] "AvantGarde"           "Bookman"              "Courier"             
##  [7] "Helvetica"            "Helvetica-Narrow"     "NewCenturySchoolbook"
## [10] "Palatino"             "Times"                "URWGothic"           
## [13] "URWBookman"           "NimbusMon"            "NimbusSan"           
## [16] "URWHelvetica"         "NimbusSanCond"        "CenturySch"          
## [19] "URWPalladio"          "NimbusRom"            "URWTimes"            
## [22] "ArialMT"              "ComputerModern"       "ComputerModernItalic"
## [25] "Japan1"               "Japan1HeiMin"         "Japan1GothicBBB"     
## [28] "Japan1Ryumin"         "Korea1"               "Korea1deb"           
## [31] "CNS1"                 "GB1"
# install.packages("extrafont")
#library(extrafont)
# the following command "font_import()" takes a long time to load - comment it out if you don't want to wait
#font_import()

Work with the Murders Dataset

Calculate the average murder rate for the country

Once we determine the per million rate to be r, this line is defined by the formula: y=rx, with y and x our axes: total murders and population in millions respectively.

In the log-scale this line turns into: log(y)=log(r)+log(x). So in our plot it’s a line with slope 1 and intercept log(r). To compute r, we use dplyr:

# Figure out the murder rate
r <- murders %>% 
  summarize(rate = sum(total) /  sum(population) * 10^6) %>% 
  pull(rate)

The first graph - Create a static graph for which each point is labeled

Use the data science theme. Plot the murders with the x-axis as population for each state per million, the y-axis as the total murders for each state.

Color by region, add a linear regression line based on this calculation for r above, where we only need the intercept: geom_abline(intercept = log10(r))

Scale the x- and y-axes by a factor of log 10, add axes labels and a title.

Use the command nudge_x argument, if wanting to move the text slightly to the right or to the left:

ds_theme_set()
murders %>% ggplot(aes(x = population/10^6, y = total, label = abb)) +  #x = population/10^6 because of the big number of population 
  geom_abline(intercept = log10(r), lty=2, col="darkgrey") +
  geom_point(aes(color=region), size = 3) +
  geom_text_repel(nudge_x = 0.005) +
  #scale_x_log10("Populations in millions (log scale)") +   # scale_x_log10 in log scale
  #scale_y_log10("Total number of murders (log scale)") +   # scale_y_log10 in log scale
  xlab("Populations in millions") + 
  ylab("Total number of murders") +
  ggtitle("US Gun Murders in 2010") +
  scale_color_discrete(name="Region")
## Warning: ggrepel: 31 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

For the second and following graphs - Change the theme

The default gray theme of ggplot2 has a rather academic look. You can use one of the ggplot2 built-in themes, and then customize the fonts.

murders_chart <-ggplot(murders, aes(x = population/10^6, y = total)) +
  #xlab("Populations in millions") + 
  #ylab("Total number of murders") +
  scale_x_log10("Populations in millions (log scale)") +   # scale_x_log10 in log scale
  scale_y_log10("Total number of murders (log scale)") +   # scale_y_log10 in log scale
  ggtitle("US Gun Murders in 2010") +
  scale_color_discrete(name="Region")+
  theme_minimal(base_size = 14, base_family = "URWTimes")
  #theme_minimal(base_size = 24, base_family = "Bookman")  #Testing base_size and base_family
#Testing base_size and base_family and the following is working and comment out

#murders_chart <-ggplot(murders, aes(x = population, y = total)) +
 # xlab("Population in the State") + 
  #ylab("Total Murders") +
  #ggtitle("Comparison of Total Murders and Population in the State") +
  #theme_minimal(base_size = 24, base_family = "Bookman")  #Testing base_size and base_family

The second graph - Add a layer with points

This code will add a geom layer with points to the template:

murders_chart +
  geom_point()
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

The third graph - Customize the two layers we’ve added to the chart

The following code modifies the two geom layers to change their appearance.

murders_chart +
  #geom_point(size = 3, alpha = 0.5) +    #Testing different size and alpha
  #geom_point(size = 10, alpha = 5) +     #Testing different size and alpha
  geom_point(size = 6, alpha = 15) +      #Testing different size and alpha
  geom_smooth(method = lm, se=FALSE, color = "red")
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

The fourth graph - Customize again, coloring the points by state

You can make a dashed line by: linetype = “dotdash”, or equivalently, lty = 2

murders_chart + 
  geom_point(size = 3, alpha = 0.5, aes(color = state)) +
  geom_smooth(method = lm, se  =FALSE, color = "black", lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

  #geom_point(size = 1, alpha = 0.9, aes(color = state)) +      #Testing different values
  #geom_smooth(method = lm, se  =FALSE, color = "black", lty = 1, size = 0.1) #Testing different values

The fifth graph - Customize again, coloring the points by abb and looks more clear

You can make a dashed line by: linetype = “dotdash”, or equivalently, lty = 2

murders_chart + 
  geom_point(size = 3, alpha = 0.5, aes(color = abb)) +
  geom_smooth(method = lm, se  =FALSE, color = "black", lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

The six graph - Color the entire chart by state name

Notice how the aes function colors the points by values in the data, rather than setting them to a single color. ggplot2 recognizes that state is a categorical variable, and uses its default qualitative color palette.

Now run this code, to see the different effect of setting the aes color mapping for the entire chart, rather than just one geom layer.

ggplot(murders, aes(x = population/10^6, y = total, color=state)) +
  xlab("Population in the State (million)") + 
  ylab("Total Murders") +
  #scale_x_log10("Populations in millions (log scale)") +   # scale_x_log10 in log scale and it is working
  #scale_y_log10("Total number of murders (log scale)") +   # scale_y_log10 in log scale and it is working
  #ggtitle("US Gun Murders in 2010") + + #Add the title
  #theme_minimal(base_size = 14, base_family = "URWTimes") + #Testing different value for base_family
  theme_minimal(base_size = 14, base_family = "Georgia") + 
  geom_point(size = 3, alpha = 0.5) +
  geom_smooth(method=lm, se=FALSE, lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database

## Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family not
## found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

The seventh graph - Color the entire chart by abb

ggplot(murders, aes(x = population/10^6, y = total, color=abb)) +
  #xlab("Population in the State (million)") + 
  #ylab("Total Murders") +
  scale_x_log10("Populations in millions (log scale)") +   # scale_x_log10 in log scale and it is working
  scale_y_log10("Total number of murders (log scale)") +   # scale_y_log10 in log scale and it is working
  #ggtitle("US Gun Murders in 2010") + + #Add the title
  #theme_minimal(base_size = 14, base_family = "URWTimes") + #Testing different value for base_family
  theme_minimal(base_size = 14, base_family = "Georgia") + 
  geom_point(size = 3, alpha = 0.5) +
  geom_smooth(method=lm, se=FALSE, lty = 2, size = 0.3)
## `geom_smooth()` using formula 'y ~ x'
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

The eighth graph - Set the axis ranges, and use a different color palette

You can apply ColorBrewer qualitative palettes by using the scale_color_brewer function. Add the text you want to appear as a legend title using name.

# set the axis ranges, change color palette
murders_chart + 
  xlab("Population in the State (million)") + 
  ylab("Total Murders") +
  geom_point(size = 5, alpha = 1.0, aes(color = region)) +
  geom_smooth(method = lm, se = FALSE, color = "black", lty=2, size = 0.2) + 
  scale_x_continuous(limits=c(0,10)) + 
  scale_y_continuous(limits=c(0,150)) +
  scale_color_brewer(name="region", palette = "Set1")
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
## family not found in Windows font database

The Essay About the Dataset and its Virtualization

This assignment is to choose the dataset of “murders.cvs” in “dslabs”. This dataset includes gun murder data for US states in 2010. The dataset of “murders.cvs” has five columns with the related data type as the following format – data name / : state / abb / region / population / total /

This assignment uses this dataset to practice what I learn from week 8 class and also implement the week 8 assignment item 2 by R program.

The first step is to load the required packages and dataset in “dslabs”. Then to choose the dataset of “murders.cvs” in “dslabs”.

Next, change the font and get the list font for the assignment to use

Next, work with the Murders Dataset to created multiple graphs. Then to explain each of them how to be created:

The first graph is to use the log-scale to determine the per million rate to be r, this line is defined by the formula: y=rx, with y and x our axes: total murders and population in millions respectively through by log(y)=log(r)+log(x). I use the exact same way I learned from the class by Professor Saidi.

However, if I comment out

scale_x_log10(“Populations in millions (log scale)”) +

scale_y_log10(“Total number of murders (log scale)”) +

And add the following:

xlab(“Populations in millions”) +

ylab(“Total number of murders”) +

And I got my graph with x-axis “Populations in millions” and y-axis “Total number of murders”.

For the second graph, change the theme - The default gray theme of ggplot2 has a rather academic look. You can use one of the ggplot2 built-in themes, and then customize the fonts. And I can test the base_size and base_family but I comment out the code after worked out and tested.

The second graph - Add a geom layer with points to the template.

The third graph - Customize the two layers to modify the two geom layers to change their appearance.

The fourth graph - Customize again, coloring the points by state.

The fifth graph - Customize again, coloring the points by abb and looks more clear.

The six graph - Color the entire chart by state name.

The seventh graph - Color the entire chart by abb.

The eighth graph - Set the axis ranges, and use a different color palette.