Assignment Requirements

Tidyverse CREATE Assignment (25 points)

  • Clone the provided repository (1 point) 🗸
  • Write a vignette using one TidyVerse package (15 points) 🗸
  • Write a vignette using more than one TidyVerse packages (+ 2 points) 🗸
  • Make a pull request on the shared repository (1 point)
  • Update the README.md file with your example (2 points)
  • Submit your GitHub handle name & link to Peergrade (1 point)
  • Grade your 3 peers and provide the feedback in Peergrade (2 points)
  • Submit the best peer link & your link to Blackboard (1 point)

Overview

The tidyverse package is an open source collection of packages with very applicable and useful tools for Data Science. Installing tidyverse like any other package can be done with the install.packages() function. The packages I will focus on is reprex and ggplot function for my assignment. Requirements to run code is openintro package

Load Package

Loading the library after an installation can be done using the command below

library(tidyverse)
Package Description
broom For summarizing statistic models using tiny bubbles
cli Suite of tools for Command Line Interface
crayon Colored terminal output
dbplyr dplyr’s backend database
dplyr Actions involving Data Manipulation
forcats Suite of tools for factors
ggplot2 Suite of tools for creating plots
haven enables R to read and write various data formats
hms Used for storing durations or times
httr Wrapper for curl package
jsonlite JSON Parser and Generator for R
lubridate Intuitive date-time data tools
magrittr Operators for code readability
modelr Modeling pipeline functions
pillar Column formatting tools
purrr Allows for mapping functions to data
readr For reading rectangular data
readxl For reading data quickly from excel files
reprex Wrapper for creating snippets to post on websites and messaging apps
rlang For core language features of tidyverse
rstudioapi For conditional access to RStudios API from CRAN
rvest Wrapper for scraping of information off of webpages
stringr Tools for data cleaing and preparation
tibble For dataframe creation
tidyr To ‘tidy up’ or simplify data
xml2 To enhance work with HTML and XML through R
A-1
TIDYVERSE PACKAGES

Reprex

As explained in Table A-1 Reprex is a *Wrapper for creating snippets to post on websites and messaging apps. It’s source information and details can be found below.

Reprex Source Information:

  1. Website for Reprex Package: reprex.tidyverse.org
  2. Reprex Github: github.com/tidyverse/reprex
  3. Good Tutorial for Reprex: How to use reprex, vignettes/articles/learn-reprex.Rmd

ggplot2

As explained in Table A-1 ggplot2 is a suite for tools for creating plots. The data used in creating the below ggplot comes from the openintro package. OpenIntro package details can be found below.

  1. ggplot2 website : rdocumentation.org/packages/ggplot2/versions/3.3.3
  2. ggplot Github: github.com/cran/ggplot2

Loading data

The data used to create the plot, is the dataset evals from the OpenIntro package, noted below: OpenIntro Github: github.com/OpenIntroStat/openintro

  • In order to verify what packages are loaded, the command data() can be used
  • To verify if an OpenIntro package directory exists on your local machine, use the command packageDescription("openintro")
  • If it does not or the library is not available for some reason, use install.packages("openintro") to install OpenIntro.
  • The command help(package = "openintro") can be used to access more documentation, regarding OpenIntro

Step 1: Load Library

#Load library
library(openintro)

Step 2: Load Data

## Load Dataset `evals` from `OpenIntro`
data(evals)
head(evals)
## # A tibble: 6 x 23
##   course_id prof_id score rank  ethnicity gender language   age cls_perc_eval
##       <int>   <int> <dbl> <fct> <fct>     <fct>  <fct>    <int>         <dbl>
## 1         1       1   4.7 tenu~ minority  female english     36          55.8
## 2         2       1   4.1 tenu~ minority  female english     36          68.8
## 3         3       1   3.9 tenu~ minority  female english     36          60.8
## 4         4       1   4.8 tenu~ minority  female english     36          62.6
## 5         5       2   4.6 tenu~ not mino~ male   english     59          85  
## 6         6       2   4.3 tenu~ not mino~ male   english     59          87.5
## # ... with 14 more variables: cls_did_eval <int>, cls_students <int>,
## #   cls_level <fct>, cls_profs <fct>, cls_credits <fct>, bty_f1lower <int>,
## #   bty_f1upper <int>, bty_f2upper <int>, bty_m1lower <int>, bty_m1upper <int>,
## #   bty_m2upper <int>, bty_avg <dbl>, pic_outfit <fct>, pic_color <fct>

Step 3: Prepare Data

Dataframe manipulated_data is created using specifically columns prof_id and score from evals data set. Data is then condensed using group_by() function and a new column no_rows is added to the dataframe as shown below

manipulated_data<-data.frame(Professors_ID = evals$prof_id,Score = evals$score)
head(manipulated_data,3)
##   Professors_ID Score
## 1             1   4.7
## 2             1   4.1
## 3             1   3.9
manipulated_data<-manipulated_data %>% 
  group_by(Score) %>%
  summarise(no_rows = length(Score))

Step 4: Plot

Plotting with ggplot2 the plot type has to be chosen with additional functions such as geom_line, geom_density, geom_histogram(), geom_point(), etc. Multiple aesthetics can be applied in one graph as well, as shown by running

        ggplot(data = manipulated_data,aes(x=Score, y=no_rows))+
        geom_histogram(aes(x=no_rows,..density..))+
        geom_density(aes(x=no_rows,..density..), color = "red", size=3)

ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_line()
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_density(aes(x=no_rows,..density..))
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_histogram(aes(x=no_rows,..density..))
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_point()

ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+
    geom_histogram(aes(x=no_rows,..density..))+
    geom_density(aes(x=no_rows,..density..), color = "red", size=3)

Below would be an example of a more complex variation, utilizing geom_text(),labs(),theme() and scale_x_continous() to create a more complex plot.

# Use ggplot(),geom_bar(),geom_text(),labs)(),scale_x_continous(), and theme() to edit plot
ggplot(data = manipulated_data, aes(x=Score, y=no_rows,fill=no_rows)) + 
  geom_bar(stat = "identity")+
    geom_text(aes(label=no_rows),position = position_dodge(width = .1),vjust = -0.25)+
    labs(title = 'Score Distribution',x = 'Score', y="Count")+
    scale_x_continuous(breaks = unique(manipulated_data$Score)) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

Creating Snippet

The first step to utilizing reprex involves copying the code you would like to create a snippet of then run reprex::reprex(), unless you already loaded the library, in which case reprex() will suffice.

The example below will show how to make a snippet, out of all the steps taken to build the ggplot in chunk ggplot_manipulated_data_intermediate

library(tidyverse)
library(openintro)
data(evals)
#head(evals)
manipulated_data<-data.frame(Professors_ID = evals$prof_id,Score = evals$score)
#head(manipulated_data,3)

manipulated_data<-manipulated_data %>% 
  group_by(Score) %>%
  summarise(no_rows = length(Score))

ggplot(data = manipulated_data, aes(x=Score, y=no_rows,fill=no_rows)) + 
  geom_bar(stat = "identity")+
    geom_text(aes(label=no_rows),position = position_dodge(width = .1),vjust = -0.25)+
    labs(title = 'Score Distribution',x = 'Score', y="Count")+
    scale_x_continuous(breaks = unique(manipulated_data$Score)) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

COPY_CODE

CONSOLE_OUTPUT

The resulting snippet allows for an easy copy & paste with full graphics available

Github Load Example

Conclusion

The understanding the use of ggplot is almost a requirement in my opinion, as the complex plots are best formed utilizing this function. Reprex is also invaluable, as a way to clearly display snippets of code to others while not having to share entire file. The snippets is best when posting on public forums, but also very useful when working within a team, and just needing advice for a specific section.