Tidyverse CREATE Assignment (25 points)
The tidyverse package is an open source collection of packages with very applicable and useful tools for Data Science. Installing tidyverse like any other package can be done with the install.packages() function. The packages I will focus on is reprex and ggplot function for my assignment. Requirements to run code is openintro package
Loading the library after an installation can be done using the command below
library(tidyverse)As explained in Table A-1 Reprex is a *Wrapper for creating snippets to post on websites and messaging apps. It’s source information and details can be found below.
Reprex Package: reprex.tidyverse.orgReprex Github: github.com/tidyverse/reprexReprex: How to use reprex, vignettes/articles/learn-reprex.Rmd As explained in Table A-1 ggplot2 is a suite for tools for creating plots. The data used in creating the below ggplot comes from the openintro package. OpenIntro package details can be found below.
ggplot2 website : rdocumentation.org/packages/ggplot2/versions/3.3.3ggplot Github: github.com/cran/ggplot2The data used to create the plot, is the dataset evals from the OpenIntro package, noted below: OpenIntro Github: github.com/OpenIntroStat/openintro
data() can be usedOpenIntro package directory exists on your local machine, use the command packageDescription("openintro")install.packages("openintro") to install OpenIntro.help(package = "openintro") can be used to access more documentation, regarding OpenIntro#Load library
library(openintro)## Load Dataset `evals` from `OpenIntro`
data(evals)
head(evals)## # A tibble: 6 x 23
## course_id prof_id score rank ethnicity gender language age cls_perc_eval
## <int> <int> <dbl> <fct> <fct> <fct> <fct> <int> <dbl>
## 1 1 1 4.7 tenu~ minority female english 36 55.8
## 2 2 1 4.1 tenu~ minority female english 36 68.8
## 3 3 1 3.9 tenu~ minority female english 36 60.8
## 4 4 1 4.8 tenu~ minority female english 36 62.6
## 5 5 2 4.6 tenu~ not mino~ male english 59 85
## 6 6 2 4.3 tenu~ not mino~ male english 59 87.5
## # ... with 14 more variables: cls_did_eval <int>, cls_students <int>,
## # cls_level <fct>, cls_profs <fct>, cls_credits <fct>, bty_f1lower <int>,
## # bty_f1upper <int>, bty_f2upper <int>, bty_m1lower <int>, bty_m1upper <int>,
## # bty_m2upper <int>, bty_avg <dbl>, pic_outfit <fct>, pic_color <fct>
Dataframe manipulated_data is created using specifically columns prof_id and score from evals data set. Data is then condensed using group_by() function and a new column no_rows is added to the dataframe as shown below
manipulated_data<-data.frame(Professors_ID = evals$prof_id,Score = evals$score)
head(manipulated_data,3)## Professors_ID Score
## 1 1 4.7
## 2 1 4.1
## 3 1 3.9
manipulated_data<-manipulated_data %>%
group_by(Score) %>%
summarise(no_rows = length(Score))Plotting with ggplot2 the plot type has to be chosen with additional functions such as geom_line, geom_density, geom_histogram(), geom_point(), etc. Multiple aesthetics can be applied in one graph as well, as shown by running
ggplot(data = manipulated_data,aes(x=Score, y=no_rows))+
geom_histogram(aes(x=no_rows,..density..))+
geom_density(aes(x=no_rows,..density..), color = "red", size=3)
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_line()
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_density(aes(x=no_rows,..density..))
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_histogram(aes(x=no_rows,..density..))
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+geom_point()
ggplot(data = manipulated_data, aes(x=Score, y=no_rows))+
geom_histogram(aes(x=no_rows,..density..))+
geom_density(aes(x=no_rows,..density..), color = "red", size=3)Below would be an example of a more complex variation, utilizing geom_text(),labs(),theme() and scale_x_continous() to create a more complex plot.
# Use ggplot(),geom_bar(),geom_text(),labs)(),scale_x_continous(), and theme() to edit plot
ggplot(data = manipulated_data, aes(x=Score, y=no_rows,fill=no_rows)) +
geom_bar(stat = "identity")+
geom_text(aes(label=no_rows),position = position_dodge(width = .1),vjust = -0.25)+
labs(title = 'Score Distribution',x = 'Score', y="Count")+
scale_x_continuous(breaks = unique(manipulated_data$Score)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))The first step to utilizing reprex involves copying the code you would like to create a snippet of then run reprex::reprex(), unless you already loaded the library, in which case reprex() will suffice.
The example below will show how to make a snippet, out of all the steps taken to build the ggplot in chunk ggplot_manipulated_data_intermediate
library(tidyverse)
library(openintro)
data(evals)
#head(evals)
manipulated_data<-data.frame(Professors_ID = evals$prof_id,Score = evals$score)
#head(manipulated_data,3)
manipulated_data<-manipulated_data %>%
group_by(Score) %>%
summarise(no_rows = length(Score))
ggplot(data = manipulated_data, aes(x=Score, y=no_rows,fill=no_rows)) +
geom_bar(stat = "identity")+
geom_text(aes(label=no_rows),position = position_dodge(width = .1),vjust = -0.25)+
labs(title = 'Score Distribution',x = 'Score', y="Count")+
scale_x_continuous(breaks = unique(manipulated_data$Score)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))COPY_CODE
CONSOLE_OUTPUT
The resulting snippet allows for an easy copy & paste with full graphics available
Github Load Example
The understanding the use of ggplot is almost a requirement in my opinion, as the complex plots are best formed utilizing this function. Reprex is also invaluable, as a way to clearly display snippets of code to others while not having to share entire file. The snippets is best when posting on public forums, but also very useful when working within a team, and just needing advice for a specific section.