The readr package is a fast way to read in rectangular data like a CSV file. It is useful in that it is capable of parsing many types of data.
The forcats package provides tools for solving problems with factors. Factors are useful for categorical data, and when there are variables with a set of fixed known values, and for when you want to show character vectors in non-alphabetical order. It can also be used to convert unknown values to NA.
ggplot2 is used for displaying graphics. You use it by first supplying the ggplot function data, then specifying how to map the variables to the aesthetics, and then add on layers for types of graphs such as geom_point() for a points graph or geom_bar() for a bar graph, scales, anc coordinate systems.
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.0 ✔ dplyr 0.8.0.1
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(DT)
library(readr)
library(forcats)
library(ggplot2)
Displayed below is the data that we are working with, on some characteristics of individual Marvel comicbook characters. Data Source: https://www.kaggle.com/fivethirtyeight/fivethirtyeight-comic-characters-dataset.
The data gets parsed into a dataframe using the function read_csv() from the readr package. We display it into a datatable, a function from DT which is a separate package from tidyverse.
comicsData <- read_csv("marvel-wikia-data.csv")
## Parsed with column specification:
## cols(
## page_id = col_double(),
## name = col_character(),
## urlslug = col_character(),
## ID = col_character(),
## ALIGN = col_character(),
## EYE = col_character(),
## HAIR = col_character(),
## SEX = col_character(),
## GSM = col_character(),
## ALIVE = col_character(),
## APPEARANCES = col_double(),
## `FIRST APPEARANCE` = col_character(),
## Year = col_double()
## )
datatable(comicsData, options = list(pageLength = 5))
## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
ggplot(comicsData, aes(x = fct_infreq(EYE))) +
geom_bar() +
coord_flip()
datatable(comicsData %>%
count(EYE, sort = TRUE))
eyecolors <- comicsData %>% mutate(EYE = fct_lump(EYE, n = 5)) %>% count(EYE, sort = TRUE)
## Warning: Factor `EYE` contains implicit NA, consider using
## `forcats::fct_explicit_na`
datatable(eyecolors)
ggplot(data = eyecolors) + geom_bar(mapping = aes(x = EYE, y = n, fill = EYE), stat = "identity")
Extended Yohannes Deboch’s example with ggplot2 package. Please see link from Blackboard or Github.