# A tibble: 10 × 5
name species hair_color skin_color eye_color
<chr> <chr> <chr> <chr> <chr>
1 Luke Skywalker Human blond fair blue
2 C-3PO Droid <NA> gold yellow
3 R2-D2 Droid <NA> white, blue red
4 Darth Vader Human none white yellow
5 Leia Organa Human brown light brown
6 Owen Lars Human brown, grey light blue
7 Beru Whitesun Lars Human brown light blue
8 R5-D4 Droid <NA> white, red red
9 Biggs Darklighter Human black light brown
10 Obi-Wan Kenobi Human auburn, white fair blue-gray
1 dplyr
dplyr is a popular R package for data manipulation, known for its intuitive syntax and verbs that make it easy to transform datasets.
Useful functions
select(): Choose specific columns from a data frame.filter(): Select rows that meet specific conditions.mutate(): Create new columns or modify existing ones.arrange(): Order rows based on column values.summarise(): Reduce multiple rows to a single summary row.group_by(): Group data by one or more columns for aggregation.
# A tibble: 10 × 5
name species hair_color skin_color eye_color
<chr> <chr> <chr> <chr> <chr>
1 Luke Skywalker Human blond fair blue
2 C-3PO Droid <NA> gold yellow
3 R2-D2 Droid <NA> white, blue red
4 Darth Vader Human none white yellow
5 Leia Organa Human brown light brown
6 Owen Lars Human brown, grey light blue
7 Beru Whitesun Lars Human brown light blue
8 R5-D4 Droid <NA> white, red red
9 Biggs Darklighter Human black light brown
10 Obi-Wan Kenobi Human auburn, white fair blue-gray
# A tibble: 10 × 4
name height mass species
<chr> <dbl> <dbl> <chr>
1 Luke Skywalker 1.72 77 Human
2 C-3PO 1.67 75 Droid
3 R2-D2 0.96 32 Droid
4 Darth Vader 2.02 136 Human
5 Leia Organa 1.5 49 Human
6 Owen Lars 1.78 120 Human
7 Beru Whitesun Lars 1.65 75 Human
8 R5-D4 0.97 32 Droid
9 Biggs Darklighter 1.83 84 Human
10 Obi-Wan Kenobi 1.82 77 Human
# A tibble: 10 × 3
genus order sleep_total
<chr> <chr> <dbl>
1 Giraffa Artiodactyla 1.9
2 Globicephalus Cetacea 2.7
3 Equus Perissodactyla 2.9
4 Capreolus Artiodactyla 3
5 Equus Perissodactyla 3.1
6 Loxodonta Proboscidea 3.3
7 Phoca Carnivora 3.5
8 Ovis Artiodactyla 3.8
9 Elephas Proboscidea 3.9
10 Bos Artiodactyla 4
Code
# A tibble: 10 × 5
name hair_color skin_color eye_color species
<chr> <chr> <chr> <chr> <chr>
1 Luke Skywalker blond fair blue Human
2 C-3PO <NA> gold yellow Robot
3 R2-D2 <NA> white, blue red Robot
4 Darth Vader none white yellow Human
5 Leia Organa brown light brown Human
6 Owen Lars brown, grey light blue Human
7 Beru Whitesun Lars brown light blue Human
8 R5-D4 <NA> white, red red Robot
9 Biggs Darklighter black light brown Human
10 Obi-Wan Kenobi auburn, white fair blue-gray Human
Code
# A tibble: 2 × 3
sex `Average height` `Average mass`
<chr> <dbl> <dbl>
1 female 1.72 54.7
2 male 1.78 80.2
2 ggplot2
ggplot2 is a widely-used package for data visualization, providing a powerful system to create complex plots.
Useful functions
ggplot(): Initialize a plot object.geom_point(): Create scatter plots.geom_line(): Draw lines to show trends.geom_bar(): Create bar charts.facet_wrap()/facet_grid(): Create subplots based on factors.labs(): Customize labels and titles.
Code
if(!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse, plotly, ggridges)
library(palmerpenguins)
ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
geom_point(size = 3,
alpha = 0.5)+
labs(title =
"Flipper Length vs Body Mass by Species",
x = "Flipper Length (mm)",
y = "Body Mass (g)") +
theme_minimal()Code
chickwts %>%
group_by(feed) %>%
mutate(mean_by_feed = mean(weight)) %>%
ungroup() %>%
mutate(feed = fct_reorder(feed, mean_by_feed)) %>%
ggplot(aes(x = feed,
y = weight,
colour = feed)) +
coord_flip() +
geom_jitter(show.legend = F,
size = 4,
alpha = 0.2,
width = 0.05) +
geom_point(stat = "summary",
fun = "mean",
size = 8,
show.legend = F)+
geom_hline(aes(yintercept = mean(weight)),
colour = "gray70",
size = 0.9) +
geom_segment(aes(
x = feed,
y = mean(weight),
xend = feed,
yend = mean_by_feed),
size = 2,
show.legend = F) +
labs(title = "Weight of chickens by feed group",
x = "Feed",
y = "Weight of chickens") +
theme(legend.position = "none") +
theme_minimal()Code
library(ggridges)
library(viridis)
ggplot(lincoln_weather,
aes(x = `Mean Temperature [F]`,
y = `Month`,
fill = ..x..)) +
geom_density_ridges_gradient(scale = 3,
rel_min_height = 0.01,
alpha = 5) +
scale_fill_viridis(name = "Temp. [F]",
option = "C") +
labs(title = 'Temperatures in Lincoln NE in 2016') +
theme_bw() +
theme(
legend.position="none",
panel.spacing = unit(0.1, "lines"),
strip.text.x = element_text(size = 8)
)3 forcats
forcats provides tools for working with categorical data (factors), making it easier to reorder, create, and modify factor levels.
Useful functions
fct_relevel(): Manually change the order of factor levels.fct_reorder(): Reorder factor levels based on another variable.fct_infreq(): Reorder levels by their frequency.fct_rev(): Reverse the order of factor levels.
Code
gss_cat %>%
group_by(relig) %>%
summarise(meantv = mean(tvhours, na.rm = T)) %>%
mutate(relig = fct_reorder(relig, meantv)) %>%
ggplot(aes(meantv, relig))+
geom_point(size = 4,
color = "steelblue")+
theme_minimal()+
labs(title = "Average TV watchtime by religion",
x = "Averate TV watch time",
y = "")4 stringr
stringr provides functions to work efficiently with strings, including pattern matching, replacement, and splitting.
Useful functions
str_c(): Concatenate strings together.str_detect(): Detect the presence of a pattern.str_sub(): Extract substrings from a string.str_to_upper(): Changes lowercase text to uppercase text.str_replace(): Replace matching patterns in a string.str_split(): Split strings into substrings.
Code
Species Sepal.Length Description
1 setosa 5.1 setosa: 5.1 cm
2 setosa 4.9 setosa: 4.9 cm
3 setosa 4.7 setosa: 4.7 cm
4 setosa 4.6 setosa: 4.6 cm
5 setosa 5.0 setosa: 5 cm
6 setosa 5.4 setosa: 5.4 cm
7 setosa 4.6 setosa: 4.6 cm
8 setosa 5.0 setosa: 5 cm
9 setosa 4.4 setosa: 4.4 cm
10 setosa 4.9 setosa: 4.9 cm
Code
model mpg cyl disp
1 Mazda RX4 21.0 6 160.0
2 Mazda RX4 Wag 21.0 6 160.0
3 Merc 240D 24.4 4 146.7
4 Merc 230 22.8 4 140.8
5 Merc 280 19.2 6 167.6
6 Merc 280C 17.8 6 167.6
7 Merc 450SE 16.4 8 275.8
8 Merc 450SL 17.3 8 275.8
9 Merc 450SLC 15.2 8 275.8
10 AMC Javelin 15.2 8 304.0
Code
model short name mpg cyl disp
1 Mazda RX4 Maz 21.0 6 160.0
2 Mazda RX4 Wag Maz 21.0 6 160.0
3 Datsun 710 Dat 22.8 4 108.0
4 Hornet 4 Drive Hor 21.4 6 258.0
5 Hornet Sportabout Hor 18.7 8 360.0
6 Valiant Val 18.1 6 225.0
7 Duster 360 Dus 14.3 8 360.0
8 Merc 240D Mer 24.4 4 146.7
9 Merc 230 Mer 22.8 4 140.8
10 Merc 280 Mer 19.2 6 167.6
# A tibble: 10 × 2
name species
<chr> <chr>
1 Luke Skywalker HUMAN
2 C-3PO DROID
3 R2-D2 DROID
4 Darth Vader HUMAN
5 Leia Organa HUMAN
6 Owen Lars HUMAN
7 Beru Whitesun Lars HUMAN
8 R5-D4 DROID
9 Biggs Darklighter HUMAN
10 Obi-Wan Kenobi HUMAN
5 gtExtras
gtExtras extends the gt package to add more flexibility in styling tables with additional formatting options and features.
Useful functions
gt_color_box(): Add color shading to cell values.gt_highlight_rows(): Highlight specific rows in a table.gt_plt_sparkline(): Add sparklines to table cells.gt_fa_repeats(): Add font-awesome icons as repeat markers.
Code
library(gtExtras)
library(gapminder)
library(RColorBrewer)
library(svglite)
gapminder %>%
rename(Country = country) %>%
filter(continent == "Europe") %>%
group_by(Country) %>%
summarise(`GDP per capita` = round(mean(gdpPercap)),
`Pop size` = round(mean(pop)),
`Life expectance` = list(lifeExp)) %>%
arrange(-`GDP per capita`) %>%
head(10) %>%
gt() %>%
gt_theme_pff() %>%
gt_plt_dist('Life expectance') %>%
gt_color_rows(column = 'Pop size',
palette = "Pastel1") %>%
gt_plt_bar_pct('GDP per capita',
fill = "steelblue",
height = 15,
width = 120) %>%
tab_header(title = "The GDP and Pop Size of Europe") %>%
cols_align(align = "left")| The GDP and Pop Size of Europe | |||
|---|---|---|---|
| Country | GDP per capita | Pop size | Life expectance |
| Switzerland | 6384293 | ||
| Norway | 4031441 | ||
| Netherlands | 13786798 | ||
| Denmark | 4994187 | ||
| Germany | 77547043 | ||
| Iceland | 226978 | ||
| Austria | 7583298 | ||
| Sweden | 8220029 | ||
| Belgium | 9725119 | ||
| United Kingdom | 56087801 | ||
Code
| Life Expectancy, Population and GDP in 1972 | |||||
|---|---|---|---|---|---|
| country | continent | year | lifeExp | pop | gdpPercap |
| Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
| Afghanistan | Asia | 1957 | 30.332 | 9240934 | 820.8530 |
| Afghanistan | Asia | 1962 | 31.997 | 10267083 | 853.1007 |
| Afghanistan | Asia | 1967 | 34.020 | 11537966 | 836.1971 |
| Afghanistan | Asia | 1972 | 36.088 | 13079460 | 739.9811 |
| Afghanistan | Asia | 1977 | 38.438 | 14880372 | 786.1134 |
| Afghanistan | Asia | 1982 | 39.854 | 12881816 | 978.0114 |
| Afghanistan | Asia | 1987 | 40.822 | 13867957 | 852.3959 |
| Afghanistan | Asia | 1992 | 41.674 | 16317921 | 649.3414 |
| Afghanistan | Asia | 1997 | 41.763 | 22227415 | 635.3414 |
6 plotly
plotly is a package for creating interactive web-based plots, often used to enhance visualizations initially created with ggplot2.
Useful functions
plot_ly(): Create a new interactive plot.ggplotly(): Convert ggplot2 plots to interactive plots.layout(): Customize the layout of a plotly object.add_trace(): Add a trace to an existing plot.
Code
library(plotly)
p <- starwars %>%
drop_na(height, mass, eye_color) %>%
filter(mass < 250) %>%
filter(eye_color %in% c("blue",
"brown",
"black",
"pink",
"red",
"orange")) %>%
ggplot(aes(x = height,
y = mass,
color = eye_color))+
geom_jitter(size = 6,
alpha = 0.5)+
scale_color_manual(values = c("blue" = "blue",
"brown" = "brown",
"black" = "black",
"pink" = "pink",
"red" = "red",
"orange" = "orange"))+
theme_minimal()+
theme(legend.position = c(0.05,0.98),
legend.justification = c("left", "top"))+ #<35
labs(title = "height, mass and eye color",
x = "Height of characters",
y = "Mass of characters",
color = "Eye Color")
ggplotly(p) Courses that contain short and easy to digest video content are available at premieranalytics.com.bd Each lessons uses data that is built into R or comes with installed packages so you can replicated the work at home. premieranalytics.com.bd also includes teaching on statistics and research methods.