The select() function selects and reorders variables. contains() is a helper function that can be used to identify specific variables without naming them. Other helper functions include starts_with(), ends_with(), all_of(), any_of(), one_of(), everything(), num_range() and matches()
2
The head(10) function will ensure that only the first 10 rows are included.
# A tibble: 10 × 5
name species hair_color skin_color eye_color
<chr> <chr> <chr> <chr> <chr>
1 Luke Skywalker Human blond fair blue
2 C-3PO Droid <NA> gold yellow
3 R2-D2 Droid <NA> white, blue red
4 Darth Vader Human none white yellow
5 Leia Organa Human brown light brown
6 Owen Lars Human brown, grey light blue
7 Beru Whitesun Lars Human brown light blue
8 R5-D4 Droid <NA> white, red red
9 Biggs Darklighter Human black light brown
10 Obi-Wan Kenobi Human auburn, white fair blue-gray
The data will now include only rows that meet all of the criteria listed. The species variable must have the value of “human”, the height variable must have a value less than 200 and the eye_color variable can have a value of either “blue”, “brown” or “black”.
# A tibble: 10 × 4
name height species eye_color
<chr> <int> <chr> <chr>
1 Luke Skywalker 172 Human blue
2 Leia Organa 150 Human brown
3 Owen Lars 178 Human blue
4 Beru Whitesun Lars 165 Human blue
5 Biggs Darklighter 183 Human brown
6 Anakin Skywalker 188 Human blue
7 Wilhuff Tarkin 180 Human blue
8 Han Solo 180 Human brown
9 Boba Fett 183 Human brown
10 Lando Calrissian 177 Human brown
Mutate will either create a new variable or overwrite an existing variable
# A tibble: 10 × 4
name height mass species
<chr> <dbl> <dbl> <chr>
1 Luke Skywalker 1.72 77 Human
2 C-3PO 1.67 75 Droid
3 R2-D2 0.96 32 Droid
4 Darth Vader 2.02 136 Human
5 Leia Organa 1.5 49 Human
6 Owen Lars 1.78 120 Human
7 Beru Whitesun Lars 1.65 75 Human
8 R5-D4 0.97 32 Droid
9 Biggs Darklighter 1.83 84 Human
10 Obi-Wan Kenobi 1.82 77 Human
arrange() will order the data by the variable included inside the parenthesis. If it is a numeric variable it will order them in descending order (or ascending if a “-” sign is placed in front of the variable name). If it is a character or factor variable then it will order them alphabetically.
The recode function works within the mutate function. Within the recode function, the first argument is the variable you want to recode, then state the change arguments.
# A tibble: 10 × 5
name hair_color skin_color eye_color species
<chr> <chr> <chr> <chr> <chr>
1 Luke Skywalker blond fair blue Human
2 C-3PO <NA> gold yellow Robot
3 R2-D2 <NA> white, blue red Robot
4 Darth Vader none white yellow Human
5 Leia Organa brown light brown Human
6 Owen Lars brown, grey light blue Human
7 Beru Whitesun Lars brown light blue Human
8 R5-D4 <NA> white, red red Robot
9 Biggs Darklighter black light brown Human
10 Obi-Wan Kenobi auburn, white fair blue-gray Human
Remember to use == (not a single =) because this is a logical function asking R to identify observations in which it is true that the sex variable as the value of “male”. The | symbol is the equivalent of saying “or” and indicates that if either of the conditions are met, the observation should be included.
2
Remove missing values (na)
3
Create a summary table with column headings for average height and mass
# A tibble: 2 × 3
sex `Average height` `Average mass`
<chr> <dbl> <dbl>
1 female 1.72 54.7
2 male 1.78 80.2
2 ggplot2
ggplot2 is a widely-used package for data visualization, providing a powerful system to create complex plots.
ggplot(): Initialize a plot object.
geom_point(): Create scatter plots.
geom_line(): Draw lines to show trends.
geom_bar(): Create bar charts.
facet_wrap() / facet_grid(): Create subplots based on factors.
library(palmerpenguins)ggplot(data = penguins,aes(x = flipper_length_mm,y = body_mass_g,color = species)) +geom_point(size =3,alpha =0.5)+labs(title ="Flipper Length vs Body Mass by Species",x ="Flipper Length (mm)",y ="Body Mass (g)") +theme_minimal()
1
First define the data that will be used. This can be piped with pipe-operators (%>%) instead of defining inside of the ggplot function.
2
Next map the aesthetics. In this case, the x-axis, y-axis and color are all mapped to specific variables.
3
Next define the geometry. geom_point() will give you a scatterplot. Arguments inside the parenthesis can be used to further control the look of the plot. The aesthetics for the geometry can be defined here. If they are not, the overall aesthetics defined above will be used.
4
The alpha value determines the transparency of the object.
5
Add labels. If you want a label to be left off, then define it is ““.
6
You can add a predefined theme or control all of the aspects of the canvas separately. In this case a “minimal” theme was used.
Show the code
penguins %>%ggplot(aes(x = species, y = bill_length_mm, fill = species))+geom_boxplot(alpha =0.5) +labs(title ="Bill Length Distribution by Species",x ="Species",y ="Bill Length (mm)") +theme_minimal()
1
Here the data is being piped into the ggplot function. 2. Note that there are both “fill” and “color” aesthetics. If a shape is going to be used, then fill will determine the color inside the shape and color the outline.
Show the code
ggplot(penguins, aes(x = species, y = body_mass_g, fill = species)) +geom_bar(stat ="summary",fun ="mean",alpha =0.5) +labs(title ="Average Body Mass of Penguin Species",x ="Species",y ="Average Body Mass (g)") +theme_minimal()
1
geom_bar() is being told to summarize the data instead of counting it. In this case we want an average and so we define the function to be used to summarize the data as “mean”.
We are calculating the average weight of all of the chicks for a particular feed and creating a new variable that assigns that average weight to each observation.
2
This function is from the forcats package. We’re ordering the feed variable by the average weight calculated above.
3
This flips the plot 90 degrees. From here on, the x-axis will be the vertical axis and the y-axis the horizontal.
4
geom_jitter() creates a dot (or point) that is not exactly at the coordinates of the data. This is so that points don’t overlap.
5
This code will create a point for each feed at the average weight of chickens using that feed. We’ve made it big (size = 8)
6
This creates a horizontal line (the grey line) that intercepts with the y axis at the average weight for all of the chickens. Note that the axis has been flipped above with the coord_flip() function and so the horizontal line is actually vertical in this case.
7
This code creates a line from the grey vertial line (mean weight for all chickens) to the colored dot (mean weight for the chickens getting that feed). A segmented line must have an x and y starting point and an x end and y end (finishing point)
8
The coordinates of the beginning of each line (for each feed)
9
The coordinates of the end of each line
Show the code
library(ggridges)library(viridis)ggplot(lincoln_weather, aes(x =`Mean Temperature [F]`, y =`Month`, fill = ..x..)) +geom_density_ridges_gradient(scale =3,rel_min_height =0.01,alpha =5) +scale_fill_viridis(name ="Temp. [F]",option ="C") +labs(title ='Temperatures in Lincoln NE in 2016') +theme_bw() +theme(legend.position="none",panel.spacing =unit(0.1, "lines"),strip.text.x =element_text(size =8) )
1
This package contains the data that we’ll use and the geom_density_ridges_gradient() function that provides the geometry for ggplot()
2
This package provides the color scheme that we’ll use for this plot
3
Sets the fill color of the density ridges to be based on the x-values, meaning the fill color will vary according to the temperature values.
4
Adds ridgeline plots to the graph, where each line represents a density estimate (smoothed distribution) for temperatures grouped by month.
5
Sets the minimum height relative to the maximum height of the density curve. This removes very small tails from the ridges, making the plot cleaner.
6
Applies a color scale to the fill of the ridgelines using the viridis color palette, which is perceptually uniform and suitable for viewers with color vision deficiencies. name = “Temp. [F]”: Sets the title for the color legend to “Temp. [F]”. option = “C”: Specifies the viridis color map to use. The “C” option is a specific palette within viridis that has a cool-to-warm gradient, making it suitable for temperature data.
7
Sets the spacing between panels in the plot to 0.1 lines, making panels closer together if there are multiple facets (this example does not use faceting, so this setting might not have an effect).
8
Adjusts the font size of the strip text (facet labels), setting it to size 8. This is relevant if you have faceted plots, though it may not affect this specific plot.
3 forcats
forcats provides tools for working with categorical data (factors), making it easier to reorder, create, and modify factor levels.
Useful functions
fct_relevel(): Manually change the order of factor levels.
fct_reorder(): Reorder factor levels based on another variable.
In the mtcars data, the model name is the row name and not a stand alone variable. This code will create a variable called “model” and remove the row names.
2
Use mutate() to create a new variable called has_M. Use str_detect() to create a logical vector where any observation where the model includes the letter M will be designated as TRUE.
3
Use filter() to include only rows where has_M is TRUE
Use the string_to_upper to indicate which variable is to be changed to upper case.
# A tibble: 10 × 2
name species
<chr> <chr>
1 Luke Skywalker HUMAN
2 C-3PO DROID
3 R2-D2 DROID
4 Darth Vader HUMAN
5 Leia Organa HUMAN
6 Owen Lars HUMAN
7 Beru Whitesun Lars HUMAN
8 R5-D4 DROID
9 Biggs Darklighter HUMAN
10 Obi-Wan Kenobi HUMAN
5 gtExtras
gtExtras extends the gt package to add more flexibility in styling tables with additional formatting options and features.
gt_color_box(): Add color shading to cell values.
gt_highlight_rows(): Highlight specific rows in a table.
gt_plt_sparkline(): Add sparklines to table cells.
gt_fa_repeats(): Add font-awesome icons as repeat markers.
library(gtExtras)library(gapminder)library(RColorBrewer)library(svglite)gapminder %>%rename(Country = country) %>%filter(continent =="Europe") %>%group_by(Country) %>%summarise(`GDP per capita`=round(mean(gdpPercap)),`Pop size`=round(mean(pop)),`Life expectance`=list(lifeExp)) %>%arrange(-`GDP per capita`) %>%head(10) %>%gt() %>%gt_theme_pff() %>%gt_plt_dist('Life expectance') %>%gt_color_rows(column ='Pop size',palette ="Pastel1") %>%gt_plt_bar_pct('GDP per capita',fill ="steelblue",height =15,width =120) %>%tab_header(title ="The GDP and Pop Size of Europe") %>%cols_align(align ="left")
1
Load the gtExtras package
2
The gapminder package contains the data that we’ll use
3
RColorBrewer is a package that contains color palettes that we’ll use
4
svglite works with RColorBrewer
5
This code creates the data frame that we’ll use for the table
6
The gt() function is from the gtExtras package and will create a basic table
7
Adding a theme is options. This is one of a range of themes available.
8
Here gtExtras replaces the values in the variable with a “distribution” (curve) in each cell. Note that this variable is a list, not a single value (see code above)
9
We color the population size variable using a palette from the RColorBrewer package. The colors used in each cell are related to the value in that particular cell.
10
We can replace the values in the GDP per capita variable with a bar plot. We can also define the look of it in terms of color, height and width.
11
Add a table heading
12
Align the columns to the left
The GDP and Pop Size of Europe
Country
GDP per capita
Pop size
Life expectance
Switzerland
6384293
Norway
4031441
Netherlands
13786798
Denmark
4994187
Germany
77547043
Iceland
226978
Austria
7583298
Sweden
8220029
Belgium
9725119
United Kingdom
56087801
Show the code
gapminder %>%head(10) %>% gt %>%gt_highlight_rows(row = year ==1972,fill ='steelblue') %>%tab_header(title ="Life Expectancy, Population and GDP in 1972") %>%gt_theme_espn()
1
Make sure that you have loaded the gapminder package with library(gapminder)
2
Use gt_hight_rows() to color a row. Define the row and color in the arguments.
Life Expectancy, Population and GDP in 1972
country
continent
year
lifeExp
pop
gdpPercap
Afghanistan
Asia
1952
28.801
8425333
779.4453
Afghanistan
Asia
1957
30.332
9240934
820.8530
Afghanistan
Asia
1962
31.997
10267083
853.1007
Afghanistan
Asia
1967
34.020
11537966
836.1971
Afghanistan
Asia
1972
36.088
13079460
739.9811
Afghanistan
Asia
1977
38.438
14880372
786.1134
Afghanistan
Asia
1982
39.854
12881816
978.0114
Afghanistan
Asia
1987
40.822
13867957
852.3959
Afghanistan
Asia
1992
41.674
16317921
649.3414
Afghanistan
Asia
1997
41.763
22227415
635.3414
6 plotly
plotly is a package for creating interactive web-based plots, often used to enhance visualizations initially created with ggplot2.
Useful functions
plot_ly(): Create a new interactive plot.
ggplotly(): Convert ggplot2 plots to interactive plots.
layout(): Customize the layout of a plotly object.
library(plotly)p <- starwars %>%drop_na(height, mass, eye_color) %>%filter(mass <250) %>%filter(eye_color %in%c("blue", "brown","black","pink","red","orange")) %>%ggplot(aes(x = height,y = mass, color = eye_color))+geom_jitter(size =6,alpha =0.5)+scale_color_manual(values =c("blue"="blue","brown"="brown","black"="black","pink"="pink","red"="red","orange"="orange"))+theme_minimal()+theme(legend.position =c(0.05,0.98),legend.justification =c("left", "top"))+#<35labs(title ="height, mass and eye color",x ="Height of characters",y ="Mass of characters",color ="Eye Color")ggplotly(p)
1
Load the plotly package
2
Create an object that will later have the ggplotly() functoin applied to it
3
Here we are manually telling ggplot that if the value of a point is described as “blue”, the color assigned to that point should be “blue”.
4
Define the position of the legend (inside the plot itself). The first number is the x coordinate and the second number the y coordinate.
6
Apply the ggplotly() function to the obect.
Show the code
trees %>%plot_ly(x =~ Girth,y =~ Height,z =~ Volume)
1
Use the plot_ly() function to create a 3D plot and define the x, y, and z coordinates.
Show the code
plot_ly(z = volcano, type ="surface")
1
volcano is a dataset that comes with the plotly package. Define the type of plot as surface.
7 lubridate
Watch this space.. content about how to work with date and time data using the lubridate pack will be added soon.
8 Learn more
Courses that contain short and easy to digest video content are available at LearnMore365.com Each lessons uses data that is built into R or comes with installed packages so you can replicated the work at home. LearnMore365.com also includes teaching on statistics and research methods.