Course Description

Say you’ve found a great dataset and would like to learn more about it. How can you start to answer the questions you have about the data? You can use dplyr to answer those questions—it can also help with basic transformations of your data. You’ll also learn to aggregate your data and add, remove, or change the variables. Along the way, you’ll explore a dataset containing information about counties in the United States. You’ll finish the course by applying these tools to the babynames dataset to explore trends of baby names in the United States.

Transforming Data with dplyr

Learn verbs you can use to transform your data, including select, filter, arrange, and mutate. You’ll use these functions to modify the counties dataset to view particular observations and answer questions about the data.

Exploring data with dplyr

Understanding your data

Take a look at the counties dataset using the glimpse() function.

What is the first value in the income variable?

1001
51281
50254
40725

Selecting columns

Select the following four columns from the counties variable:

state
county
population
poverty

You don’t need to save the result to a variable.

Select the columns listed from the counties variable.

# edited/added
library(tidyverse)
counties <- readRDS("counties.rds")

counties %>%
  # Select the columns 
  select(state, county, population, poverty)

The filter and arrange verbs

Arranging observations

Here you see the counties_selected dataset with a few interesting variables selected. These variables: private_work, public_work, self_employed describe whether people work for the government, for private companies, or for themselves.

In these exercises, you’ll sort these observations to find the most interesting cases.

Add a verb to sort the observations of the public_work variable in descending order.

counties_selected <- counties %>%
  select(state, county, population, private_work, public_work, self_employed)

counties_selected %>%
  # Add a verb to sort in descending order of public_work
  arrange(desc(public_work))

Filtering for conditions

You use the filter() verb to get only observations that match a particular condition, or match multiple conditions.

Find only the counties that have a population above one million (1000000).

Find only the counties in the state of California that also have a population above one million (1000000).

counties_selected <- counties %>%
  select(state, county, population)

counties_selected %>%
  # Filter for counties with a population above 1000000
  filter(population > 1000000)

counties_selected <- counties %>%
  select(state, county, population)

counties_selected %>%
  # Filter for counties with a population above 1000000
  filter(state == "California",
         population > 1000000)

Filtering and arranging

We’re often interested in both filtering and sorting a dataset, to focus on observations of particular interest to you. Here, you’ll find counties that are extreme examples of what fraction of the population works in the private sector.

Filter for counties in the state of Texas that have more than ten thousand people (10000), and sort them in descending order of the percentage of people employed in private work.

counties_selected <- counties %>%
  select(state, county, population, private_work, public_work, self_employed)

counties_selected %>%
  # Filter for Texas and more than 10000 people
  filter(state == "Texas",
         population > 10000) %>%
  # Sort in descending order of private_work
  arrange(desc(private_work))

The mutate() verb

Calculating the number of government employees

In the video, you used the unemployment variable, which is a percentage, to calculate the number of unemployed people in each county. In this exercise, you’ll do the same with another percentage variable: public_work.

The code provided already selects the state, county, population, and public_work columns.

Use mutate() to add a column called public_workers to the dataset, with the number of people employed in public (government) work.
Sort the new column in descending order.

counties_selected <- counties %>%
  select(state, county, population, public_work)

counties_selected %>%
  # Add a new column public_workers with the number of people employed in public work
  mutate(public_workers = public_work * population / 100)

counties_selected %>%
  mutate(public_workers = public_work * population / 100) %>%
  # Sort in descending order of the public_workers column
  arrange(desc(public_workers))

Calculating the percentage of women in a county

The dataset includes columns for the total number (not percentage) of men and women in each county. You could use this, along with the population variable, to compute the fraction of men (or women) within each county.

In this exercise, you’ll select the relevant columns yourself.

Select the columns state, county, population, men, and women.
Add a new variable called proportion_women with the fraction of the county’s population made up of women.

counties_selected <- counties %>%
  # Select the columns state, county, population, men, and women
  select(state, county, population, men, women)

counties_selected %>%
  # Calculate proportion_women as the fraction of the population made up of women
  mutate(proportion_women = women / population)

Select, mutate, filter, and arrange

In this exercise, you’ll put together everything you’ve learned in this chapter (select(), mutate(), filter() and arrange()), to find the counties with the highest proportion of men.

Select only the columns state, county, population, men, and women.
Add a variable proportion_men with the fraction of the county’s population made up of men.
Filter for counties with a population of at least ten thousand (10000).
Arrange counties in descending order of their proportion of men.

counties %>%
  # Select the five columns 
  select(state, county, population, men, women) %>%
  # Add the proportion_men variable
  mutate(proportion_men = men / population) %>%
  # Filter for population of at least 10,000
  filter(population >= 10000) %>% 
  # Arrange proportion of men in descending order 
  arrange(desc(proportion_men))

Aggregating Data

Now that you know how to transform your data, you’ll want to know more about how to aggregate your data to make it more interpretable. You’ll learn a number of functions you can use to take many observations in your data and summarize them, including count, group_by, summarize, ungroup, and top_n.

The count verb

Counting by region

The counties dataset contains columns for region, state, population, and the number of citizens, which we selected and saved as the counties_selected table. In this exercise, you’ll focus on the region column.

counties_selected <- counties %>%
  select(county, region, state, population, citizens)

Use count() to find the number of counties in each region, using a second argument to sort in descending order.

# edited/added
counties_selected <- counties %>%
  select(county, region, state, population, citizens)

# Use count to find the number of counties in each region
counties_selected %>%
  count(region, sort = TRUE)

Counting citizens by state

You can weigh your count by particular variables rather than finding the number of counties. In this case, you’ll find the number of citizens in each state.

counties_selected <- counties %>%
  select(county, region, state, population, citizens)

Count the number of counties in each state, weighted based on the citizens column, and sorted in descending order.

# Find number of counties per state, weighted by citizens, sorted in descending order
counties_selected %>%
  count(state, wt = citizens, sort = TRUE)

Mutating and counting

You can combine multiple verbs together to answer increasingly complicated questions of your data. For example: “What are the US states where the most people walk to work?”

You’ll use the walk column, which offers a percentage of people in each county that walk to work, to add a new column and count based on it.

counties_selected <- counties %>%
  select(county, region, state, population, walk)

Use mutate() to calculate and add a column called population_walk, containing the total number of people who walk to work in a county.
Use a (weighted and sorted) count() to find the total number of people who walk to work in each state.

# edited/added
counties_selected <- counties %>%
  select(county, region, state, population, walk)

counties_selected %>%
  # Add population_walk containing the total number of people who walk to work 
  mutate(population_walk = population * walk / 100) %>%
  # Count weighted by the new column, sort in descending order
  count(state, wt = population_walk, sort = TRUE)

The group_by, summarize, and ungroup verbs

Summarizing

The summarize() verb is very useful for collapsing a large dataset into a single observation.

counties_selected <- counties %>%
  select(county, population, income, unemployment)

Summarize the counties dataset to find the following columns: min_population (with the smallest population), max_unemployment (with the maximum unemployment), and average_income (with the mean of the income variable).

# edited/added
counties_selected <- counties %>%
  select(county, population, income, unemployment)

counties_selected %>%
  # Summarize to find minimum population, maximum unemployment, and average income
  summarize(min_population = min(population),
            max_unemployment = max(unemployment),
            average_income = mean(income))

Summarizing by state

Another interesting column is land_area, which shows the land area in square miles. Here, you’ll summarize both population and land area by state, with the purpose of finding the density (in people per square miles).

counties_selected <- counties %>%
  select(state, county, population, land_area)

Group the data by state, and summarize to create the columns total_area (with total area in square miles) and total_population (with total population).
Add a density column with the people per square mile, then arrange in descending order.

# edited/added
counties_selected <- counties %>%
  select(state, county, population, land_area)

counties_selected %>%
  # Group by state
  group_by(state) %>%
  # Find the total area and population
  summarize(total_area = sum(land_area),
            total_population = sum(population))

counties_selected %>%
  group_by(state) %>%
  summarize(total_area = sum(land_area),
            total_population = sum(population)) %>%
  # Add a density column
  mutate(density = total_population / total_area) %>%
  # Sort by density in descending order
  arrange(desc(density))

Summarizing by state and region

You can group by multiple columns instead of grouping by one. Here, you’ll practice aggregating by state and region, and notice how useful it is for performing multiple aggregations in a row.

counties_selected <- counties %>%
  select(region, state, county, population)

Summarize to find the total population, as a column called total_pop, in each combination of region and state, grouped in that order.
Notice the tibble is still grouped by region; use another summarize() step to calculate two new columns: the average state population in each region (average_pop) and the median state population in each region (median_pop).

# edited/added
counties_selected <- counties %>%
  select(region, state, county, population)

counties_selected %>%
  # Group and summarize to find the total population
  group_by(region, state) %>%
  summarize(total_pop = sum(population))

counties_selected %>%
  # Group and summarize to find the total population
  group_by(region, state) %>%
  summarize(total_pop = sum(population)) %>%
  # Calculate the average_pop and median_pop columns 
  summarize(average_pop = mean(total_pop),
            median_pop = median(total_pop))

The slice_min and slice_max verbs

Selecting a county from each region

Previously, you used the walk column, which offers a percentage of people in each county that walk to work, to add a new column and count to find the total number of people who walk to work in each county.

Now, you’re interested in finding the county within each region with the highest percentage of citizens who walk to work.

counties_selected <- counties %>%
  select(region, state, county, metro, population, walk)

Find the county in each region with the highest percentage of citizens who walk to work.

# edited/added
counties_selected <- counties %>%
  select(region, state, county, metro, population, walk)

counties_selected %>%
  # Group by region
  group_by(region) %>%
  # Find the county with the highest percentage of people who walk to work
  slice_max(walk, n = 1)

Finding the lowest-income state in each region

You’ve been learning to combine multiple dplyr verbs together. Here, you’ll combine group_by(), summarize(), and slice_min() to find the state in each region with the highest income.

When you group by multiple columns and then summarize, it’s important to remember that the summarize “peels off” one of the groups, but leaves the rest on. For example, if you group_by(X, Y) then summarize, the result will still be grouped by X.

counties_selected <- counties %>%
  select(region, state, county, population, income)

Calculate the average income (as average_income) of counties within each region and state (notice the group_by() has already been done for you).
Find the state with the lowest average_income in each region.

# edited/added
counties_selected <- counties %>%
  select(region, state, county, population, income)

counties_selected %>%
  group_by(region, state) %>%
  # Calculate average income
  summarize(average_income = mean(income)) %>%
  # Find the lowest income state in each region
  slice_min(average_income, n = 1)

Using summarize, slice_max, and count together

In this chapter, you’ve learned to use six dplyr verbs related to aggregation: count(), group_by(), summarize(), ungroup(), slice_max(), and slice_min(). In this exercise, you’ll combine them to answer a question:

In how many states do more people live in metro areas than non-metro areas?

Recall that the metro column has one of the two values "Metro" (for high-density city areas) or "Nonmetro" (for suburban and country areas).

counties_selected <- counties %>%
  select(state, metro, population)

For each combination of state and metro, find the total population as total_pop.

Extract the most populated row from each state, which will be either Metro or Nonmetro.

Ungroup, then count how often Metro or Nonmetro appears to see how many states have more people living in those areas.

# edited/added
counties_selected <- counties %>%
  select(state, metro, population)

counties_selected %>%
  # Find the total population for each combination of state and metro
  group_by(state, metro) %>%
  summarize(total_pop = sum(population))

counties_selected %>%
  # Find the total population for each combination of state and metro
  group_by(state, metro) %>%
  summarize(total_pop = sum(population)) %>%
  # Extract the most populated row for each state
  slice_max(total_pop, n = 1)

counties_selected %>%
  # Find the total population for each combination of state and metro
  group_by(state, metro) %>%
  summarize(total_pop = sum(population)) %>%
  # Extract the most populated row for each state
  slice_max(total_pop, n = 1) %>%
  # Count the states with more people in Metro or Nonmetro areas
  ungroup() %>% 
  count(metro)

Selecting and Transforming Data

Learn advanced methods to select and transform columns. Also learn about select helpers, which are functions that specify criteria for columns you want to choose, as well as the rename and transmute verbs.

Selecting

Selecting columns

Using the select() verb, we can answer interesting questions about our dataset by focusing in on related groups of verbs. The colon (:) is useful for getting many columns at a time.

Use glimpse() to examine all the variables in the counties table.
Select the columns for state, county, population, and (using a colon) all five of those industry-related variables; there are five consecutive variables in the table related to the industry of people’s work: professional, service, office, construction, and production.
Arrange the table in descending order of service to find which counties have the highest rates of working in the service industry.

# Glimpse the counties table
glimpse(counties)

counties %>%
  # Select state, county, population, and industry-related columns
  select(state, county, population, professional:production) %>%
  # Arrange service in descending order 
  arrange(desc(service))

Select helpers

In the video you learned about the select helper starts_with(). Another select helper is ends_with(), which finds the columns that end with a particular string.

Select the columns state, county, population, and all those that end with work.
Filter just for the counties where at least 50% of the population is engaged in public work.

counties %>%
  # Select the state, county, population, and those ending with "work"
  select(state, county, population, ends_with("work")) %>%
  # Filter for counties that have at least 50% of people engaged in public work
  filter(public_work >= 50)

The rename verb

Renaming a column after count

The rename() verb is often useful for changing the name of a column that comes out of another verb, such as count(). In this exercise, you’ll rename the default n column generated from count() to something more descriptive.

Use count() to determine how many counties are in each state.
Notice the n column in the output; use rename() to rename that to num_counties.

counties %>%
  # Count the number of counties in each state
  count(state)

counties %>%
  # Count the number of counties in each state
  count(state) %>%
  # Rename the n column to num_counties
  rename(num_counties = n)

Renaming a column as part of a select

rename() isn’t the only way you can choose a new name for a column; you can also choose a name as part of a select().

Select the columns state, county, and poverty from the counties dataset; in the same step, rename the poverty column to poverty_rate.

counties %>%
  # Select state, county, and poverty as poverty_rate
  select(state, county, poverty_rate = poverty)

The transmute verb

Choosing among verbs

Which of the following verbs would you use to calculate new columns while dropping other columns?

select
mutate
transmute
rename

Using transmute

As you learned in the video, the transmute verb allows you to control which variables you keep, which variables you calculate, and which variables you drop.

Keep only the state, county, and population columns, and add a new column, density, that contains the population per land_area.
Filter for only counties with a population greater than one million.
Sort the table in ascending order of density.

counties %>%
  # Keep the state, county, and populations columns, and add a density column
  transmute(state, county, population, density = population / land_area) %>%
  # Filter for counties with a population greater than one million 
  filter(population > 1000000) %>%
  # Sort density in ascending order 
  arrange(density)

Matching verbs to their definitions

We’ve learned a number of new verbs in this chapter that you can use to modify and change the variables you have.

rename

Leaves the columns you don’t mention alone; doesn’t allow you to calculate or change values.

transmute

Must mention all the columns you keep; allows you to calculate or change values.

mutate

Leaves the columns you don’t mention alone; allows you to calculate or change values.

Choosing among the four verbs

In this chapter you’ve learned about the four verbs: select, mutate, transmute, and rename. Here, you’ll choose the appropriate verb for each situation. You won’t need to change anything inside the parentheses.

Choose the right verb for changing the name of the unemployment column to unemployment_rate
Choose the right verb for keeping only the columns state, county, and the ones containing poverty.
Calculate a new column called fraction_women with the fraction of the population made up of women, without dropping any columns.
Keep only three columns: the state, county, and employed / population, which you’ll call employment_rate.

# Change the name of the unemployment column
counties %>%
  rename(unemployment_rate = unemployment)

# Keep the state and county columns, and the columns containing poverty
counties %>%
  select(state, county, contains("poverty"))

# Calculate the fraction_women column without dropping the other columns
counties %>%
  mutate(fraction_women = women / population)

# Keep only the state, county, and employment_rate columns
counties %>%
  transmute(state, county, employment_rate = employed / population)

Case Study: The babynames Dataset

Work with a new dataset that represents the names of babies born in the United States each year. Learn how to use grouped mutates and window functions to ask and answer more complex questions about your data. And use a combination of dplyr and ggplot2 to make interesting graphs to further explore your data.

The babynames data

Filtering and arranging for one year

The dplyr verbs you’ve learned are useful for exploring data. For instance, you could find out the most common names in a particular year.

Filter for only the year 1990.
Sort the table in descending order of the number of babies born.

# edited/added
babynames <- readRDS("babynames.rds")

babynames %>%
  # Filter for the year 1990
  filter(year == 1990) %>%
  # Sort the number column in descending order 
  arrange(desc(number))

Finding the most popular names each year

You saw that you could use filter() and arrange() to find the most common names in one year. However, you could also use group_by() and slice_max() to find the most common name in every year.

Use group_by() and slice_max() to find the most common name for US babies in each year.

babynames %>%
  # Find the most common name in each year
  group_by(year) %>%
  slice_max(number, n = 1)

Visualizing names with ggplot2

The dplyr package is very useful for exploring data, but it’s especially useful when combined with other tidyverse packages like ggplot2.

Filter for only the names Steven, Thomas, and Matthew, and assign it to an object called selected_names.
Visualize those three names as a line plot over time, with each name represented by a different color.

selected_names <- babynames %>%
  # Filter for the names Steven, Thomas, and Matthew 
  filter(name %in% c("Steven", "Thomas", "Matthew"))

selected_names <- babynames %>%
  # Filter for the names Steven, Thomas, and Matthew 
  filter(name %in% c("Steven", "Thomas", "Matthew"))

# Plot the names using a different color for each name
ggplot(selected_names, aes(x = year, y = number, color = name)) +
  geom_line()

Grouped mutates

Finding the year each name is most common

In an earlier video, you learned how to filter for a particular name to determine the frequency of that name over time. Now, you’re going to explore which year each name was the most common.

To do this, you’ll be combining the grouped mutate approach with a slice_max().

First, calculate the total number of people born in that year in this dataset as year_total.
Next, use year_total to calculate the fraction of people born in each year that have each name.
Now use your newly calculated fraction column, in combination with slice_max(), to identify the year each name was most common.

# Calculate the fraction of people born each year with the same name
babynames %>%
  group_by(year) %>%
  mutate(year_total = sum(number)) %>%
  ungroup() %>%
  mutate(fraction = number / year_total)

# Calculate the fraction of people born each year with the same name
babynames %>%
  group_by(year) %>%
  mutate(year_total = sum(number)) %>%
  ungroup() %>%
  mutate(fraction = number / year_total) %>%
  # Find the year each name is most common
  group_by(name) %>%
  slice_max(fraction, n = 1)

Adding the total and maximum for each name

In the video, you learned how you could group by the year and use mutate() to add a total for that year.

In these exercises, you’ll learn to normalize by a different, but also interesting metric: you’ll divide each name by the maximum for that name. This means that every name will peak at 1.

Once you add new columns, the result will still be grouped by name. This splits it into 48,000 groups, which actually makes later steps like mutates slower.

Use a grouped mutate to add two columns:
- name_total: the sum of the number of babies born with that name in the entire dataset.
- name_max: the maximum number of babies born with that name in any year.
Add another step to ungroup the table.
Add a column called fraction_max containing the number in the year divided by name_max.

babynames %>%
  # Add columns name_total and name_max for each name
  group_by(name) %>%
  mutate(name_total = sum(number),
         name_max = max(number))

babynames %>%
  # Add columns name_total and name_max for each name
  group_by(name) %>%
  mutate(name_total = sum(number),
         name_max = max(number)) %>%
  # Ungroup the table 
  ungroup() %>%
  # Add the fraction_max column containing the number by the name maximum 
  mutate(fraction_max = number / name_max)

Visualizing the normalized change in popularity

You picked a few names and calculated each of them as a fraction of their peak. This is a type of “normalizing” a name, where you’re focused on the relative change within each name rather than the overall popularity of the name.

In this exercise, you’ll visualize the normalized popularity of each name. Your work from the previous exercise, names_normalized, has been provided for you.

names_normalized <- babynames %>%
                     group_by(name) %>%
                     mutate(name_total = sum(number),
                            name_max = max(number)) %>%
                     ungroup() %>%
                     mutate(fraction_max = number / name_max)

Filter the names_normalized table to limit it to the three names Steven, Thomas, and Matthew.
Create a line plot to visualize fraction_max over time, colored by name.

# edited/added
names_normalized <- babynames %>%
  group_by(name) %>%
  mutate(name_total = sum(number),
         name_max = max(number)) %>%
  ungroup() %>%
  mutate(fraction_max = number / name_max)

names_filtered <- names_normalized %>%
  # Filter for the names Steven, Thomas, and Matthew
  filter(name %in% c("Steven", "Thomas", "Matthew"))

# Visualize these names over time
ggplot(names_filtered, aes(x = year, y = fraction_max, color = name)) +
  geom_line()

Window functions

Using ratios to describe the frequency of a name

In the video, you learned how to find the difference in the frequency of a baby name between consecutive years. What if instead of finding the difference, you wanted to find the ratio?

You’ll start with the babynames_fraction data already, so that you can consider the popularity of each name within each year.

Arrange the data in ascending order of name and then year.
Group by name so that your mutate works within each name.
Add a column ratio containing the ratio (not difference) of fraction between each year.

# edited/added
babynames_fraction <- babynames %>%
  group_by(year) %>%
  mutate(year_total = sum(number)) %>%
  ungroup() %>%
  mutate(fraction = number / year_total)

babynames_fraction %>%
  # Arrange the data in order of name, then year 
  arrange(name, year) %>%
  # Group the data by name
  group_by(name) %>%
  # Add a ratio column that contains the ratio of fraction between each year 
  mutate(ratio = fraction / lag(fraction))

Biggest jumps in a name

Previously, you added a ratio column to describe the ratio of the frequency of a baby name between consecutive years to describe the changes in the popularity of a name. Now, you’ll look at a subset of that data, called babynames_ratios_filtered, to look further into the names that experienced the biggest jumps in popularity in consecutive years.

babynames_ratios_filtered <- babynames_fraction %>%
                     arrange(name, year) %>%
                     group_by(name) %>%
                     mutate(ratio = fraction / lag(fraction)) %>%
                     filter(fraction >= 0.00001)

From each name in the data, keep the observation (the year) with the largest ratio; note the data is already grouped by name.
Sort the ratio column in descending order.
Filter the babynames_ratios_filtered data further by filtering the fraction column to only display results greater than or equal to 0.001.

# edited/added
babynames_ratios_filtered <- babynames_fraction %>%
  arrange(name, year) %>%
  group_by(name) %>%
  mutate(ratio = fraction / lag(fraction)) %>%
  filter(fraction >= 0.00001)

babynames_ratios_filtered %>%
  # Extract the largest ratio from each name 
  slice_max(ratio, n = 1) %>%
  # Sort the ratio column in descending order 
  arrange(desc(ratio)) %>%
  # Filter for fractions greater than or equal to 0.001
  filter(fraction >= 0.001)

Congratulations!

Congratulations! You’ve made it to the end of the course.

Summary

In this course, you’ve learned how to transform data with the dplyr package. You reviewed the select, filter, mutate, and arrange verbs for transforming data. You mastered the count verb, and learned the powerful group_by and summarize pattern for aggregating data.

Verbs table

You learned four verbs you can use for adding, removing, and changing variables in a table: select, rename, transmute, and mutate.

babynames data

You also applied many of these tools to a new babynames dataset, and gained experience with grouped mutates and window functions.

Other DataCamp courses

You’ll find all of these skills valuable in your own projects, as well as in countless other DataCamp courses, including more intermediate and advanced dplyr courses, and courses that combine dplyr with other Tidyverse packages.

Congratulations!

Congratulations again, and best of luck using your new-found dplyr skills in your own analyses!

Data Manipulation with dplyr

James Chapman

03 April 2023

Transforming Data with dplyr

Exploring data with dplyr

Understanding your data

Selecting columns

The filter and arrange verbs

Arranging observations

Filtering for conditions

Filtering and arranging

The mutate() verb

Calculating the number of government employees

Calculating the percentage of women in a county

Select, mutate, filter, and arrange

Aggregating Data

The count verb

Counting by region

Counting citizens by state

Mutating and counting

The group_by, summarize, and ungroup verbs

Summarizing

Summarizing by state

Summarizing by state and region

The slice_min and slice_max verbs

Selecting a county from each region

Finding the lowest-income state in each region

Using summarize, slice_max, and count together

Selecting and Transforming Data

Selecting

Selecting columns

Select helpers

The rename verb

Renaming a column after count

Renaming a column as part of a select

The transmute verb

Choosing among verbs

Using transmute

Matching verbs to their definitions

rename

transmute

mutate

Choosing among the four verbs

Case Study: The babynames Dataset

The babynames data

Filtering and arranging for one year

Finding the most popular names each year

Visualizing names with ggplot2

Grouped mutates

Finding the year each name is most common

Adding the total and maximum for each name

Visualizing the normalized change in popularity

Window functions

Using ratios to describe the frequency of a name

Biggest jumps in a name

Congratulations!

Congratulations!

Summary

Verbs table

babynames data

Other DataCamp courses

Congratulations!