Your task here is to Create an Example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.(25 points)

Later, you’ll be asked to extend an existing vignette. Using one of your classmate’s examples (as created above), you’ll then extend his or her example with additional annotated code. (15 points)

You should clone the provided repository. Once you have code to submit, you should make a pull request on the shared repository. You should also update the README.md file with your example.

After you’ve created your vignette, please submit your GitHub handle name in the submission link provided below. This will let your instructor know that your work is ready to be peer-graded.

You should complete your submission on the schedule stated in the course syllabus.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The data I used for this assignment was from fivethirtyeight and it’s on the alcohol consumption from different countries in 2010. To load the data I used read.csv, since the file is a csv in github. Read.csv is from the readr library which is also in the tidyverse package. The glimpse function is from the tibble library from tyverse, that is used to view the data set and get informaiton on the ammount of rows and columns. This data set has 193 rows and 5 columns.

Article: https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/

Data: https://github.com/fivethirtyeight/data/tree/master/alcohol-consumption

DF1<-read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/alcohol-consumption/drinks.csv')
glimpse(DF1)

## Rows: 193
## Columns: 5
## $ country                      <chr> "Afghanistan", "Albania", "Algeria", "And…
## $ beer_servings                <int> 0, 89, 25, 245, 217, 102, 193, 21, 261, 2…
## $ spirit_servings              <int> 0, 132, 0, 138, 57, 128, 25, 179, 72, 75,…
## $ wine_servings                <int> 0, 54, 14, 312, 45, 45, 221, 11, 212, 191…
## $ total_litres_of_pure_alcohol <dbl> 0.0, 4.9, 0.7, 12.4, 5.9, 4.9, 8.3, 3.8, …

From the tidyr library in tidyverse, pivot_longer function can be used to reshape the data frame by collapsing four coulumns into two columns, resulting into a lengthen data frame.

##       country      alcohol_type_of_serving serving_amount
## 1 Afghanistan                beer_servings              0
## 2 Afghanistan              spirit_servings              0
## 3 Afghanistan                wine_servings              0
## 4 Afghanistan total_litres_of_pure_alcohol              0
## 5     Albania                beer_servings             89
## 6     Albania              spirit_servings            132

drop_na is a function from tidyr that drops missing vaules and trues the values into floats.

##       country      alcohol_type_of_serving serving_amount
## 1 Afghanistan                beer_servings              0
## 2 Afghanistan              spirit_servings              0
## 3 Afghanistan                wine_servings              0
## 4 Afghanistan total_litres_of_pure_alcohol              0
## 5     Albania                beer_servings             89
## 6     Albania              spirit_servings            132

ggplot function is also from tidyverse used to plot data sets, as well the filter function which I used to only display data for serving amount that are greater than or equal to 347.

DF1<-filter(DF1,serving_amount>=347)
ggplot(DF1, aes(x=country, y=serving_amount, fill=alcohol_type_of_serving)) +
  geom_col(position="dodge")

Conclusion

Tidyverse is a package that is handy for manipulating and transforming data. I was able to use filter to transform the data, glimpse to view the data, read.csv to read the data from a csv file into a data frame, pivot_longer to reshape the data, ggplot to show a visual of the data I was exploring.

Assignment_tidyverse

Andreina A

2024-11-02

Conclusion