M06-Reflection Essay-Advanced Data Wrangling

Author

Juan De La Cruz

Published

February 16, 2026

1 Step 1 – Impressions of Revealjs

In Step 1, you learned how to design stunning presentation slides with Quarto. What’s your impression of the Revealjs presentation? Describe its capabilities as you learned from the presentations. What are its strengths and weaknesses compared with PPT?

After watching the Step 1 lecture, my impression of Revealjs is that it offers a modern and flexible approach to building presentations, especially for work that involves data and analysis. What stood out to me is how Revealjs allows you to create slides directly from a Quarto document, which keeps everything—writing, formatting, code, and visuals—in one place. Instead of switching between tools or copying screenshots into PowerPoint, Revealjs lets you generate a full presentation from the same source file that contains your analysis. This makes the workflow more efficient and reproducible.

The lecture also demonstrated how Revealjs supports features such as slide transitions, incremental bullet points, embedded images, and even live code or outputs. These capabilities make the presentation feel more dynamic and interactive. I also liked how simple it is to structure slides using Markdown headings, which keeps the focus on content rather than design. Compared with PowerPoint, Revealjs is much stronger for technical presentations because it integrates naturally with R and Quarto. However, PowerPoint is still more intuitive for quick, non‑technical presentations and for audiences who prefer a familiar format. Overall, Revealjs feels like a powerful tool for presenting analytical work because it keeps the presentation closely connected to the underlying code and data.

2 Step 2: Revealjs Presentation Reflection

In Step 2, you were asked to create a Revealjs presentation. Provide a link to the presentation you published on the web.  Add at least one reproducible chart or table in the presentation. Briefly describe what the presentation is about. What did you learn from the experience of the Revealjs presentation? 

2.2 Reproducible Chart Included

The presentation includes a reproducible bar chart created with ggplot2 that visualizes the sample characteristics of three audience segments for Rising Stars Muay Thai. The chart is generated directly from an R code chunk, ensuring that the visualization updates automatically if the underlying data changes.


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)

sample_data <- tibble(
  segment = c("Combat Fans", "Martial Artists", "Families"),
  count = c(120, 85, 60)
)

sample_data
# A tibble: 3 × 2
  segment         count
  <chr>           <dbl>
1 Combat Fans       120
2 Martial Artists    85
3 Families           60
ggplot(sample_data, aes(x = segment, y = count)) +
  geom_col(fill = "firebrick") +
  labs(
    title = "Sample Size by Audience Segment",
    x = "Segment",
    y = "Count"
  )

::: callout-note 

2.3 What the Presentation Is About

The presentation introduces my MSDM Culminating Experience Project focused on Rising Stars Muay Thai, a Sacramento‑based Muay Thai promotion. It outlines the organization’s current strengths, the gaps in its digital infrastructure, and the goals of the analytics project. The slides walk through the introduction, literature review, methods, and sample characteristics, setting the foundation for a data‑driven marketing strategy.

::: 

2.4 What I Learned from the Revealjs Experience

Working with Revealjs taught me how powerful it is to build presentations directly from code. I learned how to structure slides using Markdown, how slide separators work, and how to embed reproducible tables and charts that render automatically. I also realized how sensitive Revealjs is to formatting—especially YAML indentation, heading levels, and the difference between Visual and Source mode in RStudio. Overall, the experience helped me understand how to create dynamic, reproducible presentations that stay connected to the underlying data and analysis.

3 Step 3 – Advanced Data Wrangling Functions


My advanced wrangling lecture in Step 3 introduced new concepts, including joining relational databases. Pick three different functions, each from a different category. Describe when to use them and give one coding example along with an output for each function. To clarify, you are expected to do literate programming to weave your narratives with the code.

3.1 1. filter() – Selecting Rows Based on Conditions

The filter() function is used when I want to keep only the rows that meet specific criteria. It is especially helpful when narrowing a dataset to a meaningful subset before further analysis. For example, if I want to focus on cars with high fuel efficiency in the mtcars dataset, I can filter based on the mpg variable.

 library(dplyr)  

high_mpg <- mtcars |>
  filter(mpg > 25) 

high_mpg
                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2

3.2 2. mutate() – Creating or Transforming Columns

The mutate() function is used when I need to add new variables or transform existing ones. It is useful for creating calculated fields that support deeper analysis. For example, I can create a new variable that converts weight from thousands of pounds (wt) into actual pounds.

mtcars_mutated <- mtcars |>   
  mutate(weight_lbs = wt * 1000)  

head(mtcars_mutated[, c("wt", "weight_lbs")])
                     wt weight_lbs
Mazda RX4         2.620       2620
Mazda RX4 Wag     2.875       2875
Datsun 710        2.320       2320
Hornet 4 Drive    3.215       3215
Hornet Sportabout 3.440       3440
Valiant           3.460       3460

3.3 3. Left_join() – Merging Relational Tables

The left_join() function is used when I want to merge two datasets while keeping all rows from the left table. This is especially useful when working with relational data, where information is stored across multiple tables. In the example below, I join a small table of car categories to the main mtcars dataset.

cars <- tibble(   
  model = rownames(mtcars),   
  mpg = mtcars$mpg 
  )  

categories <- tibble(
  model = c("Mazda RX4", "Datsun 710"),   
  type = c("Sport", "Compact") 
  )  

cars_joined <- cars |>
  left_join(categories, by = "model")  

cars_joined
# A tibble: 32 × 3
   model               mpg type   
   <chr>             <dbl> <chr>  
 1 Mazda RX4          21   Sport  
 2 Mazda RX4 Wag      21   <NA>   
 3 Datsun 710         22.8 Compact
 4 Hornet 4 Drive     21.4 <NA>   
 5 Hornet Sportabout  18.7 <NA>   
 6 Valiant            18.1 <NA>   
 7 Duster 360         14.3 <NA>   
 8 Merc 240D          24.4 <NA>   
 9 Merc 230           22.8 <NA>   
10 Merc 280           19.2 <NA>   
# ℹ 22 more rows