M06-Reflection Essay-Advanced Data Wrangling

Author

Savannah Ventic

Published

March 15, 2026

1 Essay Prompts:

1.1 Tidyverse Tricks

Note

Prompt: Summarize what you learned about Tidyverse tricks presented in Step 1 by focusing on the new things you learned that were not dealt with in M05.

Response: In Step 1, I watched David Robinson’s video about ten useful tricks in the Tidyverse. One thing I noticed right away was how quickly he could move through the data and transform it using multiple functions together. Compared with what I learned in M05, the examples in the video showed a more advanced way of using the tidyverse tools. In M05, I learned the basic functions such as filter(), select(), mutate(), and summarize() to clean and organize data. In this video, I saw how these functions can be used together to solve problems efficiently. The speaker used the pipe operator to connect many steps together, which helped make the code easier to understand. Another new thing I learned was how data scientists simultaneously explore data while cleaning it. This illustrated that data wrangling is not only about cleaning data, but also about understanding the data. Overall, the video helped me understand what can be done when the tools are combined and used efficently.

1.2 David Robinson’s Performance

Note

Prompt: What do you think about David Robinson’s impromptu screencast performance in Step 2? How much were you able to understand?

Response: In Step 2, I watched David Robinson’s live screencast where he analyzed European energy data using the Tidyverse in R. His performance was impressive as he was able to quickly clean the data, explore the data, and create visualizations. It was interesting to see how he was able to combine different tidyverse functions to transform the data and generate insights. I could understand some parts of the code, especially when he used functions we learned in class like plotting with ggplot2. However, some of the video moved very fast and included tools that I do not fully understand yet, so it was challenging to follow every single step. Even though The video was still helpful because it showed how an experienced data scientist uses tidyverse.

1.3 Initial Cleaning and Further Cleaning

Note

Prompt: In Step 3, you saw two videos, one showing initial data cleaning and the other showing further cleaning. Describe the two processes. What are the big differences between the two? How do the differences result in the differences in final sample size?

Response: In Step 3, the first video showed the initial data cleaning process, which focused on organizing the raw survey data and removing problems such as missing values or incorrect responses. This step helped prepare the data for further analysis. The second video showed further cleaning, where stricter rules were applied to filter the data more carefully. The main difference is that the first step focuses on basic cleaning, while the second step focuses on improving data quality. Because the second step has different conditions, more responses are removed, which results in a smaller final sample size.

1.4 Revealjs Presentation

Tip

Prompt: In Step 4, you learned how to create an Revealjs presentation. What’s your impression of the Revealjs presentation? Describe its capabilities as you learned from the video. What are its strengths and weaknesses compared with PPT?

Response: In Step 4, I learned how to create presentation slides using Revealjs with Quarto. Revealjs allows users to make slides using code and publish them as web presentations. It can include features like navigation, code blocks, and visualizations, which are useful for data science presentations. One strength is that it works well with R and allows code and results to be shown in the slides. However, compared with PowerPoint, it can be harder for beginners because it requires coding knowledge.

1.5 Creating Presentation

Note

Prompt:In Step 5, you were asked to create a Revealjs presentation. Provide a link to the presentation you published on the web. Briefly describe what the presentation is about. What did you learn from the experience of the Revealjs presentation?

Response: In Step 5, we practiced creating a Revealjs presentation using Quarto. The presentation was designed to introduce a portfolio website project and explain its main sections, such as the home page, project pages, and dashboards. While working on the slides, I learned how Revealjs allows presentations to be created using code instead of using Google Slides or PowerPoint. I also learned how slides can include features such as navigation, sections, and formatted content. This activity helped me understand how Quarto can be used to create professional presentations for various projects.

1.6 Literate Programming

Note

Prompt: My advanced wrangling lecture in Step 6 introduced new concepts not covered in M05, such as joining relational databases. Pick three of them, describe when to use them, and give one coding example along with an output for each function. To clarify, you are expected to do literate programming by weaving your narratives with the code.

Response: In Step 6, I learned several new data wrangling tools that were not covered in M05. Three useful ones were left_join(), across(), and case_when(). left_join() is used when I want to combine two datasets and keep all rows from the first table. across() is useful to apply the same function to multiple columns at the same time. case_when() is helpful when creating a new variable with certain conditions. These functions make data cleaning easier.

1.6.1 left_join()


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
df1 <- data.frame(id = c(1, 2, 3), name = c("A", "B", "C"))
df2 <- data.frame(id = c(1, 2, 4), score = c(90, 85, 88))

left_join(df1, df2, by = "id")
  id name score
1  1    A    90
2  2    B    85
3  3    C    NA

1.6.2 across()

df <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6))

df %>% summarize(across(everything(), mean))
  a b
1 2 5

1.6.3 case when()

df <- data.frame(score = c(95, 80, 60))

df %>% mutate(level = case_when(
  score >= 90 ~ "High",
  score >= 70 ~ "Medium",
  TRUE ~ "Low"
))
  score  level
1    95   High
2    80 Medium
3    60    Low

1.7 Wrangling Reflection

Note

Prompt:Data wrangling is like laying a foundation/basement for a building, while visualization is building a structure above the basement. Working in a basement is not a glamorous job at all. No one notices it; people will see what they can see above the ground and judge it, but without the hard work you did in the basement, the building will never stand well. Besides, wrangling is hard. You learned dplyr and tidyr in this module, but there is still more to learn in M07 and M08. What did you like the most about working with the tools you learned in this module? Elaborate on your point.

Response: What I liked most about working with data wrangling tools in this module is that they help organize overwhelming data into a cleaner format. Functions from the tidyverse, such as dplyr and tidyr, make it easier to transform, and summarize data. I also liked how the pipe operator allows multiple steps to be connected in a clear and way. These tools make the data preparation process more efficient and help prepare the data for analysis and visualization.

1.8 Challenges

Note

Prompt:What seems to be the challenge, if any, for you when you try to master the wrangling tools? How may you be able to overcome the challenge? Let me share a little bit about my learning journey. The first time I saw David Robinson’s performance, I was so impressed that I paused the video and typed his code. It took all day to type the code. That experience was very helpful, but I realized that I could shorten my learning curve by focusing on the basics. I couldn’t understand a lot of the tools he was using. So, I tried to study those packages in Tidyverse and watch his video once in a while to check how much I improved. I found this method very effective. Thus, I believe that you will be ready to absorb online resources like YouTube much more effectively once you finish this program. Hope this experience helps you.

Response: One challenge when learning data wrangling is that there are many functions and it can be difficult to remember how each one works. Sometimes the code can be confusing when trying to combine multiple steps together. To overcome this challenge, I think practicing regularly and reviewing examples will help. Running the code step by step and can also make it easier to understand how each of the functions work.