M02 – Introduction to Data Science in R: Reflection

Author

Nicole Bewley-Hudson

Published

February 15, 2026

1 Step 2: Preparing the HTML Report in Quarto

1.1 Prompt

Note

Prepare your report (HTML file) using Quarto. When you prepare the Quarto report, implement as many features and effects of the HTML format as you can, including: text formatting, headings, lists, blockquote, panel tabset or tables, cross-referencing, footnote, callout blocks, table of contents (toc-depth/toc-expand), section numbering, self-contained HTML, HTML theme options, hiding code, code overflow, and code tools.

1.2 Response

For Step 3, I am going to write my reflection as a polished Quarto HTML report (not a Word-style essay). Based on the instructor’s Quarto lecture and the advanced codebook, my plan is to organize each essay prompt as its own numbered section (Heading 1), and then clearly show the prompt and my response underneath it (Heading 2). I will use Quarto’s HTML tools so the document is easy to read, professional, and reproducible.

The sections below demonstrate how I will intentionally apply Quarto’s HTML features while responding to the Step 3 essay prompts:

Implementation Plan for HTML Features

I am intentionally using hierarchical headings (H1 for major prompts and H2 for prompts/responses) so Quarto automatically builds a structured Table of Contents. This makes the report easier to navigate and mirrors how professional reports and academic papers are organized.

Table of contents + toc-depth / toc-expand

I enabled the Table of Contents in the YAML header, adjusted the depth level, and expanded key subsections so the reader can navigate the report easily.

Section numbering

I will use Heading 1 for each essay prompt so Quarto automatically numbers each major section in the HTML. This makes the report feel structured and “report-like” rather than a single long page.

HTML theme options

I will use an HTML theme in the YAML so the report looks clean and readable. (I will stick with a standard theme like cosmo unless instructed otherwise.)

Self-contained HTML

I will include embed-resources: true in the YAML so the HTML file is fully self-contained. That way, when I upload it, it won’t require a separate folder of images/scripts and won’t appear broken.

1.3 Writing and Formatting Features Demonstrated

Note

This section explains how I intentionally applied Quarto formatting tools to improve clarity, structure, and professionalism.

1.3.1 Text Formatting

Inside my responses, I use:

Bold for key terms such as reproducibility, literate programming, and tidyverse
Inline code formatting for functions and file types like quarto publish, .qmd, and embed-resources

This improves readability and helps technical terms stand out clearly.

1.3.2 Lists

When describing workflows (such as reproducible research steps or tidyverse pipelines), I use:

Numbered lists for ordered processes
Bullet lists for conceptual groupings

This keeps analytical logic easy to follow and prevents long blocks of dense text.

1.3.3 Blockquote for Key Takeaway

“Reproducibility is not just about getting the right answer — it is about making the entire analytical process transparent, repeatable, and trustworthy.”¹

Formatting Techniques Applied In This Report
Feature	Purpose
Bold formatting	Emphasizes key ideas
Inline code	Distinguishes technical syntax
Columns	Improves visual organization
Callout blocks	Highlights important concepts
Footnotes	Adds clarification without interrupting flow

1.4 Organization features that make the HTML easier to read

Panel tabset or tables

I will use at least one tabset to group related ideas without making the page feel too long. For example, in the R vs. Python section (see Section Section 2.6),organized into three structured tabs — Practitioner View, Academic View, and My Take.

This allows readers to compare perspectives without scrolling through one long block of text.

I will also use at least one table to summarize concepts (for example, tidyverse packages by purpose or a simple R vs Python comparison). Tables will be created in Visual mode so they render cleanly.

Reproducibility features I will demonstrate in Step 3

Hiding code + code tools

When I include short code examples (tidyverse pipelines), I will use code folding so the HTML stays readable. The reader can choose to “show code” when they want it. Code tools in the HTML will allow toggling code visibility.

Code overflow

I enabled code wrapping so long lines will not break the HTML layout. This ensures code remains readable even when lines are wide.

Show code

mtcars %>%
  group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    mean_hp  = mean(hp),
    mean_wt  = mean(wt)
  )

# A tibble: 3 × 4
    cyl mean_mpg mean_hp mean_wt
  <dbl>    <dbl>   <dbl>   <dbl>
1     4     26.7    82.6    2.29
2     6     19.7   122.     3.12
3     8     15.1   209.     4.00

Cross-referencing

Show code

#| label: tbl-summary
#| tbl-cap: "Example Summary Table"
#| message: false
#| warning: false

mtcars %>%
  group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    mean_hp  = mean(hp),
    mean_wt  = mean(wt),
    .groups = "drop"
  ) %>%
  knitr::kable(digits = 2)

Table 1

cyl	mean_mpg	mean_hp	mean_wt
4	26.66	82.64	2.29
6	19.74	122.29	3.12
8	15.10	209.21	4.00

As shown in Table 1, Quarto automatically numbers and links tables.

Footnotes

Quarto supports literate programming, which helps address the reproducibility crisis in research².

Callout blocks

Tip

Using embed-resources: true ensures the HTML file is fully self-contained for submission.

Key reminders from the lecture:

Why embed-resources: true matters for submission
Why consistent daily practice improves coding fluency
Why Quarto reduces manual copy/paste errors and improves reproducibility

This approach will allow me to answer each Step 3 prompt clearly while also demonstrating that I can use Quarto features intentionally (not randomly). The final HTML should look like a professional report: easy to navigate, correctly structured, and reproducible.

2 Step 3: Essay Responses

2.1 Reproducible Data Science and Literate Coding

2.1.1 Prompt

Summarize new things you learned in this module about reproducible data science and literate coding.

2.1.2 Response

This module fundamentally changed how I think about data analysis. Before this course, I viewed coding primarily as a tool for producing results. What I learned is that reproducible data science is not just about getting the right output. It is about creating a transparent workflow where data, code, interpretation, and final output are integrated into a single, repeatable document.

The concept of literate coding stood out to me. Rather than separating analysis from reporting, Quarto allows narrative and executable code to exist side-by-side. This eliminates manual copying and pasting of tables, charts, and statistics into Word or PowerPoint. Having a workflow that the reproducibility crisis discussion highlighted as vulnerable to human error can be minimized. I now understand that even small formatting decisions, such as hiding setup code, embedding resources, or structuring sections clearly, contribute to long-term credibility and clarity.

Most importantly, reproducibility protects both accuracy and memory. If I revisit a project months later, I do not need to remember what I clicked or which menu I used. The code documents the process. That shift, from “results-focused” to “process-focused,” is one of the most valuable lessons from this module.

2.2 Instructor’s Lecture About Quarto

2.2.1 Prompt

Summarize the instructor’s lecture about Quarto that you found impressive. Do you see yourself using Quarto every day for your job or study, like taking notes in your class or preparing a presentation? More eye-catching effects will be introduced in later modules.

2.2.2 Response

What impressed me most about the instructor’s lecture was the emphasis on consistency and habit-building when learning to code. The analogy comparing coding to learning a new language made the learning process feel realistic and achievable. Rather than cramming long sessions, the instructor emphasized short, consistent daily engagement, even 15 minutes a day, to maintain cognitive continuity. That advice framed coding as a skill built gradually through repetition.

I also found the practical workflow guidance helpful, especially regarding embed-resources: true and avoiding broken HTML submissions. This was an important topic for myself, as I have had that error with previous assignment submissions. The demonstration of publishing to Quarto Pub and GitHub Pages showed that Quarto is not simply a reporting tool but a platform capable of producing websites, presentations, and shareable analytical products.

Yes, I can see myself using Quarto regularly. I have tried to implement using it more frequently for class notes, outlines or other tasks allowing me to become more comfortable with the different features. It provides structure for note-taking, project planning, and analytical assignments while maintaining professional formatting. The integration of narrative and code makes it ideal for study guides and documentation. As more advanced features are introduced, I see Quarto becoming a central organizational tool in my academic and professional workflow.

2.3 Reproducibility Crisis

2.3.1 Prompt

Summarize the video presentation about reproducibility crisis. What do you think about reproducible crisis in your field? How can Quarto help address the concerns you have about reproducibility in the line of your work?

2.3.2 Response

The reproducibility crisis discussion emphasized how fragile traditional analytical workflows can be. When researchers rely on menu-driven software, copy outputs into spreadsheets, screenshot charts, and manually type results into reports, the final product becomes disconnected from the actual analytical process. Even small transcription errors can undermine credibility, and years later, it may be impossible to reconstruct the original steps.

In business and analytics contexts, I see similar risks. Teams often rely heavily on spreadsheets, dashboards, and exported reports. When filters are changed or calculations adjusted without documentation, inconsistencies emerge. Over time, version control becomes difficult, and stakeholders may question which numbers are correct.

Quarto directly addresses this concern by embedding the analytical process within the report itself. The document becomes a single source of truth: if the data changes, the output updates automatically upon rendering. This significantly reduces manual error and increases transparency. In my future work, this structured workflow would help ensure that decisions are based on documented, reproducible logic rather than isolated spreadsheet outputs.

2.4 Tidyverse Ecosystem

2.4.1 Prompt

Summarize what you learned in this module about the Tidyverse ecosystem in R. What can you do with R?

2.4.2 Response

The Tidyverse ecosystem introduced a consistent and readable approach to data analysis. Rather than memorizing disconnected functions, Tidyverse feels intuitive because its functions read like verbs (e.g., filter(), select(), mutate(), summarise()³) connected through pipes, so the workflow reads like a sentence. This structure makes analytical workflows easier to read, debug, and explain.

Show code

#| label: tidy-verb-demo
#| echo: false
#| message: false
#| warning: false

library(tidyverse)

mtcars |>
  as_tibble(rownames = "model") |>
  filter(cyl == 4) |>
  select(model, mpg, hp, wt) |>
  mutate(hp_per_wt = hp / wt) |>
  summarise(
    n = n(),
    avg_mpg = mean(mpg),
    avg_hp_per_wt = mean(hp_per_wt)
  ) |>
  knitr::kable(
    digits = 2,
    caption = "Summary statistics for 4-cylinder vehicles"
  )

Summary statistics for 4-cylinder vehicles
n	avg_mpg	avg_hp_per_wt
11	26.66	37.93

How I added this formatting (so you can see I learned it)

Feature used: Footnotes + inline code formatting.

How to do it (in Source mode): 1. Type a footnote marker where you want it to appear, like [^1]. 2. Scroll to the bottom of the section (or bottom of the document) and define it like this:

[^1]: Your footnote text here.

Tidyverse workflow (visual overview)

flowchart LR
  A[Import]--> 
  B[Tidy]--> 
  C[Transform]--> 
  D[Visualize]--> 
  E[Model]

I also learned that R extends far beyond statistical testing. R can support the full analytical lifecycle: importing data, cleaning and transforming it, visualizing patterns, modeling outcomes, and communicating results through reproducible reports. The integration between tidyverse tools and Quarto reinforces the idea that analysis and communication are not separate tasks.

Through this module, I now see R as a professional analytical environment capable of producing not just results, but structured, shareable insight.

2.5 Career-Enhancing Packages

2.5.1 Prompt

Refer to the video introducing two 20 packages. Among the various packages introduced, choose two packages beyond the Tidyverse ecosystem, base R, and Quarto that could enhance your job/career. Tell me why you think so.

2.5.2 Response

From the “20 R Packages” presentation, two packages that stood out to me are tidymodels and data.table. I chose these because they strengthen two things that matter in real work: (1) credible modeling workflows and (2) efficiency with larger datasets.

tidymodels appeals to me because it treats modeling as a process (preprocessing → training → tuning → validation), which supports transparency and repeatability in analytics work.
data.table stands out because performance becomes a real constraint in practice, and fast data manipulation matters when datasets scale.

Show code

knitr::kable(
  tibble::tibble(
    package = c("tidymodels", "data.table"),
    strength = c("Structured modeling workflows", "High-performance data manipulation"),
    why_it_matters = c(
      "Improves credibility and repeatability for business decisions",
      "Helps scale analysis when data gets large"
    )
  ),
  caption = "Two packages beyond Tidyverse/base R/Quarto that could strengthen my career toolkit"
)

Two packages beyond Tidyverse/base R/Quarto that could strengthen my career toolkit
package	strength	why_it_matters
tidymodels	Structured modeling workflows	Improves credibility and repeatability for business decisions
data.table	High-performance data manipulation	Helps scale analysis when data gets large

2.6 R vs. Python Debate

2.6.1 Prompt

Refer to the debate on R vs. Python. What is your take on this debate?

2.6.2 Response

From the practitioner’s perspective, the debate between R and Python centers on applicability and job market demand. The practitioner video emphasized that Python dominates in software engineering, automation, and production machine learning systems. However, R remains strong in statistical modeling, visualization, and exploratory data analysis. The key message was not that one language is superior, but that each serves different purposes within data workflows.

The University of Michigan professor emphasized reproducible data science. R has historically excelled in literate programming through R Markdown and now Quarto. Its ecosystem was built around statistical rigor and academic transparency. Python, while extremely powerful, required additional tooling to reach the same level of integrated reproducible reporting. The academic argument focused on clarity, documentation, and reproducibility as central pillars of research integrity.

The R vs. Python debate ultimately appears less about superiority and more about context and application. Modern analytical environments increasingly emphasize interchangeability, allowing multiple languages within the same workflow. Rather than viewing the decision as either/or, it is more productive to understand the strengths each tool brings to different stages of analysis.

For my current academic and analytical goals, R provides a strong foundation in data manipulation, statistical modeling, visualization, and reproducible communication, particularly through tidyverse and Quarto. At the same time, I recognize Python’s value in broader technical ecosystems, including automation, software development, and production-level machine learning systems.

My perspective is not to choose sides, but to intentionally build complementary skills that expand flexibility and long-term adaptability in future roles.

Footnotes

The reproducibility crisis refers to the inability of researchers to recreate published findings due to undocumented workflows or manual data handling.↩︎
The reproducibility crisis refers to the inability of researchers to recreate published findings due to undocumented workflows or manual data handling.↩︎
Tidyverse verb quick glossary (with examples):
- filter() keeps rows that match a condition (think “filter down the cases”). Example: keep 4-cylinder cars → mtcars |> dplyr::filter(cyl == 4)
- select() keeps (or reorders) columns (think “select variables”). Example: keep only mpg + hp → mtcars |> dplyr::select(mpg, hp)
- mutate() creates or transforms columns (think “make a new variable”). Example: add power-to-weight → mtcars |> dplyr::mutate(hp_per_wt = hp / wt)
- summarise() collapses many rows into summary statistics (think “summarize the dataset”). Example: average mpg → mtcars |> dplyr::summarise(avg_mpg = mean(mpg))↩︎

--- title: "M02 – Introduction to Data Science in R: Reflection" author: "Nicole Bewley-Hudson" date: today date-format: "MMMM D, YYYY" format: html: theme: cosmo toc: true toc-depth: 2 toc-expand: 1 number-sections: true code-fold: true code-summary: "Show code" code-tools: true code-overflow: wrap embed-resources: true mermaid: {} editor: source execute: warning: false message: false --- ```{r} #| label: setup #| include: false #| message: false #| warning: false library(tidyverse) options(dplyr.summarise.inform = FALSE) options(conflicts.policy = "depends.ok") ``` # Step 2: Preparing the HTML Report in Quarto ## *Prompt* ::: callout-note Prepare your report (HTML file) using Quarto. When you prepare the Quarto report, implement as many features and effects of the HTML format as you can, including: text formatting, headings, lists, blockquote, panel tabset or tables, cross-referencing, footnote, callout blocks, table of contents (toc-depth/toc-expand), section numbering, self-contained HTML, HTML theme options, hiding code, code overflow, and code tools. ::: ## *Response* For Step 3, I am going to write my reflection as a polished Quarto HTML report (not a Word-style essay). Based on the instructor’s Quarto lecture and the advanced codebook, my plan is to organize each essay prompt as its own numbered section (Heading 1), and then clearly show the prompt and my response underneath it (Heading 2). I will use Quarto’s HTML tools so the document is easy to read, professional, and reproducible. The sections below demonstrate how I will intentionally apply Quarto’s HTML features while responding to the Step 3 essay prompts: **Implementation Plan for HTML Features** I am intentionally using hierarchical headings (H1 for major prompts and H2 for prompts/responses) so Quarto automatically builds a structured Table of Contents. This makes the report easier to navigate and mirrors how professional reports and academic papers are organized. **Table of contents + toc-depth / toc-expand** I enabled the Table of Contents in the YAML header, adjusted the depth level, and expanded key subsections so the reader can navigate the report easily. **Section numbering** I will use Heading 1 for each essay prompt so Quarto automatically numbers each major section in the HTML. This makes the report feel structured and “report-like” rather than a single long page. **HTML theme options** I will use an HTML theme in the YAML so the report looks clean and readable. (I will stick with a standard theme like cosmo unless instructed otherwise.) **Self-contained HTML** I will include embed-resources: true in the YAML so the HTML file is fully self-contained. That way, when I upload it, it won’t require a separate folder of images/scripts and won’t appear broken. ## Writing and Formatting Features Demonstrated ::: callout-note This section explains how I intentionally applied Quarto formatting tools to improve clarity, structure, and professionalism. ::: :::::: columns ::: {.column width="48%"} ### Text Formatting Inside my responses, I use: - **Bold** for key terms such as **reproducibility**, **literate programming**, and **tidyverse** - Inline code formatting for functions and file types like `quarto publish`, `.qmd`, and `embed-resources` This improves readability and helps technical terms stand out clearly. ::: ::: {.column width="4%"} ::: ::: {.column width="48%"} ### Lists When describing workflows (such as reproducible research steps or tidyverse pipelines), I use: 1. Numbered lists for ordered processes\ 2. Bullet lists for conceptual groupings This keeps analytical logic easy to follow and prevents long blocks of dense text. ::: :::::: ### **Blockquote for Key Takeaway** > “Reproducibility is not just about getting the right answer — it is about making the entire analytical process transparent, repeatable, and trustworthy.”[^1] [^1]: The reproducibility crisis refers to the inability of researchers to recreate published findings due to undocumented workflows or manual data handling. ```{r} #| echo: false library(tidyverse) tibble( Feature = c("Bold formatting", "Inline code", "Columns", "Callout blocks", "Footnotes"), Purpose = c( "Emphasizes key ideas", "Distinguishes technical syntax", "Improves visual organization", "Highlights important concepts", "Adds clarification without interrupting flow" ) ) |> knitr::kable( caption = "Formatting Techniques Applied In This Report" ) ``` ## Organization features that make the HTML easier to read **Panel tabset or tables** I will use at least one tabset to group related ideas without making the page feel too long. For example, in the R vs. Python section (see Section @sec-r-python),organized into three structured tabs — Practitioner View, Academic View, and My Take. This allows readers to compare perspectives without scrolling through one long block of text. I will also use at least one table to summarize concepts (for example, tidyverse packages by purpose or a simple R vs Python comparison). Tables will be created in Visual mode so they render cleanly. ***Reproducibility features I will demonstrate in Step 3*** **Hiding code + code tools** When I include short code examples (tidyverse pipelines), I will use code folding so the HTML stays readable. The reader can choose to “show code” when they want it. Code tools in the HTML will allow toggling code visibility. **Code overflow** I enabled code wrapping so long lines will not break the HTML layout. This ensures code remains readable even when lines are wide. ```{r} mtcars %>% group_by(cyl) %>% summarise( mean_mpg = mean(mpg), mean_hp = mean(hp), mean_wt = mean(wt) ) ``` **Cross-referencing** ```{r} #| label: tbl-summary #| message: false #| warning: false #| paged-print: false #| label: tbl-summary #| tbl-cap: "Example Summary Table" #| message: false #| warning: false mtcars %>% group_by(cyl) %>% summarise( mean_mpg = mean(mpg), mean_hp = mean(hp), mean_wt = mean(wt), .groups = "drop" ) %>% knitr::kable(digits = 2) ``` As shown in @tbl-summary, Quarto automatically numbers and links tables. **Footnotes** Quarto supports literate programming, which helps address the reproducibility crisis in research[^2]. [^2]: The reproducibility crisis refers to the inability of researchers to recreate published findings due to undocumented workflows or manual data handling. **Callout blocks** ::: callout-tip ## *Tip* *Using `embed-resources: true` ensures the HTML file is fully self-contained for submission.* ::: **Key reminders from the lecture:** - Why `embed-resources: true` matters for submission - Why consistent daily practice improves coding fluency - Why Quarto reduces manual copy/paste errors and improves reproducibility This approach will allow me to answer each Step 3 prompt clearly while also demonstrating that I can use Quarto features intentionally (not randomly). The final HTML should look like a professional report: easy to navigate, correctly structured, and reproducible. # Step 3: Essay Responses ## Reproducible Data Science and Literate Coding ### *Prompt* Summarize new things you learned in this module about reproducible data science and literate coding. ### *Response* This module fundamentally changed how I think about data analysis. Before this course, I viewed coding primarily as a tool for producing results. What I learned is that reproducible data science is not just about getting the right output. It is about creating a transparent workflow where data, code, interpretation, and final output are integrated into a single, repeatable document. The concept of literate coding stood out to me. Rather than separating analysis from reporting, Quarto allows narrative and executable code to exist side-by-side. This eliminates manual copying and pasting of tables, charts, and statistics into Word or PowerPoint. Having a workflow that the reproducibility crisis discussion highlighted as vulnerable to human error can be minimized. I now understand that even small formatting decisions, such as hiding setup code, embedding resources, or structuring sections clearly, contribute to long-term credibility and clarity. Most importantly, reproducibility protects both accuracy and memory. If I revisit a project months later, I do not need to remember what I clicked or which menu I used. The code documents the process. That shift, from “results-focused” to “process-focused,” is one of the most valuable lessons from this module. ## **Instructor’s Lecture About Quarto** ### *Prompt* Summarize the instructor's lecture about Quarto that you found impressive. Do you see yourself using Quarto every day for your job or study, like taking notes in your class or preparing a presentation? More eye-catching effects will be introduced in later modules. ### *Response* What impressed me most about the instructor’s lecture was the emphasis on consistency and habit-building when learning to code. The analogy comparing coding to learning a new language made the learning process feel realistic and achievable. Rather than cramming long sessions, the instructor emphasized short, consistent daily engagement, even 15 minutes a day, to maintain cognitive continuity. That advice framed coding as a skill built gradually through repetition. I also found the practical workflow guidance helpful, especially regarding `embed-resources: true` and avoiding broken HTML submissions. This was an important topic for myself, as I have had that error with previous assignment submissions. The demonstration of publishing to Quarto Pub and GitHub Pages showed that Quarto is not simply a reporting tool but a platform capable of producing websites, presentations, and shareable analytical products. Yes, I can see myself using Quarto regularly. I have tried to implement using it more frequently for class notes, outlines or other tasks allowing me to become more comfortable with the different features. It provides structure for note-taking, project planning, and analytical assignments while maintaining professional formatting. The integration of narrative and code makes it ideal for study guides and documentation. As more advanced features are introduced, I see Quarto becoming a central organizational tool in my academic and professional workflow. ## **Reproducibility Crisis** ### *Prompt* Summarize the video presentation about reproducibility crisis. What do you think about reproducible crisis in your field? How can Quarto help address the concerns you have about reproducibility in the line of your work? ### *Response* The reproducibility crisis discussion emphasized how fragile traditional analytical workflows can be. When researchers rely on menu-driven software, copy outputs into spreadsheets, screenshot charts, and manually type results into reports, the final product becomes disconnected from the actual analytical process. Even small transcription errors can undermine credibility, and years later, it may be impossible to reconstruct the original steps. In business and analytics contexts, I see similar risks. Teams often rely heavily on spreadsheets, dashboards, and exported reports. When filters are changed or calculations adjusted without documentation, inconsistencies emerge. Over time, version control becomes difficult, and stakeholders may question which numbers are correct. Quarto directly addresses this concern by embedding the analytical process within the report itself. The document becomes a single source of truth: if the data changes, the output updates automatically upon rendering. This significantly reduces manual error and increases transparency. In my future work, this structured workflow would help ensure that decisions are based on documented, reproducible logic rather than isolated spreadsheet outputs. ## **Tidyverse Ecosystem** ### *Prompt* Summarize what you learned in this module about the Tidyverse ecosystem in R. What can you do with R? ### *Response* The Tidyverse ecosystem introduced a consistent and readable approach to data analysis. Rather than memorizing disconnected functions, Tidyverse feels intuitive because its functions read like *verbs* (e.g., `filter()`, `select()`, `mutate()`, `summarise()`[^3]) connected through pipes, so the workflow reads like a sentence. This structure makes analytical workflows easier to read, debug, and explain. [^3]: **Tidyverse verb quick glossary (with examples):**\ - `filter()` keeps rows that match a condition (think “filter down the cases”). Example: keep 4-cylinder cars → `mtcars |> dplyr::filter(cyl == 4)`\ - `select()` keeps (or reorders) columns (think “select variables”). Example: keep only mpg + hp → `mtcars |> dplyr::select(mpg, hp)`\ - `mutate()` creates or transforms columns (think “make a new variable”). Example: add power-to-weight → `mtcars |> dplyr::mutate(hp_per_wt = hp / wt)`\ - `summarise()` collapses many rows into summary statistics (think “summarize the dataset”). Example: average mpg → `mtcars |> dplyr::summarise(avg_mpg = mean(mpg))` ```{r} #| label: tidy-verb-demo #| echo: false #| message: false #| warning: false library(tidyverse) mtcars |> as_tibble(rownames = "model") |> filter(cyl == 4) |> select(model, mpg, hp, wt) |> mutate(hp_per_wt = hp / wt) |> summarise( n = n(), avg_mpg = mean(mpg), avg_hp_per_wt = mean(hp_per_wt) ) |> knitr::kable( digits = 2, caption = "Summary statistics for 4-cylinder vehicles" ) ``` ::: {.callout-tip title="How I added this formatting (so you can see I learned it)"} **Feature used:** Footnotes + inline code formatting. **How to do it (in Source mode):** 1. Type a footnote marker where you want it to appear, like `[^1]`. 2. Scroll to the *bottom of the section* (or bottom of the document) and define it like this: `[^1]: Your footnote text here.` ::: ::: {.callout-note title="Tidyverse workflow (visual overview)"} ```{mermaid} flowchart LR A[Import]--> B[Tidy]--> C[Transform]--> D[Visualize]--> E[Model] ``` ::: I also learned that R extends far beyond statistical testing. R can support the full analytical lifecycle: importing data, cleaning and transforming it, visualizing patterns, modeling outcomes, and communicating results through reproducible reports. The integration between tidyverse tools and Quarto reinforces the idea that analysis and communication are not separate tasks. Through this module, I now see R as a professional analytical environment capable of producing not just results, but structured, shareable insight. ## **Career-Enhancing Packages** ### *Prompt* Refer to the video introducing two 20 packages. Among the various packages introduced, choose two packages beyond the Tidyverse ecosystem, base R, and Quarto that could enhance your job/career. Tell me why you think so. ### *Response* From the “20 R Packages” presentation, two packages that stood out to me are **tidymodels** and **data.table**. I chose these because they strengthen two things that matter in real work: (1) credible modeling workflows and (2) efficiency with larger datasets. - **tidymodels** appeals to me because it treats modeling as a process (preprocessing → training → tuning → validation), which supports transparency and repeatability in analytics work. - **data.table** stands out because performance becomes a real constraint in practice, and fast data manipulation matters when datasets scale. ```{r} knitr::kable( tibble::tibble( package = c("tidymodels", "data.table"), strength = c("Structured modeling workflows", "High-performance data manipulation"), why_it_matters = c( "Improves credibility and repeatability for business decisions", "Helps scale analysis when data gets large" ) ), caption = "Two packages beyond Tidyverse/base R/Quarto that could strengthen my career toolkit" ) ``` ## **R vs. Python Debate** {#sec-r-python} ### *Prompt* Refer to the debate on R vs. Python. What is your take on this debate? ### *Response* ::: panel-tabset ## **Practitioner View** From the practitioner’s perspective, the debate between R and Python centers on applicability and job market demand. The practitioner video emphasized that Python dominates in software engineering, automation, and production machine learning systems. However, R remains strong in statistical modeling, visualization, and exploratory data analysis. The key message was not that one language is superior, but that each serves different purposes within data workflows. ## **Academic View** The University of Michigan professor emphasized reproducible data science. R has historically excelled in literate programming through R Markdown and now Quarto. Its ecosystem was built around statistical rigor and academic transparency. Python, while extremely powerful, required additional tooling to reach the same level of integrated reproducible reporting. The academic argument focused on clarity, documentation, and reproducibility as central pillars of research integrity. ## **My Take** The R vs. Python debate ultimately appears less about superiority and more about context and application. Modern analytical environments increasingly emphasize interchangeability, allowing multiple languages within the same workflow. Rather than viewing the decision as either/or, it is more productive to understand the strengths each tool brings to different stages of analysis. For my current academic and analytical goals, R provides a strong foundation in data manipulation, statistical modeling, visualization, and reproducible communication, particularly through tidyverse and Quarto. At the same time, I recognize Python’s value in broader technical ecosystems, including automation, software development, and production-level machine learning systems. My perspective is not to choose sides, but to intentionally build complementary skills that expand flexibility and long-term adaptability in future roles. :::