| Feature | Purpose |
|---|---|
| Bold formatting | Emphasizes key ideas |
| Inline code | Distinguishes technical syntax |
| Columns | Improves visual organization |
| Callout blocks | Highlights important concepts |
| Footnotes | Adds clarification without interrupting flow |
M02 – Introduction to Data Science in R: Reflection
1 Step 2: Preparing the HTML Report in Quarto
1.1 Prompt
Prepare your report (HTML file) using Quarto. When you prepare the Quarto report, implement as many features and effects of the HTML format as you can, including: text formatting, headings, lists, blockquote, panel tabset or tables, cross-referencing, footnote, callout blocks, table of contents (toc-depth/toc-expand), section numbering, self-contained HTML, HTML theme options, hiding code, code overflow, and code tools.
1.2 Response
For Step 3, I am going to write my reflection as a polished Quarto HTML report (not a Word-style essay). Based on the instructor’s Quarto lecture and the advanced codebook, my plan is to organize each essay prompt as its own numbered section (Heading 1), and then clearly show the prompt and my response underneath it (Heading 2). I will use Quarto’s HTML tools so the document is easy to read, professional, and reproducible.
The sections below demonstrate how I will intentionally apply Quarto’s HTML features while responding to the Step 3 essay prompts:
Implementation Plan for HTML Features
I am intentionally using hierarchical headings (H1 for major prompts and H2 for prompts/responses) so Quarto automatically builds a structured Table of Contents. This makes the report easier to navigate and mirrors how professional reports and academic papers are organized.
Table of contents + toc-depth / toc-expand
I enabled the Table of Contents in the YAML header, adjusted the depth level, and expanded key subsections so the reader can navigate the report easily.
Section numbering
I will use Heading 1 for each essay prompt so Quarto automatically numbers each major section in the HTML. This makes the report feel structured and “report-like” rather than a single long page.
HTML theme options
I will use an HTML theme in the YAML so the report looks clean and readable. (I will stick with a standard theme like cosmo unless instructed otherwise.)
Self-contained HTML
I will include embed-resources: true in the YAML so the HTML file is fully self-contained. That way, when I upload it, it won’t require a separate folder of images/scripts and won’t appear broken.
1.3 Writing and Formatting Features Demonstrated
This section explains how I intentionally applied Quarto formatting tools to improve clarity, structure, and professionalism.
1.3.1 Text Formatting
Inside my responses, I use:
- Bold for key terms such as reproducibility, literate programming, and tidyverse
- Inline code formatting for functions and file types like
quarto publish,.qmd, andembed-resources
This improves readability and helps technical terms stand out clearly.
1.3.2 Lists
When describing workflows (such as reproducible research steps or tidyverse pipelines), I use:
- Numbered lists for ordered processes
- Bullet lists for conceptual groupings
This keeps analytical logic easy to follow and prevents long blocks of dense text.
1.3.3 Blockquote for Key Takeaway
“Reproducibility is not just about getting the right answer — it is about making the entire analytical process transparent, repeatable, and trustworthy.”1
1.4 Organization features that make the HTML easier to read
Panel tabset or tables
I will use at least one tabset to group related ideas without making the page feel too long. For example, in the R vs. Python section (see Section Section 2.6),organized into three structured tabs — Practitioner View, Academic View, and My Take.
This allows readers to compare perspectives without scrolling through one long block of text.
I will also use at least one table to summarize concepts (for example, tidyverse packages by purpose or a simple R vs Python comparison). Tables will be created in Visual mode so they render cleanly.
Reproducibility features I will demonstrate in Step 3
Hiding code + code tools
When I include short code examples (tidyverse pipelines), I will use code folding so the HTML stays readable. The reader can choose to “show code” when they want it. Code tools in the HTML will allow toggling code visibility.
Code overflow
I enabled code wrapping so long lines will not break the HTML layout. This ensures code remains readable even when lines are wide.
Show code
mtcars %>%
group_by(cyl) %>%
summarise(
mean_mpg = mean(mpg),
mean_hp = mean(hp),
mean_wt = mean(wt)
)# A tibble: 3 × 4
cyl mean_mpg mean_hp mean_wt
<dbl> <dbl> <dbl> <dbl>
1 4 26.7 82.6 2.29
2 6 19.7 122. 3.12
3 8 15.1 209. 4.00
Cross-referencing
Show code
#| label: tbl-summary
#| tbl-cap: "Example Summary Table"
#| message: false
#| warning: false
mtcars %>%
group_by(cyl) %>%
summarise(
mean_mpg = mean(mpg),
mean_hp = mean(hp),
mean_wt = mean(wt),
.groups = "drop"
) %>%
knitr::kable(digits = 2)| cyl | mean_mpg | mean_hp | mean_wt |
|---|---|---|---|
| 4 | 26.66 | 82.64 | 2.29 |
| 6 | 19.74 | 122.29 | 3.12 |
| 8 | 15.10 | 209.21 | 4.00 |
As shown in Table 1, Quarto automatically numbers and links tables.
Footnotes
Quarto supports literate programming, which helps address the reproducibility crisis in research2.
Callout blocks
Using embed-resources: true ensures the HTML file is fully self-contained for submission.
Key reminders from the lecture:
- Why
embed-resources: truematters for submission - Why consistent daily practice improves coding fluency
- Why Quarto reduces manual copy/paste errors and improves reproducibility
This approach will allow me to answer each Step 3 prompt clearly while also demonstrating that I can use Quarto features intentionally (not randomly). The final HTML should look like a professional report: easy to navigate, correctly structured, and reproducible.
2 Step 3: Essay Responses
2.1 Reproducible Data Science and Literate Coding
2.1.1 Prompt
Summarize new things you learned in this module about reproducible data science and literate coding.
2.1.2 Response
This module fundamentally changed how I think about data analysis. Before this course, I viewed coding primarily as a tool for producing results. What I learned is that reproducible data science is not just about getting the right output. It is about creating a transparent workflow where data, code, interpretation, and final output are integrated into a single, repeatable document.
The concept of literate coding stood out to me. Rather than separating analysis from reporting, Quarto allows narrative and executable code to exist side-by-side. This eliminates manual copying and pasting of tables, charts, and statistics into Word or PowerPoint. Having a workflow that the reproducibility crisis discussion highlighted as vulnerable to human error can be minimized. I now understand that even small formatting decisions, such as hiding setup code, embedding resources, or structuring sections clearly, contribute to long-term credibility and clarity.
Most importantly, reproducibility protects both accuracy and memory. If I revisit a project months later, I do not need to remember what I clicked or which menu I used. The code documents the process. That shift, from “results-focused” to “process-focused,” is one of the most valuable lessons from this module.
2.2 Instructor’s Lecture About Quarto
2.2.1 Prompt
Summarize the instructor’s lecture about Quarto that you found impressive. Do you see yourself using Quarto every day for your job or study, like taking notes in your class or preparing a presentation? More eye-catching effects will be introduced in later modules.
2.2.2 Response
What impressed me most about the instructor’s lecture was the emphasis on consistency and habit-building when learning to code. The analogy comparing coding to learning a new language made the learning process feel realistic and achievable. Rather than cramming long sessions, the instructor emphasized short, consistent daily engagement, even 15 minutes a day, to maintain cognitive continuity. That advice framed coding as a skill built gradually through repetition.
I also found the practical workflow guidance helpful, especially regarding embed-resources: true and avoiding broken HTML submissions. This was an important topic for myself, as I have had that error with previous assignment submissions. The demonstration of publishing to Quarto Pub and GitHub Pages showed that Quarto is not simply a reporting tool but a platform capable of producing websites, presentations, and shareable analytical products.
Yes, I can see myself using Quarto regularly. I have tried to implement using it more frequently for class notes, outlines or other tasks allowing me to become more comfortable with the different features. It provides structure for note-taking, project planning, and analytical assignments while maintaining professional formatting. The integration of narrative and code makes it ideal for study guides and documentation. As more advanced features are introduced, I see Quarto becoming a central organizational tool in my academic and professional workflow.
2.3 Reproducibility Crisis
2.3.1 Prompt
Summarize the video presentation about reproducibility crisis. What do you think about reproducible crisis in your field? How can Quarto help address the concerns you have about reproducibility in the line of your work?
2.3.2 Response
The reproducibility crisis discussion emphasized how fragile traditional analytical workflows can be. When researchers rely on menu-driven software, copy outputs into spreadsheets, screenshot charts, and manually type results into reports, the final product becomes disconnected from the actual analytical process. Even small transcription errors can undermine credibility, and years later, it may be impossible to reconstruct the original steps.
In business and analytics contexts, I see similar risks. Teams often rely heavily on spreadsheets, dashboards, and exported reports. When filters are changed or calculations adjusted without documentation, inconsistencies emerge. Over time, version control becomes difficult, and stakeholders may question which numbers are correct.
Quarto directly addresses this concern by embedding the analytical process within the report itself. The document becomes a single source of truth: if the data changes, the output updates automatically upon rendering. This significantly reduces manual error and increases transparency. In my future work, this structured workflow would help ensure that decisions are based on documented, reproducible logic rather than isolated spreadsheet outputs.
2.4 Tidyverse Ecosystem
2.4.1 Prompt
Summarize what you learned in this module about the Tidyverse ecosystem in R. What can you do with R?
2.4.2 Response
The Tidyverse ecosystem introduced a consistent and readable approach to data analysis. Rather than memorizing disconnected functions, Tidyverse feels intuitive because its functions read like verbs (e.g., filter(), select(), mutate(), summarise()3) connected through pipes, so the workflow reads like a sentence. This structure makes analytical workflows easier to read, debug, and explain.
Show code
#| label: tidy-verb-demo
#| echo: false
#| message: false
#| warning: false
library(tidyverse)
mtcars |>
as_tibble(rownames = "model") |>
filter(cyl == 4) |>
select(model, mpg, hp, wt) |>
mutate(hp_per_wt = hp / wt) |>
summarise(
n = n(),
avg_mpg = mean(mpg),
avg_hp_per_wt = mean(hp_per_wt)
) |>
knitr::kable(
digits = 2,
caption = "Summary statistics for 4-cylinder vehicles"
)| n | avg_mpg | avg_hp_per_wt |
|---|---|---|
| 11 | 26.66 | 37.93 |
Feature used: Footnotes + inline code formatting.
How to do it (in Source mode): 1. Type a footnote marker where you want it to appear, like [^1]. 2. Scroll to the bottom of the section (or bottom of the document) and define it like this:
[^1]: Your footnote text here.
flowchart LR A[Import]--> B[Tidy]--> C[Transform]--> D[Visualize]--> E[Model]
I also learned that R extends far beyond statistical testing. R can support the full analytical lifecycle: importing data, cleaning and transforming it, visualizing patterns, modeling outcomes, and communicating results through reproducible reports. The integration between tidyverse tools and Quarto reinforces the idea that analysis and communication are not separate tasks.
Through this module, I now see R as a professional analytical environment capable of producing not just results, but structured, shareable insight.
2.5 Career-Enhancing Packages
2.5.1 Prompt
Refer to the video introducing two 20 packages. Among the various packages introduced, choose two packages beyond the Tidyverse ecosystem, base R, and Quarto that could enhance your job/career. Tell me why you think so.
2.5.2 Response
From the “20 R Packages” presentation, two packages that stood out to me are tidymodels and data.table. I chose these because they strengthen two things that matter in real work: (1) credible modeling workflows and (2) efficiency with larger datasets.
- tidymodels appeals to me because it treats modeling as a process (preprocessing → training → tuning → validation), which supports transparency and repeatability in analytics work.
- data.table stands out because performance becomes a real constraint in practice, and fast data manipulation matters when datasets scale.
Show code
knitr::kable(
tibble::tibble(
package = c("tidymodels", "data.table"),
strength = c("Structured modeling workflows", "High-performance data manipulation"),
why_it_matters = c(
"Improves credibility and repeatability for business decisions",
"Helps scale analysis when data gets large"
)
),
caption = "Two packages beyond Tidyverse/base R/Quarto that could strengthen my career toolkit"
)| package | strength | why_it_matters |
|---|---|---|
| tidymodels | Structured modeling workflows | Improves credibility and repeatability for business decisions |
| data.table | High-performance data manipulation | Helps scale analysis when data gets large |
2.6 R vs. Python Debate
2.6.1 Prompt
Refer to the debate on R vs. Python. What is your take on this debate?
2.6.2 Response
From the practitioner’s perspective, the debate between R and Python centers on applicability and job market demand. The practitioner video emphasized that Python dominates in software engineering, automation, and production machine learning systems. However, R remains strong in statistical modeling, visualization, and exploratory data analysis. The key message was not that one language is superior, but that each serves different purposes within data workflows.
The University of Michigan professor emphasized reproducible data science. R has historically excelled in literate programming through R Markdown and now Quarto. Its ecosystem was built around statistical rigor and academic transparency. Python, while extremely powerful, required additional tooling to reach the same level of integrated reproducible reporting. The academic argument focused on clarity, documentation, and reproducibility as central pillars of research integrity.
The R vs. Python debate ultimately appears less about superiority and more about context and application. Modern analytical environments increasingly emphasize interchangeability, allowing multiple languages within the same workflow. Rather than viewing the decision as either/or, it is more productive to understand the strengths each tool brings to different stages of analysis.
For my current academic and analytical goals, R provides a strong foundation in data manipulation, statistical modeling, visualization, and reproducible communication, particularly through tidyverse and Quarto. At the same time, I recognize Python’s value in broader technical ecosystems, including automation, software development, and production-level machine learning systems.
My perspective is not to choose sides, but to intentionally build complementary skills that expand flexibility and long-term adaptability in future roles.
Footnotes
The reproducibility crisis refers to the inability of researchers to recreate published findings due to undocumented workflows or manual data handling.↩︎
The reproducibility crisis refers to the inability of researchers to recreate published findings due to undocumented workflows or manual data handling.↩︎
Tidyverse verb quick glossary (with examples):
-filter()keeps rows that match a condition (think “filter down the cases”). Example: keep 4-cylinder cars →mtcars |> dplyr::filter(cyl == 4)
-select()keeps (or reorders) columns (think “select variables”). Example: keep only mpg + hp →mtcars |> dplyr::select(mpg, hp)
-mutate()creates or transforms columns (think “make a new variable”). Example: add power-to-weight →mtcars |> dplyr::mutate(hp_per_wt = hp / wt)
-summarise()collapses many rows into summary statistics (think “summarize the dataset”). Example: average mpg →mtcars |> dplyr::summarise(avg_mpg = mean(mpg))↩︎