M05-Visualizing Data with Tables for MSDM CEP

Author

Seunghee Im

Published

March 4, 2025

0.1 Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

1 Write a short essay on the following

Summarize what you learned from the videos you watched.

The video demonstrates how to use the gt package in R to create well formated tables from the palmerpenguins dataset. the key takeaways are:

1.Basic Table Creation

  • Load the gt and palmerpenguins datasets.

  • Pipe data into gt() to create a basic table.

  • Use tab_header() to add a title and subtitle.

2.Using Markdown for Formatting

  • Wrap text with md() to apply Markdown formatting.

  • Use backticks for code-like font and bold or italic text.

3.Summarizing Data before Tabulation

  • Use the tidyverse (dplyr, tidyr) to group data (e.g., by species).

  • Apply summarize() to calculate means (e.g., bill length, body mass).

4.Improving Table Appearance

  • rowname_col(): Moves categorical labels to the left column for better readability.

  • cols_label(): Renames columns and adds units (e.g., “Bill Length (mm)”).

  • fmt_number(): Formats numeric values, controls decimal places, and adds digit separators.

  • Scaling Values: Converts grams to kilograms using scale_by.

5.Adjusting Column Widths

  • Uses cols_width() to set consistent column sizes.

  • Adjusts stub (first column) width separately.

6.Adding Footnotes and Source Notes

  • tab_source_note(): Adds general notes (e.g., dataset source).

  • tab_footnote(): Links specific notes to data points (e.g., “Gentoo is the largest species”).

7.Styling Tables with Colors

  • tab_style(): Highlights rows or specific cells with cell_fill() (background color) and cell_text() (text color).

  • Uses color names like steelblue or HEX codes for styling.

8.Final Touches

  • tab_stubhead(): Adds a label to the top-left corner (e.g., “Penguin Species”).

  • opt_table_font(): Changes the font (e.g., Google Fonts like Montserrat).

  • opt_footnote_marks(): Customizes footnote markers.

    Key Benefits of Using gt: Easy declarative syntax for styling tables, Seamless integration with Markdown and HTML, Powerful footnote management, Advanced customization with colors and widths, Auto-formatting of numbers and scaling options.

What did you like about the gt and gtExtra packages demonstrated in the videos?

The gt and gtExtras packages I like are powerful for creating well-structured, visually appealing tables with minimal effort. gt ensures simplicity and readability with intuitive syntax and strong default formatting, while still allowing advanced customization through functions like tab_style()cols_label(), and tab_footnote() for better organization and presentation. It supports smart grouping, making it easy to structure data effectively, and data-driven styling allows conditional formatting, such as highlighting negative values. The gtExtras package enhances gt with pre-built themes (e.g., 538, NYT, Guardian) and embedded visual elements like sparklines, bar charts, and density plots for deeper insights. The packages offer thoughtful defaults but provide flexibility for full customization. Seamless integration with the Tidyverse ensures a smooth workflow for summarizing and reshaping data before formatting. Overall, gt and gtExtrassimplify table creation while offering extensive customization and visualization options, making them ideal for high-quality data presentation.

How do tables complement charts for data visualization?

Tables complement charts for data visualization by providing precise, structured data alongside visual representations. The gt and gtExtras packages demonstrate this by integrating icons, distributions, and formatting enhancementswithin tables. In the video, GT plot summary automatically generates a table with key statistics like means, medians, and missing values, offering a quick snapshot of the dataset. The ability to insert visual elements, such as distribution plots within table cells, enhances understanding by combining numeric precision with visual cues. Grouping data (e.g., by the number of cylinders in the mtcars dataset) helps structure insights, while themes (e.g., Guardian theme) make tables visually appealing and easier to read. Overall, tables complement charts by balancing numerical accuracy with visual clarity, making data more accessible and informative.

Under what circumstances would you prefer to use tables over charts in visualization data?

I would prefer to use a table over a chart when exact numerical values are important, such as in financial reports, scientific data, or statistical summaries where precision matters. Tables are ideal for comparing small datasets and presenting multiple categories of data, allowing for structured viewing and quick cross-referencing. When dealing with mixed data types (e.g., product lists with prices, availability, and ratings), tables accommodate diverse formats more effectively. They are also useful for data auditing and verification, where reviewing raw data for accuracy, anomalies, or missing values is necessary. If the dataset lacks strong trends or patterns that benefit from visualization, tables offer a more straightforward way to present the numbers. Additionally, for printed reports, regulatory filings, and academic papers, tables provide a structured, easy-to-reference format. In interactive dashboards, they allow users to sort, filter, and search specific values for deeper exploration. While charts are great for spotting trends, tables ensure that data remains precise, detailed, and accessible, making them the preferred choice when accuracy and direct comparison are the priority.

2 Use the data for your MSDM CEP. Pick some variables relevant to your Analytics Objectives and share some descriptive statistics using gt and gtExtra together. This table can be presented for your presentation in IBM 6800. In this table, incorporate the following at a minimum.

Column Hiding & Moving


Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tibble)
library(scales)

# Create a dataset
data <- tibble::tibble(
  Category = c(
    "SaaS Market", "SaaS Market", "Workforce", "Workforce", "Revenue", "Revenue", "Competitors"
  ),
  Metric = c(
    "Number of SaaS Companies (Global)", 
    "Number of SaaS Companies (US)",
    "Product Managers (LinkedIn Identified, 2023)", 
    "Total Product Managers (Global, 2023)",
    "Annual Sales (2023, USD Billion)", 
    "Projected Annual Sales (2032, USD Billion)",
    "Dominant Players"
  ),
  Value = c(
    4326, 1985, 698909, 2494346, 7.36, 15.4, 
    "Aha!, ProdPad, ProductBoard, ProductPlan, Roadmunk, Jira Product Discovery, Craft.io, Airfocus"
  )
)

# Create a GT table with column hiding & moving
data %>%
  gt() %>%
  tab_header(title = "Productfolio SaaS Market Statistics") %>%
  cols_label(
    Category = "Market Segment",
    Metric = "Metric",
    Value = "Value"
  ) %>%
  cols_hide(columns = c(Category)) %>%  # Hide the "Category" column
  cols_move(
    columns = c(Value),  # Move the "Value" column
    after = Metric  # Place it after the "Metric" column
  ) %>%
  gt_theme_538()  # Apply a theme from gtExtras
Productfolio SaaS Market Statistics
Metric Value
Number of SaaS Companies (Global) 4326
Number of SaaS Companies (US) 1985
Product Managers (LinkedIn Identified, 2023) 698909
Total Product Managers (Global, 2023) 2494346
Annual Sales (2023, USD Billion) 7.36
Projected Annual Sales (2032, USD Billion) 15.4
Dominant Players Aha!, ProdPad, ProductBoard, ProductPlan, Roadmunk, Jira Product Discovery, Craft.io, Airfocus

Code with Title, Subtitle, Column Merging,Relabeling and Text Transformation.

# Load necessary libraries
library(gt)
library(gtExtras)
library(dplyr)
library(tibble)
library(scales)

# Create a dataset
data <- tibble::tibble(
  Category = c(
    "SaaS Market", "SaaS Market", "Workforce", "Workforce", "Revenue", "Revenue", "Competitors"
  ),
  Metric = c(
    "Number of SaaS Companies (Global)", 
    "Number of SaaS Companies (US)",
    "Product Managers (LinkedIn Identified, 2023)", 
    "Total Product Managers (Global, 2023)",
    "Annual Sales (2023, USD Billion)", 
    "Projected Annual Sales (2032, USD Billion)",
    "Dominant Players"
  ),
  Growth_Rate = c("5% YoY", "3% YoY", "7% YoY", "6% YoY", "10% CAGR", "12% CAGR", "N/A")
)

# Create a GT table with text transformation
tab <- data %>%
  gt() %>%
  tab_header(
    title = md("**Productfolio SaaS Market Statistics**"),
    subtitle = md("*Key growth metrics and projections for the SaaS industry.*")
  ) %>%
  text_transform(
    locations = cells_body(columns = Growth_Rate),
    fn = function(x) {
      # Extract the percentage growth value
      growth_value <- substr(x, 1, regexpr("%", x) - 1)
      
      # Determine growth type using `dplyr::case_when()`
      growth_type <- dplyr::case_when(
        grepl("YoY", x) ~ "Year-over-Year Growth",
        grepl("CAGR", x) ~ "Compound Annual Growth Rate",
        TRUE ~ "Not Available"
      )
      
      # Format the transformed text with HTML styling
      paste(growth_value, "%<br><em>", growth_type, "</em>")
    }
  ) %>%
  cols_label(
    Category = "Market Segment",
    Metric = "Industry Metric",
    Growth_Rate = "Growth Rate"
  ) %>%
  gt_theme_538()  # Apply a polished theme from gtExtras

# Show the table
tab
Productfolio SaaS Market Statistics
Key growth metrics and projections for the SaaS industry.
Market Segment Industry Metric Growth Rate
SaaS Market Number of SaaS Companies (Global) 5 %
Year-over-Year Growth
SaaS Market Number of SaaS Companies (US) 3 %
Year-over-Year Growth
Workforce Product Managers (LinkedIn Identified, 2023) 7 %
Year-over-Year Growth
Workforce Total Product Managers (Global, 2023) 6 %
Year-over-Year Growth
Revenue Annual Sales (2023, USD Billion) 10 %
Compound Annual Growth Rate
Revenue Projected Annual Sales (2032, USD Billion) 12 %
Compound Annual Growth Rate
Competitors Dominant Players %
Not Available

Continue from Previous table+Code with Footnote & Source

# Load necessary libraries
library(gt)
library(gtExtras)
library(dplyr)
library(tibble)
library(scales)

# Create a dataset
data <- tibble::tibble(
  Category = c(
    "SaaS Market", "SaaS Market", "Workforce", "Workforce", "Revenue", "Revenue", "Competitors"
  ),
  Metric = c(
    "Number of SaaS Companies (Global)", 
    "Number of SaaS Companies (US)",
    "Product Managers (LinkedIn Identified, 2023)", 
    "Total Product Managers (Global, 2023)",
    "Annual Sales (2023, USD Billion)", 
    "Projected Annual Sales (2032, USD Billion)",
    "Dominant Players"
  ),
  Value = c(
    4326, 1985, 698909, 2494346, 7.36, 15.4, 
    "Aha!, ProdPad, ProductBoard, ProductPlan, Roadmunk, Jira Product Discovery, Craft.io, Airfocus"
  )
)

# Create a GT table with a footnote and source note
tab <- data %>%
  gt() %>%
  tab_header(
    title = md("**Productfolio SaaS Market Statistics**"),
    subtitle = md("*Key industry insights, including company count, workforce size, and revenue projections.*")
  ) %>%
  # Add a footnote explaining the revenue projection
  tab_footnote(
    footnote = "Revenue projections are based on estimated CAGR growth trends from industry reports.",
    locations = cells_body(columns = Value, rows = 6)  # Attaching footnote to "Projected Annual Sales (2032)"
  ) %>%
  # Add another footnote to define 'Dominant Players'
  tab_footnote(
    footnote = "Dominant players include major SaaS companies with significant market share in the product management space.",
    locations = cells_body(columns = Value, rows = 7)  # Attaching footnote to "Dominant Players"
  ) %>%
  # Add a source note at the bottom of the table
  tab_source_note(
    source_note = md(
      "**Sources:**  
      - Ahlgren, O., & Dalentoft, J. (n.d.). *Collecting and integrating customer feedback.*  
      - Arora, S., & Khare, P. (2024). *The Role of Machine Learning in Personalizing User Experiences in SaaS Products*, c809-c821.  
      - Faber, T. (2023, November). *Collaboration & Project Management Software in the US.* [IBISWorld](https://my-ibisworld-com.proxy.library.cpp.edu/us/en/industry-specialized/OD6191/at-a-glance)  
      - Google. *Ads Transparency.* [Google Ads](https://adstransparency.google.com/?region=US). Accessed **February 6th, 2025**.  
      - Google Analytics GA 4. *Demographic Details Report.* Google, 2025. [Analytics.Google.com](https://analytics.google.com)  
      - LinkedIn. *LinkedIn Ad Library.* Accessed **February 6th, 2025**. [LinkedIn Ad Library](https://www.linkedin.com/ad-library)  
      "
    )
  ) %>%
  cols_label(
    Category = "Market Segment",
    Metric = "Industry Metric",
    Value = "Value"
  ) %>%
  gt_theme_538()  # Apply a polished theme from gtExtras

# Show the table
tab
Productfolio SaaS Market Statistics
Key industry insights, including company count, workforce size, and revenue projections.
Market Segment Industry Metric Value
SaaS Market Number of SaaS Companies (Global) 4326
SaaS Market Number of SaaS Companies (US) 1985
Workforce Product Managers (LinkedIn Identified, 2023) 698909
Workforce Total Product Managers (Global, 2023) 2494346
Revenue Annual Sales (2023, USD Billion) 7.36
Revenue Projected Annual Sales (2032, USD Billion) 15.41
Competitors Dominant Players Aha!, ProdPad, ProductBoard, ProductPlan, Roadmunk, Jira Product Discovery, Craft.io, Airfocus2
Sources:
- Ahlgren, O., & Dalentoft, J. (n.d.). Collecting and integrating customer feedback.
- Arora, S., & Khare, P. (2024). The Role of Machine Learning in Personalizing User Experiences in SaaS Products, c809-c821.
- Faber, T. (2023, November). Collaboration & Project Management Software in the US. IBISWorld
- Google. Ads Transparency. Google Ads. Accessed February 6th, 2025.
- Google Analytics GA 4. Demographic Details Report. Google, 2025. Analytics.Google.com
- LinkedIn. LinkedIn Ad Library. Accessed February 6th, 2025. LinkedIn Ad Library
1 Revenue projections are based on estimated CAGR growth trends from industry reports.
2 Dominant players include major SaaS companies with significant market share in the product management space.

3 Explain why the table is worth adding to MSDM project written report or presentation slides.

The table is worth adding to MSDM project written report and presentation slides because it summarizes key industry insights in a clear, structed, and data-driven format. Given that your project focuses on digital marketing optimization for Productfolio, the table provides essential market context, competitive landscape, and revenue projections, making it a crucial visual aid for decision-making and strategic planning.

3.1 Key Benefits of Including Table:

Enhance Credibility with Data-Driven Insights

The table presents factual, quantitative data about the SaaS industry, including market size, workforce distribution, and revenue forecasts.

Simplfies Complex Information for Your Audience

Instead of dense paragraphs, the table condenses essential statistics into an easily digestible format.

Strengthens Your Marketing & Industry Analysis Sections

Supports Slide 9 (Industry Analysis) by quantifying market size and projections, Aligns with Slide 10(Customer Analysis) by illustrating workforce and market segmentation and Enhances Slide 11 (SWOT Analysis) by providing competitive landscape data.

Provides a Visual Representation of Key Data for Better Engagement

In presentation, data tables are visually appealing and help maintain audience attention.