Analytics Types & Visualization

Learning Analytics — Analytics & Visualization (Required)

Author

Juliette Duthoit

Published

June 12, 2026


Learning objectives

By the end of this file, you will be able to:

  • Simulate and save an educational dataset in R
  • Apply descriptive analytics using colMeans() and rowMeans()
  • Reshape data from wide to long format using pivot_longer()
  • Create and interpret scatter plots, bar plots, line plots, and histograms
  • Compute and interpret correlation between two variables
  • Apply the analytics type (descriptive, diagnostic, predictive) to real questions

The analytics types — a reminder

Before coding, connect each technique to the type of question it answers:

Analytics type Question Technique used in this file
Descriptive What happened? Summary stats, bar plot, histogram
Diagnostic Why did it happen? Scatter plot, correlation
Predictive What will happen next? Regression line, risk flagging

Keep this table in mind as you work through the exercises below. Every output you produce should be connected to one of these questions.


Part 1 · Creating and saving a simulated dataset

Instead of loading existing data, we will create our own simulated dataset. This teaches you how data is structured in R — useful when you need to build a small dataset from scratch for testing or teaching.

Creating the dataset

# set.seed() makes the random data reproducible —
# everyone running this code gets the same values
set.seed(42)

data_lms <- data.frame(
  Student_ID = paste("Student", 1:40, sep = "_"),
  Week_1  = sample(6:20, 40, replace = TRUE),
  Week_2  = sample(6:20, 40, replace = TRUE),
  Week_3  = sample(6:20, 40, replace = TRUE),
  Week_4  = sample(6:20, 40, replace = TRUE),
  Week_5  = sample(6:20, 40, replace = TRUE),
  Week_6  = sample(6:20, 40, replace = TRUE),
  Week_7  = sample(6:20, 40, replace = TRUE),
  Week_8  = sample(6:20, 40, replace = TRUE),
  Week_9  = sample(6:20, 40, replace = TRUE),
  Week_10 = sample(6:20, 40, replace = TRUE),
  Week_11 = sample(6:20, 40, replace = TRUE),
  Week_12 = sample(6:20, 40, replace = TRUE),
  Week_13 = sample(6:20, 40, replace = TRUE),
  Week_14 = sample(6:20, 40, replace = TRUE),
  Week_15 = sample(6:20, 40, replace = TRUE),
  Week_16 = sample(6:20, 40, replace = TRUE)
)
data_lms
# Inspect the first few rows
head(data_lms)

Question: What does sample(6:20, 40, replace = TRUE) do? What would change if you set replace = FALSE? As always, use your own words to answer the question.

  • [Each column receives 40 entries randomly selected between the number 6 and 20 (included). So, 40 draws of a number from 6 to 20. Since its only 15 possible numbers, I am guessing “Replace= TRUE” means the same number can be drawn/ selected more than once and therefore, if we put “replace = FALSE”, we would not be able to have 40 entries are there are only 15 possibilities. We would get either NA values or an error, I suppose.

Saving the dataset

# Save as a CSV file in your project folder
write.csv(data_lms, "40_students_LMS_time_spent.csv", row.names = FALSE)

# Confirm it saved — check your Files pane for the new file

Question: Why is it important to be able to create and save datasets manually, rather than only working with provided data?

  • [Sometimes you want to craft your own data for testing or debugging purposes.]

Part 2 · Descriptive analytics — what happened?

Summary statistics

# Summary of all weekly columns (excluding Student_ID column)
summary_stats <- summary(data_lms[, -1])
summary_stats
     Week_1          Week_2          Week_3          Week_4     
 Min.   : 6.00   Min.   : 6.00   Min.   : 6.00   Min.   : 6.00  
 1st Qu.: 9.00   1st Qu.: 8.00   1st Qu.: 9.75   1st Qu.:12.00  
 Median :13.00   Median :10.50   Median :12.50   Median :14.50  
 Mean   :12.32   Mean   :10.75   Mean   :12.53   Mean   :14.28  
 3rd Qu.:15.00   3rd Qu.:13.00   3rd Qu.:16.00   3rd Qu.:18.00  
 Max.   :20.00   Max.   :20.00   Max.   :19.00   Max.   :20.00  
     Week_5          Week_6          Week_7          Week_8     
 Min.   : 6.00   Min.   : 6.00   Min.   : 6.00   Min.   : 6.00  
 1st Qu.: 8.50   1st Qu.: 9.00   1st Qu.: 8.75   1st Qu.:10.75  
 Median :13.50   Median :15.00   Median :14.00   Median :13.50  
 Mean   :13.18   Mean   :13.22   Mean   :13.60   Mean   :13.05  
 3rd Qu.:17.00   3rd Qu.:17.00   3rd Qu.:18.25   3rd Qu.:16.00  
 Max.   :20.00   Max.   :20.00   Max.   :20.00   Max.   :20.00  
     Week_9         Week_10         Week_11         Week_12     
 Min.   : 6.00   Min.   : 6.00   Min.   : 6.00   Min.   : 6.00  
 1st Qu.:10.00   1st Qu.: 9.00   1st Qu.:10.75   1st Qu.: 8.00  
 Median :14.00   Median :13.00   Median :14.00   Median :12.00  
 Mean   :13.05   Mean   :12.75   Mean   :13.47   Mean   :12.12  
 3rd Qu.:16.00   3rd Qu.:16.00   3rd Qu.:17.00   3rd Qu.:15.00  
 Max.   :20.00   Max.   :20.00   Max.   :20.00   Max.   :19.00  
    Week_13         Week_14         Week_15         Week_16     
 Min.   : 6.00   Min.   : 6.00   Min.   : 6.00   Min.   : 6.00  
 1st Qu.: 9.75   1st Qu.:10.75   1st Qu.:10.75   1st Qu.:10.00  
 Median :14.00   Median :14.00   Median :15.00   Median :15.00  
 Mean   :12.90   Mean   :13.70   Mean   :13.75   Mean   :13.65  
 3rd Qu.:16.00   3rd Qu.:17.25   3rd Qu.:17.00   3rd Qu.:17.00  
 Max.   :20.00   Max.   :20.00   Max.   :20.00   Max.   :20.00  

Question: What insights do you gain from the summary? Pick one week and describe what the min, median, and max values tell you about student engagement that week.

  • [The minimal engagement is always 6, when the max is always 19 or 20 (not surprising, since we picked randomly between 6 and 20). The median is around 12 to 15 (one outlier at 10.5), so most students are moderately engaged most weeks. Some weeks have higher medians and 3rd quartiles and could be considered peak engagement (week 4, 6, 7, 15, 16); week 2 has the lowest median, quite different than others (10.5 vs others have 12 at minimum) so it is a surprising dip. If we look at week 4, The minimum engagement is 6 so at least one person barely engaged (struggling?). The median is 14.5 so 50% of the students were at 14.5 or higher, which means the engagement was quite good that week. The Max is 20 so at least one student was highly engaged.]

Average time spent per week (colMeans)

# Select only the Week columns explicitly using grep()
# This protects against any extra columns added later (Semester_Average etc.)
# that would break names(average_time) if included accidentally
week_cols    <- grep("^Week_", names(data_lms), value = TRUE)
average_time <- colMeans(data_lms[, week_cols])
average_time
 Week_1  Week_2  Week_3  Week_4  Week_5  Week_6  Week_7  Week_8  Week_9 Week_10 
 12.325  10.750  12.525  14.275  13.175  13.225  13.600  13.050  13.050  12.750 
Week_11 Week_12 Week_13 Week_14 Week_15 Week_16 
 13.475  12.125  12.900  13.700  13.750  13.650 

Question: If some weeks show notably higher or lower average time, what actions might an instructor take?

  • [notably higher can mean student are engaging well, but it could also mean the work that week is too complicated and demands more work; notably less time could mean there is a sever issue of interest in the week’s content or that the content is too easy in comparison to the rest, demanding less engagement. Ultimately, extreme difference will suggest to change something in workload or in content in general.]

Each student’s semester average (rowMeans)

# rowMeans() calculates the mean across columns for each row (each student)
data_lms$Semester_Average <- rowMeans(data_lms[, 2:17])

head(data_lms |> select(Student_ID, Semester_Average))

Task: Calculate the average time spent for only Weeks 1–5 and save it as early_semester_average. Add it to the data frame.

# YOUR CODE HERE
# Hint: weeks 1–5 are columns 2–6 in the data frame.
# Follow the same pattern as the row-means chunk above,
# but change the column range to cover only the first 5 weeks.
data_lms$early_semester_average <- rowMeans(data_lms[, 2:6])

head(data_lms |> select(Student_ID, early_semester_average))

Question: How could the early semester average help an instructor identify at-risk students before midterm?

  • [Early semester average will show if one or several students are behaving differently from the rest of the class. We could see the overachiever (at risk of burnout or spending too much time to complete the content) and the underachiever (going too fast, not engaging enough with content). ]

Part 3 · Visualization — bar plot and line plot

Prepare data for plotting

# Confirm average_time exists and has names before reshaping
# This prevents the "zero-length variable name" error
stopifnot(
  "Run the col-means chunk first" = exists("average_time"),
  "average_time has no names"     = !is.null(names(average_time)),
  "average_time is empty"         = length(average_time) > 0
)

average_time_table <- data.frame(
  Week               = factor(names(average_time), levels = names(average_time)),
  Average_Time_Spent = average_time
)

# Quick check — should show 16 rows, one per week
nrow(average_time_table)
[1] 16
head(average_time_table)

Bar plot — average time per week

ggplot(average_time_table, aes(x = Week, y = Average_Time_Spent)) +
  geom_bar(stat = "identity", fill = "#1D9E75", color = "white") +
  labs(
    title = "Average Time Spent per Week",
    x = "Week",
    y = "Average Hours"
  ) +
  theme_minimal() +
  theme(
    plot.title  = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Average LMS time spent per week across all 40 students

Line plot — trend over time

ggplot(average_time_table, aes(x = Week, y = Average_Time_Spent, group = 1)) +
  geom_line(color = "#185FA5", linewidth = 1.2) +
  geom_point(color = "#185FA5", size = 3) +
  labs(
    title = "Trend of Average Time Spent per Week",
    x = "Week",
    y = "Average Hours"
  ) +
  theme_minimal() +
  theme(
    plot.title  = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Trend of average LMS time across the semester

Question: What differences do you notice between the bar plot and the line plot? Which is more effective for showing a trend and why? Use your own words.

  • [The bar plot has a y axes that goes fom 0 to 15, when the line plot goes from the minimum to the maximum average (10.5 to 14.5). The bar plot has too big of a spam, so it flattens the differences, when the line plots only accntuates the difference as it is a close-up on our exact numbers. ]

Line plot — individual students

# Reshape from wide to long format for individual student lines
data_long <- data_lms |>
  pivot_longer(
    cols      = starts_with("Week"),
    names_to  = "Week",
    values_to = "TimeSpent"
  ) |>
  mutate(
    Week = factor(Week, levels = paste0("Week_", 1:16))
   )
 
ggplot(data_long, aes(x = Week, y = TimeSpent,
                      group = Student_ID, color = Student_ID)) +
  geom_line(alpha = 0.5) +
  labs(
    title = "Weekly Time Spent by Each Student",
    x = "Week",
    y = "Hours"
  ) +
  theme_minimal() +
  theme(
    plot.title     = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.text.x    = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

Weekly LMS time for all 40 students

Question: What patterns do you notice when looking at all 40 students at once? Is this visualization easy to interpret? Why or why not?

  • [Looking at all 40 lines, it is obvious that every student varied during the semester. No one has a straight line, they appear to all have mounts and dips. We can also see that every week has a mix of high and low; no week has all student high or low. This graph is NOT easy to interpret, in my opinion. Too many elements and colors, it makes things complicated. It’s very hard to select one line and try to see its evolution through the semester.]

Line plot — selected students only

Task: Choose 5 students you want to compare and update the code below.

# YOUR CODE HERE
# Step 1: Choose 5 Student_IDs from the data and filter for them.
#         Student IDs are in the format "Student_1", "Student_2", etc.
#         Pick students whose patterns you find interesting to compare —
#         for example, mix high and low average engagement.
#
data_only_5 <- data_lms |>
  filter(Student_ID %in% c("Student_10", "Student_19", "Student_23", "Student_33", "Student_34"))

# Step 2: Reshape with pivot_longer() — same as the lineplot-all chunk above.
#

data_long2 <- data_only_5 |>
  pivot_longer(
    cols      = starts_with("Week"),
    names_to  = "Week",
    values_to = "TimeSpent"
  ) |>
  mutate(
    Week = factor(Week, levels = paste0("Week_", 1:16))
   )

# Step 3: Plot with ggplot() — copy the structure from lineplot-all
#         and adjust the title and legend position.

ggplot(data_long2, aes(x = Week, y = TimeSpent,
                      group = Student_ID, color = Student_ID)) +
  geom_line(alpha = 0.5) +
  labs(
    title = "Weekly Time Spent by 5 Student",
    x = "Week",
    y = "Hours"
  ) +
  theme_minimal() +
  theme(
    plot.title     = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.text.x    = element_text(angle = 45, hjust = 1),
    legend.position = "bottom"
  )

Weekly LMS time for selected students

Question: What insights do you gain from this focused view? What design decisions did you make in choosing these five students?

  • [I chose the student with the lowest early semester engagement avergae, and the two with the highest, and then I picked two students with mid early semester average. I can see that the early semester average is not very predictive for the rest of the semester for those students : the lowest average got a good 6 weeks of high engagement, but did low and the other weeks, and the highest started very strong but has very low pits through the semester. I can see some common peaks between the students, but no very clear patterns. The design decision is, I suppose, sampling the sample and hope for a less messy representation. ]

Histogram — semester averages

ggplot(data_lms, aes(x = Semester_Average)) +
  geom_histogram(binwidth = 1, fill = "#378ADD", color = "white") +
  labs(
    title = "Distribution of Semester Average Time Spent",
    x = "Semester Average (hours/week)",
    y = "Number of Students"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5))

Distribution of semester averages across 40 students

Part 4 · Diagnostic analytics — why did it happen?

Now we switch to the sci-online-classes dataset to explore the relationship between time spent and final grades.

# Load the dataset used in the previous module
# Make sure sci-online-classes.csv is in your data folder
data_sci <- read_csv("data/sci-online-classes.csv") |>
  clean_names()

glimpse(data_sci)
Rows: 603
Columns: 30
$ student_id            <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id             <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned   <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned     <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject               <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "FrSc…
$ semester              <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section               <chr> "02", "01", "01", "01", "01", "03", "01", "01", …
$ gradebook_item        <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ grade_category        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ final_grade_cems      <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ points_possible       <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ points_earned         <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ gender                <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2                    <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3                    <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5                    <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6                    <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7                    <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8                    <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9                    <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10                   <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ time_spent            <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ time_spent_hours      <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ time_spent_std        <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int                   <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc                    <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv                    <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
Note

This is the same dataset from previous module. We are reloading it here because the LMS time data (Parts 1–3) and the sci-online-classes data (Part 4) are separate files. Reloading makes this file self-contained.

Scatter plot with regression line

ggplot(data_sci,
       aes(x = time_spent_hours, y = final_grade_cems)) +
  geom_point(color = "#185FA5", size = 2.5, alpha = 0.6) +
  geom_smooth(method = "lm", color = "#993C1D", se = TRUE) +
  labs(
    title = "Time Spent vs. Final Grade",
    x = "Time Spent on LMS (hours)",
    y = "Final Grade"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5))

Relationship between time spent and final grade

Question: Based on the scatter plot, what do you expect the relationship between time spent and final grades to be? Write your hypothesis before looking at the correlation.

  • [Looking at the plot, I am thinking there might be a positive correlation, but it would be weak. We see lots of points away from the lines, and we see lots of students who spent little time but had high grades, as well as some who spent lots of time but only got mid-range grades..]

Correlation

# cor() computes the Pearson correlation coefficient
# use = "complete.obs" ignores rows with missing data
correlation <- cor(data_sci$time_spent_hours,
                   data_sci$final_grade_cems,
                   use = "complete.obs")

correlation
[1] 0.3654121
TipInterpreting correlation
  • Values close to +1: strong positive relationship (more time → higher grade)
  • Values close to -1: strong negative relationship
  • Values close to 0: little or no linear relationship
  • This is NOT a statistics course — focus on interpreting what this number means for learners, not on p-values.

Question: With both the scatter plot and the correlation value in front of you, what can you say about the relationship between time spent and final grades? What would you recommend to an instructor based on this finding?

  • [Time spent studying is slightly correlated with final grades: as time is spent, grade increases. It is not a strong relationship, though. It’s weak to moderate. Human behavior is difficult to predict + we need to take into account that some students might have been absent and grades might have been influenced by other factors (sickness etc). But if we consider that there are no outliers, we can tell the instructor that in their course, more time does not always translate as better grade, which means there is might be an issue with the LMS content - maybe the activity don’t fully align with the assessment? It would be useful to see how the time is spent by the student rather than just how much time is spent. The instructor should still identify the student who don’t spend a lot of time on the material early on and check what is going on. ]

Practice — grouped summary by subject

Task: Using data_sci, calculate the mean final_grade_cems and mean time_spent_hours grouped by subject. Arrange by mean grade descending. Which subject has the highest average grade? Is it also the subject with the most time spent?

TipHint

You have used group_by() and summarise() in the previous file. Apply the same pattern here with a different grouping variable. If you need a column name reminder, run names(data_sci) in the Console.

# YOUR CODE HERE
# Steps: group_by(subject) |> summarise(mean_grade = ..., mean_time = ...) |> arrange(desc(...))

grouped_data <- data_sci |>
  group_by(subject) |>
  summarise(
    mean_grade = mean(final_grade_cems, na.rm = TRUE),
    mean_time = mean(time_spent_hours, na.rm = TRUE)
    ) |>
  arrange(desc(mean_grade))
grouped_data

Question: Does the subject with the highest average grade also have the most time spent? What might explain any differences you find?

  • [PhysA has the highest average grade (83.6) but only the second but one least time spent (23.6 vs. highest at 40.1). Having such high grades with lwoer time spent could mean that the course is “too easy”, or that the course is very well designed, with no wondering around trying to find material and efficient explanation of course concepts. I believe time spent is expected to behave differently depending on subjects - some subject do demand more practice activities etc, when others are more straight forward. On the other hand, we can see that the lower average grade (BioA with 65.1) also has the least time spent (21.22), which hints more clearly at a course design issue. ]

Part 5 · Box plot

A box plot shows the distribution of a variable across categories — useful for comparing groups and spotting outliers.

ggplot(data_sci, aes(x = gender, y = final_grade_cems, fill = gender)) +
  geom_boxplot(color = "gray30",
               outlier.colour = "#993C1D",
               outlier.shape  = 16,
               outlier.size   = 2) +
  scale_fill_manual(values = c("F" = "#E1F5EE", "M" = "#E6F1FB")) +
  labs(
    title = "Final Grade Distribution by Gender",
    x     = "Gender",
    y     = "Final Grade"
  ) +
  theme_minimal() +
  theme(
    plot.title     = element_text(size = 14, face = "bold", hjust = 0.5),
    legend.position = "none"
  )

Final grade distribution by gender

Question: What does the box plot tell you about the distribution of final grades by gender? Are there differences worth investigating?

  • [Women seem to have a slightly bigger spread of grades than men, but the medians are very similar. The first and 3rd quartile are almost identical as well, so Gender does not seem to be a factor into grades. We also have a lot of outliers for each genders in the low grades (drop outs etc). So, based on this plot, there is no difference worth investigating.

Final reflection

After completing both the LMS time analysis and the sci-online-classes analysis, reflect on the following:

Question: How could these analytics techniques be applied in a real classroom or course design context? Describe one specific scenario — from your track (K–12 or ID/higher ed) — where the combination of a bar plot, line plot, and correlation would help an educator or designer make a better decision.

  • [In a real course design context, we can use those analytics to monitor students’ engagement and performance early in the semester in online courses. This could help an instructor identify at-risk students. The line plot can track engagement through the semester and see dips and peaks. It could be a plot with all the students, and you could select only one student in comparison to the top 3, the lower 3, and the average of the section. The bar plot could help comparing sections or any groups (labs, etc) - average grades, average engagement. It could also be used to compare avergae grade per assignment type within one section, for instance. The correlation shoudl be between time spent and final grades, or maybe between time spent on a module and the grade on the assessment at the end of that module. All together, those graph would help the instructor make a decisions by identifying at-risk students (drops in engagement, etc) and prepare interventions, and it would also help the instructor improve their course design by evaluating if the engagement is meaninful (time well spent?) or identifying weeks that lose engagement.]

Render & submit

Step 1 — Add your name

Change the author: field in the YAML header at the top to your name.

Step 2 — Render

Click Render in the toolbar. A formatted HTML page will appear in your Viewer tab or a new browser window. Check the Console for any error messages if the render fails.

Step 3 — Publish

Option Best for Link
Posit Cloud Quickest — one click from your workspace Guide
RPubs Free, public, easy to share a link rpubs.com
Quarto Pub Clean public portfolio pages Guide
GitHub Pages Best for a professional portfolio Guide
TipE-portfolio tip

This document shows three levels of analytics work: descriptive (summary statistics and bar plots), trend analysis (line plots), and diagnostic (scatter plot and correlation). Together they demonstrate a complete analytical workflow that is worth showcasing in a professional portfolio.

Share your published link with your instructor once you have rendered and published. Post in the course discussion board if you run into any technical issues.