# set.seed() makes the random data reproducible —
# everyone running this code gets the same values
set.seed(42)
data_lms <- data.frame(
Student_ID = paste("Student", 1:40, sep = "_"),
Week_1 = sample(6:20, 40, replace = TRUE),
Week_2 = sample(6:20, 40, replace = TRUE),
Week_3 = sample(6:20, 40, replace = TRUE),
Week_4 = sample(6:20, 40, replace = TRUE),
Week_5 = sample(6:20, 40, replace = TRUE),
Week_6 = sample(6:20, 40, replace = TRUE),
Week_7 = sample(6:20, 40, replace = TRUE),
Week_8 = sample(6:20, 40, replace = TRUE),
Week_9 = sample(6:20, 40, replace = TRUE),
Week_10 = sample(6:20, 40, replace = TRUE),
Week_11 = sample(6:20, 40, replace = TRUE),
Week_12 = sample(6:20, 40, replace = TRUE),
Week_13 = sample(6:20, 40, replace = TRUE),
Week_14 = sample(6:20, 40, replace = TRUE),
Week_15 = sample(6:20, 40, replace = TRUE),
Week_16 = sample(6:20, 40, replace = TRUE)
)
# Inspect the first few rows
head(data_lms)Analytics Types & Visualization
Learning Analytics — Analytics & Visualization (Required)
Learning objectives
By the end of this file, you will be able to:
- Simulate and save an educational dataset in R
- Apply descriptive analytics using
colMeans()androwMeans() - Reshape data from wide to long format using
pivot_longer() - Create and interpret scatter plots, bar plots, line plots, and histograms
- Compute and interpret correlation between two variables
- Apply the analytics type (descriptive, diagnostic, predictive) to real questions
The analytics types — a reminder
Before coding, connect each technique to the type of question it answers:
| Analytics type | Question | Technique used in this file |
|---|---|---|
| Descriptive | What happened? | Summary stats, bar plot, histogram |
| Diagnostic | Why did it happen? | Scatter plot, correlation |
| Predictive | What will happen next? | Regression line, risk flagging |
Keep this table in mind as you work through the exercises below. Every output you produce should be connected to one of these questions.
Part 1 · Creating and saving a simulated dataset
Instead of loading existing data, we will create our own simulated dataset. This teaches you how data is structured in R — useful when you need to build a small dataset from scratch for testing or teaching.
Creating the dataset
Question: What does sample(6:20, 40, replace = TRUE) do? What would change if you set replace = FALSE? As always, use your own words to answer the question.
- [I wasn’t sure what these meant so I ask NAMU, and this was the response to help me understand. This command generates 40 random integers between 6 and 20, and some of them can appear more than once because replacement is allowed. This means that each one is used them then put back in place and it may be used again. So if True is replace with FALSE this means there are only 15 draws allowed. NAMU said, “If you still ask for 40 (sample(6:20, 40, replace = FALSE)), R will give an error — because it can’t pick 40 unique values when there are only 15 numbers available.” This means there can only be 15 draws out of the 40 values.I think…]
Saving the dataset
# Save as a CSV file in your project folder
write.csv(data_lms, "40_students_LMS_time_spent.csv", row.names = FALSE)
# Confirm it saved — check your Files pane for the new fileQuestion: Why is it important to be able to create and save datasets manually, rather than only working with provided data?
- [I think it is important to create and save data sets manually because if you make small or big edits you can go back and find your mistakes if it is wrong. You may have multiple copies that you can use. Also, firsthand lets you learn the skills needed in formatting data.]
Part 2 · Descriptive analytics — what happened?
Summary statistics
# Summary of all weekly columns (excluding Student_ID column)
summary_stats <- summary(data_lms[, -1])
summary_stats Week_1 Week_2 Week_3 Week_4
Min. : 6.00 Min. : 6.00 Min. : 6.00 Min. : 6.00
1st Qu.: 9.00 1st Qu.: 8.00 1st Qu.: 9.75 1st Qu.:12.00
Median :13.00 Median :10.50 Median :12.50 Median :14.50
Mean :12.32 Mean :10.75 Mean :12.53 Mean :14.28
3rd Qu.:15.00 3rd Qu.:13.00 3rd Qu.:16.00 3rd Qu.:18.00
Max. :20.00 Max. :20.00 Max. :19.00 Max. :20.00
Week_5 Week_6 Week_7 Week_8
Min. : 6.00 Min. : 6.00 Min. : 6.00 Min. : 6.00
1st Qu.: 8.50 1st Qu.: 9.00 1st Qu.: 8.75 1st Qu.:10.75
Median :13.50 Median :15.00 Median :14.00 Median :13.50
Mean :13.18 Mean :13.22 Mean :13.60 Mean :13.05
3rd Qu.:17.00 3rd Qu.:17.00 3rd Qu.:18.25 3rd Qu.:16.00
Max. :20.00 Max. :20.00 Max. :20.00 Max. :20.00
Week_9 Week_10 Week_11 Week_12
Min. : 6.00 Min. : 6.00 Min. : 6.00 Min. : 6.00
1st Qu.:10.00 1st Qu.: 9.00 1st Qu.:10.75 1st Qu.: 8.00
Median :14.00 Median :13.00 Median :14.00 Median :12.00
Mean :13.05 Mean :12.75 Mean :13.47 Mean :12.12
3rd Qu.:16.00 3rd Qu.:16.00 3rd Qu.:17.00 3rd Qu.:15.00
Max. :20.00 Max. :20.00 Max. :20.00 Max. :19.00
Week_13 Week_14 Week_15 Week_16
Min. : 6.00 Min. : 6.00 Min. : 6.00 Min. : 6.00
1st Qu.: 9.75 1st Qu.:10.75 1st Qu.:10.75 1st Qu.:10.00
Median :14.00 Median :14.00 Median :15.00 Median :15.00
Mean :12.90 Mean :13.70 Mean :13.75 Mean :13.65
3rd Qu.:16.00 3rd Qu.:17.25 3rd Qu.:17.00 3rd Qu.:17.00
Max. :20.00 Max. :20.00 Max. :20.00 Max. :20.00
Question: What insights do you gain from the summary? Pick one week and describe what the min, median, and max values tell you about student engagement that week.
- [From the Week 16 summary, I can see that the lowest value 6 shows the minimum number for that week, and the highest value is 20. The first quartile 10 means that about 25% of students worked less than 10 hours, while the third quartile 17 means about 75% of students were below that number. The median of 15 represents the middle point. This means half of the students were below 15 and half were above. The mean of 13.65 is the average, which is a little lower than the median, meaning that some lower numbers brought down the average. Overall, most students were mostly consistent, but a few may have spent less time or earned lower scores, creating some variation in the data.]
Average time spent per week (colMeans)
# Select only the Week columns explicitly using grep()
# This protects against any extra columns added later (Semester_Average etc.)
# that would break names(average_time) if included accidentally
week_cols <- grep("^Week_", names(data_lms), value = TRUE)
average_time <- colMeans(data_lms[, week_cols])
average_time Week_1 Week_2 Week_3 Week_4 Week_5 Week_6 Week_7 Week_8 Week_9 Week_10
12.325 10.750 12.525 14.275 13.175 13.225 13.600 13.050 13.050 12.750
Week_11 Week_12 Week_13 Week_14 Week_15 Week_16
13.475 12.125 12.900 13.700 13.750 13.650
Question: If some weeks show notably higher or lower average time, what actions might an instructor take?
- [The instructor could check and see if the workload is the same. One week may require students to work longer which means spend more time on the LMS while other weeks may require less time. But if there is a consistent amount of average hours across the weeks that students spent then this could mean students are disengaged in the lesson and not spending quality time that is needed to score or pass the class or assessment. ]
Each student’s semester average (rowMeans)
# rowMeans() calculates the mean across columns for each row (each student)
data_lms$Semester_Average <- rowMeans(data_lms[, 2:17])
head(data_lms |> select(Student_ID, Semester_Average))Task: Calculate the average time spent for only Weeks 1–5 and save it as early_semester_average. Add it to the data frame.
# average time spent for **only Weeks 1–5**
data_lms$Semester_Average <- rowMeans(data_lms[, 2:6])
head(data_lms |> select(Student_ID, Semester_Average))Hint: weeks 1–5 are columns 2–6 in the data frame.
Follow the same pattern as the row-means chunk above,
but change the column range to cover only the first 5 weeks.
data_lms$Semester_Average <- rowMeans(data_lms[, 2:6])
head(data_lms |> select(Student_ID, Semester_Average))
**Question:** How could the early semester average help an instructor identify at-risk students before midterm?
- \[The instructor could identify at-risk students before midterm by assessing the averages and time spent to indicate if students are failing because of of less time spend on LMS or failing due to misconceptions on material and guided instruction.\]
------------------------------------------------------------------------
## Part 3 · Visualization — bar plot and line plot
### Prepare data for plotting
::: {.cell}
```{.r .cell-code}
# This confirms that average_time exists before continuing exists("average_time")
time_data_long <- data_lms %>%
pivot_longer(
cols = starts_with("Week"),
names_to = "Week",
values_to = "Time_Spent"
)
# Confirm average_time exists and has names before reshaping
# This prevents the "zero-length variable name" error
stopifnot(
"Run the col-means chunk first" = exists("average_time"),
"average_time has no names" = !is.null(names(average_time)),
"average_time is empty" = length(average_time) > 0
)
# Prepare the data table for plotting
average_time_table <- data.frame(
Week = factor(names(average_time), levels = names(average_time)),
Average_Time_Spent = average_time
)
# Quick check — should show 16 rows, one per week
nrow(average_time_table)
[1] 16
head(average_time_table):::
Bar plot — average time per week
library(ggplot2)
ggplot(average_time_table, aes(x = Week, y = Average_Time_Spent)) +
geom_bar(stat = "identity", fill = "#1D9E75", color = "white") +
labs(
title = "Average Time Spent per Week",
x = "Week",
y = "Average Hours"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1)
)Line plot — trend over time
ggplot(average_time_table, aes(x = Week, y = Average_Time_Spent, group = 1)) +
geom_line(color = "#185FA5", linewidth = 1.2) +
geom_point(color = "#185FA5", size = 3) +
labs(
title = "Trend of Average Time Spent per Week",
x = "Week",
y = "Average Hours"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1)
)Question: What differences do you notice between the bar plot and the line plot? Which is more effective for showing a trend and why? Use your own words.
- [Looking at both graphs, I can see that they show the same information. The bar graph makes it easy to compare individual weeks because each week is shown separately. I can see which weeks had the highest or lowest average time spent, but it’s a little harder to see how the data changes from one week to the next.
The line graph does a better job of showing the overall pattern throughout the semester. The points are connected, so it’s easier to notice increases, decreases, and periods where student activity stayed about the same. The flow of the line helps show how engagement changed over time. While the bar graph is useful for comparing values, I found the line graph more helpful and getting a picture of the class’s overall activity.]
Line plot — individual students
# Reshape from wide to long format for individual student lines
data_long <- data_lms |>
pivot_longer(
cols = starts_with("Week"),
names_to = "Week",
values_to = "TimeSpent"
)
ggplot(data_long, aes(x = Week, y = TimeSpent,
group = Student_ID, color = Student_ID)) +
geom_line(alpha = 0.5) +
labs(
title = "Weekly Time Spent by Each Student",
x = "Week",
y = "Hours"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)Question: What patterns do you notice when looking at all 40 students at once? Is this visualization easy to interpret? Why or why not?
- [When I looked at all 40 students together, I noticed that the data was much more consistent than when looking at individual students. Instead of seeing a lot of ups and downs, the overall trend stayed pretty steady from week to week. There were a few small increases and decreases throughout the semester, but nothing too dramatic. This tells me that even though some students may have spent more or less time on certain weeks, the class as a whole maintained a fairly consistent level of engagement.
I found this graph easy to understand because it takes a large amount of information and presents it in a simple way. The bars make it easy to compare each week, while the line helps show the overall trend across the semester. One drawback is that averaging the data hides individual student differences, so it doesn’t show who was spending the most or least amount of time. Even so, it does a good job of giving a quick picture of how the class performed overall and makes it easy to spot general patterns in student engagement.]
Line plot — selected students only
Task: Choose 5 students you want to compare and update the code below.
# YOUR CODE HERE
#| label: lineplot-selected
#| fig-cap: "Weekly LMS time for selected students"
# Step 1: Choose 5 students
selected_students <- c("Student_1", "Student_5", "Student_10", "Student_20", "Student_35")
# Step 2: Filter for those students
selected_data <- time_data_long %>%
filter(Student_ID %in% selected_students)
# Step 3: Plot
ggplot(selected_data, aes(x = Week, y = Time_Spent, group = Student_ID, color = Student_ID)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2) +
labs(
title = "Weekly LMS Time for Selected Students",
x = "Week",
y = "Time Spent (hours)",
color = "Student ID"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom"
)# Step 1: Choose 5 Student_IDs from the data and filter for them.
# Student IDs are in the format "Student_1", "Student_2", etc.
# Pick students whose patterns you find interesting to compare —
# for example, mix high and low average engagement.
#
# Step 2: Reshape with pivot_longer() — same as the lineplot-all chunk above.
#
# Step 3: Plot with ggplot() — copy the structure from lineplot-all
# and adjust the title and legend position.Question: What insights do you gain from this focused view? What design decisions did you make in choosing these five students?
- [By looking at five individual students instead of the whole class, I was able to notice patterns that weren’t obvious in the overall average. Some students were very consistent and spent about the same amount of time each week, and others had weeks where their activity increased or dropped alot. I selected these five students because they represented different levels of participation, including high, average, and lower engagement. When looking at their data, it helped me see how students interacted differently with the LMS and showed how individual behaviors can influence the trends across the class.]
Histogram — semester averages
ggplot(data_lms, aes(x = Semester_Average)) +
geom_histogram(binwidth = 1, fill = "#378ADD", color = "white") +
labs(
title = "Distribution of Semester Average Time Spent",
x = "Semester Average (hours/week)",
y = "Number of Students"
) +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5))Part 4 · Diagnostic analytics — why did it happen?
Now we switch to the sci-online-classes dataset to explore the relationship between time spent and final grades.
# Load the dataset used in the previous module
# Make sure sci-online-classes.csv is in your data folder
data_sci <- read_csv("data/sci-online-classes.csv") |>
clean_names()
glimpse(data_sci)Rows: 603
Columns: 30
$ student_id <dbl> 43146, 44638, 47448, 47979, 48797, 51943, 52326,…
$ course_id <chr> "FrScA-S216-02", "OcnA-S116-01", "FrScA-S216-01"…
$ total_points_possible <dbl> 3280, 3531, 2870, 4562, 2207, 4208, 4325, 2086, …
$ total_points_earned <dbl> 2220, 2672, 1897, 3090, 1910, 3596, 2255, 1719, …
$ percentage_earned <dbl> 0.6768293, 0.7567261, 0.6609756, 0.6773345, 0.86…
$ subject <chr> "FrScA", "OcnA", "FrScA", "OcnA", "PhysA", "FrSc…
$ semester <chr> "S216", "S116", "S216", "S216", "S116", "S216", …
$ section <chr> "02", "01", "01", "01", "01", "03", "01", "01", …
$ gradebook_item <chr> "POINTS EARNED & TOTAL COURSE POINTS", "ATTEMPTE…
$ grade_category <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ final_grade_cems <dbl> 93.45372, 81.70184, 88.48758, 81.85260, 84.00000…
$ points_possible <dbl> 5, 10, 10, 5, 438, 5, 10, 10, 443, 5, 12, 10, 5,…
$ points_earned <dbl> NA, 10.00, NA, 4.00, 399.00, NA, NA, 10.00, 425.…
$ gender <chr> "M", "F", "M", "M", "F", "F", "M", "F", "F", "M"…
$ q1 <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q2 <dbl> 4, 4, 4, 5, 3, NA, 5, 3, 3, NA, NA, 5, 3, 3, NA,…
$ q3 <dbl> 4, 3, 4, 3, 3, NA, 3, 3, 3, NA, NA, 3, 3, 5, NA,…
$ q4 <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 3, 5, NA,…
$ q5 <dbl> 5, 4, 5, 5, 4, NA, 5, 3, 4, NA, NA, 5, 4, 5, NA,…
$ q6 <dbl> 5, 4, 4, 5, 4, NA, 5, 4, 3, NA, NA, 5, 3, 5, NA,…
$ q7 <dbl> 5, 4, 4, 4, 4, NA, 4, 3, 3, NA, NA, 5, 3, 5, NA,…
$ q8 <dbl> 5, 5, 5, 5, 4, NA, 5, 3, 4, NA, NA, 4, 3, 5, NA,…
$ q9 <dbl> 4, 4, 3, 5, NA, NA, 5, 3, 2, NA, NA, 5, 2, 2, NA…
$ q10 <dbl> 5, 4, 5, 5, 3, NA, 5, 3, 5, NA, NA, 4, 4, 5, NA,…
$ time_spent <dbl> 1555.1667, 1382.7001, 860.4335, 1598.6166, 1481.…
$ time_spent_hours <dbl> 25.91944500, 23.04500167, 14.34055833, 26.643610…
$ time_spent_std <dbl> -0.18051496, -0.30780313, -0.69325954, -0.148446…
$ int <dbl> 5.0, 4.2, 5.0, 5.0, 3.8, 4.6, 5.0, 3.0, 4.2, NA,…
$ pc <dbl> 4.50, 3.50, 4.00, 3.50, 3.50, 4.00, 3.50, 3.00, …
$ uv <dbl> 4.333333, 4.000000, 3.666667, 5.000000, 3.500000…
This is the same dataset from previous module. We are reloading it here because the LMS time data (Parts 1–3) and the sci-online-classes data (Part 4) are separate files. Reloading makes this file self-contained.
Scatter plot with regression line
ggplot(data_sci,
aes(x = time_spent_hours, y = final_grade_cems)) +
geom_point(color = "#185FA5", size = 2.5, alpha = 0.6) +
geom_smooth(method = "lm", color = "#993C1D", se = TRUE) +
labs(
title = "Time Spent vs. Final Grade",
x = "Time Spent on LMS (hours)",
y = "Final Grade"
) +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5))Question: Based on the scatter plot, what do you expect the relationship between time spent and final grades to be? Write your hypothesis before looking at the correlation.
- [ I think there will be a positive relationship between the amount of time students spent on the LMS and their final grades.]
Correlation
# cor() computes the Pearson correlation coefficient
# use = "complete.obs" ignores rows with missing data
correlation <- cor(data_sci$time_spent_hours,
data_sci$final_grade_cems,
use = "complete.obs")
correlation[1] 0.3654121
- Values close to +1: strong positive relationship (more time → higher grade)
- Values close to -1: strong negative relationship
- Values close to 0: little or no linear relationship
- This is NOT a statistics course — focus on interpreting what this number means for learners, not on p-values.
Question: With both the scatter plot and the correlation value in front of you, what can you say about the relationship between time spent and final grades? What would you recommend to an instructor based on this finding?
- [The correlation shows a moderate positive relationship between time spent on the LMS and final grades. The students who spent more time engaging with course content tended to earn higher grades. While spending more time online does not guarantee higher grades, the data shows that regular participation and engagement can have a positive impact on students’ grades. If they consistently access course materials, complete activities, and stay involved then they are better prepared for assessments and assignments.
As an instructor, I would encourage students to log in regularly and stay engaged with the course throughout the semester rather than waiting until deadlines approach. This is definitely what I try to do anyways. I would also mention that participation is more important than just gaining hours. Activities such as reviewing lessons, contributing to discussions, completing assignments on time, and using feedback to improve learning are likely to have the greatest impact on student success.]
Practice — grouped summary by subject
Task: Using data_sci, calculate the mean final_grade_cems and mean time_spent_hours grouped by subject. Arrange by mean grade descending. Which subject has the highest average grade? Is it also the subject with the most time spent?
You have used group_by() and summarise() in the previous file. Apply the same pattern here with a different grouping variable. If you need a column name reminder, run names(data_sci) in the Console.
# #| label: grouped-summary-practice
library(dplyr)
data_sci_summary <- data_sci %>%
group_by(subject) %>%
summarise(
mean_grade = mean(final_grade_cems, na.rm = TRUE),
mean_time = mean(time_spent_hours, na.rm = TRUE)
) %>%
arrange(desc(mean_grade))
data_sci_summary# Steps: group_by(subject) |> summarise(mean_grade = ..., mean_time = ...) |> arrange(desc(...))Question: Does the subject with the highest average grade also have the most time spent? What might explain any differences you find?
- [When I looked at the grouped summary, I noticed that Computer Science had the highest average grade, while Mathematics had the highest average time spent. This stood out because the subject where students spent the most time was not the one with the best overall performance. It shows that spending more time on a subject does not mean a higher grades.
There could be because Computer Science may have more interactive or hands-on learning activities that help students learn better. Mathematics often requires more practice and problem-solving, which does takes more time. Student interest and confidence may also play a role, as students who enjoy a subject are often able to perform well without spending as much time on it. The data shows that the quality of learning and instructional strategies may have a bigger impact on student success than time spent.]
Part 5 · Box plot
A box plot shows the distribution of a variable across categories — useful for comparing groups and spotting outliers.
ggplot(data_sci, aes(x = gender, y = final_grade_cems, fill = gender)) +
geom_boxplot(color = "gray30",
outlier.colour = "#993C1D",
outlier.shape = 16,
outlier.size = 2) +
scale_fill_manual(values = c("F" = "#E1F5EE", "M" = "#E6F1FB")) +
labs(
title = "Final Grade Distribution by Gender",
x = "Gender",
y = "Final Grade"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
legend.position = "none"
)Question: What does the box plot tell you about the distribution of final grades by gender? Are there differences worth investigating?
- [When I looked at the box plot, I noticed that the grades for male and female students were similar. There were some differences in how the scores were spread out. One group appeared to have a slightly higher median grade, while the other showed a wider range of scores. This could show that their overall performance was similar, and there may be differences in consistency among students within each group. There were a few outliers, which showed students who performed either much higher or much lower than their classmates. These students could be worth looking at more closely to better understand what factors contributed to their success or challenges. The differences shown in the box plot are not extreme, but they do raise a few questions about student engagement, learning preferences, and academic support.
From an instructional technology teacher, this information could help me evaluate whether digital tools and learning activities are meeting the needs of all students. If one group is performing more greater than another, it may be a good idea to look at how course materials, technology resources, and instructional strategies can be adjusted to better support student success across the board.]
Final reflection
After completing both the LMS time analysis and the sci-online-classes analysis, reflect on the following:
Question: How could these analytics techniques be applied in a real classroom or course design context? Describe one specific scenario — from your track (K–12 or ID/higher ed) — where the combination of a bar plot, line plot, and correlation would help an educator or designer make a better decision.
- [The analytics tools I used were bar plots, line plots, and correlations. These are helpful for me to understand how students are learning and performing. In a K–12 setting like mine, I could use these to look at LMS data and to get a better picture of both student engagement and their achievement across different subjects or classes. A bar plot helps show which subjects or assignments students are performing better or worse in, so it’s easier to spot patterns. A line plot is helpful for tracking how student engagement changes over time, like weekly participation or time spent in the LMS. The correlation piece helps me to connect the dots between effort and outcomes by showing whether things like time spent or participation actually relate to higher grades.
I think these tools make it easier to make better decisions in the classroom. For example, I could adjust pacing, add extra support where needed, or change how lessons are delivered using technology tools. It can help move my instruction from guessing to making decisions based on real student data, which can support them more and have a greater outcomes for all learners.]
Render & submit
Step 1 — Jennifer Rutherford
Change the author: field in the YAML header at the top to your name.
Step 2 — Render
Click Render in the toolbar. A formatted HTML page will appear in your Viewer tab or a new browser window. Check the Console for any error messages if the render fails.
Step 3 — Publish
| Option | Best for | Link |
|---|---|---|
| Posit Cloud | Quickest — one click from your workspace | Guide |
| RPubs | Free, public, easy to share a link | rpubs.com |
| Quarto Pub | Clean public portfolio pages | Guide |
| GitHub Pages | Best for a professional portfolio | Guide |
This document shows three levels of analytics work: descriptive (summary statistics and bar plots), trend analysis (line plots), and diagnostic (scatter plot and correlation). Together they demonstrate a complete analytical workflow that is worth showcasing in a professional portfolio.
Share your published link with your instructor once you have rendered and published. Post in the course discussion board if you run into any technical issues.