tidyverse
skimr
data_to_explore
are you getting an error? - make sure to install.packages(““) in the console to fix that
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(skimr)
data_to_explore <- read_csv("data/data_to_explore.csv")
## Rows: 943 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): student_id, subject, semester, section, gender, enrollment_reason...
## dbl (23): total_points_possible, total_points_earned, proportion_earned, ti...
## dttm (3): date_x, date_y, date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
skim()
function to view the
data_to explore
#skim the data by adding the skim function in front of the data
skim(data_to_explore)
Name | data_to_explore |
Number of rows | 943 |
Number of columns | 34 |
_______________________ | |
Column type frequency: | |
character | 8 |
numeric | 23 |
POSIXct | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
student_id | 0 | 1.00 | 2 | 6 | 0 | 879 | 0 |
subject | 0 | 1.00 | 4 | 5 | 0 | 5 | 0 |
semester | 0 | 1.00 | 4 | 4 | 0 | 4 | 0 |
section | 0 | 1.00 | 2 | 2 | 0 | 4 | 0 |
gender | 227 | 0.76 | 1 | 1 | 0 | 2 | 0 |
enrollment_reason | 227 | 0.76 | 5 | 34 | 0 | 5 | 0 |
enrollment_status | 227 | 0.76 | 7 | 17 | 0 | 3 | 0 |
course_id | 281 | 0.70 | 12 | 13 | 0 | 36 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
total_points_possible | 226 | 0.76 | 1619.55 | 387.12 | 1212.00 | 1217.00 | 1676.00 | 1791.00 | 2425.00 | ▇▂▆▁▃ |
total_points_earned | 226 | 0.76 | 1229.98 | 510.64 | 0.00 | 1002.50 | 1177.13 | 1572.45 | 2413.50 | ▂▂▇▅▂ |
proportion_earned | 226 | 0.76 | 76.23 | 25.20 | 0.00 | 72.36 | 85.59 | 92.29 | 100.74 | ▁▁▁▃▇ |
time_spent | 232 | 0.75 | 1828.80 | 1363.13 | 0.45 | 895.57 | 1559.97 | 2423.94 | 8870.88 | ▇▅▁▁▁ |
time_spent_hours | 232 | 0.75 | 30.48 | 22.72 | 0.01 | 14.93 | 26.00 | 40.40 | 147.85 | ▇▅▁▁▁ |
int | 293 | 0.69 | 4.30 | 0.60 | 1.80 | 4.00 | 4.40 | 4.80 | 5.00 | ▁▁▂▆▇ |
val | 287 | 0.70 | 3.75 | 0.75 | 1.00 | 3.33 | 3.67 | 4.33 | 5.00 | ▁▁▆▇▆ |
percomp | 288 | 0.69 | 3.64 | 0.69 | 1.50 | 3.00 | 3.50 | 4.00 | 5.00 | ▁▁▇▃▃ |
tv | 292 | 0.69 | 4.07 | 0.59 | 1.00 | 3.71 | 4.12 | 4.46 | 5.00 | ▁▁▂▇▇ |
q1 | 285 | 0.70 | 4.34 | 0.66 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q2 | 285 | 0.70 | 3.66 | 0.93 | 1.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▂▆▇▃ |
q3 | 286 | 0.70 | 3.31 | 0.85 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▂▇▅▂ |
q4 | 289 | 0.69 | 4.35 | 0.80 | 1.00 | 4.00 | 5.00 | 5.00 | 5.00 | ▁▁▁▆▇ |
q5 | 286 | 0.70 | 4.28 | 0.69 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▆ |
q6 | 285 | 0.70 | 4.05 | 0.80 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▃▇▅ |
q7 | 286 | 0.70 | 3.96 | 0.85 | 1.00 | 3.00 | 4.00 | 5.00 | 5.00 | ▁▁▅▇▆ |
q8 | 286 | 0.70 | 4.35 | 0.65 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q9 | 286 | 0.70 | 3.55 | 0.92 | 1.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▂▇▇▃ |
q10 | 285 | 0.70 | 4.17 | 0.87 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▃▇▇ |
post_int | 848 | 0.10 | 3.88 | 0.94 | 1.00 | 3.50 | 4.00 | 4.50 | 5.00 | ▁▁▃▇▇ |
post_uv | 848 | 0.10 | 3.48 | 0.99 | 1.00 | 3.00 | 3.67 | 4.00 | 5.00 | ▂▂▅▇▅ |
post_tv | 848 | 0.10 | 3.71 | 0.90 | 1.00 | 3.29 | 3.86 | 4.29 | 5.00 | ▁▂▃▇▆ |
post_percomp | 848 | 0.10 | 3.47 | 0.88 | 1.00 | 3.00 | 3.50 | 4.00 | 5.00 | ▁▂▂▇▂ |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
date_x | 393 | 0.58 | 2015-09-02 15:40:00 | 2016-05-24 15:53:00 | 2015-10-01 15:57:30 | 536 |
date_y | 848 | 0.10 | 2015-09-02 15:31:00 | 2016-01-22 15:43:00 | 2016-01-04 13:25:00 | 95 |
date | 834 | 0.12 | 2017-01-23 13:14:00 | 2017-02-13 13:00:00 | 2017-01-25 18:43:00 | 107 |
In the code chunk below: 1. use the data_to_explore
then
2. group_by
subject
variable then 3. add
skim()
function
group_df <- data_to_explore |>
group_by(subject) %>%
skim()
group_df
Name | Piped data |
Number of rows | 943 |
Number of columns | 34 |
_______________________ | |
Column type frequency: | |
character | 7 |
numeric | 23 |
POSIXct | 3 |
________________________ | |
Group variables | subject |
Variable type: character
skim_variable | subject | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|---|
student_id | AnPhA | 0 | 1.00 | 2 | 6 | 0 | 207 | 0 |
student_id | BioA | 0 | 1.00 | 3 | 6 | 0 | 47 | 0 |
student_id | FrScA | 0 | 1.00 | 2 | 6 | 0 | 414 | 0 |
student_id | OcnA | 0 | 1.00 | 2 | 6 | 0 | 171 | 0 |
student_id | PhysA | 0 | 1.00 | 3 | 6 | 0 | 74 | 0 |
semester | AnPhA | 0 | 1.00 | 4 | 4 | 0 | 4 | 0 |
semester | BioA | 0 | 1.00 | 4 | 4 | 0 | 4 | 0 |
semester | FrScA | 0 | 1.00 | 4 | 4 | 0 | 4 | 0 |
semester | OcnA | 0 | 1.00 | 4 | 4 | 0 | 4 | 0 |
semester | PhysA | 0 | 1.00 | 4 | 4 | 0 | 4 | 0 |
section | AnPhA | 0 | 1.00 | 2 | 2 | 0 | 2 | 0 |
section | BioA | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
section | FrScA | 0 | 1.00 | 2 | 2 | 0 | 4 | 0 |
section | OcnA | 0 | 1.00 | 2 | 2 | 0 | 3 | 0 |
section | PhysA | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
gender | AnPhA | 45 | 0.79 | 1 | 1 | 0 | 2 | 0 |
gender | BioA | 4 | 0.92 | 1 | 1 | 0 | 2 | 0 |
gender | FrScA | 130 | 0.70 | 1 | 1 | 0 | 2 | 0 |
gender | OcnA | 42 | 0.76 | 1 | 1 | 0 | 2 | 0 |
gender | PhysA | 6 | 0.92 | 1 | 1 | 0 | 2 | 0 |
enrollment_reason | AnPhA | 45 | 0.79 | 5 | 34 | 0 | 4 | 0 |
enrollment_reason | BioA | 4 | 0.92 | 5 | 34 | 0 | 5 | 0 |
enrollment_reason | FrScA | 130 | 0.70 | 5 | 34 | 0 | 5 | 0 |
enrollment_reason | OcnA | 42 | 0.76 | 5 | 34 | 0 | 5 | 0 |
enrollment_reason | PhysA | 6 | 0.92 | 5 | 34 | 0 | 4 | 0 |
enrollment_status | AnPhA | 45 | 0.79 | 7 | 17 | 0 | 2 | 0 |
enrollment_status | BioA | 4 | 0.92 | 7 | 17 | 0 | 3 | 0 |
enrollment_status | FrScA | 130 | 0.70 | 7 | 17 | 0 | 3 | 0 |
enrollment_status | OcnA | 42 | 0.76 | 7 | 17 | 0 | 3 | 0 |
enrollment_status | PhysA | 6 | 0.92 | 7 | 17 | 0 | 2 | 0 |
course_id | AnPhA | 58 | 0.72 | 13 | 13 | 0 | 7 | 0 |
course_id | BioA | 7 | 0.86 | 12 | 12 | 0 | 4 | 0 |
course_id | FrScA | 150 | 0.66 | 13 | 13 | 0 | 12 | 0 |
course_id | OcnA | 55 | 0.69 | 12 | 12 | 0 | 9 | 0 |
course_id | PhysA | 11 | 0.85 | 13 | 13 | 0 | 4 | 0 |
Variable type: numeric
skim_variable | subject | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|---|
total_points_possible | AnPhA | 45 | 0.79 | 1776.52 | 12.28 | 1655.00 | 1775.00 | 1775.00 | 1775.00 | 1805.00 | ▁▁▁▇▁ |
total_points_possible | BioA | 4 | 0.92 | 2421.00 | 2.02 | 2420.00 | 2420.00 | 2420.00 | 2420.00 | 2425.00 | ▇▁▁▁▂ |
total_points_possible | FrScA | 129 | 0.70 | 1230.81 | 38.26 | 1212.00 | 1212.00 | 1217.00 | 1232.00 | 1361.00 | ▇▁▁▁▁ |
total_points_possible | OcnA | 42 | 0.76 | 1738.47 | 78.48 | 1480.00 | 1676.00 | 1676.00 | 1833.00 | 1833.00 | ▁▁▇▁▇ |
total_points_possible | PhysA | 6 | 0.92 | 2225.00 | 0.00 | 2225.00 | 2225.00 | 2225.00 | 2225.00 | 2225.00 | ▁▁▇▁▁ |
total_points_earned | AnPhA | 45 | 0.79 | 1340.16 | 423.45 | 0.00 | 1269.09 | 1511.14 | 1616.37 | 1732.52 | ▁▁▁▂▇ |
total_points_earned | BioA | 4 | 0.92 | 1546.66 | 813.01 | 0.00 | 1035.16 | 1865.13 | 2198.50 | 2413.50 | ▃▁▁▃▇ |
total_points_earned | FrScA | 129 | 0.70 | 952.30 | 305.60 | 0.00 | 914.92 | 1062.75 | 1130.00 | 1319.02 | ▁▁▁▅▇ |
total_points_earned | OcnA | 42 | 0.76 | 1283.25 | 427.25 | 0.00 | 1216.68 | 1396.85 | 1572.50 | 1786.76 | ▁▁▁▆▇ |
total_points_earned | PhysA | 6 | 0.92 | 1898.45 | 469.31 | 110.00 | 1891.75 | 2072.00 | 2149.12 | 2216.00 | ▁▁▁▂▇ |
proportion_earned | AnPhA | 45 | 0.79 | 75.44 | 23.84 | 0.00 | 71.57 | 84.90 | 90.96 | 97.61 | ▁▁▁▂▇ |
proportion_earned | BioA | 4 | 0.92 | 63.89 | 33.58 | 0.00 | 42.78 | 77.07 | 90.85 | 99.73 | ▃▁▁▃▇ |
proportion_earned | FrScA | 129 | 0.70 | 77.42 | 24.82 | 0.00 | 74.85 | 86.43 | 92.19 | 100.74 | ▁▁▁▃▇ |
proportion_earned | OcnA | 42 | 0.76 | 73.99 | 24.70 | 0.00 | 69.76 | 81.60 | 91.04 | 99.22 | ▁▁▁▃▇ |
proportion_earned | PhysA | 6 | 0.92 | 85.32 | 21.09 | 4.94 | 85.02 | 93.12 | 96.59 | 99.60 | ▁▁▁▂▇ |
time_spent | AnPhA | 45 | 0.79 | 2374.39 | 1669.58 | 0.45 | 1209.85 | 2164.90 | 3134.97 | 7084.70 | ▆▇▃▂▁ |
time_spent | BioA | 5 | 0.90 | 1404.57 | 1528.14 | 1.22 | 297.02 | 827.30 | 1955.08 | 6664.45 | ▇▂▁▁▁ |
time_spent | FrScA | 134 | 0.69 | 1591.90 | 1016.76 | 2.42 | 935.03 | 1404.90 | 2130.75 | 6537.02 | ▇▇▂▁▁ |
time_spent | OcnA | 42 | 0.76 | 2031.44 | 1496.82 | 0.58 | 1133.47 | 1800.22 | 2573.45 | 8870.88 | ▇▆▂▁▁ |
time_spent | PhysA | 6 | 0.92 | 1431.76 | 990.40 | 0.70 | 749.32 | 1282.81 | 2049.85 | 5373.35 | ▇▆▃▁▁ |
time_spent_hours | AnPhA | 45 | 0.79 | 39.57 | 27.83 | 0.01 | 20.16 | 36.08 | 52.25 | 118.08 | ▆▇▃▂▁ |
time_spent_hours | BioA | 5 | 0.90 | 23.41 | 25.47 | 0.02 | 4.95 | 13.79 | 32.58 | 111.07 | ▇▂▁▁▁ |
time_spent_hours | FrScA | 134 | 0.69 | 26.53 | 16.95 | 0.04 | 15.58 | 23.42 | 35.51 | 108.95 | ▇▇▂▁▁ |
time_spent_hours | OcnA | 42 | 0.76 | 33.86 | 24.95 | 0.01 | 18.89 | 30.00 | 42.89 | 147.85 | ▇▆▂▁▁ |
time_spent_hours | PhysA | 6 | 0.92 | 23.86 | 16.51 | 0.01 | 12.49 | 21.38 | 34.16 | 89.56 | ▇▆▃▁▁ |
int | AnPhA | 62 | 0.70 | 4.42 | 0.57 | 1.80 | 4.00 | 4.40 | 5.00 | 5.00 | ▁▁▁▅▇ |
int | BioA | 9 | 0.82 | 3.69 | 0.63 | 2.40 | 3.35 | 3.80 | 4.00 | 5.00 | ▂▆▇▆▂ |
int | FrScA | 154 | 0.65 | 4.42 | 0.52 | 2.60 | 4.00 | 4.40 | 5.00 | 5.00 | ▁▁▃▃▇ |
int | OcnA | 56 | 0.68 | 4.24 | 0.58 | 2.20 | 4.00 | 4.20 | 4.60 | 5.00 | ▁▁▂▇▆ |
int | PhysA | 12 | 0.84 | 4.00 | 0.65 | 2.20 | 3.60 | 4.00 | 4.40 | 5.00 | ▁▂▆▇▅ |
val | AnPhA | 59 | 0.72 | 4.29 | 0.62 | 1.00 | 4.00 | 4.33 | 4.67 | 5.00 | ▁▁▁▅▇ |
val | BioA | 7 | 0.86 | 3.50 | 0.58 | 2.67 | 3.00 | 3.33 | 3.67 | 5.00 | ▆▆▇▁▂ |
val | FrScA | 155 | 0.64 | 3.53 | 0.72 | 1.67 | 3.00 | 3.67 | 4.00 | 5.00 | ▂▅▇▅▂ |
val | OcnA | 55 | 0.69 | 3.62 | 0.77 | 1.00 | 3.00 | 3.67 | 4.00 | 5.00 | ▁▁▅▇▃ |
val | PhysA | 11 | 0.85 | 3.89 | 0.56 | 2.00 | 3.67 | 4.00 | 4.33 | 5.00 | ▁▁▇▇▃ |
percomp | AnPhA | 61 | 0.71 | 3.80 | 0.67 | 2.00 | 3.50 | 4.00 | 4.50 | 5.00 | ▂▃▇▆▇ |
percomp | BioA | 8 | 0.84 | 3.34 | 0.75 | 2.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▅▇▃▇▂ |
percomp | FrScA | 152 | 0.65 | 3.64 | 0.63 | 1.50 | 3.00 | 3.50 | 4.00 | 5.00 | ▁▁▇▅▃ |
percomp | OcnA | 56 | 0.68 | 3.57 | 0.67 | 2.00 | 3.00 | 3.50 | 4.00 | 5.00 | ▂▇▆▅▅ |
percomp | PhysA | 11 | 0.85 | 3.56 | 0.84 | 2.00 | 3.00 | 3.50 | 4.00 | 5.00 | ▅▅▇▅▇ |
tv | AnPhA | 60 | 0.71 | 4.35 | 0.57 | 1.00 | 4.00 | 4.43 | 4.83 | 5.00 | ▁▁▁▅▇ |
tv | BioA | 9 | 0.82 | 3.61 | 0.56 | 2.29 | 3.14 | 3.57 | 3.86 | 5.00 | ▁▃▇▂▁ |
tv | FrScA | 156 | 0.64 | 4.04 | 0.52 | 2.29 | 3.71 | 4.00 | 4.43 | 5.00 | ▁▂▆▇▅ |
tv | OcnA | 55 | 0.69 | 3.97 | 0.62 | 1.71 | 3.71 | 4.00 | 4.38 | 5.00 | ▁▁▂▇▅ |
tv | PhysA | 12 | 0.84 | 3.94 | 0.56 | 2.14 | 3.57 | 4.00 | 4.29 | 5.00 | ▁▂▃▇▂ |
q1 | AnPhA | 59 | 0.72 | 4.43 | 0.64 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q1 | BioA | 7 | 0.86 | 3.76 | 0.66 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▃▁▇▁ |
q1 | FrScA | 153 | 0.65 | 4.50 | 0.57 | 2.00 | 4.00 | 5.00 | 5.00 | 5.00 | ▁▁▁▆▇ |
q1 | OcnA | 55 | 0.69 | 4.20 | 0.69 | 2.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▂▁▇▅ |
q1 | PhysA | 11 | 0.85 | 4.03 | 0.72 | 2.00 | 4.00 | 4.00 | 4.50 | 5.00 | ▁▃▁▇▃ |
q2 | AnPhA | 59 | 0.72 | 4.30 | 0.74 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▂▇▇ |
q2 | BioA | 7 | 0.86 | 3.48 | 0.71 | 2.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▇▁▆▁ |
q2 | FrScA | 152 | 0.65 | 3.35 | 0.89 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▃▇▆▂ |
q2 | OcnA | 56 | 0.68 | 3.46 | 0.93 | 1.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▂▆▇▂ |
q2 | PhysA | 11 | 0.85 | 4.03 | 0.76 | 2.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▂▁▇▅ |
q3 | AnPhA | 60 | 0.71 | 3.53 | 0.87 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▁▇▅▃ |
q3 | BioA | 7 | 0.86 | 2.98 | 0.87 | 2.00 | 2.00 | 3.00 | 3.00 | 5.00 | ▅▇▁▂▁ |
q3 | FrScA | 152 | 0.65 | 3.25 | 0.79 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▂▇▃▁ |
q3 | OcnA | 56 | 0.68 | 3.30 | 0.86 | 2.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▃▇▁▅▂ |
q3 | PhysA | 11 | 0.85 | 3.32 | 0.95 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▃▇▆▂ |
q4 | AnPhA | 61 | 0.71 | 4.52 | 0.78 | 1.00 | 4.00 | 5.00 | 5.00 | 5.00 | ▁▁▁▃▇ |
q4 | BioA | 7 | 0.86 | 3.69 | 0.81 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▂▃▁▇▂ |
q4 | FrScA | 154 | 0.65 | 4.44 | 0.74 | 1.00 | 4.00 | 5.00 | 5.00 | 5.00 | ▁▁▁▅▇ |
q4 | OcnA | 56 | 0.68 | 4.29 | 0.75 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▂▇▇ |
q4 | PhysA | 11 | 0.85 | 4.02 | 0.87 | 2.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▃▁▇▆ |
q5 | AnPhA | 59 | 0.72 | 4.36 | 0.69 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q5 | BioA | 8 | 0.84 | 3.88 | 0.68 | 2.00 | 4.00 | 4.00 | 4.00 | 5.00 | ▁▃▁▇▂ |
q5 | FrScA | 153 | 0.65 | 4.38 | 0.62 | 2.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q5 | OcnA | 55 | 0.69 | 4.20 | 0.77 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▂▇▆ |
q5 | PhysA | 11 | 0.85 | 4.06 | 0.67 | 2.00 | 4.00 | 4.00 | 4.00 | 5.00 | ▁▁▁▇▃ |
q6 | AnPhA | 59 | 0.72 | 4.50 | 0.65 | 1.00 | 4.00 | 5.00 | 5.00 | 5.00 | ▁▁▁▆▇ |
q6 | BioA | 7 | 0.86 | 3.83 | 0.70 | 3.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▅▁▇▁▂ |
q6 | FrScA | 153 | 0.65 | 3.88 | 0.79 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▃▁▇▃ |
q6 | OcnA | 55 | 0.69 | 3.84 | 0.84 | 1.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▁▅▇▃ |
q6 | PhysA | 11 | 0.85 | 4.27 | 0.68 | 2.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▆ |
q7 | AnPhA | 60 | 0.71 | 4.08 | 0.85 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▃▇▆ |
q7 | BioA | 8 | 0.84 | 3.71 | 0.96 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▂▇▁▇▆ |
q7 | FrScA | 152 | 0.65 | 4.02 | 0.83 | 1.00 | 3.00 | 4.00 | 5.00 | 5.00 | ▁▁▅▇▆ |
q7 | OcnA | 55 | 0.69 | 3.83 | 0.82 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▆▁▇▅ |
q7 | PhysA | 11 | 0.85 | 3.81 | 0.90 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▂▅▁▇▅ |
q8 | AnPhA | 60 | 0.71 | 4.45 | 0.65 | 1.00 | 4.00 | 5.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q8 | BioA | 7 | 0.86 | 3.79 | 0.72 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▃▁▇▂ |
q8 | FrScA | 152 | 0.65 | 4.45 | 0.58 | 3.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▇▁▇ |
q8 | OcnA | 55 | 0.69 | 4.33 | 0.60 | 3.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▇▁▆ |
q8 | PhysA | 12 | 0.84 | 4.05 | 0.73 | 2.00 | 4.00 | 4.00 | 4.00 | 5.00 | ▁▁▁▇▃ |
q9 | AnPhA | 59 | 0.72 | 4.07 | 0.81 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▃▇▆ |
q9 | BioA | 7 | 0.86 | 3.19 | 0.86 | 2.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▃▇▁▅▁ |
q9 | FrScA | 154 | 0.65 | 3.37 | 0.91 | 1.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▁▃▇▆▂ |
q9 | OcnA | 55 | 0.69 | 3.54 | 0.91 | 1.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▁▂▇▇▃ |
q9 | PhysA | 11 | 0.85 | 3.38 | 0.83 | 2.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▃▇▁▇▂ |
q10 | AnPhA | 59 | 0.72 | 4.35 | 0.74 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▁▇▇ |
q10 | BioA | 8 | 0.84 | 3.37 | 0.89 | 2.00 | 3.00 | 3.00 | 4.00 | 5.00 | ▂▇▁▅▂ |
q10 | FrScA | 152 | 0.65 | 4.30 | 0.81 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▂▆▇ |
q10 | OcnA | 55 | 0.69 | 4.13 | 0.93 | 1.00 | 4.00 | 4.00 | 5.00 | 5.00 | ▁▁▃▇▇ |
q10 | PhysA | 11 | 0.85 | 3.78 | 0.89 | 2.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▂▆▁▇▅ |
post_int | AnPhA | 209 | 0.00 | 1.00 | NA | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
post_int | BioA | 40 | 0.18 | 3.06 | 0.69 | 1.75 | 2.75 | 3.00 | 3.25 | 4.25 | ▂▃▇▂▂ |
post_int | FrScA | 392 | 0.10 | 4.00 | 0.93 | 1.50 | 3.75 | 4.00 | 4.88 | 5.00 | ▁▃▁▇▇ |
post_int | OcnA | 157 | 0.10 | 4.33 | 0.56 | 3.00 | 4.00 | 4.25 | 4.75 | 5.00 | ▁▂▅▅▇ |
post_int | PhysA | 50 | 0.32 | 3.75 | 0.88 | 1.50 | 3.50 | 4.00 | 4.25 | 5.00 | ▁▁▂▇▂ |
post_uv | AnPhA | 209 | 0.00 | 1.00 | NA | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
post_uv | BioA | 40 | 0.18 | 3.11 | 0.80 | 1.67 | 2.67 | 3.33 | 3.67 | 4.33 | ▂▃▂▇▂ |
post_uv | FrScA | 392 | 0.10 | 3.38 | 1.11 | 1.00 | 2.67 | 3.67 | 4.00 | 5.00 | ▃▃▆▇▆ |
post_uv | OcnA | 157 | 0.10 | 3.93 | 0.88 | 1.33 | 3.67 | 4.00 | 4.58 | 5.00 | ▁▁▁▇▇ |
post_uv | PhysA | 50 | 0.32 | 3.57 | 0.66 | 1.67 | 3.33 | 3.67 | 4.00 | 4.67 | ▁▁▃▇▂ |
post_tv | AnPhA | 209 | 0.00 | 1.00 | NA | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
post_tv | BioA | 40 | 0.18 | 3.08 | 0.70 | 1.71 | 2.86 | 3.00 | 3.29 | 4.29 | ▂▂▇▃▂ |
post_tv | FrScA | 392 | 0.10 | 3.73 | 0.96 | 1.29 | 3.29 | 4.00 | 4.43 | 5.00 | ▁▃▅▆▇ |
post_tv | OcnA | 157 | 0.10 | 4.16 | 0.60 | 3.00 | 3.86 | 4.14 | 4.71 | 4.86 | ▂▁▅▅▇ |
post_tv | PhysA | 50 | 0.32 | 3.67 | 0.74 | 1.57 | 3.43 | 3.86 | 4.04 | 4.71 | ▂▁▃▇▅ |
post_percomp | AnPhA | 209 | 0.00 | 3.00 | NA | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 | ▁▁▇▁▁ |
post_percomp | BioA | 40 | 0.18 | 3.06 | 0.58 | 2.00 | 2.50 | 3.50 | 3.50 | 3.50 | ▂▃▁▂▇ |
post_percomp | FrScA | 392 | 0.10 | 3.51 | 0.96 | 1.00 | 3.00 | 3.50 | 4.00 | 5.00 | ▁▂▆▇▅ |
post_percomp | OcnA | 157 | 0.10 | 3.69 | 0.75 | 2.00 | 3.50 | 4.00 | 4.00 | 5.00 | ▃▁▆▇▃ |
post_percomp | PhysA | 50 | 0.32 | 3.40 | 0.91 | 1.50 | 3.00 | 3.50 | 4.00 | 4.50 | ▂▂▂▆▇ |
Variable type: POSIXct
skim_variable | subject | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|---|
date_x | AnPhA | 80 | 0.62 | 2015-09-02 15:40:00 | 2016-03-23 16:11:00 | 2015-09-27 20:10:30 | 129 |
date_x | BioA | 9 | 0.82 | 2015-09-08 19:52:00 | 2016-03-09 14:07:00 | 2015-09-16 14:27:00 | 40 |
date_x | FrScA | 215 | 0.51 | 2015-09-08 13:10:00 | 2016-04-27 02:12:00 | 2015-10-08 19:19:30 | 218 |
date_x | OcnA | 75 | 0.57 | 2015-09-08 20:08:00 | 2016-03-03 15:57:00 | 2016-01-25 20:17:00 | 97 |
date_x | PhysA | 14 | 0.81 | 2015-09-09 12:24:00 | 2016-05-24 15:53:00 | 2015-10-08 21:17:00 | 60 |
date_y | AnPhA | 209 | 0.00 | 2015-09-02 15:31:00 | 2015-09-02 15:31:00 | 2015-09-02 15:31:00 | 1 |
date_y | BioA | 40 | 0.18 | 2015-11-17 03:04:00 | 2016-01-21 23:38:00 | 2016-01-16 23:48:00 | 9 |
date_y | FrScA | 392 | 0.10 | 2015-09-09 15:21:00 | 2016-01-22 15:43:00 | 2016-01-04 13:13:00 | 43 |
date_y | OcnA | 157 | 0.10 | 2015-09-12 15:56:00 | 2016-01-08 17:51:00 | 2015-09-18 04:08:30 | 18 |
date_y | PhysA | 50 | 0.32 | 2015-09-14 14:45:00 | 2016-01-22 05:36:00 | 2016-01-17 08:24:30 | 24 |
date | AnPhA | 189 | 0.10 | 2017-01-23 14:28:00 | 2017-02-10 15:25:00 | 2017-02-01 17:09:00 | 21 |
date | BioA | 47 | 0.04 | 2017-02-06 20:12:00 | 2017-02-09 19:15:00 | 2017-02-08 07:43:30 | 2 |
date | FrScA | 372 | 0.14 | 2017-01-23 13:14:00 | 2017-02-13 13:00:00 | 2017-01-24 17:23:00 | 62 |
date | OcnA | 155 | 0.11 | 2017-01-23 14:07:00 | 2017-02-09 18:45:00 | 2017-02-01 21:53:30 | 20 |
date | PhysA | 71 | 0.04 | 2017-01-30 14:41:00 | 2017-02-03 15:23:00 | 2017-02-02 20:54:00 | 3 |
GGplot is designed to work iteratively. You start with a layer that shows the raw data. Then you add layers of annotations and statistical summaries.
You can read more about ggplot in the book “GGPLOT: Elegant Graphics for Data Analysis”. You can also find lots of inspiration in the r-graph gallery that includes code. Finally you can use the GGPLOT cheat sheet to help.
” Elegant Graphics for Data Analysis” states that “every ggplot2 plot has three key components:
data,
A set of aesthetic mappings between variables in the data and visual properties, and
At least one layer which describes how to render each observation. Layers are usually created with a geom function.”
Create a basic visualization that examines a continuous variable of interest.
Which variable should we be looking at?
#inspect at the data frame
data_to_explore
## # A tibble: 943 × 34
## student_id subject semester section total_points_possible total_points_earned
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 43146 FrScA S216 02 1217 1150
## 2 44638 OcnA S116 01 1676 1384.
## 3 47448 FrScA S216 01 1232 1116
## 4 47979 OcnA S216 01 1833 1493.
## 5 48797 PhysA S116 01 2225 1995.
## 6 51943 FrScA S216 03 1222 70
## 7 52326 AnPhA S216 01 1775 1519.
## 8 52446 PhysA S116 01 2225 2198
## 9 53447 FrScA S116 01 1212 1173
## 10 53475 FrScA S116 02 1212 0
## # ℹ 933 more rows
## # ℹ 28 more variables: proportion_earned <dbl>, gender <chr>,
## # enrollment_reason <chr>, enrollment_status <chr>, time_spent <dbl>,
## # time_spent_hours <dbl>, course_id <chr>, int <dbl>, val <dbl>,
## # percomp <dbl>, tv <dbl>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>,
## # q6 <dbl>, q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, date_x <dttm>,
## # post_int <dbl>, post_uv <dbl>, post_tv <dbl>, post_percomp <dbl>, …
Includes:
aes
function: one continuous variable:
geom_bar()
function - bar
graphggplot(data_to_explore, aes(x = subject)) +
geom_bar()
ggplot(data_to_explore, aes(x = subject)) +
geom_bar() +
labs(title = "Number of Student Enrollments per Subject",
caption = "Which online courses have had the largest enrollment numbers?")
ggplot(data_to_explore, aes(x = subject, fill = gender)) +
geom_bar() +
labs(title = "Gender Distribution of Students Across Subjects",
caption = "Which subjects enroll more female students?")
aes()
function - one continuous
variables:
tv
variable mapped to x positionNEED HELP? TRY STHDA
Yours could look like something below…
ggplot(data_to_explore, aes(x = tv)) +
geom_histogram(bins = 5) +
labs(title = "Number of Hours Students Watch TV per Day",
caption = "Approximately how many students watch 4+ hours of TV per day?")
or maybe you added a theme()
data_to_explore%>%
ggplot(aes(x= tv))+
geom_histogram(bins = 5, fill = "red", colour = "black")+
labs(title = "Number of Hours Students Watch TV per Day",
caption = "Approximately how many students watch 4+ hours of TV per day?") +
theme_classic()
## Warning: Removed 292 rows containing non-finite values (`stat_bin()`).
Create a basic visualization that examines the relationship between two categorical variables.
count()
function for subject
,
enrollment
then,ggplot()
functionaes()
function - one continuous
variables
subject
variable mapped to x positionenrollment reason
variable mapped to x positiongeom_tile()
functiondata_to_explore %>%
count(subject, enrollment_reason) %>%
ggplot() +
geom_tile(mapping = aes(x = subject,
y = enrollment_reason,
fill = n)) +
labs(title = "Reasons for Enrollment by Subject",
caption = "Which subjects were the least available at local schools?")
Create a basic visualization that examines the relationship between two continuous variables.
Which variables should we be looking at?
#look at the data frame
data_to_explore
## # A tibble: 943 × 34
## student_id subject semester section total_points_possible total_points_earned
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 43146 FrScA S216 02 1217 1150
## 2 44638 OcnA S116 01 1676 1384.
## 3 47448 FrScA S216 01 1232 1116
## 4 47979 OcnA S216 01 1833 1493.
## 5 48797 PhysA S116 01 2225 1995.
## 6 51943 FrScA S216 03 1222 70
## 7 52326 AnPhA S216 01 1775 1519.
## 8 52446 PhysA S116 01 2225 2198
## 9 53447 FrScA S116 01 1212 1173
## 10 53475 FrScA S116 02 1212 0
## # ℹ 933 more rows
## # ℹ 28 more variables: proportion_earned <dbl>, gender <chr>,
## # enrollment_reason <chr>, enrollment_status <chr>, time_spent <dbl>,
## # time_spent_hours <dbl>, course_id <chr>, int <dbl>, val <dbl>,
## # percomp <dbl>, tv <dbl>, q1 <dbl>, q2 <dbl>, q3 <dbl>, q4 <dbl>, q5 <dbl>,
## # q6 <dbl>, q7 <dbl>, q8 <dbl>, q9 <dbl>, q10 <dbl>, date_x <dttm>,
## # post_int <dbl>, post_uv <dbl>, post_tv <dbl>, post_percomp <dbl>, …
Includes:
aes()
function - two continuous
variables
geom_point()
function - Scatter
plot#layer 1: add data and aesthetics mapping
ggplot(data_to_explore,
aes(x = time_spent_hours,
y = proportion_earned)) +
#layer 2: + geom function type
geom_point()
#layer 1: add data and aesthetics mapping
#layer 3: add color scale by type
ggplot(data_to_explore,
aes(x = time_spent_hours,
y = proportion_earned,
color = enrollment_status)) +
#layer 2: + geom function type
geom_point() +
#layer 4: add labels
labs(title="How Time Spent on Course LMS is Related to Points Earned in the course",
x="Time Spent (Hours)",
y = "Proportion of Points Earned")
#layer 1: add data and aesthetics mapping
#layer 4: add color scale by type
ggplot(data_to_explore,
aes(x = time_spent_hours,
y = proportion_earned,
color = enrollment_status)) +
#layer 2: + geom function type
geom_point() +
#layer 3: add labels
labs(title="How Time Spent on Course LMS is Related to Points Earned in the course",
x="Time Spent (Hours)",
y = "Proportion of Points Earned")
#layer 1: add data and aesthetics mapping
#layer 3: add color scale by type
ggplot(data_to_explore, aes(x = time_spent_hours, y = proportion_earned, color = enrollment_status)) +
#layer 2: + geom function type
geom_point() +
#layer 4: add labels
labs(title="How Time Spent on Course LMS is Related to Points Earned in the Course",
x="Time Spent (Hours)",
y = "Proportion of Points Earned")+
#layer 5: add facet wrap
facet_wrap(~ subject)
You can pipe the data with the dataframe and use
drop_na()
function.
drop_na
function to remove na’s from enrollment
status then,ggplot
function like abovedata_to_explore %>%
drop_na(enrollment_status) %>%
ggplot(aes(x = time_spent_hours,
y = proportion_earned,
color = enrollment_status)) +
geom_point() +
labs(title="How Time Spent on Course LMS is Related to Points Earned in the Course",
x="Time Spent (Hours)",
y = "Proportion of Points Earned")+
facet_wrap(~ subject)