Setup

Load Packages

library(janitor)
library(gt)
library(patchwork)
library(modelsummary)
library(tidyverse)

theme_set(theme_bw())

Note the use of theme_set to create ggplots using the theme_bw() approach rather than the default theme. That’s just my personal preference. There are several themes available, some of which are displayed here: theme_light() is another one I use regularly.

Load Data

lab1data <- read_csv("data/lab-01-survey-2020.csv")

names(lab1data)

 [1] "fake_id"         "ats_11"          "ats_12"          "ats_13"         
 [5] "ats_14"          "ats_15"          "lab_1"           "lab_2"          
 [9] "lab_3"           "watch_live"      "watch_rec"       "ats_21"         
[13] "ats_22"          "ats_23"          "ats_24"          "ats_25"         
[17] "tools_None"      "tools_R"         "tools_Rstudio"   "tools_Rmarkdown"
[21] "tools_ggplot"    "tools_dplyr"     "tools_pipe"      "tools_tidyverse"
[25] "tools_github"    "other_classes"   "nervous"         "election"       
[29] "communicate"     "biostats"        "pvalue"          "comfort"        
[33] "proj_data"

How many people are included in these data?

nrow(lab1data)

[1] 70

Project Data Sets?

Let’s tabulate the responses to the proj_data item, which was:

Do you have a data set in mind that you are hoping to analyze as part of a project in this course?

Possible responses were Yes, Maybe and No.

lab1data %>% tabyl(proj_data)

 proj_data  n   percent
     Maybe 23 0.3285714
        No 38 0.5428571
       Yes  9 0.1285714

One problem with this table is the order of the options. So, we’ll change the proj_data variable into a factor and also order it in a sensible way, as opposed to the default (alphabetically)

lab1data <- lab1data %>%
    mutate(proj_data = fct_relevel(factor(proj_data), "Yes", "Maybe", "No"))

Better Table of Counts

Now, we’ll tabulate the proj_data counts.

lab1data %>% tabyl(proj_data) %>% adorn_pct_formatting()

 proj_data  n percent
       Yes  9   12.9%
     Maybe 23   32.9%
        No 38   54.3%

The adorn_pct_formatting() allows the proportions to be labeled and presented as percentages.

I don’t actually expect anyone to have a workable data set in mind yet, since you don’t actually know what the projects will require. I will get back soon to people via email regarding their additional comments about their data sets (for the people who answered Yes or Maybe and gave me some additional information about their potential data.)

Plotting the `proj_data` counts

The most common choice is a bar chart for plotting a factor.

ggplot(lab1data, aes(x = proj_data)) +
    geom_bar()

That’s a reasonable start.

Lab 01 Status

How far along are you in completing Part 1 (the video)?

lab1data %>% tabyl(lab_1)

                               lab_1  n   percent
                I've completed this. 41 0.5857143
          I've not yet started this. 17 0.2428571
 I've started but not finished this. 12 0.1714286

I don’t love the ordering there, and I would like this to make more sense. So let’s put I’ve completed this at the end of the group (behind the other two options.)

Let’s try:

lab1data <- lab1data %>%
    mutate(lab_1 = 
               fct_relevel(factor(lab_1), 
                           "I've completed this.", after = 2))

The after = 2 causes “I’ve completed this.” to be placed after the other two options, which are left in the same order they were in originally.

For more on fct_relevel (which is part of the forcats package in the tidyverse) see our Course Notes and the RStudio Cheat Sheet for “Working with Factors”.

Now, let’s redraw the table to see the new order, and also place it in a kable and round the percentages.

lab1data %>% tabyl(lab_1) %>% knitr::kable(digits = 3)

lab_1	n	percent
I’ve not yet started this.	17	0.243
I’ve started but not finished this.	12	0.171
I’ve completed this.	41	0.586

How far along are you in completing Part 2 (interpreting my visualization)?

Again, we’ll re-order the responses to produce a sensible order. Then we’ll use a different option to create the tabyl.

lab1data <- lab1data %>%
    mutate(lab_2 = fct_relevel(factor(lab_2), 
                           "I've completed this.", after = 2))

lab1data %>% tabyl(lab_2) %>% adorn_pct_formatting()

                               lab_2  n percent
          I've not yet started this.  7   10.0%
 I've started but not finished this. 17   24.3%
                I've completed this. 46   65.7%

How far along are you in completing Part 3 (reacting to Spiegelhalter’s Intro)?

This time, we’ll use fct_relevel to resort all three levels into the reverse order that we’ve used so far.

lab1data <- lab1data %>%
    mutate(lab_3 = fct_relevel(factor(lab_3), 
                               "I've completed this.",
                               "I've started but not finished this.",
                               "I've not yet started this."))

And now we’ll use the gt() function from the gt package to make the table look nice.

lab1data %>% tabyl(lab_3) %>% adorn_pct_formatting() %>%
    gt()

lab_3	n	percent
I've completed this.	42	60.0%
I've started but not finished this.	21	30.0%
I've not yet started this.	7	10.0%

How many completed each part of the lab?

lab1data %>% 
    count(lab_1, lab_2, lab_3) %>% gt()

lab_1	lab_2	lab_3	n
I've not yet started this.	I've not yet started this.	I've not yet started this.	3
I've not yet started this.	I've started but not finished this.	I've started but not finished this.	3
I've not yet started this.	I've completed this.	I've completed this.	7
I've not yet started this.	I've completed this.	I've started but not finished this.	3
I've not yet started this.	I've completed this.	I've not yet started this.	1
I've started but not finished this.	I've not yet started this.	I've started but not finished this.	1
I've started but not finished this.	I've not yet started this.	I've not yet started this.	1
I've started but not finished this.	I've started but not finished this.	I've started but not finished this.	4
I've started but not finished this.	I've completed this.	I've completed this.	5
I've started but not finished this.	I've completed this.	I've started but not finished this.	1
I've completed this.	I've not yet started this.	I've completed this.	1
I've completed this.	I've not yet started this.	I've started but not finished this.	1
I've completed this.	I've started but not finished this.	I've completed this.	5
I've completed this.	I've started but not finished this.	I've started but not finished this.	5
I've completed this.	I've completed this.	I've completed this.	24
I've completed this.	I've completed this.	I've started but not finished this.	3
I've completed this.	I've completed this.	I've not yet started this.	2

Other Classes

I asked “How many other classes are you taking this semester?”

Which of these summaries seems more useful in this setting?

mosaic::favstats(~ other_classes, data = lab1data)

Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

 min Q1 median Q3 max     mean       sd  n missing
   0  1      2  3   7 2.414286 1.731991 70       0

lab1data %>% tabyl(other_classes) %>% adorn_pct_formatting()

 other_classes  n percent
             0  6    8.6%
             1 20   28.6%
             2 12   17.1%
             3 18   25.7%
             4  8   11.4%
             5  1    1.4%
             6  1    1.4%
             7  4    5.7%

Plotting other class counts

And which plot seems more useful?

ggplot(lab1data, aes(x = other_classes)) +
    geom_bar()

ggplot(lab1data, aes(x = other_classes)) +
    geom_histogram(binwidth = 1, fill = "#626262", col = "#0a304e") +   
    scale_x_discrete(limits = seq(0, 7, by = 1))

Note the use of scale_x_discrete to set the limits on the x axis here so that each observed value would appear.

Your Feelings on…

I asked you to react to six statements, in each case on a Strongly Agree to Strongly Disagree scale.

I feel nervous about the 431 course right now.
I think a lot about how to communicate scientific ideas.
I am taking 431 to help me become a biostatistician or data scientist.
I can accurately define a p value in a couple of sentences.
I am comfortable writing in English.
I am following the 2020 US Presidential Election closely.

Creating Factors all at once

First, we’ll make create factors out of each of these variables, and attribute the same set of levels (in the same order) to all six of them.

feel_levels <- c("Strongly Disagree", "Disagree", 
                 "Neutral", "Agree", "Strongly Agree")

lab1data <- lab1data %>%
    mutate(across(nervous:comfort, ~ factor(., levels = feel_levels)))

The across function allows us to include all variables in the data set from nervous through comfort. The ~ allows us to apply the factor function with appropriate levels to all six variables in one line of code.

Following the US Election?

We’d like to plot these results. As an example, we’ll look at responses to the last of these items: “I am following the 2020 US Presidential Election closely.”

A First Attempt

ggplot(lab1data, aes(x = election)) + 
    geom_bar(col = "black")

Improving the Plot

Let’s add a title, and also change the fill depending on the response.

ggplot(lab1data, aes(x = election, fill = election)) + 
    geom_bar(col = "black") + 
    labs(x = "", 
         title = "I am following the 2020 US Presidential Election closely")

We don’t really need the legend here, so we’ll drop that.

Getting Closer

ggplot(lab1data, aes(x = election, fill = election)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    labs(x = "", 
         title = "I am following the 2020 US Presidential Election closely")

I’m going to switch the choice of colors in the fill of the bars to an approach which may be better for people who don’t distinguish colors well, using the viridis scale built into ggplot2.

ggplot(lab1data, aes(x = election, fill = election)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_fill_viridis_d() +
    labs(x = "", 
         title = "I am following the 2020 US Presidential Election closely")

For this election variable, all five options were selected by at least one student. But for one of the other items, not all options were selected. Consider this plot:

ggplot(lab1data, aes(x = comfort, fill = comfort)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_fill_viridis_d() +
    labs(x = "", 
         title = "I am comfortable writing in English")

I’d like to make sure that all five possible values (SA, A, N, D, and SD) appear in the plot even if no one gave that response, so I’ll use drop = FALSE in the scale calls for the x axis and the fill.

Repairing the writing in English plot in this way, we have:

ggplot(lab1data, aes(x = comfort, fill = comfort)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "I am comfortable writing in English")

Using `drop = FALSE` on Election data

ggplot(lab1data, aes(x = election, fill = election)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "I am following the 2020 US Presidential Election closely")

Finally, I’d like to flip the axes, so that the bars extend out to the right instead of up from the bottom.

A Final Version

ggplot(lab1data, aes(x = election, fill = election)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I am following the 2020 US Presidential Election closely")

The reason to flip these will become clearer is mostly convenience as we put plots together in the next section.

Plots for all Six Items

Here are some plots of those results, gathered two at a time using the patchwork package.

p1 <- ggplot(lab1data, aes(x = nervous, fill = nervous)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I feel nervous about the 431 course right now")

p2 <- ggplot(lab1data, aes(x = communicate, fill = communicate)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I think a lot about how to communicate scientific ideas")

p1 / p2

p1 <- ggplot(lab1data, aes(x = biostats, fill = biostats)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I am taking 431 to help me become a biostatistician or data scientist")

p2 <- ggplot(lab1data, aes(x = pvalue, fill = pvalue)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I can accurately define a p value in a couple of sentences")

p1 / p2

p1 <- ggplot(lab1data, aes(x = comfort, fill = comfort)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I am comfortable writing in English")

p2 <- ggplot(lab1data, aes(x = election, fill = election)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "I am following the 2020 US Presidential Election closely")

p1 / p2

Do any of these results surprise you?

Attitudes Toward Statistics

Here’s a tabyl of the responses to the first of the ten items I asked about your attitudes towards statistics. Each had five responses, ranging from Strongly Agree through Strongly Disagree.

The first item, labeled ats_11 in the data, was the statement: “I feel that statistics will be useful to me in my profession.”

Here are the results:

lab1data %>% tabyl(ats_11)

         ats_11  n   percent
          Agree 17 0.2428571
 Strongly Agree 53 0.7571429

There are several things to clean up here.

First, let’s assign levels in the correct order (from Strongly Agree down to Strongly Disagree) for this item, and make it a factor while we’re at it. We’ll do this in a temporary tibble, since we’ll show another way to accomplish this shortly.

ats_levels <- c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree")

temp <- lab1data %>% 
    mutate(ats_11 = factor(ats_11, levels = ats_levels))

Now, let’s see what happens when we create a tabyl of the ats_11 data from this new temp tibble.

temp %>% tabyl(ats_11)

            ats_11  n   percent
    Strongly Agree 53 0.7571429
             Agree 17 0.2428571
           Neutral  0 0.0000000
          Disagree  0 0.0000000
 Strongly Disagree  0 0.0000000

That’s fine, but it takes quite a while to repeat the mutate process for all 10 ats items. We could use the across() function from dplyr to help us do this all at once.

lab1data <- lab1data %>%
    mutate(across(starts_with("ats_"), ~ factor(., levels = ats_levels)))

The across() function is a new piece of dplyr and was not available for 431 last year.

Item Group A

Let’s start with a closer look at three of the ten items, which we’ll call Group A.

Item	Description
`ats_11`	I feel that statistics will be useful to me in my profession.
`ats_12`	Most people would benefit from taking a statistics course.
`ats_14`	Statistics is an inseparable aspect of scientific research.

Tabulating Group A

lab1data %>% tabyl(ats_11) %>% adorn_pct_formatting()

            ats_11  n percent
    Strongly Agree 53   75.7%
             Agree 17   24.3%
           Neutral  0    0.0%
          Disagree  0    0.0%
 Strongly Disagree  0    0.0%

lab1data %>% tabyl(ats_12) %>% adorn_pct_formatting() %>%
    adorn_title(row_name = "Most people would benefit from a course.", 
                col_name = "")

                                                    
 Most people would benefit from a course.  n percent
                           Strongly Agree 43   61.4%
                                    Agree 19   27.1%
                                  Neutral  8   11.4%
                                 Disagree  0    0.0%
                        Strongly Disagree  0    0.0%

Note that I had to adjust the chunk to include warnings = FALSE to avoid a warning here. This is because the adorn_title() function is being used here on a one-way table, when it is designed for cross-classifications (two-way tables.)

lab1data %>% tabyl(ats_14, show_missing_levels = FALSE) %>% 
    adorn_pct_formatting() %>%
    knitr::kable()

ats_14	n	percent
Strongly Agree	47	67.1%
Agree	20	28.6%
Neutral	3	4.3%

Plotting Group A

We’ll use ggplot functions to build the plots for each of the three items separately, and then combine them into a single plot with some tools from the patchwork package.

p11 <- ggplot(lab1data, aes(x = ats_11, fill = ats_11)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "I feel that statistics will be useful to me in my profession.")

p12 <- ggplot(lab1data, aes(x = ats_12, fill = ats_12)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "Most people would benefit from taking a statistics course.")

p14 <- ggplot(lab1data, aes(x = ats_14, fill = ats_14)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "Statistics is an inseparable aspect of scientific research.")


(p11 / p12 / p14) +
    plot_annotation("Attitudes towards Statistics Items with Positive Frames")

Item Group B

Group B consists of four additional items that were also framed positively, in that I was hoping for answers closer to “Strongly Agree” than “Strongly Disagree”.

Item	Description
`ats_15`	I am excited at the prospect of using statistics in my work.
`ats_21`	One becomes a more effective “consumer” of research findings if one has some training in statistics.
`ats_23`	Statistical training is relevant to my performance in my field of study.
`ats_25`	Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.

Plotting Group B

p15 <- ggplot(lab1data, aes(x = ats_15, fill = ats_15)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "Excited to use in my work")

p21 <- ggplot(lab1data, aes(x = ats_21, fill = ats_21)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "Effective consumer")

p23 <- ggplot(lab1data, aes(x = ats_23, fill = ats_23)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "Relevant to my field")

p25 <- ggplot(lab1data, aes(x = ats_25, fill = ats_25)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    coord_flip() +
    labs(x = "", 
         title = "Necessary for citizenship")


(p15 + p21) / (p23 + p25) +
    plot_annotation(
        "Attitudes towards Statistics: Four More Positively Framed Items")

The last of these items (on the bottom right of the plot) appears to be the least agreeable of these responses among students in this year’s class.

You’ll note that I used coord_flip() to switch the X and Y axes here so that the categories (in particular Strongly Disagree) would be legible even when I put plots next to each other horizontally.

Tabulating Group B

We could use the approach we took previously (with tabyl) to look at these pieces, or We might also consider using the datasummary function from the modelsummary package here to obtain a contingency table. More on this function at https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html.

datasummary(
    ('Excited to use in my work' = ats_15) + 
        ('Effective consumer' = ats_21) +
        ('Relevant to my field' = ats_23) +
        ('Necessary for citizenship' = ats_25)
    ~ (N = 1) + Percent(),
    data = lab1data, fmt = NULL)

		N	Percent
“Excited to use in my work”	Strongly Agree	51	72.857143
	Agree	16	22.857143
	Neutral	3	4.285714
	Disagree	0	0.000000
	Strongly Disagree	0	0.000000
“Effective consumer”	Strongly Agree	31	44.285714
	Agree	35	50.000000
	Neutral	4	5.714286
	Disagree	0	0.000000
	Strongly Disagree	0	0.000000
“Relevant to my field”	Strongly Agree	41	58.571429
	Agree	26	37.142857
	Neutral	2	2.857143
	Disagree	1	1.428571
	Strongly Disagree	0	0.000000
“Necessary for citizenship”	Strongly Agree	10	14.285714
	Agree	20	28.571429
	Neutral	31	44.285714
	Disagree	9	12.857143
	Strongly Disagree	0	0.000000

Item Group C

The remaining three 10 items were framed negatively, so that I was hoping for answers closer to “Strongly Disagree” than “Strongly Agree”. We’ll call those Group C.

Item	Description
`ats_13`	I have difficulty seeing how statistics relates to my field of study.
`ats_22`	Dealing with numbers makes me uneasy.
`ats_24`	Statistical analysis is best left to the “experts” and should not be part of a typical scientist’s job.

Plotting Group C

p13 <- ggplot(lab1data, aes(x = ats_13, fill = ats_13)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "I have difficulty seeing how statistics relates to my field of study.")

p22 <- ggplot(lab1data, aes(x = ats_22, fill = ats_22)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "Dealing with numbers makes me uneasy.")

p24 <- ggplot(lab1data, aes(x = ats_24, fill = ats_24)) + 
    geom_bar(col = "black") + 
    guides(fill = FALSE) +
    scale_x_discrete(drop = FALSE) +
    scale_fill_viridis_d(drop = FALSE) +
    labs(x = "", 
         title = "Statistical analysis is best left to the 'experts'...")

(p13 / p22 / p24) +
    plot_annotation("Attitudes towards Statistics Items with Negative Frames")

It’s gratifying, and not especially surprising, that people in this course tended to respond to these items further towards the “Strongly Disagree” side of the scale.

Tabulating Group C

datasummary(
    ('Difficulty relating to my field' = ats_13) + 
        ('Uneasy dealing with numbers' = ats_22) +
        ('Best left to the experts' = ats_24)
    ~ (N = 1) + Percent(),
    data = lab1data, fmt = NULL)

		N	Percent
“Difficulty relating to my field”	Strongly Agree	2	2.857143
	Agree	1	1.428571
	Neutral	2	2.857143
	Disagree	21	30.000000
	Strongly Disagree	44	62.857143
“Uneasy dealing with numbers”	Strongly Agree	1	1.428571
	Agree	12	17.142857
	Neutral	15	21.428571
	Disagree	33	47.142857
	Strongly Disagree	9	12.857143
“Best left to the experts”	Strongly Agree	1	1.428571
	Agree	2	2.857143
	Neutral	10	14.285714
	Disagree	32	45.714286
	Strongly Disagree	25	35.714286

Tools Used Before 431

I asked Prior to starting this course, which of the following have you used to do something meaningful? The options available to you (picking all that apply) included:

none of these
R
RStudio
R Markdown
Github
dplyr
ggplot2
the pipe %>%
the tidyverse

Working with data from “choose all that apply” questions is always an extra effort. Let’s start with the folks who chose “none of the above” meaning that all of this is new to them.

lab1data %>% tabyl(tools_None) %>% adorn_pct_formatting()

 tools_None  n percent
          0 38   54.3%
          1 32   45.7%

Here, 1 indicates that the option was selected, and 0 indicates that it was not. That’s a substantial fraction of the class who gave the response “none of these”, certainly. You folks are not alone.

I’ll note that this 1/0 coding means that the mean of the tools_None variable also gives us the proportion of respondents who chose this option, and the sum gives us the count.

lab1data %>% 
    summarize(People = sum(tools_None), 
              Proportion = mean(tools_None))

# A tibble: 1 x 2
  People Proportion
   <dbl>      <dbl>
1     32      0.457

We can use the across function again here to help with our summarizing, and then show the results four at a time, so that they fit in the available space, like…

lab1data %>% summarize(across(starts_with("tools_"), sum)) %>%
    select(2:5)

# A tibble: 1 x 4
  tools_R tools_Rstudio tools_Rmarkdown tools_ggplot
    <dbl>         <dbl>           <dbl>        <dbl>
1      25            30              16           17

lab1data %>% summarize(across(starts_with("tools_"), sum)) %>%
    select(6:9)

# A tibble: 1 x 4
  tools_dplyr tools_pipe tools_tidyverse tools_github
        <dbl>      <dbl>           <dbl>        <dbl>
1          13         12              11            6

It is a bit odd to see people who said that they had used RStudio, but not R, but there are definitely people who use RStudio as an editor or development environment for work in languages other than R, like Python or SQL.

Plotting the Tools

our_table <- lab1data %>% summarize(across(starts_with("tools_"), mean)) %>%
    pivot_longer(cols = starts_with("tools_"), names_to = "tools_used", values_to = "proportion") %>%
    mutate(tools = str_remove(tools_used, "tools_")) %>%
    mutate(tools = fct_reorder(factor(tools), -proportion))

our_table

# A tibble: 9 x 3
  tools_used      proportion tools    
  <chr>                <dbl> <fct>    
1 tools_None          0.457  None     
2 tools_R             0.357  R        
3 tools_Rstudio       0.429  Rstudio  
4 tools_Rmarkdown     0.229  Rmarkdown
5 tools_ggplot        0.243  ggplot   
6 tools_dplyr         0.186  dplyr    
7 tools_pipe          0.171  pipe     
8 tools_tidyverse     0.157  tidyverse
9 tools_github        0.0857 github

OK. Now, I’m ready to plot the table. Since we’re plotting an actual value here for each tool (the proportion) we’ll use geom_col rather than geom_bar, and just for fun, I’ll be using CWRU’s blue (for the fill) and grey (for the lines around the bars.)

ggplot(our_table,  aes(x = tools, y = proportion)) +
    geom_col(col = "#626262", fill = "#0a304e") + 
    labs(title = "Proportion of 431 students with prior experience")

I hope that’s helpful for you.

How You’ll Take 431

Two remaining items, each on a five point scale from Always to Never…

How often do you anticipate participating in (watching) our 431 classes live, as they happen? (stored in watch_live)
How often do you anticipate watching the recordings of our class Zoom sessions, after they happen? (stored in watch_rec)

lab1data %>% count(watch_live)

# A tibble: 4 x 2
  watch_live                 n
  <chr>                  <int>
1 About half of the time     6
2 Always                    42
3 Never                      1
4 Usually                   21

lab1data %>% count(watch_rec)

# A tibble: 5 x 2
  watch_rec                  n
  <chr>                  <int>
1 About half of the time    15
2 Always                     7
3 Never                      1
4 Seldom                    36
5 Usually                   11

First, let’s change these variables to factors, and re-order them in a sensible way.

watch_levels <- 
    c("Always", "Usually", "About half of the time", "Seldom", "Never")

lab1data <- lab1data %>%
    mutate(across(starts_with("watch_"), ~ factor(., levels = watch_levels)))

Now, let’s build a two-way table, with titles…

lab1data %>% tabyl(watch_live, watch_rec) %>% adorn_title()

                        watch_rec                                            
             watch_live    Always Usually About half of the time Seldom Never
                 Always         4       7                      8     23     0
                Usually         1       4                      3     13     0
 About half of the time         2       0                      4      0     0
                 Seldom         0       0                      0      0     0
                  Never         0       0                      0      0     1

The most common combination appears to be watching live always with watching the recording seldom.

I am obviously concerned about the one student who listed that they were never going to watch the live version or the recording. That’s something I need to look into.

Session Information

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] forcats_0.5.0      stringr_1.4.0      dplyr_1.0.2        purrr_0.3.4       
 [5] readr_1.3.1        tidyr_1.1.2        tibble_3.0.3       ggplot2_3.3.2     
 [9] tidyverse_1.3.0    modelsummary_0.6.1 patchwork_1.0.1    gt_0.2.2          
[13] janitor_2.0.1     

loaded via a namespace (and not attached):
 [1] fs_1.5.0          lubridate_1.7.9   webshot_0.5.2     httr_1.4.2       
 [5] tools_4.0.2       backports_1.1.7   utf8_1.1.4        R6_2.4.1         
 [9] DBI_1.1.0         lazyeval_0.2.2    colorspace_1.4-1  withr_2.2.0      
[13] tidyselect_1.1.0  gridExtra_2.3     leaflet_2.0.3     compiler_4.0.2   
[17] cli_2.0.2         rvest_0.3.6       xml2_1.3.2        ggdendro_0.1.21  
[21] labeling_0.3      sass_0.2.0        mosaicCore_0.6.0  scales_1.1.1     
[25] checkmate_2.0.0   tables_0.9.4      digest_0.6.25     ggformula_0.9.4  
[29] rmarkdown_2.3.3   pkgconfig_2.0.3   htmltools_0.5.0   dbplyr_1.4.4     
[33] highr_0.8         htmlwidgets_1.5.1 rlang_0.4.7       readxl_1.3.1     
[37] rstudioapi_0.11   farver_2.0.3      generics_0.0.2    jsonlite_1.7.0   
[41] crosstalk_1.1.0.1 magrittr_1.5      kableExtra_1.2.1  mosaicData_0.18.0
[45] Matrix_1.2-18     Rcpp_1.0.5        munsell_0.5.0     fansi_0.4.1      
[49] lifecycle_0.2.0   stringi_1.4.6     yaml_2.2.1        snakecase_0.11.0 
[53] MASS_7.3-52       ggstance_0.3.4    grid_4.0.2        blob_1.2.1       
[57] ggrepel_0.8.2     crayon_1.3.4      lattice_0.20-41   haven_2.3.1      
[61] splines_4.0.2     hms_0.5.3         knitr_1.29        pillar_1.4.6     
[65] reprex_0.3.0      glue_1.4.2        evaluate_0.14     modelr_0.1.8     
[69] vctrs_0.3.3       tweenr_1.0.1      cellranger_1.1.0  gtable_0.3.0     
[73] polyclip_1.10-0   assertthat_0.2.1  xfun_0.16         ggforce_0.3.2    
[77] broom_0.7.0       viridisLite_0.3.0 mosaic_1.7.0      ellipsis_0.3.1

Exploring Lab 01 Surveys

Thomas E. Love

2020-09-08 10:35:36

Setup

Load Packages

Load Data

Project Data Sets?

Better Table of Counts

Plotting the `proj_data` counts

Lab 01 Status

How far along are you in completing Part 1 (the video)?

How far along are you in completing Part 2 (interpreting my visualization)?

How far along are you in completing Part 3 (reacting to Spiegelhalter’s Intro)?

How many completed each part of the lab?

Other Classes

Plotting other class counts

Your Feelings on…

Creating Factors all at once

Following the US Election?

A First Attempt

Improving the Plot

Getting Closer

Using `drop = FALSE` on Election data

A Final Version

Plots for all Six Items

Attitudes Toward Statistics

Item Group A

Tabulating Group A

Plotting Group A

Item Group B

Plotting Group B

Tabulating Group B

Item Group C

Plotting Group C

Tabulating Group C

Tools Used Before 431

Plotting the Tools

How You’ll Take 431

Session Information

Exploring Lab 01 Surveys

Thomas E. Love

2020-09-08 10:35:36

Setup

Load Packages

Load Data

Project Data Sets?

Better Table of Counts

Plotting the proj_data counts

Lab 01 Status

How far along are you in completing Part 1 (the video)?

How far along are you in completing Part 2 (interpreting my visualization)?

How far along are you in completing Part 3 (reacting to Spiegelhalter’s Intro)?

How many completed each part of the lab?

Other Classes

Plotting other class counts

Your Feelings on…

Creating Factors all at once

Following the US Election?

A First Attempt

Improving the Plot

Getting Closer

Using drop = FALSE on Election data

A Final Version

Plots for all Six Items

Attitudes Toward Statistics

Item Group A

Tabulating Group A

Plotting Group A

Item Group B

Plotting Group B

Tabulating Group B

Item Group C

Plotting Group C

Tabulating Group C

Tools Used Before 431

Plotting the Tools

How You’ll Take 431

Session Information

Plotting the `proj_data` counts

Using `drop = FALSE` on Election data