Quantitative Methods Workshop

Anshul Kumar

12:45–2:30 p.m. on Oct 6 2020 (MGHIHP HE942)

Access and configure the slides

Learning objectives

  1. Identify and distinguish the types of HPEd research questions that can be answered using quantitative methods. Review types of methods that can be used to answer the RQs.

  2. Organize data in a spreadsheet or database to use these quantitative methods appropriately.

  3. Use examples of quantitative studies to review, critique, and evaluate the entire process of asking a question, gathering data, data analysis, interpreting what the results can and cannot tell us, reporting results responsibly, identifying limitations and ethical concerns, and taking action.

Schedule

  1. 12:45 p.m. – Introductory presentation from Anshul
    1. Associative research question example (long)
    2. Predictive research question example (short)
    3. Descriptive research question example (short)
  2. 1:00 p.m. – Small group abstract reading and response activity
  3. 1:20 p.m. – Full group discussions
    1. 1:25 p.m. – Group A
    2. 1:35 p.m. – Group B
    3. 1:45 p.m. – Group C
    4. 1:55 p.m. – Group D
  4. 2:05 p.m. – Questions, answers, and discussion
  5. 2:20 p.m. – Review types of quantitative research questions and common analytic methods, if time permits
  6. 2:30 p.m. – End

Example 1 – Find the association

What is the apparent relationship (association) between MCAT score (mcat) and hours spent studying (study)?

  • We surveyed 15 students who were studying for the MCAT.
  • We asked them their MCAT score percentile and how many hours they spent studying.
  • Then we made this graph.


  • Question: What is the apparent relationship (association) between MCAT score (mcat) and hours spent studying (study)?

Draw the line that shows the relationship between mcat and study.

  • The slope of the line is -0.51.
  • For each additional hour spent studying, a student’s MCAT score percentile is predicted to drop by 0.51 units.

Does this relationship between mcat and study make sense?

No. 

Why does it not make sense?

You should score better if you study more. But we see the opposite on the graph.

Disaggregate the data and check again.

  • It turns out that these students come from different groups (classrooms, in this case).
  • Now what is the apparent relationship between mcat and study?

Draw the lines for the new relationship.

  • The slope of each of the three lines is 0.44.
  • For each additional hour spent studying, a student’s MCAT score percentile is predicted to increase by 0.44 units.

Simpson’s Paradox

What do the raw data look like?

study mcat classroom
50 77 A
53 76 A
55 77 A
53 78 A
57 79 A
60 73 B
63 74 B
65 73 B
63 75 B
67 75 B
70 63 C
73 64 C
75 65 C
73 66 C
77 70 C
  1. What is the unit of observation? (What does a row represent?)
  2. The students in Classroom A all do better than those in the other two rooms. Why might this be the case?
  3. What are some additional columns that we could add to the data sheet?
  4. Separate from classroom-level characteristics, what could explain the differences we see across students from the three classrooms?

How do we apply this associative analytic approach to real data?

You will learn how to apply this in the course HE-902!

fit1 <- lm(mcat ~ study, data = prep) # trend line, ignore classrooms
fit2 <- lm(mcat ~ study + classroom, data = prep) # trend line with classrooms

prep$fit1pred = predict(fit1) # extract information from 1st trend line
prep$fit2pred = predict(fit2) # extract information from 2nd trend line

ggplot(prep, aes(x = study, y = mcat, color = classroom)) + # make axes
     geom_point() + # add points
     geom_line(aes(y = fit2pred), size = 1) + # add 2nd line with colors
     geom_line(aes(y = fit1pred), size = 1, color = 'black') # add 1st line in black

Associative research question


Example 2 – Predict specific outcomes

Fall 2019 students

classroom study mcat
A 50 77
A 53 76
A 55 77
A 53 78
A 57 79
B 60 73
B 63 74
B 65 73
B 63 75
B 67 75
C 70 63
C 73 64
C 75 65
C 73 66
C 77 70



Question: Given patterns we find in the student data from 2019 on the left, what are the unknown MCAT scores for the current 2020 students on the right?

Fall 2020 students

classroom study mcat
A 45 ?
A 57 ?
A 46 ?
A 60 ?
A 57 ?
B 70 ?
B 64 ?
B 68 ?
B 61 ?
B 69 ?
C 68 ?
C 80 ?
C 78 ?
C 74 ?
C 74 ?

How do we apply this predictive analytic approach to real data?

You will learn how to apply this in the course HE-930!

Predictive research question

Example 3 – Describe specific outcomes



Question: What are the mean and standard deviation of the number of hours students study and of their MCAT scores?

Student Data

classroom study mcat
A 50 77
A 53 76
A 55 77
A 53 78
A 57 79
B 60 73
B 63 74
B 65 73
B 63 75
B 67 75
C 70 63
C 73 64
C 75 65
C 73 66
C 77 70

How do we apply this descriptive analytic approach to real data?

You will learn how to apply this in the course HE-902!

  • We focus heavily on inference, meaning the process of using a sample of people or things to answer a question about a much larger population of those people or things that we actually want to study.

  • You may be familiar with terms such as confidence intervals and p-values. These tell us how certain or uncertain we can be about the answer to our research question in the entire population based on our analysis of our sample.

Click here for image source.

Descriptive research question

Summary of selected quantitative research question types

Question type Specific Example Generic form
Associative What is the relationship between MCAT score and hours spent studying? What is the relationship between Y and X in [some population]?
Predictive Given patterns we find in the student data from 2019, what are the unknown MCAT scores for the students in 2020? Given patterns we find in complete data, what are predicted outcomes for incomplete data?
Descriptive What are the mean and standard deviation of the number of hours students study and of their MCAT scores? How much of an outcome we care about is happening in our sample and/or population? How much does this outcome vary? How is this outcome distributed throughout our sample and/or population?


Group activity

For 20 minutes now you will…

  1. Go to the slide with your name and your group’s name on it and follow the instructions there.
  2. Work in a small group of four people total.
  3. Read the abstract of a published study that uses quantitative methods.
  4. Answer some questions about the study, based only on what you learned from the abstract.

Then…

  1. We will reconvene in a large group.
  2. Each group will present the answers to some of their questions.

Small group rosters

A B C D
Egide A. Nuha E. Maria B. Kevin A.
Robin K. Hani L. Paul L. Melissa M.
Alex M. Cynthia M. Ann M. Maura P.
Dawn W. Maria S.P. Kela R. Anne W.


OPEN these slides on your computer:

Click on HIDE TOOLBARS in the bottom-right corner for better viewing.

Group A task (complete by 1:20 p.m.)

Group members: Egide A., Robin K., Alex M., Dawn W.

First read this abstract on the left side…

Study: Winkler-Schwartz, A., et al. (2019). Machine learning identification of surgical and operative factors associated with surgical expertise in virtual reality simulation. JAMA network open, 2(8), e198363-e198363.

Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.

Key points and abstract

  • Question – Can a machine learning algorithm differentiate participants according to their stage of practice in a complex simulated neurosurgical task?
  • Findings – In this case series study, 50 individuals (14 neurosurgeons, 4 fellows, 10 senior residents, 10 junior residents, and 12 medical students) participated in 250 simulated tumor resections. An accuracy of 90% was achieved using 6 performance features by a K-nearest neighbor algorithm and 2 neurosurgeons, 1 fellow or senior resident, 1 junior resident, and 1 medical student were misclassified.
  • Meaning – The findings suggest that machine learning algorithms may be capable of classifying surgical expertise with greater granularity and precision than has been previously demonstrated in surgery.
  • Importance – Despite advances in the assessment of technical skills in surgery, a clear understanding of the composites of technical expertise is lacking. Surgical simulation allows for the quantitation of psychomotor skills, generating data sets that can be analyzed using machine learning algorithms.
  • Objective – To identify surgical and operative factors selected by a machine learning algorithm to accurately classify participants by level of expertise in a virtual reality surgical procedure.
  • Design, Setting, and Participants – Fifty participants from a single university were recruited between March 1, 2015, and May 31, 2016, to participate in a case series study at McGill University Neurosurgical Simulation and Artificial Intelligence Learning Centre. Data were collected at a single time point and no follow-up data were collected. Individuals were classified a priori as expert (neurosurgery staff), seniors (neurosurgical fellows and senior residents), juniors (neurosurgical junior residents), and medical students, all of whom participated in 250 simulated tumor resections.
  • Exposures – All individuals participated in a virtual reality neurosurgical tumor resection scenario. Each scenario was repeated 5 times.
  • Main Outcomes and Measures – Through an iterative process, performance metrics associated with instrument movement and force, resection of tissues, and bleeding generated from the raw simulator data output were selected by K-nearest neighbor, naive Bayes, discriminant analysis, and support vector machine algorithms to most accurately determine group membership.
  • Results – A total of 50 individuals (9 women and 41 men; mean [SD] age, 33.6 [9.5] years; 14 neurosurgeons, 4 fellows, 10 senior residents, 10 junior residents, and 12 medical students) participated. Neurosurgeons were in practice between 1 and 25 years, with 9 (64%) involving a predominantly cranial practice. The K-nearest neighbor algorithm had an accuracy of 90% (45 of 50), the naive Bayes algorithm had an accuracy of 84% (42 of 50), the discriminant analysis algorithm had an accuracy of 78% (39 of 50), and the support vector machine algorithm had an accuracy of 76% (38 of 50). The K-nearest neighbor algorithm used 6 performance metrics to classify participants, the naive Bayes algorithm used 9 performance metrics, the discriminant analysis algorithm used 8 performance metrics, and the support vector machine algorithm used 8 performance metrics. Two neurosurgeons, 1 fellow or senior resident, 1 junior resident, and 1 medical student were misclassified.
  • Conclusions and Relevance – In a virtual reality neurosurgical tumor resection study, a machine learning algorithm successfully classified participants into 4 levels of expertise with 90% accuracy. These findings suggest that algorithms may be capable of classifying surgical expertise with greater granularity and precision than has been previously demonstrated in surgery.

…then answer these questions on the right side.

Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!

  1. What is the research question that the study addressed? (1 sentence, ending in a question mark; can be copy-pasted verbatim)
  2. What type(s) of quantitative question was asked? (1 word)
  3. What was the main finding? Pick only the most important one, if there were multiple. (1 sentence or bullet; can be copy-pasted verbatim)
  4. Based on the abstract, to what extent did the study adequately answer the research question? Are you convinced that the main finding adequately answers the question? In other words, what was good/strong? (1–3 sentences or bullets)
  5. Based on the abstract, what are the limitations of the study (including possible sources of bias)? In what ways does/might the study fail to fully answer the research question? In other words, what was bad/weak? (1–3 sentences or bullets)
  6. Identify at least one positive or negative ethical implication of the way this study was conducted and/or its results? (1–3 sentences or bullets)
  7. How can the main finding now be applied to solve a real world problem, if at all? (1–2 sentences or bullets)
  8. Considering everything you know about the study just from reading the abstract, give the study a letter grade on the A–F scale. Write a one-sentence justification of this grade, if time permits. (One letter and one sentence)

If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.

Group B task (complete by 1:20 p.m.)

Group members: Nuha E., Hani L., Cynthia M., Maria S.P.

First read this abstract on the left side…

Study: Dyrbye, L. N., et al. (2019). Effect of a professional coaching intervention on the well-being and distress of physicians: a pilot randomized clinical trial. JAMA internal medicine, 179(10), 1406-1414.

Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.

Key points and abstract

  • Question – Does professional coaching result in measurable reductions in burnout and measurable improvements in quality of life, resilience, job satisfaction, engagement, and fulfillment in physicians?
  • Findings – In this pilot randomized clinical trial of 88 physicians, participants who received professional coaching had a significant reduction in emotional exhaustion and overall symptoms of burnout, as well as improvements in overall quality of life and resilience.
  • Meaning – Professional coaching may be an effective strategy to reduce burnout and improve well-being for physicians.
  • Importance – Burnout symptoms among physicians are common and have potentially serious ramifications for physicians and their patients. Randomized studies testing interventions to address burnout have been uncommon.
  • Objective – To explore the effect of individualized coaching on the well-being of physicians.
  • Design, Setting, and Participants – A pilot randomized clinical trial involving 88 practicing physicians in the departments of medicine, family medicine, and pediatrics who volunteered for coaching was conducted between October 9, 2017, and March 27, 2018, at Mayo Clinic sites in Arizona, Florida, Minnesota, and Wisconsin. Statistical analysis was conducted from August 24, 2018, to March 25, 2019.
  • Exposures – A total of 6 coaching sessions facilitated by a professional coach.
  • Main Outcomes and Measures – Burnout, quality of life, resilience, job satisfaction, engagement, and meaning at work using established metrics. Analysis was performed on an intent-to-treat basis.
  • Results – Among the 88 physicians in the study (48 women and 40 men), after 6 months of professional coaching, emotional exhaustion decreased by a mean (SD) of 5.2 (8.7) points in the intervention group compared with an increase of 1.5 (7.7) points in the control group by the end of the study (P < .001). Absolute rates of high emotional exhaustion at 5 months decreased by 19.5% in the intervention group and increased by 9.8% in the control group (−29.3% [95% CI, −34.0% to −24.6%]) (P < .001). Absolute rates of overall burnout at 5 months also decreased by 17.1% in the intervention group and increased by 4.9% in the control group (−22.0% [95% CI, −25.2% to −18.7%]) (P < .001). Quality of life improved by a mean (SD) of 1.2 (2.5) points in the intervention group compared with 0.1 (1.7) points in the control group (1.1 points [95% CI, 0.04-2.1 points]) (P = .005), and resilience scores improved by a mean (SD) of 1.3 (5.2) points in the intervention group compared with 0.6 (4.0) points in the control group (0.7 points [95% CI, 0.0-3.0 points]) (P = .04). No statistically significant differences in depersonalization, job satisfaction, engagement, or meaning in work were observed.
  • Conclusions and Relevance – Professional coaching may be an effective way to reduce emotional exhaustion and overall burnout as well as improve quality of life and resilience for some physicians.

…then answer these questions on the right side.

Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!

  1. What is the research question that the study addressed? (1 sentence, ending in a question mark; can be copy-pasted verbatim)
  2. What type(s) of quantitative question was asked? (1 word)
  3. What was the main finding? Pick only the most important one, if there were multiple. (1 sentence or bullet; can be copy-pasted verbatim)
  4. Based on the abstract, to what extent did the study adequately answer the research question? Are you convinced that the main finding adequately answers the question? In other words, what was good/strong? (1–3 sentences or bullets)
  5. Based on the abstract, what are the limitations of the study (including possible sources of bias)? In what ways does/might the study fail to fully answer the research question? In other words, what was bad/weak? (1–3 sentences or bullets)
  6. Identify at least one positive or negative ethical implication of the way this study was conducted and/or its results? (1–3 sentences or bullets)
  7. How can the main finding now be applied to solve a real world problem, if at all? (1–2 sentences or bullets)
  8. Considering everything you know about the study just from reading the abstract, give the study a letter grade on the A–F scale. Write a one-sentence justification of this grade, if time permits. (One letter and one sentence)

If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.

Group C task (complete by 1:20 p.m.)

Group members: Maria B., Paul L., Ann M., Kela R.

First read this abstract on the left side…

Study: Hauer, K. E., et al. (2008). Factors associated with medical students’ career choices regarding internal medicine. Jama, 300(10), 1154-1164.

Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.

Abstract

  • Context – Shortfalls in the US physician workforce are anticipated as the population ages and medical students’ interest in careers in internal medicine (IM) has declined (particularly general IM, the primary specialty serving older adults). The factors influencing current students’ career choices regarding IM are unclear.
  • Objectives – To describe medical students’ career decision making regarding IM and to identify modifiable factors related to this decision making.
  • Design, Setting, and Participants – Web-based cross-sectional survey of 1177 fourth-year medical students (82% response rate) at 11 US medical schools in spring 2007.
  • Main Outcome Measures – Demographics, debt, educational experiences, and number who chose or considered IM careers were measured. Factor analysis was performed to assess influences on career chosen. Logistic regression analysis was conducted to assess independent association of variables with IM career choice.
  • Results – Of 1177 respondents, 274 (23.2%) planned careers in IM, including 24 (2.0%) in general IM. Only 228 (19.4%) responded that their core IM clerkship made a career in general IM seem more attractive, whereas 574 (48.8%) responded that it made a career in subspecialty IM more attractive. Three factors influenced career choice regarding IM: educational experiences in IM, the nature of patient care in IM, and lifestyle. Students were more likely to pursue careers in IM if they were male (odds ratio [OR] 1.75; 95% confidence interval [CI], 1.20-2.56), were attending a private school (OR, 1.88; 95% CI, 1.26-2.83), were favorably impressed with their educational experience in IM (OR, 4.57; 95% CI, 3.01-6.93), reported favorable feelings about caring for IM patients (OR, 8.72; 95% CI, 6.03-12.62), or reported a favorable impression of internists’ lifestyle (OR, 2.00; 95% CI, 1.39-2.87).
  • Conclusions – Medical students valued the teaching during IM clerkships but expressed serious reservations about IM as a career. Students who reported more favorable impressions of the patients cared for by internists, the IM practice environment, and internists’ lifestyle were more likely to pursue a career in IM.

…then answer these questions on the right side.

Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!

  1. What is the research question that the study addressed? (1 sentence, ending in a question mark; can be copy-pasted verbatim)
  2. What type(s) of quantitative question was asked? (1 word)
  3. What was the main finding? Pick only the most important one, if there were multiple. (1 sentence or bullet; can be copy-pasted verbatim)
  4. Based on the abstract, to what extent did the study adequately answer the research question? Are you convinced that the main finding adequately answers the question? In other words, what was good/strong? (1–3 sentences or bullets)
  5. Based on the abstract, what are the limitations of the study (including possible sources of bias)? In what ways does/might the study fail to fully answer the research question? In other words, what was bad/weak? (1–3 sentences or bullets)
  6. Identify at least one positive or negative ethical implication of the way this study was conducted and/or its results? (1–3 sentences or bullets)
  7. How can the main finding now be applied to solve a real world problem, if at all? (1–2 sentences or bullets)
  8. Considering everything you know about the study just from reading the abstract, give the study a letter grade on the A–F scale. Write a one-sentence justification of this grade, if time permits. (One letter and one sentence)

If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.

Group D task (complete by 1:20 p.m.)

Group members: Kevin A., Melissa M., Maura P., Anne W.

First read this abstract on the left side…

Study: Akçapınar, G., et al. (2019). Using learning analytics to develop early-warning system for at-risk students. International Journal of Educational Technology in Higher Education, 16(1), 40.

Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.

Abstract

In the current study interaction data of students in an online learning setting was used to research whether the academic performance of students at the end of term could be predicted in the earlier weeks. The study was carried out with 76 second-year university students registered in a Computer Hardware course. The study aimed to answer two principle questions: which algorithms and features best predict the end of term academic performance of students by comparing different classification algorithms and pre-processing techniques and whether or not academic performance can be predicted in the earlier weeks using these features and the selected algorithm. The results of the study indicated that the kNN algorithm accurately predicted unsuccessful students at the end of term with a rate of 89%. When findings were examined regarding the analysis of data obtained in weeks 3, 6, 9, 12, and 14 to predict whether the end-of-term academic performance of students could be predicted in the earlier weeks, it was observed that students who were unsuccessful at the end of term could be predicted with a rate of 74% in as short as 3 weeks’ time. The findings obtained from this study are important for the determination of features for early warning systems that can be developed for online learning systems and as indicators of student success. At the same time, it will aid researchers in the selection of algorithms and pre-processing techniques in the analysis of educational data.

…then answer these questions on the right side.

Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!

  1. What is the research question that the study addressed? (1 sentence, ending in a question mark; can be copy-pasted verbatim)
  2. What type(s) of quantitative question was asked? (1 word)
  3. What was the main finding? Pick only the most important one, if there were multiple. (1 sentence or bullet; can be copy-pasted verbatim)
  4. Based on the abstract, to what extent did the study adequately answer the research question? Are you convinced that the main finding adequately answers the question? In other words, what was good/strong? (1–3 sentences or bullets)
  5. Based on the abstract, what are the limitations of the study (including possible sources of bias)? In what ways does/might the study fail to fully answer the research question? In other words, what was bad/weak? (1–3 sentences or bullets)
  6. Identify at least one positive or negative ethical implication of the way this study was conducted and/or its results? (1–3 sentences or bullets)
  7. How can the main finding now be applied to solve a real world problem, if at all? (1–2 sentences or bullets)
  8. Considering everything you know about the study just from reading the abstract, give the study a letter grade on the A–F scale. Write a one-sentence justification of this grade, if time permits. (One letter and one sentence)

If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.