Anshul Kumar
12:45–2:30 p.m. on Oct 6 2020 (MGHIHP HE942)
Click on Hide Toolbars in the bottom-right corner, to make this document more readable, if you want and if you see that option.
Link to these slides: https://rpubs.com/anshulkumar/he942quantworkshop2020
Identify and distinguish the types of HPEd research questions that can be answered using quantitative methods. Review types of methods that can be used to answer the RQs.
Organize data in a spreadsheet or database to use these quantitative methods appropriately.
Use examples of quantitative studies to review, critique, and evaluate the entire process of asking a question, gathering data, data analysis, interpreting what the results can and cannot tell us, reporting results responsibly, identifying limitations and ethical concerns, and taking action.
mcat) and hours spent studying (study)?mcat) and hours spent studying (study)?mcat and study.mcat and study make sense?No.
You should score better if you study more. But we see the opposite on the graph.
mcat and study?We just saw an example of Simpson’s Paradox: The initial trend that we see in the data is false because we fail to look at the group level. Once we look at the group level, we see a completely different trend.
In this case, the groups were classrooms.
Simpson’s Paradox is at play any time we have data that is clustered into separate groups. We must correct for clustering when we try to fit a line to the data.
| study | mcat | classroom |
|---|---|---|
| 50 | 77 | A |
| 53 | 76 | A |
| 55 | 77 | A |
| 53 | 78 | A |
| 57 | 79 | A |
| 60 | 73 | B |
| 63 | 74 | B |
| 65 | 73 | B |
| 63 | 75 | B |
| 67 | 75 | B |
| 70 | 63 | C |
| 73 | 64 | C |
| 75 | 65 | C |
| 73 | 66 | C |
| 77 | 70 | C |
fit1 <- lm(mcat ~ study, data = prep) # trend line, ignore classrooms
fit2 <- lm(mcat ~ study + classroom, data = prep) # trend line with classrooms
prep$fit1pred = predict(fit1) # extract information from 1st trend line
prep$fit2pred = predict(fit2) # extract information from 2nd trend line
ggplot(prep, aes(x = study, y = mcat, color = classroom)) + # make axes
geom_point() + # add points
geom_line(aes(y = fit2pred), size = 1) + # add 2nd line with colors
geom_line(aes(y = fit1pred), size = 1, color = 'black') # add 1st line in blackWhat is the relationship between MCAT score (mcat) and hours spent studying (study)?
What is the relationship between Y and X in [some population]?
Fall 2019 students
| classroom | study | mcat |
|---|---|---|
| A | 50 | 77 |
| A | 53 | 76 |
| A | 55 | 77 |
| A | 53 | 78 |
| A | 57 | 79 |
| B | 60 | 73 |
| B | 63 | 74 |
| B | 65 | 73 |
| B | 63 | 75 |
| B | 67 | 75 |
| C | 70 | 63 |
| C | 73 | 64 |
| C | 75 | 65 |
| C | 73 | 66 |
| C | 77 | 70 |
Question: Given patterns we find in the student data from 2019 on the left, what are the unknown MCAT scores for the current 2020 students on the right?
Fall 2020 students
| classroom | study | mcat |
|---|---|---|
| A | 45 | ? |
| A | 57 | ? |
| A | 46 | ? |
| A | 60 | ? |
| A | 57 | ? |
| B | 70 | ? |
| B | 64 | ? |
| B | 68 | ? |
| B | 61 | ? |
| B | 69 | ? |
| C | 68 | ? |
| C | 80 | ? |
| C | 78 | ? |
| C | 74 | ? |
| C | 74 | ? |
Given patterns we find in the student data from 2019, what are the unknown MCAT scores for the students in 2020?
Given patterns we find in complete data, what are predicted outcomes for incomplete data?
Question: What are the mean and standard deviation of the number of hours students study and of their MCAT scores?
Student Data
| classroom | study | mcat |
|---|---|---|
| A | 50 | 77 |
| A | 53 | 76 |
| A | 55 | 77 |
| A | 53 | 78 |
| A | 57 | 79 |
| B | 60 | 73 |
| B | 63 | 74 |
| B | 65 | 73 |
| B | 63 | 75 |
| B | 67 | 75 |
| C | 70 | 63 |
| C | 73 | 64 |
| C | 75 | 65 |
| C | 73 | 66 |
| C | 77 | 70 |
We focus heavily on inference, meaning the process of using a sample of people or things to answer a question about a much larger population of those people or things that we actually want to study.
You may be familiar with terms such as confidence intervals and p-values. These tell us how certain or uncertain we can be about the answer to our research question in the entire population based on our analysis of our sample.
What are the mean and standard deviation of the number of hours students study and of their MCAT scores?
How much of an outcome we care about is happening in our sample and/or population? How much does this outcome vary? How is this outcome distributed throughout our sample and/or population?
| Question type | Specific Example | Generic form |
|---|---|---|
| Associative | What is the relationship between MCAT score and hours spent studying? | What is the relationship between Y and X in [some population]? |
| Predictive | Given patterns we find in the student data from 2019, what are the unknown MCAT scores for the students in 2020? | Given patterns we find in complete data, what are predicted outcomes for incomplete data? |
| Descriptive | What are the mean and standard deviation of the number of hours students study and of their MCAT scores? | How much of an outcome we care about is happening in our sample and/or population? How much does this outcome vary? How is this outcome distributed throughout our sample and/or population? |
For 20 minutes now you will…
Then…
Small group rosters
| A | B | C | D |
|---|---|---|---|
| Egide A. | Nuha E. | Maria B. | Kevin A. |
| Robin K. | Hani L. | Paul L. | Melissa M. |
| Alex M. | Cynthia M. | Ann M. | Maura P. |
| Dawn W. | Maria S.P. | Kela R. | Anne W. |
OPEN these slides on your computer:
Click on HIDE TOOLBARS in the bottom-right corner for better viewing.
Group members: Egide A., Robin K., Alex M., Dawn W.
First read this abstract on the left side…
Study: Winkler-Schwartz, A., et al. (2019). Machine learning identification of surgical and operative factors associated with surgical expertise in virtual reality simulation. JAMA network open, 2(8), e198363-e198363.
Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.
Key points and abstract
…then answer these questions on the right side.
Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at akumar@mghihp.edu in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!
If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.
Group members: Nuha E., Hani L., Cynthia M., Maria S.P.
First read this abstract on the left side…
Study: Dyrbye, L. N., et al. (2019). Effect of a professional coaching intervention on the well-being and distress of physicians: a pilot randomized clinical trial. JAMA internal medicine, 179(10), 1406-1414.
Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.
Key points and abstract
…then answer these questions on the right side.
Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at akumar@mghihp.edu in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!
If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.
Group members: Maria B., Paul L., Ann M., Kela R.
First read this abstract on the left side…
Study: Hauer, K. E., et al. (2008). Factors associated with medical students’ career choices regarding internal medicine. Jama, 300(10), 1154-1164.
Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.
Abstract
…then answer these questions on the right side.
Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at akumar@mghihp.edu in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!
If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.
Group members: Kevin A., Melissa M., Maura P., Anne W.
First read this abstract on the left side…
Study: Akçapınar, G., et al. (2019). Using learning analytics to develop early-warning system for at-risk students. International Journal of Educational Technology in Higher Education, 16(1), 40.
Everything you need to read is below on this slide. You don’t need to click on anything or go anywhere. Do not look at the main text of the published article.
Abstract
In the current study interaction data of students in an online learning setting was used to research whether the academic performance of students at the end of term could be predicted in the earlier weeks. The study was carried out with 76 second-year university students registered in a Computer Hardware course. The study aimed to answer two principle questions: which algorithms and features best predict the end of term academic performance of students by comparing different classification algorithms and pre-processing techniques and whether or not academic performance can be predicted in the earlier weeks using these features and the selected algorithm. The results of the study indicated that the kNN algorithm accurately predicted unsuccessful students at the end of term with a rate of 89%. When findings were examined regarding the analysis of data obtained in weeks 3, 6, 9, 12, and 14 to predict whether the end-of-term academic performance of students could be predicted in the earlier weeks, it was observed that students who were unsuccessful at the end of term could be predicted with a rate of 74% in as short as 3 weeks’ time. The findings obtained from this study are important for the determination of features for early warning systems that can be developed for online learning systems and as indicators of student success. At the same time, it will aid researchers in the selection of algorithms and pre-processing techniques in the analysis of educational data.
…then answer these questions on the right side.
Answer the following questions about this study, based on ONLY reading the text to the left. E-mail your answers to Anshul at akumar@mghihp.edu in a well-formatted message, as soon as you are done with the activity. Your answers will be shared with the rest of the class. Please adhere to the word limits or you will be penalized!
If you are not able to answer one or more questions above just by reading the abstract of the study, explain what information you would need in order to answer the question.