Author: @HunterRatliff1
Published to: RPubs
Source Code: Available on Github
Although the High School camp had 100 registered students, a few were absent on either/both the first or the last day. For this reason, our dataset only contains 86 observations (i.e. students).
Students came from 28 schools in 10 districts. The majority of students were in 11th grade, as depicted below:
Figure 01A
This may appear decieving, because it’s not clear that the majority of students came from AISD. This, however, is not the case:
Figure 01B
Figure 01C
A few lines about methods can go here… Shouldn’t be that complicated or long
Students listed the languages spoken in their home in decreasing order of frequency. From these responses, five language variables were calculated:
Figure 2
This can also be expressed in percentage format:
40.00%11.76%48.24% (with 53.66% of these households having a PHLOTE)Students were asked to select one or more boxes describing their gender identification. The options the were provided are as follows: Female, Male, and/or I’d prefer not answer. Additionally, students were allowed to write in their own responses. For the purpose of analysis, all respponses other than male or female have been categorized as Other.
Figure 3
Students were asked to select one or more boxes describing their racial/ ethnic identification among the options White, Hispanic or Latina/o, Asian, American Indian or Alaska Native, and Native Hawaiian or Other Pacific Islander. Similar to the Gender identification question, respondants also were given the option to write-in responses or select I’d prefer not answer. Given the survey’s limitations on this question (e.g. combining the ethnicity question with the race question, the social construction of race) these data were simplified into simple factors using the following heuristic:
Hispanic (any combo) regardless of raceWhite alone, Black alone, or Asian aloneOther/2+ Race IDThe detailed Race/Ethnicity ID data are available in the “Demographics” table
Figure 4A
Figure 4B
Most measurements of socioeconomic status (SES) fall in to two categories: Proxy and composite measurements. Proxy measures measure only one dimension of SES, such as income, wealth, occupation, or educational attainment. They are most useful when the proxy is highly conserved, meaning that they explain a high degree of the variance in other dimensions of socioeconomic status or social mobility. Proxy measures are useful in their simplicity, but often fail to account for the full scope of SES. Composite indicators factor in multiple variables that influence SES by assigning weights to each dimension. This provides a more comprehensive indicator and simplifies multidimensional measures into a single ranking system.
We estimated students SES by asking about their parents’ educational attainment and occupation. We gathered these measurements for both parents, and by doing so could deduce other useful indicators (e.g. single parent households) for further consideration.
A common proxy for SES is the educational attainment of a students’ parents (Oakes). Educational attainment is of praticular interest for us because of the camp’s aims, so this proxy will serve as one of our primary indicators. Students were asked to identify the attainment of both parents individually, by selecting one of the following seven options and/or writing in their own response:
A fair number of students chose to write in responses indicating less than Some high school (e.g. elementery school, middle school, third grade), so an additional Primary Education only category was added in order to capture these students.
Figure 5A
Comparing that to the 2014 census’ estimations of Travis County, you’ll see we’re pretty spot on:
Figure 5B
More importantly, we ought to consider that traditional applicants to medical school are disproportionally from families with high educational attainment and not relective of the general population. On account of this fact, it makes more sense to adopt a better measure of SES than education alone.
The Association of American Medical Colleges (AAMC) measues SES with a two-factor indicator comprising of the highest education (E) and occupation (O) of either parent. This scale categorizes students into five categories (EO-1 to EO-5) and has been demonstrated to be a robust method of identifying students from a socioeconomically disadvantaged background (Grbic et al., 2015; PMID: 25629949). More information about AAMC’s methodology can be found here, but the basic methodology is summariezed well in the picture below:
We implimented a similar classification scheme, with a few minor adjustments. First, unlike AAMC, we considered household with only one parent present. Additionally, if a student left the occupation blank AND the educational attainment was a Bachelor’s degree, we assigned a EO-score of 2.5. This seemed to be a fair compromise in order to avoid discarding these students’ responses and this adjustment only had an effect on two students once the other parent’s education + occupation was considered.
Figure 6A
Now compare that to the 2012 AMCAS applicant pool:
Figure 6B
To do:
Figure 7
To do:
Figure 8
It’s worth noting that these are percentage measuring samples of very different sizes (e.g. 56 campers were in the 11th grade while only 16 & 12 were in the 10th & 12th grade, respectively). A more detailed breakdown with the group sizes (n) is available in the table below:
| Course | Grade Level | Have taken | Plan to take | n |
|---|---|---|---|---|
| Chemistry | 12 | 75.00% | 87.50% | 8 |
| Biology | 12 | 71.43% | 100.00% | 7 |
| Physics | 12 | 70.00% | 100.00% | 10 |
| Biology | 11 | 57.41% | 92.59% | 54 |
| Biology | 10 | 53.33% | 100.00% | 15 |
| Chemistry | 11 | 43.40% | 94.34% | 53 |
| Computer Science | 10 | 33.33% | 86.67% | 15 |
| Calclus | 12 | 30.00% | 80.00% | 10 |
| Computer Science | 12 | 25.00% | 50.00% | 8 |
| Computer Science | 11 | 16.67% | 33.33% | 48 |
| Statistics | 12 | 12.50% | 75.00% | 8 |
| Physics | 11 | 7.69% | 90.38% | 52 |
| Calclus | 11 | 3.57% | 92.86% | 56 |
| Statistics | 11 | 0.00% | 62.75% | 51 |
| Calclus | 10 | 0.00% | 92.86% | 14 |
| Chemistry | 10 | 0.00% | 100.00% | 15 |
| Physics | 10 | 0.00% | 100.00% | 15 |
| Statistics | 10 | 0.00% | 80.00% | 10 |
A simplified version of this table can be generated if we collapse the
Grade Levelvariable:
| Course | Have taken | Plan to take | n |
|---|---|---|---|
| Biology | 57.69% | 94.87% | 78 |
| Chemistry | 38.46% | 94.87% | 78 |
| Computer Science | 20.55% | 47.95% | 73 |
| Physics | 15.19% | 93.67% | 79 |
| Calclus | 6.10% | 91.46% | 82 |
| Statistics | 1.41% | 66.20% | 71 |
Now that we’ve explored the overall AP coursework of the students, let’s move on to another important question that we might have. How many AP courses has the average student already taken? This is highly dependent on thier grade level (because many AP courses aren’t offered until later in high school), so we’ll create histograms by each grade level:
Figure 9
As we’d expect, it appears that the number of AP courses (i.e. the count) seems to follow a Poisson-like distribution. Perhaps if I get around to it I might be able to create a negative binomial regression to model the number of AP courses taken…
Students were asked to rate the above statement on a Likert-like scale from (1) Not very likely to (10) Very likely.
Q8. Likelihood to pursue a career in the health professions
Figure 10
Q11. How prepared do you feel for taking college level science courses?
Students were asked to rate the above statement on a Likert-like scale from (1) Not very well prepared to (10) Very well prepared.
Figure 11
Q12. How knowledgeable do you feel about potential career options in the health sciences?
Students were asked to rate the above statement on a Likert-like scale from (1) Not very knowledgeable to (10) Very knowledgeable.
Figure 12
46.51%)41.86%)11.63%)--- LICENSE ---
Copyright (C) 2016 Hunter Ratliff
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
In the spirit of Reproducible Research, below is the information About the R Session at the time it was compiled:
devtools::session_info()
## setting value
## version R version 3.2.4 (2016-03-10)
## system x86_64, darwin13.4.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/Chicago
## date 2016-08-07
##
## package * version date
## assertthat 0.1 2013-12-06
## cellranger 1.0.0 2015-06-20
## chron 2.3-47 2015-06-24
## colorspace 1.2-6 2015-03-11
## curl 0.9.7 2016-04-10
## data.table 1.9.6 2015-09-19
## DBI 0.4-1 2016-05-08
## devtools 1.11.1.9000 2016-05-10
## digest 0.6.9 2016-01-08
## dplyr * 0.5.0 2016-06-24
## evaluate 0.9 2016-04-29
## foreign 0.8-66 2015-08-19
## formatR 1.4 2016-05-09
## formattable * 0.1.7.1 2016-05-22
## ggplot2 * 2.1.0 2016-05-03
## ggthemes * 3.0.2 2016-05-03
## googlesheets * 0.2.0 2016-03-18
## gtable 0.2.0 2016-02-26
## haven 0.2.0 2015-04-09
## highr 0.6 2016-05-09
## htmltools 0.3.5 2016-03-21
## htmlwidgets 0.6 2016-02-25
## httpuv 1.3.3 2015-08-04
## jsonlite 1.0 2016-07-01
## knitr * 1.13 2016-05-09
## labeling 0.3 2014-08-23
## lazyeval 0.2.0 2016-06-12
## magrittr 1.5 2014-11-22
## markdown 0.7.7 2015-04-22
## memoise 1.0.0 2016-01-29
## mime 0.5 2016-07-07
## munsell 0.4.3 2016-02-13
## openxlsx 3.0.0 2015-07-03
## plyr 1.8.4 2016-06-08
## purrr 0.2.2 2016-06-18
## R6 2.1.2 2016-01-26
## RColorBrewer 1.1-2 2014-12-07
## Rcpp 0.12.6 2016-07-19
## readODS 1.6.2 2016-03-09
## readr 0.2.2 2015-10-22
## readxl 0.1.1 2016-03-28
## rio * 0.4.0 2016-05-01
## rmarkdown 0.9.6 2016-05-01
## scales * 0.4.0 2016-02-26
## shiny 0.13.2.9003 2016-05-03
## stringi 1.1.1 2016-05-27
## stringr * 1.0.0 2015-04-30
## tibble 1.1 2016-07-04
## tidyr * 0.5.1 2016-06-14
## urltools 1.4.0 2016-04-12
## withr 1.0.1 2016-02-04
## xml2 0.1.2 2015-09-01
## xtable 1.8-2 2016-02-05
## yaml 2.1.13 2014-06-12
## source
## CRAN (R 3.2.0)
## CRAN (R 3.2.0)
## CRAN (R 3.2.0)
## CRAN (R 3.2.0)
## CRAN (R 3.2.4)
## CRAN (R 3.2.0)
## cran (@0.4-1)
## Github (hadley/devtools@46bcd74)
## CRAN (R 3.2.3)
## CRAN (R 3.2.5)
## CRAN (R 3.2.5)
## CRAN (R 3.2.4)
## cran (@1.4)
## Github (renkun-ken/formattable@9e6511d)
## Github (hadley/ggplot2@59c503b)
## Github (jrnold/ggthemes@331d830)
## CRAN (R 3.2.4)
## CRAN (R 3.2.3)
## CRAN (R 3.2.0)
## cran (@0.6)
## CRAN (R 3.2.4)
## CRAN (R 3.2.3)
## CRAN (R 3.2.0)
## cran (@1.0)
## cran (@1.13)
## CRAN (R 3.2.0)
## cran (@0.2.0)
## CRAN (R 3.2.0)
## CRAN (R 3.2.0)
## CRAN (R 3.2.3)
## cran (@0.5)
## CRAN (R 3.2.3)
## CRAN (R 3.2.0)
## cran (@1.8.4)
## cran (@0.2.2)
## CRAN (R 3.2.3)
## CRAN (R 3.2.0)
## cran (@0.12.6)
## CRAN (R 3.2.4)
## CRAN (R 3.2.0)
## CRAN (R 3.2.4)
## CRAN (R 3.2.5)
## CRAN (R 3.2.5)
## CRAN (R 3.2.3)
## Github (rstudio/shiny@7e303b4)
## cran (@1.1.1)
## CRAN (R 3.2.0)
## cran (@1.1)
## cran (@0.5.1)
## CRAN (R 3.2.4)
## CRAN (R 3.2.3)
## CRAN (R 3.2.0)
## CRAN (R 3.2.3)
## CRAN (R 3.2.0)