Author: @HunterRatliff1
Published to: RPubs
Source Code: Available on Github

Although the High School camp had 100 registered students, a few were absent on either/both the first or the last day. For this reason, our dataset only contains 86 observations (i.e. students).

District, School, & Grade

Students came from 28 schools in 10 districts. The majority of students were in 11th grade, as depicted below:

Figure 01A

This may appear decieving, because it’s not clear that the majority of students came from AISD. This, however, is not the case:

Figure 01B

Figure 01C

Self-Identified Variables

A few lines about methods can go here… Shouldn’t be that complicated or long

Languages Spoken at Home

Students listed the languages spoken in their home in decreasing order of frequency. From these responses, five language variables were calculated:

ML - Multilingual household
PHLOTE - Primary home language other than english
NES - Non-english speaker, meaning meaning english is not spoken at all
L1 - Primary language
L2 - Secondary language

Figure 2

This can also be expressed in percentage format:

Households that only speak english: 40.00%
Households that don’t speak english whatsoever: 11.76%
Households that speak more than one language: 48.24% (with 53.66% of these households having a PHLOTE)

Gender Identification

Students were asked to select one or more boxes describing their gender identification. The options the were provided are as follows: Female, Male, and/or I’d prefer not answer. Additionally, students were allowed to write in their own responses. For the purpose of analysis, all respponses other than male or female have been categorized as Other.

Figure 3

Race/Ethnicity Identification

Students were asked to select one or more boxes describing their racial/ ethnic identification among the options White, Hispanic or Latina/o, Asian, American Indian or Alaska Native, and Native Hawaiian or Other Pacific Islander. Similar to the Gender identification question, respondants also were given the option to write-in responses or select I’d prefer not answer. Given the survey’s limitations on this question (e.g. combining the ethnicity question with the race question, the social construction of race) these data were simplified into simple factors using the following heuristic:

If the Hispanic/Latina ethnicity was selected, the response was categorized as Hispanic (any combo) regardless of race
If white, black, or Asian was the only response given, the response was categorized as White alone, Black alone, or Asian alone
All other responses (including write-ins) were aggregated into Other/2+ Race ID

The detailed Race/Ethnicity ID data are available in the “Demographics” table

Figure 4A

Figure 4B

Socioeconomic Indicator

Most measurements of socioeconomic status (SES) fall in to two categories: Proxy and composite measurements. Proxy measures measure only one dimension of SES, such as income, wealth, occupation, or educational attainment. They are most useful when the proxy is highly conserved, meaning that they explain a high degree of the variance in other dimensions of socioeconomic status or social mobility. Proxy measures are useful in their simplicity, but often fail to account for the full scope of SES. Composite indicators factor in multiple variables that influence SES by assigning weights to each dimension. This provides a more comprehensive indicator and simplifies multidimensional measures into a single ranking system.

We estimated students SES by asking about their parents’ educational attainment and occupation. We gathered these measurements for both parents, and by doing so could deduce other useful indicators (e.g. single parent households) for further consideration.

Educational Attainment

A common proxy for SES is the educational attainment of a students’ parents (Oakes). Educational attainment is of praticular interest for us because of the camp’s aims, so this proxy will serve as one of our primary indicators. Students were asked to identify the attainment of both parents individually, by selecting one of the following seven options and/or writing in their own response:

Some high school
High school graduate
Some college
Associate degree
Bachelor’s degree
Post graduate degree
Doctoral degree

A fair number of students chose to write in responses indicating less than Some high school (e.g. elementery school, middle school, third grade), so an additional Primary Education only category was added in order to capture these students.

Figure 5A

Comparing that to the 2014 census’ estimations of Travis County, you’ll see we’re pretty spot on:

Figure 5B

More importantly, we ought to consider that traditional applicants to medical school are disproportionally from families with high educational attainment and not relective of the general population. On account of this fact, it makes more sense to adopt a better measure of SES than education alone.

EO Scale

The Association of American Medical Colleges (AAMC) measues SES with a two-factor indicator comprising of the highest education (E) and occupation (O) of either parent. This scale categorizes students into five categories (EO-1 to EO-5) and has been demonstrated to be a robust method of identifying students from a socioeconomically disadvantaged background (Grbic et al., 2015; PMID: 25629949). More information about AAMC’s methodology can be found here, but the basic methodology is summariezed well in the picture below:

Source: Effective Practices for Using the AAMC Socioeconomic Status Indicators in Medical School Admissions

We implimented a similar classification scheme, with a few minor adjustments. First, unlike AAMC, we considered household with only one parent present. Additionally, if a student left the occupation blank AND the educational attainment was a Bachelor’s degree, we assigned a EO-score of 2.5. This seemed to be a fair compromise in order to avoid discarding these students’ responses and this adjustment only had an effect on two students once the other parent’s education + occupation was considered.

Figure 6A

Now compare that to the 2012 AMCAS applicant pool:

Figure 6B

Proprietary Scale

To do:

Explain what it is
Why it’s useful
Limitations

Figure 7

AP Coursework

To do:

What was measured
- Types of classes
- Taken/Planned
How it is presented
- Percentage of students by course
- Number of courses by student
Limitations of inference

By Course

Figure 8

It’s worth noting that these are percentage measuring samples of very different sizes (e.g. 56 campers were in the 11th grade while only 16 & 12 were in the 10th & 12th grade, respectively). A more detailed breakdown with the group sizes (n) is available in the table below:

Course	Grade Level	Have taken	Plan to take	n
Chemistry	12	75.00%	87.50%	8
Biology	12	71.43%	100.00%	7
Physics	12	70.00%	100.00%	10
Biology	11	57.41%	92.59%	54
Biology	10	53.33%	100.00%	15
Chemistry	11	43.40%	94.34%	53
Computer Science	10	33.33%	86.67%	15
Calclus	12	30.00%	80.00%	10
Computer Science	12	25.00%	50.00%	8
Computer Science	11	16.67%	33.33%	48
Statistics	12	12.50%	75.00%	8
Physics	11	7.69%	90.38%	52
Calclus	11	3.57%	92.86%	56
Statistics	11	0.00%	62.75%	51
Calclus	10	0.00%	92.86%	14
Chemistry	10	0.00%	100.00%	15
Physics	10	0.00%	100.00%	15
Statistics	10	0.00%	80.00%	10

A simplified version of this table can be generated if we collapse the Grade Level variable:

Course	Have taken	Plan to take	n
Biology	57.69%	94.87%	78
Chemistry	38.46%	94.87%	78
Computer Science	20.55%	47.95%	73
Physics	15.19%	93.67%	79
Calclus	6.10%	91.46%	82
Statistics	1.41%	66.20%	71

By Student

Now that we’ve explored the overall AP coursework of the students, let’s move on to another important question that we might have. How many AP courses has the average student already taken? This is highly dependent on thier grade level (because many AP courses aren’t offered until later in high school), so we’ll create histograms by each grade level:

Figure 9

As we’d expect, it appears that the number of AP courses (i.e. the count) seems to follow a Poisson-like distribution. Perhaps if I get around to it I might be able to create a negative binomial regression to model the number of AP courses taken…

Numerical Scales

Pursue a career

Students were asked to rate the above statement on a Likert-like scale from (1) Not very likely to (10) Very likely.

Q8. Likelihood to pursue a career in the health professions

Figure 10

Prepared for college

Q11. How prepared do you feel for taking college level science courses?

Students were asked to rate the above statement on a Likert-like scale from (1) Not very well prepared to (10) Very well prepared.

Figure 11

Knowledge of careers

Q12. How knowledgeable do you feel about potential career options in the health sciences?

Students were asked to rate the above statement on a Likert-like scale from (1) Not very knowledgeable to (10) Very knowledgeable.

Figure 12

Other variables

Text Responses

In your own words, why are you attending this camp?
Briefly describe your expectations for this year’s DMS Health Sciences?
What do you think you might want to do when you graduate college?

Factors

Favorite class/subject
- Why
If you’ve attended any other academic camps in the past 4 years, list them below:
- Camp name
- Academic field
- Date Attended
How did you become interested in attending this year’s Health Sciences Summer Camp? (all that apply)
- Teacher talked about it (46.51%)
- School Counselor recommendation (41.86%)
- Parents knew about it (11.63%)
- Other (fill in the blank)

License

--- LICENSE ---

Copyright (C) 2016 Hunter Ratliff

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

In the spirit of Reproducible Research, below is the information About the R Session at the time it was compiled:

devtools::session_info()
##  setting  value                       
##  version  R version 3.2.4 (2016-03-10)
##  system   x86_64, darwin13.4.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/Chicago             
##  date     2016-08-07                  
## 
##  package      * version     date      
##  assertthat     0.1         2013-12-06
##  cellranger     1.0.0       2015-06-20
##  chron          2.3-47      2015-06-24
##  colorspace     1.2-6       2015-03-11
##  curl           0.9.7       2016-04-10
##  data.table     1.9.6       2015-09-19
##  DBI            0.4-1       2016-05-08
##  devtools       1.11.1.9000 2016-05-10
##  digest         0.6.9       2016-01-08
##  dplyr        * 0.5.0       2016-06-24
##  evaluate       0.9         2016-04-29
##  foreign        0.8-66      2015-08-19
##  formatR        1.4         2016-05-09
##  formattable  * 0.1.7.1     2016-05-22
##  ggplot2      * 2.1.0       2016-05-03
##  ggthemes     * 3.0.2       2016-05-03
##  googlesheets * 0.2.0       2016-03-18
##  gtable         0.2.0       2016-02-26
##  haven          0.2.0       2015-04-09
##  highr          0.6         2016-05-09
##  htmltools      0.3.5       2016-03-21
##  htmlwidgets    0.6         2016-02-25
##  httpuv         1.3.3       2015-08-04
##  jsonlite       1.0         2016-07-01
##  knitr        * 1.13        2016-05-09
##  labeling       0.3         2014-08-23
##  lazyeval       0.2.0       2016-06-12
##  magrittr       1.5         2014-11-22
##  markdown       0.7.7       2015-04-22
##  memoise        1.0.0       2016-01-29
##  mime           0.5         2016-07-07
##  munsell        0.4.3       2016-02-13
##  openxlsx       3.0.0       2015-07-03
##  plyr           1.8.4       2016-06-08
##  purrr          0.2.2       2016-06-18
##  R6             2.1.2       2016-01-26
##  RColorBrewer   1.1-2       2014-12-07
##  Rcpp           0.12.6      2016-07-19
##  readODS        1.6.2       2016-03-09
##  readr          0.2.2       2015-10-22
##  readxl         0.1.1       2016-03-28
##  rio          * 0.4.0       2016-05-01
##  rmarkdown      0.9.6       2016-05-01
##  scales       * 0.4.0       2016-02-26
##  shiny          0.13.2.9003 2016-05-03
##  stringi        1.1.1       2016-05-27
##  stringr      * 1.0.0       2015-04-30
##  tibble         1.1         2016-07-04
##  tidyr        * 0.5.1       2016-06-14
##  urltools       1.4.0       2016-04-12
##  withr          1.0.1       2016-02-04
##  xml2           0.1.2       2015-09-01
##  xtable         1.8-2       2016-02-05
##  yaml           2.1.13      2014-06-12
##  source                                 
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.4)                         
##  CRAN (R 3.2.0)                         
##  cran (@0.4-1)                          
##  Github (hadley/devtools@46bcd74)       
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.5)                         
##  CRAN (R 3.2.5)                         
##  CRAN (R 3.2.4)                         
##  cran (@1.4)                            
##  Github (renkun-ken/formattable@9e6511d)
##  Github (hadley/ggplot2@59c503b)        
##  Github (jrnold/ggthemes@331d830)       
##  CRAN (R 3.2.4)                         
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.0)                         
##  cran (@0.6)                            
##  CRAN (R 3.2.4)                         
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.0)                         
##  cran (@1.0)                            
##  cran (@1.13)                           
##  CRAN (R 3.2.0)                         
##  cran (@0.2.0)                          
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.3)                         
##  cran (@0.5)                            
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.0)                         
##  cran (@1.8.4)                          
##  cran (@0.2.2)                          
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.0)                         
##  cran (@0.12.6)                         
##  CRAN (R 3.2.4)                         
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.4)                         
##  CRAN (R 3.2.5)                         
##  CRAN (R 3.2.5)                         
##  CRAN (R 3.2.3)                         
##  Github (rstudio/shiny@7e303b4)         
##  cran (@1.1.1)                          
##  CRAN (R 3.2.0)                         
##  cran (@1.1)                            
##  cran (@0.5.1)                          
##  CRAN (R 3.2.4)                         
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.0)                         
##  CRAN (R 3.2.3)                         
##  CRAN (R 3.2.0)

High School Demographics

Hunter Ratliff, @HunterRatliff1

August 1, 2016