Albert Y. Kim
Wednesday 2016/2/16
We are not going to work off moodle anymore, rather
UCBAdmissions.xlsx in Excelrvest package in RStudioFrom the American Statistician (Cobb 2015): Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up
Cobb advocates “minimizing prerequisites to research”:
In the Humanities, students in a first course engage with original sources. You do not just prepare your students to read Austen; they read Austen. You do not just prepare students to hear Bach; they hear Bach.
In the context of this class: you need to be doing research QUICK!
Research here doesn't not mean necessary publishing in journals, but more simply looking at real data that's relevant to the student.
The University of California Berkeley was sued in 1973 for bias against women who had applied for admission to graduate schools. We consider the \( n=4526 \) people who applied to the 6 largest departments.
Admit Gender Dept Freq
1 Admitted Male A 512
2 Rejected Male A 313
3 Admitted Female A 89
4 Rejected Female A 19
5 Admitted Male B 353
6 Rejected Male B 207
We also consider data on tuition and financial aid from various institutions across the United States from the Washington Post.
We will explore two methods for loading data:
rvest package in RStudio package.xlsx file not only contains data, but also lots of metadata, i.e. data about data, that we don't need.CSV = No fluff, just stuff.
To load data in an Excel spreadsheet in RStudio, we need the values to be in tidy format:
We convert the UCBAdmissions.xlsx file to CSV format to eliminate the metadata:
.csv to the filename to Save As. i.e. your filename should be UCBAdmissions.csvWe load the CSV file into R.
UCBAdmissions.csvNote in your console panel R spits out the command to do this automatically. You can copy this line into your R scripts.
From the Google Docs menu bar (not your browser's menu bar):
The other way we'll load is by basic web-scraping via the rvest package in RStudio package. Run the following code in RStudio:
library(rvest)
webpage <- "http://apps.washingtonpost.com/g/page/local/college-grants-for-the-affluent/1526/"
wp_data <- webpage %>%
read_html() %>%
html_nodes("table") %>%
.[[1]] %>%
html_table()
View(wp_data)
The rvest package works by scraping HTML code used to make webpages. To view a webpage's raw HTML code:
The html_nodes() function looks for HTML tags.