The data file EconEnrollment.csv includes enrollment data for every economics course at the University of Oregon from 2014Q1 to 2020Q3. A description of the variables in the data is in the table below.
| Name | Description |
|---|---|
Term |
Quarter Code |
course number |
Number of course |
coursename |
Descriptive Name |
instructor |
Instructor |
enrollment |
Number of students in course |
level |
Course level (0 - Intro, 1 - Intermediate, 2 - Masters, 3 - PhD) |
In this problem, we will explore a couple of R’s “tidyverse” packages while analyzing this data.
# You can type your R code here.
library(readr, dplyr)
# Erase eval=FALSE. You will need to do this for every block of code.
# I had to put it there to show but not run the code.
# It is an option that tells Markdown to not run the code.
# Complete the commands below
setwd("~/Downloads/School Stuff/EC423")
EconEnrollment <- read_csv("EconEnrollment.csv")
## Rows: 840 Columns: 6
## ── Column specification ─────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): coursename, instructor
## dbl (4): Term, coursenumber, enrollment, level
##
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(EconEnrollment)
enroll <- EconEnrollment
‘dplyr’ is a popular package for cleaning data in R. You can use it to apply a sequence of functions to a dataframe. It returns the data with all of the functions applied in sequence. Dplyr uses “pipes” which are written as “%>%”.
For example, if you want to filter the data to only see courses taught by me, you can use the following code:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
DavisCourses <- enroll %>% filter(instructor=="Davis, Jon")
DavisCourses
## # A tibble: 8 x 6
## Term coursenumber coursename instructor enrollment level
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 202001 423 Econometrics Davis, Jon NA 2
## 2 201903 607 Experimental Econ Davis, Jon 13 3
## 3 201901 311 Inter Micro Theory Davis, Jon 75 1
## 4 201901 423 Econometrics Davis, Jon 25 2
## 5 201802 428 Behav and Exp Econ Davis, Jon 80 2
## 6 201802 607 Applied Behavioral Economics,… Davis, Jon 4 3
## 7 201801 311 Inter Micro Theory Davis, Jon 83 1
## 8 201801 311 Inter Micro Theory Davis, Jon 74 1
This shows that I have taught 8 classes in the dataset. My smallest class had 4 students and my largest class had 83 students. Note that the enrollment for this term is listed as NA.
Let’s drop the current term to make things easier.
enroll <- enroll %>% filter(Term<202001)
byTerm <- enroll %>% group_by(Term) %>% summarize(avg = mean(enrollment)) %>% arrange(avg)
byTerm
## # A tibble: 18 x 2
## Term avg
## <dbl> <dbl>
## 1 201901 59.7
## 2 201801 64.7
## 3 201902 69.1
## 4 201903 70.3
## 5 201803 71.3
## 6 201802 78.2
## 7 201701 80.9
## 8 201601 81.3
## 9 201703 82.1
## 10 201603 84.8
## 11 201402 88.1
## 12 201602 88.3
## 13 201503 91.4
## 14 201502 91.4
## 15 201702 94.6
## 16 201401 95.6
## 17 201501 97.7
## 18 201403 98.1
# the largest was spring term of 2014 and the smallest was fall term of 2019
total_avg <- enroll %>%
summarize(avg = mean(enrollment))
total_avg
## # A tibble: 1 x 1
## avg
## <dbl>
## 1 82.1
#The average class size was 82.11
byLevel <- enroll %>% group_by(level) %>% summarize(avg = mean(enrollment)) %>% arrange(avg)
byLevel
## # A tibble: 4 x 2
## level avg
## <dbl> <dbl>
## 1 3 12.4
## 2 2 53.8
## 3 1 75.9
## 4 0 204.
# For intro level, the average was 204.44, for intermediate, the average was 75.88, for masters, the average was 53.82, and for the phD program, the average was 12.41.
enroll %>% group_by(level) %>% summarize(avg2 = weighted.mean(enrollment, enrollment)) %>% arrange(avg2)
## # A tibble: 4 x 2
## level avg2
## <dbl> <dbl>
## 1 3 14.3
## 2 2 65.5
## 3 1 87.0
## 4 0 251.
# A prospective student would want to know the unweighted mean so the number appears lower, giving them the idea that the student to faculty ratio is lower and therefore making it more compelling to apply to UO. For a prospectuve faculty member, the weighted mean will show a higher number of students to teach, making the school seem more successful and prominent.
This question will be based on the paper “Automated Inference on Criminality using Face Images” by Xialoin Wu and Xi Zhang. The paper is posted on Canvas.
This paper attracted a lot of attention and stirred controversy when it was first posted in 2016. See for example this Vice article. Most of the media about the article focused on the ethics of predicting criminality. This question will help you assess how concerned you should be about the future of predicting criminality with only data on faces.
2.a What is the authors’ research question?
The author’s research question is whether certain AI facial recognition patterns are able to infer if you will have future criminal tendencies or not.
2.b What do the authors’ find? How accurate are their predictions?
The authors found that CNN had a 95.40%, SVM had a 93.03%, KNN had an 88.38% and LR had a 86.66% accuracy.
2.c As a student in a graduate economics course, would you describe their methods as accessible or inaccessible?
their methods are somewhat accessible because they used caution with their research, but it is also very difficult to disscet and understand, especially for those who do not undertsand what is really going on, making the data difficult to replicate.
2.d How do the authors collect their data? Are the photos of the criminals and non-criminals comparable?
Yes, they are comparable because they ensure that the photos are not mugshots, which would have made the data much more biased in terms of inciminating those in the photos.
2.e Look at Figure 10. What jumps out to you about the main difference between the “average” criminal’s face compared to the “average” non-criminal’s face?
the “average” criminal faces appear much more blurry than the “average” non-criminal faces.
2.f True/False/Uncertain. You need to understand the methodology of a paper to assess whether it’s conclusions are plausible. Justify your answer.
I think it is true to an extent, however the threshold of understanding has to be a certain level for it to be reasonable. There is a group of people out in the world who is not inteligent enough to interpret data and understand what they are reading, even in the most simple terms, but they should be able to accept certain things when they are peer reviewed and proven true through replication and whatnot because they do not possess the knowledge to do so. If anything is fact checked by multiple other groups or parties and proven true, then you should be able to decide the conclusions are plausible.