# Install packages in console first if needed:
# install.packages("readr")
# install.packages("dplyr")
library(readr)
library(dplyr)
enroll <- read_csv("C:/Users/18545/Downloads/EconEnrollment.csv")
# View first few rows
head(enroll)
## # A tibble: 6 × 6
## Term coursenumber coursename instructor enrollment level
## <dbl> <dbl> <chr> <chr> <dbl> <dbl>
## 1 202001 101 Contemporary Economic Issues Urbancic, … NA 0
## 2 202001 201 Intro to Econ Analysis: Micro Stanford, … NA 0
## 3 202001 201 Intro to Econ Analysis: Micro Waddell, G… NA 0
## 4 202001 201 Intro to Econ Analysis: Micro Mitchell-N… NA 0
## 5 202001 201 Intro to Econ Analysis: Micro Ivanauskas… NA 0
## 6 202001 202 Intro to Econ Analysis: Macro Thompson, … NA 0
# Remove the current term (2020Q1) which has NA values
enroll <- enroll %>% filter(Term < 202001)
# Find average class size in each term
byTerm <- enroll %>%
group_by(Term) %>%
summarize(avg = mean(enrollment, na.rm = TRUE)) %>%
arrange(avg)
# Display results
byTerm
## # A tibble: 18 × 2
## Term avg
## <dbl> <dbl>
## 1 201901 59.7
## 2 201801 64.7
## 3 201902 69.1
## 4 201903 70.3
## 5 201803 71.3
## 6 201802 78.2
## 7 201701 80.9
## 8 201601 81.3
## 9 201703 82.1
## 10 201603 84.8
## 11 201402 88.1
## 12 201602 88.3
## 13 201503 91.4
## 14 201502 91.4
## 15 201702 94.6
## 16 201401 95.6
## 17 201501 97.7
## 18 201403 98.1
# Identify terms with smallest and largest class sizes
cat("\nSmallest average class size:\n")
##
## Smallest average class size:
print(head(byTerm, 1))
## # A tibble: 1 × 2
## Term avg
## <dbl> <dbl>
## 1 201901 59.7
cat("\nLargest average class size:\n")
##
## Largest average class size:
print(tail(byTerm, 1))
## # A tibble: 1 × 2
## Term avg
## <dbl> <dbl>
## 1 201403 98.1
Answer: The terms with the smallest and largest average class sizes are shown above. The smallest average class size occurred in Term 2.01901^{5} with an average of 59.69 students, and the largest occurred in Term 2.01403^{5} with an average of 98.13 students.
# Calculate overall average class size
overall_avg <- enroll %>%
summarize(avg_enrollment = mean(enrollment, na.rm = TRUE))
overall_avg
## # A tibble: 1 × 1
## avg_enrollment
## <dbl>
## 1 82.1
Answer: The average class size in the economics department over this time period was 82.11 students.
# Calculate average class size by course level
by_level <- enroll %>%
group_by(level) %>%
summarize(avg_enrollment = mean(enrollment, na.rm = TRUE)) %>%
arrange(level)
by_level
## # A tibble: 4 × 2
## level avg_enrollment
## <dbl> <dbl>
## 1 0 204.
## 2 1 75.9
## 3 2 53.8
## 4 3 12.4
Answer: The average class sizes by level are: - Level 0 (Intro): 204.44 students - Level 1 (Intermediate): 75.88 students - Level 2 (Masters): 53.82 students - Level 3 (PhD): 12.41 students
# Calculate weighted average by level
weighted_by_level <- enroll %>%
group_by(level) %>%
summarize(
unweighted_avg = mean(enrollment, na.rm = TRUE),
weighted_avg = weighted.mean(enrollment, enrollment, na.rm = TRUE),
total_students = sum(enrollment, na.rm = TRUE),
num_courses = n()
) %>%
arrange(level)
weighted_by_level
## # A tibble: 4 × 5
## level unweighted_avg weighted_avg total_students num_courses
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 0 204. 251. 26373 129
## 2 1 75.9 87.0 26939 355
## 3 2 53.8 65.5 10442 194
## 4 3 12.4 14.3 1439 116
Interpretation:
The weighted average tells us the average class size experienced by a typical student, while the unweighted average tells us the average size of a typical class.
Weighted average: Weights each class by its enrollment, so larger classes contribute more to the average. This represents the experience of a randomly selected student.
Unweighted average: Treats all classes equally regardless of size. This represents the average class from the instructor’s perspective.
Student vs. Faculty perspective:
Prospective students would prefer knowing the weighted average because it tells them what class size they’re most likely to experience. Since students are more likely to be in larger classes, the weighted average is higher and more representative of their actual experience.
Prospective faculty would prefer knowing the unweighted average because it tells them the typical class size they would teach. Each class they teach counts equally, regardless of enrollment.
The authors’ research question is: Can machine learning algorithms accurately infer criminality from facial images alone, without any subjective human judgment? They aim to test whether automated systems can distinguish between criminals and non-criminals based solely on ID photos.
The authors find that machine learning classifiers can achieve high accuracy in distinguishing criminals from non-criminals:
The authors also identify specific facial features that differ between groups, such as: - Distance between eye inner corners - Upper lip curvature - Angle from nose tip to mouth corners
They claim criminal faces show greater variation and dissimilarity compared to non-criminal faces.
As a graduate economics student, I would describe their methods as moderately accessible.
The basic concepts (machine learning classification, cross-validation) are standard and well-explained. However, the technical details of CNN architecture and feature extraction would require some computer science background. The statistical analysis and interpretation are accessible to economics students familiar with classification methods.
The photos are NOT comparable, which is a critical flaw:
Criminal photos: Official ID photos from public security departments in China. These are mugshots taken after conviction for crimes ranging from murder to fraud.
Non-criminal photos: Downloaded from the internet, including people from “a wide gamut of professions and social status, including waiters, construction workers, taxi drivers, doctors, lawyers and professors.”
Key problems: 1. Different sources (official government photos vs. internet) 2. Different contexts (post-arrest vs. voluntary) 3. Potential differences in photo quality, lighting, and facial expressions 4. The non-criminal group may be systematically different (e.g., people who post professional photos online)
The most striking difference in Figure 10 is that the “average” criminal face appears to have a more neutral or serious expression, while the “average” non-criminal face appears to be smiling slightly.
This difference in facial expression likely reflects the different contexts in which the photos were taken: - Criminals: mugshots taken in custody (serious, neutral expression) - Non-criminals: photos from internet, possibly professional headshots (slight smile)
This suggests the algorithm may be detecting photo type and facial expression rather than innate criminal features.
TRUE - You need to understand the methodology to assess whether conclusions are plausible.
Justification:
This paper is a perfect example. Without examining their methodology, the high accuracy rates (89.51%) seem impressive and might support their conclusions. However, understanding their methods reveals critical flaws:
The methodology reveals that their results don’t actually validate “automated face-induced inference on criminality” - they validate that machine learning can distinguish between mugshots and internet photos. Understanding the methodology shows why we should be skeptical of their conclusions, despite the seemingly high accuracy.
This demonstrates why methodological literacy is essential for evaluating research claims, especially on sensitive topics with potential for misuse.