This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. Check Lesson 1. Also check Lesson 15 and the PDF document rmarkdown-2.0 on Canvas.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
R Markdown can show the code, the result of the code, both, neither.
You can also embed plots, for example:
qplot(speed, dist, data=cars) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Note that the warning = FALSE
parameter was added to the
code chunk to prevent printing of the warnings.
It is also possible to prevent printing of the R code that generated
the plot by adding the echo = FALSE
parameter to the code
chunk:
You can show some calculations in your text if you do them in the
homework: for example, 5+5 equals 10, which is not calculated if you
leave spaces r 5+5
.
You can write equations as in LATEX.
\[ E=mc^2 \] Additional online resources: Yihui Xie’s knitr page and whole book, and cheatsheets.
Academic integrity is the pursuit of scholarly activity in an open, honest and responsible manner. Academic integrity is a basic guiding principle for all academic activity at The Pennsylvania State University, and all members of the University community are expected to act in accordance with this principle. Consistent with this expectation, the University’s Code of Conduct states that all students should act with personal integrity, respect other students’ dignity, rights and property, and help create and maintain an environment in which all can succeed through the fruits of their efforts.
Academic integrity includes a commitment by all members of the University community not to engage in or tolerate acts of falsification, misrepresentation or deception. Such acts of dishonesty violate the fundamental ethical principles of the University community and compromise the worth of work completed by others.
map("state", boundary=TRUE, col="black")
states <- map_data("state")
ggplot(data = states) +
geom_polygon(aes(x = long, y = lat, fill = region, group = group),
color = "white") + coord_fixed(1.3) + guides(fill="none")
head(state.x77, 15) ## Built-in state-level data
## Population Income Illiteracy Life Exp Murder HS Grad Frost Area
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
## Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65 51945
## California 21198 5114 1.1 71.71 10.3 62.6 20 156361
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166 103766
## Connecticut 3100 5348 1.1 72.48 3.1 56.0 139 4862
## Delaware 579 4809 0.9 70.06 6.2 54.6 103 1982
## Florida 8277 4815 1.3 70.66 10.7 52.6 11 54090
## Georgia 4931 4091 2.0 68.54 13.9 40.6 60 58073
## Hawaii 868 4963 1.9 73.60 6.2 61.9 0 6425
## Idaho 813 4119 0.6 71.87 5.3 59.5 126 82677
## Illinois 11197 5107 0.9 70.14 10.3 52.6 127 55748
## Indiana 5313 4458 0.7 70.88 7.1 52.9 122 36097
## Iowa 2861 4628 0.5 72.56 2.3 59.0 140 55941
usdata = data.frame(region=tolower(rownames(state.x77)), state.x77, stringsAsFactors = TRUE)
mapIncome <- ggplot(usdata, aes(map_id = region)) +
geom_map(aes(fill = Income), map = states) +
scale_fill_gradientn(colours=c("lightblue","darkblue")) +
expand_limits(x = states$long, y = states$lat) +
geom_map(aes(fill = Income), map = states)+
coord_fixed(1.3)
mapIncome
https://www.nobelprize.org/prizes/economic-sciences/2021/summary/ > The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2021 was divided, one half awarded to David Card “for his empirical contributions to labour economics”, the other half jointly to Joshua D. Angrist and Guido W. Imbens “for their methodological contributions to the analysis of causal relationships.”
Parts of the introduction of their Nobel lectures: Source
nobel <- "Causality in Econometrics: Choice vs. Chance. Knowledge of causal effects is of great importance for decision makers in government, firms, as well as individuals in their private lives. Inferring the values of these effects from observed data is often a major challenge when causal mechanisms are not fully understood. These challenges have motivated methodological research in multiple disciplines. This research got a major boost in the 1920s and 1930s, thanks to advances in the design and analysis of randomized experiments in statistics and, separately, methodological work on observational studies in econometrics.
More recently, in the late 1980s and early 1990s, there was a sharp increase in empirical and methodological research in economics, as well as other disciplines, with an explicit focus on estimating causal effects. A convergence of the statistical and econometric traditions has been a catalyst for this increase. More than thirty years later, causality is a thriving
area of study. Researchers from many disciplines, including economics, statistics, political science, psychology, epidemiology, computer science and other fields, bring new questions and different methodological perspectives to the discussion. Applications range widely from biomedical to social science, with interest coming from academic, government, and private
sector organizations.
In this lecture I discuss some of the themes of this field. Per the charge of the committee awarding the prize, this article focuses primarily on my contributions to the study of causality, but I shall place them in the context of the broader interdisciplinary literature. I start by discussing briefly some of the history of methods for causal inference in statistics and econometrics. I then discuss the credibility crisis in the 1980s that provided some of the motivation for the work that was recognized in the prize. After that I discuss some of my contributions to the causal inference literature. In that part of the paper, I will also add some background and color to the specific research I describe, discussing the origins and questions that motivated my collaborators and myself, as well as pivotal moments in my intellectual journey. I see this prize as a recognition of the importance of this general interdisciplinary enterprise and hope it further invigorates the field.
Empirical Strategies in Economics: Illuminating the Path from Cause to Effect. In a chapter in the Handbook of Labor Economics, Alan Krueger and I employed the phrase “empirical strategy” to describe econometric analysis of natural experiments like the one John Snow (1855) used to establish that cholera is a waterborne illness. The Handbook volume in question (Ashenfelter and Card, 1999) was edited by two of my Princeton Ph.D. thesis advisors, Orley Ashenfelter and David Card, leaders in the battle to bring empirical strategies like Snow’s into the econometric mainstream. Ashenfelter and Card’s quest for an empirical strategy that reliably captures the causal efects of government training programs inspired me and others at Princeton to explore the econometrics of program evaluation.
An empirical strategy for program or policy evaluation is a research plan that encompasses data collection, identifcation, and estimation. As Krueger and I used it, the term “identifcation” is shorthand for research design. The Prize I share with David Card and Guido Imbens recognizes the prominent role research design has come to play in modern economics. A randomized clinical trial (RCT) is the simplest and most powerful research design. Random assignment ensures that treatment and control groups are comparable in the absence of treatment, so diferences between them after random assignment refect only the treatment efect. Not surprisingly, though also not without resistance, RCTs have come to be both an aspiration and a benchmark for empirical strategies in economics.
This past October, I worried about what I should expect from the Economics Prize treatment efect. The spotlight and disruption accompanying the prize made me wonder how the Economics Prize celebrity might change life for the Angrist family. It soon dawned on me that the matter of how public recognition afects a scholar’s life is a simple causal question: the Economics Prize intervention is substantial, sudden, and wellmeasured; outcomes like health and wealth are easy to record. Although the Economics Prizes are probably not randomly
assigned, a compelling empirical strategy for the Economics Prize treatment efect comes to mind, at least as a fight of empirical fancy."
nobelClean <- nobel %>%
tolower() %>%
removeWords("’") %>% # curly apostrophe causing trouble
removeWords("…") %>% # … causing trouble
removeWords(stopwords("en")) %>%
removePunctuation() %>%
removeNumbers()
wordcloud(nobelClean, scale=c(2,0.5), max.words=200, random.order=FALSE,
rot.per=0.35, use.r.layout=FALSE, colors=brewer.pal(6, "Dark2"))
nobelClean_tokens <- tokens(nobelClean)
nobelClean_dfm <- dfm(nobelClean_tokens)
causal_counts <- nobelClean_dfm[, c("causal", "causality", "treatment", "design", "methodological", "econometrics", "econometric", "statistics", "empirical", "strategy")]
causal_counts
## Document-feature matrix of: 1 document, 10 features (0.00% sparse) and 0 docvars.
## features
## docs causal causality treatment design methodological econometrics
## text1 7 3 5 4 4 4
## features
## docs econometric statistics empirical strategy
## text1 3 3 9 4
muskfile <- read.csv("https://query.data.world/s/yusehiqh3mj4usgmbk4bd3nkquresv", header=TRUE, stringsAsFactors=FALSE);
musk <- data.frame(date = muskfile$created_at, tweet = as.character(muskfile$text), stringsAsFactors = FALSE)
musk$tweet[1:20]
## [1] "b'And so the robots spared humanity ... https://t.co/v7JUJQWfCv'"
## [2] "b\"@ForIn2020 @waltmossberg @mims @defcon_5 Exactly. Tesla is absurdly overvalued if based on the past, but that's irr\\xe2\\x80\\xa6 https://t.co/qQcTqkzgMl\""
## [3] "b'@waltmossberg @mims @defcon_5 Et tu, Walt?'"
## [4] "b'Stormy weather in Shortville ...'"
## [5] "b\"@DaveLeeBBC @verge Coal is dying due to nat gas fracking. It's basically dead.\""
## [6] "b\"@Lexxxzis It's just a helicopter in helicopter's clothing\""
## [7] "b\"@verge It won't matter\""
## [8] "b'@SuperCoolCube Pretty good'"
## [9] "b\"Why did we waste so much time developing silly rockets? Damn you, aliens! So obtuse! You have all this crazy tech, but can't speak English!?\""
## [10] "b'Technology breakthrough: turns out chemtrails are actually a message from time-traveling aliens describing the secret of teleportation'"
## [11] "b\"RT @OpenAI: We've created the world's first Spam-detecting AI trained entirely in simulation and deployed on a physical robot: https://t.co\\xe2\\x80\\xa6\""
## [12] "b'RT @ProfBrianCox: This is extremely important from @elonmusk and @SpaceX - reusable rockets bring us MUCH closer to becoming a spacefaring\\xe2\\x80\\xa6'"
## [13] "b'@adamsbj Def P100D with Ludicrous+, although the rocket starts going a lot faster after that'"
## [14] "b'@BadAstronomer We can def bring it back like Dragon. Just a question of how much weight we need to add.'"
## [15] "b'@tesla_addict @TeslaMotors Working on it'"
## [16] "b\"@jasonlamb Looks like it could do 20% more with some structural upgrades to handle higher loads. But that's in fully expendable mode.\""
## [17] "b'@cheron A lot'"
## [18] "b'@Cardoso Silliest thing we can imagine! Secret payload of 1st Dragon flight was a giant wheel of cheese. Inspired b\\xe2\\x80\\xa6 https://t.co/68nMJkiPsC'"
## [19] "b'@redletterdave Good point, odds go from 0% to >0% :)'"
## [20] "b'Falcon Heavy test flight currently scheduled for late summer'"
feelings <- get_nrc_sentiment(musk$tweet)
head(feelings, 20)
## anger anticipation disgust fear joy sadness surprise trust negative positive
## 1 0 0 0 0 1 0 0 1 0 1
## 2 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0
## 5 1 1 1 2 0 1 0 0 2 0
## 6 0 0 0 0 0 0 0 0 0 0
## 7 0 1 0 1 0 0 0 0 1 0
## 8 0 1 0 0 1 0 0 1 0 1
## 9 2 1 2 1 1 1 0 0 5 0
## 10 0 1 0 0 0 0 0 1 0 1
## 11 0 0 0 0 0 0 0 0 0 0
## 12 0 0 0 0 0 0 0 1 0 1
## 13 1 0 0 0 0 0 0 0 1 0
## 14 0 1 1 2 1 1 1 1 1 2
## 15 0 0 0 0 0 0 0 0 0 1
## 16 0 0 0 0 0 0 0 2 0 1
## 17 0 0 0 0 0 0 0 0 0 0
## 18 0 0 0 2 1 0 1 2 0 1
## 19 0 1 0 0 1 0 1 1 0 1
## 20 0 0 0 0 0 1 0 0 1 0
# Merge tweet data and feelings matrix
musk <- cbind(musk, feelings)
# Collapse data to monthly:
musk$month <- format(as.Date(musk$date), "%Y-%m")
muskMonthly <- musk %>%
group_by(month) %>%
summarize(sumNeg = sum(negative), sumPos = sum(positive))
# Compute a sentiment score for each month "positive/negative"
muskMonthly$positivity_ratio = muskMonthly$sumPos / muskMonthly$sumNeg
# Make dates look better:
muskMonthly$month <- as.Date(paste(muskMonthly$month, "01", sep="-"))
ggplot(muskMonthly, aes(x=month, y=positivity_ratio)) + geom_line()