#introduction

My project will be answering the question of how crime and prison were talked about in the early 90s. I chose this period because in 1994 a major piece of legislation was signed into order by president Bill Clinton. It was titled the 1994 Violent Crime and Control Act. In 1992 54% of Americans said there was more crime in their neighborhood today than there was a year ago(Gallup). Also in 1992 89% of Americans thought there was more crime in the US than there was a year ago(Gallup). This American concern lead to the creation of the 1994 Violent Crime and Control Act. This act among many things, incentivized states to increase policing, increase prison sentences, and increase number of prisons. This policy I argue as well as many others has lead to the mass incarceration we currently see in the United States. Mass incarceration has also not affected all equally. The black population is arrested and in prison at extremely disproportionate numbers compared to whites.

My focus for the prpject will be to use sentiment analysis to see how the media, specifically new york times, talked about the subject of crime and prison from 1990 - 1994. My overall hypothesis is that media outlets sensationalized crime and prison policy which lead to a mass worry about crime from US citizes which started the policy process of the 1994 Violent crime and control act.

pre processing

#removing odd numbers at the end of dates
prison90.t$date <- gsub('.{14}$','',prison90.t$date)

#removing the lead aspect thats been repeated in lead
prison90.t$lead <- gsub('LEAD:', '', prison90.t$lead)
#creating a corpus of the lead paragraph
corpus <- corpus(prison90.t$lead)
#making ids for each text
docid <- paste(prison90.t$date, prison90.t$byline)
docnames(corpus) <- docid
#removing punctuation, numbers, and symbols.
corpus.tokens <-tokens(corpus, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) 

corpus.tokens <- tokens_remove(corpus.tokens, pattern = stopwords('en'))


dfm.prison.90 <- dfm(corpus.tokens)

dfm.prison.90 <- dfm_remove(dfm.prison.90, c('one','two','three', 'four', 'five', 'six', 'seven', 'eight','nine', 'ten'))



library(quanteda.textplots)
set.seed(1)
textplot_wordcloud(dfm.prison.90)

# keyword crime
crime90.t$date <- gsub('.{14}$','',crime90.t$date)

#removing the lead aspect thats been repeated in lead
crime90.t$lead <- gsub('LEAD:', '', crime90.t$lead)
#creating a corpus of the lead paragraph
corpus.crime <- corpus(crime90.t$lead)
#making ids for each text
docid <- paste(crime90.t$date, crime90.t$byline)
docnames(corpus.crime) <- docid
#removing punctuation, numbers, and symbols.
corpus.tokens.crime <-tokens(corpus.crime, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) 

corpus.tokens.crime <- tokens_remove(corpus.tokens.crime, pattern = stopwords('en'))



dfm.crime.90 <- dfm(corpus.tokens.crime)

dfm.crime.90 <- dfm_remove(dfm.crime.90, c('one','two','three', 'four', 'five', 'six', 'seven', 'eight','nine', 'ten'))

textplot_wordcloud(dfm.crime.90)

Methods

For this part I am using the method of sentiment analysis with two different dictionaries. The first dictionary is based off Mohammad and Charron’s NRC Emotion and Sentiment Lexicons which categorize words based on their association with 8 emotions and also either positive or negative sentiment. Words are categorized into a numerical value which rates the degree to which it matches the emotion/sentiment. Higher number is a stronger association. My second dictionary uses Hu and Liu’s Lexicon of positive or negative word associations. I believe these methods will work well for my project because I am trying to asses how the media talks about and gives sentiment to the issues of crime and prison policy. These methods will allow me to understand the sentiment and to what degree this sentiment is to.

Results

There a few graphs that are shown below. These graphs show the anger, fear, and disgust levels for my crime data and my prison data, while also showing positive and negative sentiment. The positive and negative sentiment is showing a lot in those two graphs. The levels of negative compared to positive is very much telling of the negative views the media portrays. The level of difference too is very clear. Fear was an emotion that was prevelant in both crime and prison corpuses. Disgust and anger were both not as popular senttiments as I had imagined. Sadness was another emotion that seems to have a presense in these articles. While it is not suprising in the least that prison and crime are not viewed positively it is interesting to show it and show the specific emotions conveyed in these texts.

library(devtools)
## Loading required package: usethis
devtools::install_github("kbenoit/quanteda.dictionaries") 
## Skipping install of 'quanteda.dictionaries' from a github remote, the SHA1 (9b97367f) has not changed since last install.
##   Use `force = TRUE` to force installation
library(quanteda.dictionaries)
devtools::install_github("quanteda/quanteda.sentiment")
## Skipping install of 'quanteda.sentiment' from a github remote, the SHA1 (a2aca88b) has not changed since last install.
##   Use `force = TRUE` to force installation
library(quanteda.sentiment)
## 
## Attaching package: 'quanteda.sentiment'
## The following object is masked from 'package:quanteda':
## 
##     data_dictionary_LSD2015
library(ggplot2)


# Prison with NRC dictionary
review.sentiment.prison <- liwcalike(as.character(corpus), data_dictionary_NRC)


ggplot(review.sentiment.prison)+
  geom_histogram(aes(x=fear))+
  ggtitle('prison')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(review.sentiment.prison)+
  geom_histogram(aes(x=disgust))+
  ggtitle('prison')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(review.sentiment.prison)+
  geom_histogram(aes(x=anger))+
  ggtitle('prison')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(review.sentiment.prison)+
  geom_histogram(aes(x=sadness))+
  ggtitle('prison')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#crime with NRC dictionary

review.sentiment.crime <- liwcalike(corpus.crime, data_dictionary_NRC)

ggplot(review.sentiment.crime)+
  geom_histogram(aes(x= fear))+
  ggtitle('crime')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(review.sentiment.crime)+
  geom_histogram(aes(x= disgust))+
  ggtitle('crime')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(review.sentiment.crime)+
  geom_histogram(aes(x= anger))+
  ggtitle('crime')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(review.sentiment.crime)+
  geom_histogram(aes(x= sadness))+
  ggtitle('crime')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Crime with HuLiu Dictionary
sentiment.crime <- liwcalike(as.character(corpus.crime), data_dictionary_HuLiu)

ggplot(sentiment.crime)+
  geom_histogram(aes(x=positive))+
  ggtitle('crime')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(sentiment.crime)+
  geom_histogram(aes(x=negative))+
  ggtitle('crime')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#prison with HuLiu Dictionary

sentiment.prison <- liwcalike(as.character(corpus), data_dictionary_HuLiu)

ggplot(sentiment.prison)+
  geom_histogram(aes(x=positive))+
  ggtitle('prison')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(sentiment.prison)+
  geom_histogram(aes(x=negative))+
  ggtitle('prison')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.