Introduction

To analyze data science sentiment, our team has chosen to focus in on the West Coast, specifically California, Oregon, and Washington state. The Los Angeles Times and USAtoday were selected from each state to analyze their sentiments towards data science. We chose to analyze the West Coast due to the large tech influence in the area, ranging from Silicon Valley in California, to major corporations such as Google and Amazon in Seattle. Below, we discuss the trends we found between each state and periodical.

Washington

Our first sentiment analysis is for Washington state. Most articles revolved around Seattle, which was expected due to the location of the large tech companies.

## 
## negative positive 
##       14       20
## 
##        anger anticipation      disgust         fear          joy     negative 
##            9           25            1            9           15           23 
##     positive      sadness     surprise        trust 
##           69            7            4           48

Overall, a generally positive sentiment was expressed for Washington, with an emphasis on joy and trust, showing how they are comfortable with data science as a part of their daily industry. Anticipation was also a strongly expressed sentiment; this could be explained as excitement for new breakthroughs in the data science industry. The sentiment range was stronger in the positvely correlated region, as shown in the plot below.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Common words expressed from Washington included: listings, labor, job, etc. A lot of the words emphasized an industrialized, working environment, showing a focus on the physical job and labor element of data science. This can be explained by the growing presence of corporations.

Oregon

Next, we analyzed Oregon sentiment. In Portland, Oregon, a rapidly growing community is known as the “Silicon Forest”, with a cluster of high-tech companies in the metropolitan area. Since this area is still developing, we wanted to closely monitor the sentiment to predict the future of the area.

Sentiment in Oregon appeared to be evenly distributed between positive and negative words, with anger and fear being strongly expressed sentiments.

## 
## negative positive 
##        9        9
## 
##        anger anticipation      disgust         fear          joy     negative 
##           12            7            4           12            4           18 
##     positive      sadness     surprise        trust 
##           33            9            2           22

Despite this, further sentiment analysis has revealed a relatively more positive sway. Still, Oregon expresses a more negative sentiment than Washington.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Word frequencies revealed an interesting and unexpected trend in Oregon. Large societal words such as crime, hate, and police have appeared, indicating that data science in that area may be focused on crime analysis due to Oregons higher crime occurrences.

California

Finally, we took a look at California sentiment. An overly positive sentiment was received, with over double of the words being positive. Anticipation and trust were observed as sentiments, which was similar to what we observed in Washington. This can be explained by the presence of Silicon Valley and the large amount of tech innovation in the area.

## 
## negative positive 
##       20       41
## 
##        anger anticipation      disgust         fear          joy     negative 
##           10           38            3           16           38           26 
##     positive      sadness     surprise        trust 
##          117           10           14           65

California expressed a stronger positive sentiment than the other states on the west coast.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Analysis of the word cloud revealed a focus on innovative words, such as science, computer, data, as well as more societal terms such as underrepresented, communities, diversity. This indicates a focus on creating diversity within the workplace and community.

tf_idf Analysis

tf_idf analysis was used to identify the most common words present, as well as their relevance. Below are the most commonly used, as well as the most relevant words present from each newspaper from each state.

Washington: listings, Amazon, job, Microsoft, openings Oregon: hate, crime, social, police, identity California: Women, black, opportunities, mentoring, computer

As supported by our analysis above, sentiment seemed to be the highest in California, followed by Washington, then Oregon. In California, the emphasis seemed to be focused on underrepresented communities and promoting growth within data science. In Oregon, a strong negative connotation surrounding crime seemed to be a recurring theme. In Washington, a corporate presence with a focus on jobs prevailed.

A stronger sentiment was expressed by the LA Times as compared to USA today. This might be attributed to the fact that the LA Times is a more local publication to the West Coast, which will have a more in-depth focus on its industries. USA today is a national newspaper might have chosen to reduce focus on developments in data science and focus more on the negative aspects.

More information would be needed regarding current job openings and employment rates in data science to further understand whether unemployment may affect how the sentiments differ across the West Coast. It would also be useful to gain data on what type of data science jobs are being offered in Oregon.

Based on the information above, We would suggest presenting more information about data science to educate and promote data science, especially within the Silicon Forest and surrounding neighborhoods. We would also suggest a greater outreach from the corporations in Seattle to the developing communities in Portland, Oregon. Communication across the West Coast could improve sentiment surrounding data science.