output: html_document: default pdf_document: default — Submission By : -Abhilasha Kumar -Nadya Paramputri -Sharp Harry

Research Objective To investigate whether the presence of STEM jobs would impact sentiment(s) AND infrastructure investment of 250+ Atlanta neighborhoods
Study Background -Why STEM jobs? STEM workers have been acknowledged as economic drivers at local and federal level [1]. They account for the country’s innovation; generating ideas and technologies that generate jobs and raise the standards of U.S. Household [2].

These individuals make 29 times more than their STEM counterparts (Langdon et al, 2011)

Local service jobs such as carpenters, taxi drivers, teachers, nurses, and others are created at a ratio of 5:1 for every STEM worker hired in a city with high STEM worker population (Moretti, 2011).

This population are growing in the past 40 years (Watson, 2017).

-Why Atlanta?

Atlanta has developed a deep-rooted ecosystem with tech-savvy workforce that is proliferated with proximity to tech-focused schools, and Georgia Tech’s decision to build Tech Square back in the 1990s that connects students to internships and research opportunities.

Number of tech job postings in Atlanta surpassed those in Chicago, Austin, and San Francisco (Burning Glass, 2022).

The rise of remote work that accelerated the great transmigration of Silicon Valley STEM workers to neighboring states that are cheaper in living costs.

Variables -Dependent Variable:

Neighborhood Sentiment: .Average Household Income & Housing Value .H1: We hypothesize that the higher the number of STEM population in an NPU, the lower the neighborhood sentiment would be .# of STEM population in the neighborhood
Economic Mobility Index .# of STEM population in the neighborhood .Average Household Income & Housing Value H2: We hypothesize that the higher the number of STEM population in an NPU, the higher the economic mobility index would be

The following steps are followed:

EXTRACTING THE TWEETS

For this project , we download Tweets that contain the names of neighborhoods in Atlanta. We apply sentiment analysis to the Tweets and map/plot the sentiments associated with neighborhoods. Specifically, we preformed the the following steps:

As always, loading the packages first.

library(rtweet)

## Warning: package 'rtweet' was built under R version 4.2.2

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.2

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2

## Warning: package 'ggplot2' was built under R version 4.2.2

## Warning: package 'tibble' was built under R version 4.2.1

## Warning: package 'tidyr' was built under R version 4.2.2

## Warning: package 'readr' was built under R version 4.2.2

## Warning: package 'purrr' was built under R version 4.2.2

## Warning: package 'dplyr' was built under R version 4.2.2

## Warning: package 'stringr' was built under R version 4.2.2

## Warning: package 'forcats' was built under R version 4.2.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::flatten() masks rtweet::flatten()
## ✖ dplyr::lag()     masks stats::lag()

library(sf)

## Warning: package 'sf' was built under R version 4.2.2

## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE

library(sentiment.ai)

## Warning: package 'sentiment.ai' was built under R version 4.2.2

library(SentimentAnalysis)

## Warning: package 'SentimentAnalysis' was built under R version 4.2.2

## 
## Attaching package: 'SentimentAnalysis'
## 
## The following object is masked from 'package:base':
## 
##     write

library(ggplot2)
library(here)

## Warning: package 'here' was built under R version 4.2.2

## here() starts at D:/Georgia Tech/Spec topic_/Project_proposal

library(tmap)

## Warning: package 'tmap' was built under R version 4.2.2

library(Hmisc);library(ff)

## Warning: package 'Hmisc' was built under R version 4.2.2

## Loading required package: lattice
## Loading required package: survival

## Warning: package 'survival' was built under R version 4.2.2

## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units

## Warning: package 'ff' was built under R version 4.2.2

## Loading required package: bit

## Warning: package 'bit' was built under R version 4.2.2

## 
## Attaching package: 'bit'
## 
## The following object is masked from 'package:base':
## 
##     xor
## 
## Attaching package ff
## - getOption("fftempdir")=="C:/Users/kumar/AppData/Local/Temp/Rtmpeu9znC/ff"
## 
## - getOption("ffextension")=="ff"
## 
## - getOption("ffdrop")==TRUE
## 
## - getOption("fffinonexit")==TRUE
## 
## - getOption("ffpagesize")==65536
## 
## - getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes
## 
## - getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system
## 
## - getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system
## 
## 
## Attaching package: 'ff'
## 
## The following objects are masked from 'package:utils':
## 
##     write.csv, write.csv2
## 
## The following objects are masked from 'package:base':
## 
##     is.factor, is.ordered

Step 2. Neighborhood Shapefile

Read the data into the current R environment.

# Read neighborhood shapefile
nb_shp <- st_read("D:/Georgia Tech/Spec topic_/major ass_5/Atlanta_Neighborhoods")

## Reading layer `Atlanta_Neighborhoods' from data source 
##   `D:\Georgia Tech\Spec topic_\major ass_5\Atlanta_Neighborhoods' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 248 features and 20 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.55085 ymin: 33.64799 xmax: -84.28962 ymax: 33.88687
## Geodetic CRS:  WGS 84

Step 3. Initiate Sentiment.ai in the environment

init_sentiment.ai(envname = "r-sentiment-ai", method = "conda") # feel free to change these arguments if you need to.

## <tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject object at 0x0000025BA852B340>

Step 4. Looping through neighborhood names to get Tweets

Prepare to use Twitter API by specifying arguments of create_token() function using your credentials.

# whatever name that was assigned to the created app
appname <- "UrbanAnalytics_tutorial"

# create token named "twitter_token"
# the keys used should be replaced by your own keys obtained by creating the app  

twitter_token <- create_token(
 app = appname,
  consumer_key = Sys.getenv("twitter_key"), 
  consumer_secret = Sys.getenv("twitter_key_secret"),
  access_token = Sys.getenv("twitter_access_token"),
  access_secret = Sys.getenv("twitter_access_token_secret"))

Step 5: Defining a function that downloads the tweets, clean them and apply senitment analysis to them.

# Extract neighborhood names from nb_shp's NAME column and store it in nb_names object.
nb_names <- nb_shp$NAME

# Define a search function
get_twt <- function(term){
  
  term_mod <- paste0("\"", term, "\"")

 
  out <- search_tweets(q = term_mod, 
                          n = 1000,
                          lang = "en",
                          geocode = "33.76,-84.41,50mi",
                          retryonratelimit = TRUE,
                          include_rts = FALSE)
  
    out <- out %>%
    select(created_at, id, id_str, full_text, geo, coordinates, place, text) 

  
  # Basic cleaning
  replace_reg <- "http[s]?://[A-Za-z\\d/\\.]+|&amp;|&lt;|&gt;"

  out <- out %>% 
    mutate(text = str_replace_all(text, replace_reg, ""),
           text = gsub("@", "", text),
           text = gsub("\n\n", "", text))
  
  # Sentiment analysis
  # Also add a column for neighborhood names
  if (nrow(out)>0){
    out <- out %>% 
      mutate(sentiment_ai = sentiment_score(out$text),
             sentiment_an = analyzeSentiment(text)$SentimentQDAP,
             nb = term)
    print(paste0("Search term:", term))
  } else {
    return(out)
  }
  
  return(out)
}

Step 6: Apply the function to Tweets.

twt <- readRDS("twt_raw.rds")
# Apply the function to get Tweets
# twt <- map(nb_names, ~get_twt(.x))

Step 7. Clean and filter the collected Tweets.

There are 2 sets of tweets collected ,the first chunk cleans and filters the first set and second chunk cleans and filters the second chunk. The process is done in the following steps: - Drop empty elements from the list twt. These are neighborhoods with no Tweets referoilring to them. Hint: you can create a logical vector that has FALSEs if the corresponding elements in twt has no Tweets and TRUE otherwise.

The coordinates column is currently a list-column. Unnest this column so that lat, long, and type (i.e., column names inside coordinates) are separate columns. You can use unnest() function.
Calculate the average sentiment score for each neighborhood. You can group_by() nb column in twt objects and summarize() to calculate means. Also add an additional column n that contains the number of rows in each group using n() function.
Join the cleaned Tweet data back to the neighborhood shape file. Use the neighborhood name as the join key.

Step 7(a)

library("data.table")

## Warning: package 'data.table' was built under R version 4.2.2

## 
## Attaching package: 'data.table'

## The following object is masked from 'package:bit':
## 
##     setattr

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

library(dplyr)
library(plyr)

## Warning: package 'plyr' was built under R version 4.2.2

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:Hmisc':
## 
##     is.discrete, summarize

## The following object is masked from 'package:here':
## 
##     here

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

twts <- twt[which(lapply(twt, nrow)!=0)]

twts <- rbindlist(twts , fill = FALSE, idcol = NULL)
typeof(twts)

## [1] "list"

twts_unnest <- unnest(twts, cols= c("coordinates"))


twts_clean <- twts_unnest  %>% group_by(nb) %>%
  dplyr::summarise(sentiment_ai = mean(sentiment_ai),
            sentiment_an = mean(sentiment_an),
            n = n()
            )

names(twts_clean)[names(twts_clean) == 'nb'] <- 'NAME'

twt_poly <- merge(x= nb_shp, y = twts_clean, by= 'NAME')

Step 7(b) Cleaning and filtering previous 2 weeks of twitter data

bw_rds <- readRDS("D:/Georgia Tech/Spec topic_/Project_proposal/twt_nb_2022-11-13.rds")

twt_bw <- bw_rds[1:248] %>% do.call("rbind", .)

twts_clean_bw <- twt_bw  %>% group_by(nb) %>%
  dplyr::summarise(sentiment_ai = mean(sentiment_ai),
            sentiment_an = mean(sentiment_an),
            n = n()
            )

names(twts_clean_bw)[names(twts_clean_bw) == 'nb'] <- 'NAME'
twt_poly_bw <- merge(x= nb_shp, y = twts_clean_bw, by= 'NAME')
names(twts_clean)[names(twts_clean) == 'nb'] <- 'NAME'

Step 7(c): Merging both the sets of tweet into one dataframe

merged_final_twt_2 <- join(twt_poly %>%  as.data.frame(),twt_poly_bw %>%  as.data.frame(), by = "ACRES")

total_twts <-rbind(twt_poly, twt_poly_bw)
tibble(total_twts)

## # A tibble: 111 × 24
##    NAME     OBJEC…¹ LOCALID GEOTYPE FULLF…² LEGAL…³ EFFEC…⁴ ENDDATE SRCREF ACRES
##    <chr>      <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  <dbl>
##  1 Adams P…      62 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    629.
##  2 Atlanti…     231 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    163.
##  3 Ben Hill      71 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    685.
##  4 Bolton        75 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    965.
##  5 Brandon       17 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    410.
##  6 Brookha…     225 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    637.
##  7 Brookwo…     207 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    101.
##  8 Buckhea…      32 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    127.
##  9 Cabbage…      48 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    112.
## 10 Campbel…      60 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>    283.
## # … with 101 more rows, 14 more variables: SQMILES <dbl>, OLDNAME <chr>,
## #   NPU <chr>, CREATED_US <chr>, CREATED_DA <date>, LAST_EDITE <chr>,
## #   LAST_EDI_1 <date>, GLOBALID <chr>, SHAPEAREA <dbl>, SHAPELEN <dbl>,
## #   sentiment_ai <dbl>, sentiment_an <dbl>, n <int>,
## #   geometry <MULTIPOLYGON [°]>, and abbreviated variable names ¹OBJECTID,
## #   ²FULLFIPS, ³LEGALAREA, ⁴EFFECTDATE

all_twts <- saveRDS(total_twts,file = "merged_twts_1.rds")

Step 8. Analysis

Now that we have collected Tweets, calculated sentiment score, and merged it back to the original shapefile, we can map them to see spatial distribution and draw plots to see inter-variable relationships.

Step(8a): First, interactive choropleth maps, one using sentiment score as the color and the other one using the number of Tweets as the color. Use tmap_arrange() function to display the two maps side-by-side.

tmap_mode("view")

## tmap mode set to interactive viewing

a <- tm_basemap("OpenStreetMap")+tm_shape(total_twts) + 
  tm_polygons(col = "sentiment_ai", style = "quantile")
a

## Variable(s) "sentiment_ai" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

b <- tm_basemap("OpenStreetMap")+ tm_shape(total_twts) +
  tm_polygons(col = "n", style="quantile")

tmap_arrange(a,b, sync = TRUE)

## Variable(s) "sentiment_ai" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

## Variable(s) "sentiment_ai" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Step(8b): Calculating correlation analysis between the number of Tweets for each neighborhood and sentiment score either using cor.test() function or ggpubr::stat_cor() function.

library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.2.2

## 
## Attaching package: 'ggpubr'

## The following object is masked from 'package:plyr':
## 
##     mutate

twt_cor <- ggscatter(total_twts,x= "n", y = "sentiment_ai", add = "reg.line", add.params = list(color = "blue", fill = "lightgray"),method = "pearson", label.x = 3, label.y = 30) # Customize reg. line

## Warning in (function (mapping = NULL, data = NULL, stat = "identity", position =
## "identity", : Ignoring unknown parameters: `method`

twt_cor + stat_cor(p.accuracy = 0.001, r.accuracy = 0.01)

## `geom_smooth()` using formula = 'y ~ x'

twt_cor

## `geom_smooth()` using formula = 'y ~ x'

cor_map <- twt_cor + stat_cor(method = "pearson")
twt_cor <- cor.test(total_twts$n,total_twts$sentiment_ai)

Step 9: Convert nb shape file from polygon to point file. Step(9a): Find the centroid of the Neighborhood shapes in order to overlap them with the NPU with aligning boundaries.

st_centroid(nb_shp)

## Warning in st_centroid.sf(nb_shp): st_centroid assumes attributes are constant
## over geometries of x

## Simple feature collection with 248 features and 20 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -84.54262 ymin: 33.65449 xmax: -84.30083 ymax: 33.87584
## Geodetic CRS:  WGS 84
## First 10 features:
##    OBJECTID LOCALID                   NAME      GEOTYPE FULLFIPS LEGALAREA
## 1         7    <NA> Peachtree Heights East Neighborhood     <NA>      <NA>
## 2         8    <NA>       Mt. Gilead Woods Neighborhood     <NA>      <NA>
## 3         9    <NA>     Meadowbrook Forest Neighborhood     <NA>      <NA>
## 4        10    <NA>            Niskey Cove Neighborhood     <NA>      <NA>
## 5        11    <NA>               Oakcliff Neighborhood     <NA>      <NA>
## 6        12    <NA>                Just Us Neighborhood     <NA>      <NA>
## 7        13    <NA>          Bush Mountain Neighborhood     <NA>      <NA>
## 8        14    <NA>             Briar Glen Neighborhood     <NA>      <NA>
## 9        15    <NA>               Fairburn Neighborhood     <NA>      <NA>
## 10       16    <NA>       Ben Hill Terrace Neighborhood     <NA>      <NA>
##    EFFECTDATE ENDDATE SRCREF  ACRES SQMILES                OLDNAME NPU
## 1        <NA>    <NA>   <NA> 133.22    0.21 Peachtree Heights East   B
## 2        <NA>    <NA>   <NA>  35.59    0.06       Mt. Gilead Woods   P
## 3        <NA>    <NA>   <NA>  70.85    0.11     Meadowbrook Forest   P
## 4        <NA>    <NA>   <NA>  52.50    0.08            Niskey Cove   P
## 5        <NA>    <NA>   <NA>  66.96    0.10               Oakcliff   H
## 6        <NA>    <NA>   <NA>  17.69    0.03                Just Us   T
## 7        <NA>    <NA>   <NA>  49.80    0.08          Bush Mountain   S
## 8        <NA>    <NA>   <NA>  66.55    0.10             Briar Glen   P
## 9        <NA>    <NA>   <NA> 114.84    0.18        Fairburn Avenue   P
## 10       <NA>    <NA>   <NA> 212.19    0.33       Ben Hill Terrace   P
##    CREATED_US CREATED_DA LAST_EDITE LAST_EDI_1
## 1        <NA>       <NA>        GIS 2022-05-24
## 2        <NA>       <NA>        GIS 2022-05-24
## 3        <NA>       <NA>        GIS 2022-05-24
## 4        <NA>       <NA>        GIS 2022-05-24
## 5        <NA>       <NA>        GIS 2022-05-24
## 6        <NA>       <NA>        GIS 2022-05-24
## 7        <NA>       <NA>        GIS 2022-05-24
## 8        <NA>       <NA>        GIS 2022-05-24
## 9        <NA>       <NA>        GIS 2022-05-24
## 10       <NA>       <NA>        GIS 2022-05-24
##                                  GLOBALID SHAPEAREA  SHAPELEN
## 1  {7040B465-59F1-4D32-BB1A-6340CCAB5471} 5803162.1  9738.439
## 2  {0B7F1854-18A6-42CD-BDA4-CF6119EFD4D5} 1550308.9  5341.891
## 3  {6FE32BA0-9C9E-496F-BF73-9DBB21427BAB} 3086026.6  7538.819
## 4  {C72B1769-93B6-42BA-AB9C-C7EEC486B46C} 2286925.9  8814.793
## 5  {B4963201-5EE9-4D7C-A2EE-E8CCDA782740} 2916579.1  6926.384
## 6  {1C57E7B0-9F92-477D-9DF9-98CE4C268F94}  770378.3  3760.127
## 7  {50462634-1A5E-4D72-9509-06DBB9EEA0FA} 2169448.6  7795.086
## 8  {114318E1-DE48-43AA-853B-1708CAB5607F} 2898718.3  8633.698
## 9  {670CCF24-29C4-4455-B6B0-22F130488D42} 5002626.5 10200.742
## 10 {BB2396B6-8004-4C3F-94BE-E9DAD516A31E} 9243017.2 12950.567
##                      geometry
## 1  POINT (-84.38295 33.82574)
## 2  POINT (-84.50368 33.69995)
## 3  POINT (-84.50308 33.69259)
## 4  POINT (-84.52892 33.70871)
## 5  POINT (-84.49806 33.76169)
## 6  POINT (-84.42486 33.75258)
## 7  POINT (-84.43204 33.72751)
## 8   POINT (-84.5033 33.69695)
## 9  POINT (-84.52342 33.69249)
## 10     POINT (-84.52177 33.7)

Step(9b): Reading the NPU file from the location and spatial join nb shape file with NPU shape file

NPU_shape <- read_sf("D:/Georgia Tech/Spec topic_/Project_proposal/City_of_Atlanta_Neighborhood_Statistical_Areas/City_of_Atlanta_Neighborhood_Statistical_Areas/City_of_Atlanta_Neighborhood_Statistical_Areas.shp")

NPU_sf <- st_as_sf(NPU_shape)
sf_joined <- st_join(NPU_sf ,nb_shp, join = st_intersects)

tmap_mode("view")

## tmap mode set to interactive viewing

a <- tm_basemap("OpenStreetMap") + tm_shape(sf_joined) + tm_polygons(col = "NPU.x", style = "pretty")
a

Step10: Extracting a NPU level data with all the variable values from csv to dataframe format.

all_data <- read.csv("NPU_Neighborhood_EconMobility_Pop_Race_MedHHInc_MedHouse_TotalJob_STEMJob (1).csv")
all_data

##    OBJECTID NPU            NEIGHBORHOOD Economic.Mobility.Index
## 1        37   A        Margaret Mitchel                      63
## 2         3   B  Peachtree Heights West                      63
## 3        38   C                Fernleaf                      61
## 4        50   D                  Bolton                      58
## 5        83   E             Ansley Park                      64
## 6        77   F        Piedmont Heights                      59
## 7        65   G Atlanta Industrial Park                      40
## 8        NA   G          West Highlands                      41
## 9        15   I           Beecher Hills                      42
## 10       24   J             Center Hill                      39
## 11       52   K            Hunter Hills                      40
## 12       10   L               Vine City                      45
## 13       12   M        Castleberry Hill                      58
## 14      102   N             Cabbagetown                      60
## 15       92   O               East Lake                      51
## 16       NA   P                Ben Hill                      45
## 17       55   Q         Midwest Cascade                      49
## 18       66   R        Campbellton Road                      33
## 19       19   S           Bush Mountain                      42
## 20       22   T         Ashview Heights                      40
## 21       46   V         Capitol Gateway                      49
## 22      100   W     Grant Park, Oakland                      59
## 23       18   X            Capitol View                      45
## 24       97   Y          Chosewood Park                      41
## 25       47   Z                Lakewood                      36
##    People.Based.Index Place.Based.Index Economic.System.Index
## 1                  71                62                    52
## 2                  69                59                    60
## 3                  65                59                    56
## 4                  64                58                    55
## 5                  66                69                    58
## 6                  60                61                    60
## 7                  38                46                    54
## 8                  36                46                    42
## 9                  41                43                    43
## 10                 31                44                    43
## 11                 37                43                    41
## 12                 45                46                    46
## 13                 49                58                    77
## 14                 66                64                    55
## 15                 61                44                    49
## 16                 45                48                    45
## 17                 46                49                    50
## 18                 26                33                    40
## 19                 42                48                    40
## 20                 43                42                    33
## 21                 46                54                    44
## 22                 62                56                    60
## 23                 45                47                    44
## 24                 43                40                    43
## 25                 38                36                    33
##    Education.System.Index   pop white black asian other hispanic medhhinc
## 1                      69  4061  85.9   5.7   4.4   1.3      2.7   299991
## 2                      65  4874  77.0  14.5   2.9   2.0      3.6   116250
## 3                      62  2662  74.1   9.7   1.6   1.2     13.3   116108
## 4                      56  5314  43.1  27.1   2.4   2.2     25.2   105331
## 5                      65  3350  86.6   8.0   1.5   0.8      3.1   109269
## 6                      57  2834  65.9  20.3   4.4   3.1      6.3   114292
## 7                      61  2083   3.6  92.8   0.6   1.1      2.0    35038
## 8                      41  3628   3.9  91.6   0.6   1.5      2.3    39589
## 9                      42  2881   1.3  95.3   0.1   1.5      1.8    45212
## 10                     37  2730   1.4  95.8   0.2   1.2      1.4    32051
## 11                     40  3836   1.5  96.2   0.2   1.0      1.0    41994
## 12                     45  2818   2.2  92.7   0.3   2.2      2.6    35244
## 13                     48 14560  32.2  53.4   6.8   2.8      4.8    73780
## 14                     56  3750  59.1  31.0   2.3   2.8      4.8   118100
## 15                     51  4046  27.2  67.7   0.9   1.8      2.4    85981
## 16                     43  3826   1.2  94.6   0.5   2.0      1.7    59371
## 17                     51  1898   1.5  96.3   0.9   0.9      0.4    96093
## 18                     34  6721   1.9  95.7   0.0   1.1      1.2    27689
## 19                     41  3672   1.7  95.8   0.2   1.1      1.1    40136
## 20                      4  2072   1.6  95.5   0.0   1.6      1.3   388803
## 21                     51  2874  12.0  81.0   1.8   2.2      3.0    31516
## 22                     60  6827  62.4  27.9   2.2   2.5      5.0   112075
## 23                     44  2648  12.8  82.6   0.4   1.8      2.4    34123
## 24                     36  3995  21.9  55.0   0.5   1.5     21.1    33252
## 25                     37  3135   2.6  84.3   1.0   0.8     11.4    36305
##    medhousevalue totaljob stemjob
## 1         974443     4799   56.80
## 2         606599    26278   55.40
## 3         676221     8540   33.30
## 4         397923    17993   28.60
## 5         467388    97905   42.00
## 6         646887    47137   35.24
## 7         208684     4370   16.80
## 8         100539     3319   22.90
## 9         168759     2647   18.30
## 10        105324     2581   16.20
## 11        104444     2541   28.20
## 12        236130     3002   20.60
## 13        374185   154829   40.80
## 14        567659     9411   12.90
## 15        421026     4161   15.70
## 16        192889     5133   12.80
## 17        325564       37    9.70
## 18        195071     2897    8.20
## 19         94993      904   24.90
## 20        282464     7852   20.50
## 21        231128     3286   16.30
## 22        407383     7018   14.40
## 23        142781    20542   48.00
## 24        179753     1791   48.80
## 25        143342     5707   29.40

Step 10: Combining the sentiment score with the NPU shape file.

print(total_twts)

## Simple feature collection with 111 features and 23 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -84.53565 ymin: 33.65559 xmax: -84.28962 ymax: 33.88687
## Geodetic CRS:  WGS 84
## First 10 features:
##                NAME OBJECTID LOCALID      GEOTYPE FULLFIPS LEGALAREA EFFECTDATE
## 1        Adams Park       62    <NA> Neighborhood     <NA>      <NA>       <NA>
## 2  Atlantic Station      231    <NA> Neighborhood     <NA>      <NA>       <NA>
## 3          Ben Hill       71    <NA> Neighborhood     <NA>      <NA>       <NA>
## 4            Bolton       75    <NA> Neighborhood     <NA>      <NA>       <NA>
## 5           Brandon       17    <NA> Neighborhood     <NA>      <NA>       <NA>
## 6        Brookhaven      225    <NA> Neighborhood     <NA>      <NA>       <NA>
## 7         Brookwood      207    <NA> Neighborhood     <NA>      <NA>       <NA>
## 8  Buckhead Village       32    <NA> Neighborhood     <NA>      <NA>       <NA>
## 9       Cabbagetown       48    <NA> Neighborhood     <NA>      <NA>       <NA>
## 10 Campbellton Road       60    <NA> Neighborhood     <NA>      <NA>       <NA>
##    ENDDATE SRCREF  ACRES SQMILES          OLDNAME NPU CREATED_US CREATED_DA
## 1     <NA>   <NA> 628.53    0.98       Adams Park   R       <NA>       <NA>
## 2     <NA>   <NA> 163.06    0.25        Home Park   E       <NA>       <NA>
## 3     <NA>   <NA> 685.22    1.07         Ben Hill   P       <NA>       <NA>
## 4     <NA>   <NA> 964.68    1.51           Bolton   D       <NA>       <NA>
## 5     <NA>   <NA> 409.85    0.64          Brandon   C       <NA>       <NA>
## 6     <NA>   <NA> 636.92    1.00       Brookhaven   B       <NA>       <NA>
## 7     <NA>   <NA> 101.17    0.16        Brookwood   E       <NA>       <NA>
## 8     <NA>   <NA> 127.21    0.20 Buckhead Village   B       <NA>       <NA>
## 9     <NA>   <NA> 112.17    0.18     Cabbage Town   N       <NA>       <NA>
## 10    <NA>   <NA> 282.91    0.44 Campbellton Road   R       <NA>       <NA>
##    LAST_EDITE LAST_EDI_1                               GLOBALID SHAPEAREA
## 1         GIS 2022-05-24 {806632F1-D3FC-4DD2-978A-42B625F8C601}  27378543
## 2         GIS 2022-05-24 {0E3C70B8-FB96-4598-9392-35B0449DF8FB}   7103044
## 3         GIS 2022-05-24 {BE4276A0-2F34-4E15-8DDA-F8059ACC9E22}  29848192
## 4         GIS 2022-05-24 {68C7BA9F-7608-4A79-B59E-65E2C4C00FE6}  42021278
## 5         GIS 2022-05-24 {F29E1386-9FEC-4907-AFAC-FBD45A723D0A}  17853189
## 6         GIS 2022-05-24 {CDC5A9FB-29F8-4F51-97DA-ED8D9DB1482E}  27744187
## 7         GIS 2022-05-24 {7160E578-A16B-4F40-A51F-A692B717629C}   4406751
## 8         GIS 2022-05-24 {ED28CC83-BB5C-4EC2-8DA5-F47473E1FAB0}   5541158
## 9         GIS 2022-05-24 {267DA5D6-6421-4870-AA02-137C41930051}   4886223
## 10        GIS 2022-05-24 {7F9C6923-F17A-4C70-9F40-49235294CB69}  12323573
##     SHAPELEN sentiment_ai sentiment_an  n                       geometry
## 1  21028.365  -0.67136337  0.020000000  2 MULTIPOLYGON (((-84.45195 3...
## 2  13535.866  -0.22949672  0.020506518 52 MULTIPOLYGON (((-84.39357 3...
## 3  38492.440   0.28917164  0.100000000  2 MULTIPOLYGON (((-84.52858 3...
## 4  36336.392  -0.58723179  0.005681818  4 MULTIPOLYGON (((-84.45799 3...
## 5  22601.822  -0.03029922  0.004557292 32 MULTIPOLYGON (((-84.41975 3...
## 6  25663.856  -0.32019438  0.031845238 10 MULTIPOLYGON (((-84.34826 3...
## 7  11375.268  -0.06301082  0.041058859 12 MULTIPOLYGON (((-84.39306 3...
## 8  11010.963   0.66483378  0.157894737  1 MULTIPOLYGON (((-84.37131 3...
## 9   9042.749   0.50428414  0.250000000  1 MULTIPOLYGON (((-84.36264 3...
## 10 15863.560   0.25729829  0.000000000  2 MULTIPOLYGON (((-84.4667 33...

all_datum <- merge(total_twts, all_data, by = "NPU", all= TRUE)

tibble(all_datum)

## # A tibble: 117 × 41
##    NPU   NAME     OBJEC…¹ LOCALID GEOTYPE FULLF…² LEGAL…³ EFFEC…⁴ ENDDATE SRCREF
##    <chr> <chr>      <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr> 
##  1 A     Paces        221 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  2 A     Margare…      98 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  3 A     Kingswo…      96 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  4 A     Paces        221 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  5 A     Chastai…     240 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  6 A     Margare…      98 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  7 B     North B…     215 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  8 B     Buckhea…      32 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
##  9 B     Brookha…     225 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
## 10 B     Lenox         89 <NA>    Neighb… <NA>    <NA>    <NA>    <NA>    <NA>  
## # … with 107 more rows, 31 more variables: ACRES <dbl>, SQMILES <dbl>,
## #   OLDNAME <chr>, CREATED_US <chr>, CREATED_DA <date>, LAST_EDITE <chr>,
## #   LAST_EDI_1 <date>, GLOBALID <chr>, SHAPEAREA <dbl>, SHAPELEN <dbl>,
## #   sentiment_ai <dbl>, sentiment_an <dbl>, n <int>, OBJECTID.y <int>,
## #   NEIGHBORHOOD <chr>, Economic.Mobility.Index <int>,
## #   People.Based.Index <int>, Place.Based.Index <int>,
## #   Economic.System.Index <int>, Education.System.Index <int>, pop <int>, …

twt_clean_1 <- all_datum %>% group_by(NPU) %>% summarise_at(vars(1:40), mean)

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(NAME): argument is not numeric or logical: returning NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LOCALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GEOTYPE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(FULLFIPS): argument is not numeric or logical: returning
## NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LEGALAREA): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(EFFECTDATE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(ENDDATE): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(SRCREF): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(OLDNAME): argument is not numeric or logical: returning
## NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(CREATED_US): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(LAST_EDITE): argument is not numeric or logical:
## returning NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(GLOBALID): argument is not numeric or logical: returning
## NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(NEIGHBORHOOD): argument is not numeric or logical:
## returning NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

## Warning in mean.default(geometry): argument is not numeric or logical: returning
## NA

twt_clean_1

## Simple feature collection with 25 features and 40 fields (with 2 geometries empty)
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -84.53565 ymin: 33.65559 xmax: -84.28962 ymax: 33.88687
## Geodetic CRS:  WGS 84
## # A tibble: 25 × 41
##    NPU    NAME OBJECTID.x LOCALID GEOTYPE FULLF…¹ LEGAL…² EFFEC…³ ENDDATE SRCREF
##    <chr> <dbl>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
##  1 A        NA      162.       NA      NA      NA      NA      NA      NA     NA
##  2 B        NA      141.       NA      NA      NA      NA      NA      NA     NA
##  3 C        NA       68.8      NA      NA      NA      NA      NA      NA     NA
##  4 D        NA       74.2      NA      NA      NA      NA      NA      NA     NA
##  5 E        NA      195.       NA      NA      NA      NA      NA      NA     NA
##  6 F        NA     1424.       NA      NA      NA      NA      NA      NA     NA
##  7 G        NA      154.       NA      NA      NA      NA      NA      NA     NA
##  8 H        NA      122.       NA      NA      NA      NA      NA      NA     NA
##  9 I        NA      161.       NA      NA      NA      NA      NA      NA     NA
## 10 J        NA      130        NA      NA      NA      NA      NA      NA     NA
## # … with 15 more rows, 31 more variables: ACRES <dbl>, SQMILES <dbl>,
## #   OLDNAME <dbl>, CREATED_US <dbl>, CREATED_DA <date>, LAST_EDITE <dbl>,
## #   LAST_EDI_1 <date>, GLOBALID <dbl>, SHAPEAREA <dbl>, SHAPELEN <dbl>,
## #   sentiment_ai <dbl>, sentiment_an <dbl>, n <dbl>, OBJECTID.y <dbl>,
## #   NEIGHBORHOOD <dbl>, Economic.Mobility.Index <dbl>,
## #   People.Based.Index <dbl>, Place.Based.Index <dbl>,
## #   Economic.System.Index <dbl>, Education.System.Index <dbl>, pop <dbl>, …

twt_clean_ <- twt_clean_1 %>% select("NPU", "NAME", "sentiment_ai", "n", "Economic.Mobility.Index", "People.Based.Index", "Economic.System.Index" ,"Education.System.Index" , "pop", "white", "black", "asian","other","hispanic","medhhinc","medhousevalue", "totaljob","stemjob")
twt_clean_

## Simple feature collection with 25 features and 18 fields (with 2 geometries empty)
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -84.53565 ymin: 33.65559 xmax: -84.28962 ymax: 33.88687
## Geodetic CRS:  WGS 84
## # A tibble: 25 × 19
##    NPU    NAME sentime…¹     n Econo…² Peopl…³ Econo…⁴ Educa…⁵   pop white black
##    <chr> <dbl>     <dbl> <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl>
##  1 A        NA   -0.123   3.83    63        71      52      69 4061  85.9    5.7
##  2 B        NA    0.244   7.12    63        69      60      65 4874  77     14.5
##  3 C        NA    0.0395 13.2     61        65      56      62 2662  74.1    9.7
##  4 D        NA   -0.276   4.6     58        64      55      56 5314  43.1   27.1
##  5 E        NA    0.226  26.2     64        66      58      65 3350  86.6    8  
##  6 F        NA    0.437   3.33    59        60      60      57 2834  65.9   20.3
##  7 G        NA   -0.149  29.5     40.5      37      48      51 2856.  3.75  92.2
##  8 H        NA    0.443   7       NA        NA      NA      NA   NA  NA     NA  
##  9 I        NA    0.0267  1       42        41      43      42 2881   1.3   95.3
## 10 J        NA    0.525   2       39        31      43      37 2730   1.4   95.8
## # … with 15 more rows, 8 more variables: asian <dbl>, other <dbl>,
## #   hispanic <dbl>, medhhinc <dbl>, medhousevalue <dbl>, totaljob <dbl>,
## #   stemjob <dbl>, geometry <GEOMETRY [°]>, and abbreviated variable names
## #   ¹sentiment_ai, ²Economic.Mobility.Index, ³People.Based.Index,
## #   ⁴Economic.System.Index, ⁵Education.System.Index

Step 11: Data Analysis and Visualization of all the tweets and sentiment_ai values.

ggplot(data = twt_clean_, mapping = aes(x=n, y=sentiment_ai)) +
  geom_point() +
  geom_smooth(method = "lm",se = FALSE) +
  labs(
    x = "Count_Tweets",
    y = "Avg_Sentiment_Score",
    title = "Tweet patterns in different NPU in Atlanta"
  )

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 2 rows containing missing values (`geom_point()`).

Step 12: Correlation of mean of sentiment score with other data variables

spw <- ggplot(data = twt_clean_, mapping = aes(x=Economic.Mobility.Index, y=sentiment_ai)) +
  geom_point() +
  geom_smooth(method = "lm",se = FALSE) +
  stat_cor(method = "pearson", label.x = 40, label.y = 0.65)

Description: This map shows that there is a negative correlation between the two of the variables and shows that

library(ggplot2)
library(tmap)
library(cowplot)

## Warning: package 'cowplot' was built under R version 4.2.2

## 
## Attaching package: 'cowplot'

## The following object is masked from 'package:ggpubr':
## 
##     get_legend

library(ggplotify)

## Warning: package 'ggplotify' was built under R version 4.2.2

spw1 <- ggplot(data = twt_clean_, mapping = aes(x=Education.System.Index, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+ stat_cor(method = "Kendall", label.x = 30, label.y = 0.55)
spw1

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Computation failed in `stat_cor()`
## Caused by error in `match.arg()`:
## ! 'arg' should be one of "pearson", "kendall", "spearman"

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw2 <- ggplot(data = twt_clean_, mapping = aes(x=stemjob, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red")+ 
  theme_bw()+ stat_cor(method = "pearson", label.x = 30, label.y = 0.55)
spw2

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw3 <- ggplot(data = twt_clean_, mapping = aes(x=People.Based.Index, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+ stat_cor(method = "pearson", label.x = 30, label.y = 0.55)
spw3

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw4 <- ggplot(data = twt_clean_, mapping = aes(x=white, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+ stat_cor(method = "pearson", label.x = 30, label.y = 0.55)
spw4

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

tmap_mode()

## current tmap mode is "view"

plot_grid(spw1, spw2, spw3, spw4 )

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Computation failed in `stat_cor()`
## Caused by error in `match.arg()`:
## ! 'arg' should be one of "pearson", "kendall", "spearman"

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw5 <- ggplot(data = twt_clean_, 
               mapping = aes(x= black, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") + 
  theme_bw()+ 
  stat_cor(method = "pearson", label.x = 30, label.y = 0.55)
spw5

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw6 <- ggplot(data = twt_clean_, mapping = aes(x= asian, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+
  stat_cor(method = "pearson", label.x = 2, label.y = 0.55)
spw6

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw7 <- ggplot(data = twt_clean_, mapping = aes(x= other, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+
  stat_cor(method = "pearson", label.x = 2, label.y = 0.55)
spw7

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw8 <- ggplot(data = twt_clean_, mapping = aes(x= hispanic, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+
  stat_cor(method = "pearson", label.x = 2, label.y = 0.55)
spw8

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

plot_grid(spw5, spw6, spw7, spw8)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

spw9 <- ggplot(data = twt_clean_, mapping = aes(x= medhhinc, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+
  stat_cor(method = "pearson", label.x = 75000, label.y = 0.55)


spw10 <- ggplot(data = twt_clean_, mapping = aes(x= medhousevalue, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm",se = FALSE, color= "red") +
  theme_bw()+
  stat_cor(method = "pearson", label.x = 300000, label.y = 0.55)

plot_grid(spw9, spw10)

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

library(MASS)

## Warning: package 'MASS' was built under R version 4.2.2

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

lm_model <- lm(sentiment_ai ~ NPU+ Economic.Mobility.Index + People.Based.Index +Education.System.Index + pop+ white+ black+ asian+other+hispanic+medhhinc+medhousevalue+ stemjob, data = twt_clean_)
summary(lm_model)

## 
## Call:
## lm(formula = sentiment_ai ~ NPU + Economic.Mobility.Index + People.Based.Index + 
##     Education.System.Index + pop + white + black + asian + other + 
##     hispanic + medhhinc + medhousevalue + stemjob, data = twt_clean_)
## 
## Residuals:
## ALL 22 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (12 not defined because of singularities)
##                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)              -0.1234        NaN     NaN      NaN
## NPUB                      0.3672        NaN     NaN      NaN
## NPUC                      0.1629        NaN     NaN      NaN
## NPUD                     -0.1529        NaN     NaN      NaN
## NPUE                      0.3497        NaN     NaN      NaN
## NPUF                      0.5600        NaN     NaN      NaN
## NPUG                     -0.0258        NaN     NaN      NaN
## NPUI                      0.1501        NaN     NaN      NaN
## NPUJ                      0.6481        NaN     NaN      NaN
## NPUK                      0.8700        NaN     NaN      NaN
## NPUL                      0.1206        NaN     NaN      NaN
## NPUM                      0.3951        NaN     NaN      NaN
## NPUN                      0.6389        NaN     NaN      NaN
## NPUO                      0.5216        NaN     NaN      NaN
## NPUP                      0.3858        NaN     NaN      NaN
## NPUR                      0.3724        NaN     NaN      NaN
## NPUS                      0.3152        NaN     NaN      NaN
## NPUT                      0.6116        NaN     NaN      NaN
## NPUV                      0.3482        NaN     NaN      NaN
## NPUW                      0.3942        NaN     NaN      NaN
## NPUY                      0.4967        NaN     NaN      NaN
## NPUZ                      0.5994        NaN     NaN      NaN
## Economic.Mobility.Index       NA         NA      NA       NA
## People.Based.Index            NA         NA      NA       NA
## Education.System.Index        NA         NA      NA       NA
## pop                           NA         NA      NA       NA
## white                         NA         NA      NA       NA
## black                         NA         NA      NA       NA
## asian                         NA         NA      NA       NA
## other                         NA         NA      NA       NA
## hispanic                      NA         NA      NA       NA
## medhhinc                      NA         NA      NA       NA
## medhousevalue                 NA         NA      NA       NA
## stemjob                       NA         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 21 and 0 DF,  p-value: NA

lm_model_1 <- ggplot(data = twt_clean_, mapping = aes(x= n+ Economic.Mobility.Index + People.Based.Index+ Economic.System.Index +Education.System.Index + pop+ white+ black+ asian+other+hispanic+medhhinc+medhousevalue+ stemjob, y=sentiment_ai)) +
  geom_point(alpha= 0.4, size = 3) +
  geom_smooth(method = "lm", color= "red") +
  theme_bw()+
  stat_cor(method = "pearson", label.x = 500000, label.y = 0.55)

lm_model_1

## `geom_smooth()` using formula = 'y ~ x'

## Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).

## Warning: Removed 3 rows containing non-finite values (`stat_cor()`).

## Warning: Removed 3 rows containing missing values (`geom_point()`).

pred_lm_ <- predict(lm_model)

length(pred_lm_) <- length(twt_clean_$sentiment_ai)
plot_data <- data.frame(Predicted_value_sent = pred_lm_,  
                       Observed_value_sent = twt_clean_$sentiment_ai, 
                       twt_clean_$NPU)

ggplot(plot_data, aes(x = Predicted_value_sent, y = Observed_value_sent)) +
                  geom_point(alpha= 0.4, size = 3) +
                 geom_abline(intercept = 0, slope = 1, color = "green")

## Warning: Removed 4 rows containing missing values (`geom_point()`).

add column of the predicted points with the tweets

names(plot_data)[names(plot_data) == 'twt_clean_.NPU'] <- 'NPU'


twt_clean_2 <- merge(twt_clean_, plot_data, by = "NPU")
a_1 <- tm_basemap("OpenStreetMap") + tm_shape(twt_clean_2) + tm_polygons(col = "Predicted_value_sent", style = "pretty")
a_1

## Warning: The shape twt_clean_2 contains empty units.

## Variable(s) "Predicted_value_sent" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

b_1 <- tm_basemap("OpenStreetMap")+tm_shape(total_twts) + 
  tm_polygons(col = "sentiment_ai", style = "quantile")
b_1

## Variable(s) "sentiment_ai" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

tmap_arrange(a_1, b_1)

## Warning: The shape twt_clean_2 contains empty units.

## Variable(s) "Predicted_value_sent" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

## Variable(s) "sentiment_ai" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

## Warning: The shape twt_clean_2 contains empty units.

## Variable(s) "Predicted_value_sent" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
## Variable(s) "sentiment_ai" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Untitled

Abhilasha

2022-12-11

Step 2. Neighborhood Shapefile

Step 3. Initiate Sentiment.ai in the environment

Step 4. Looping through neighborhood names to get Tweets

Step 7. Clean and filter the collected Tweets.