Part #1-This map represents the amount of renter-occupied households across different counties across the state of Tennessee, with this map helping show the percentage of renters in counties. The highest percentage of rent-based households in this data set cam be seen in Davidson county, which can be attributed to the fact that it is the county Nashville is located in, a major metropolitan city which might have more renters than homeowners.
##################################################################
# County level, Nashville MSA
##################################################################
# Installing and loading required packages
if (!require("tidyverse")) install.packages("tidyverse")
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("tidycensus")) install.packages("tidycensus")
## Loading required package: tidycensus
if (!require("sf")) install.packages("sf")
## Loading required package: sf
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
if (!require("mapview")) install.packages("mapview")
## Loading required package: mapview
## Warning: package 'mapview' was built under R version 4.3.3
library(tidyverse)
library(tidycensus)
library(sf)
library(mapview)
# Transmitting API key
census_api_key("9fbd73ce43871afbb3837dd9a865f064940a6a53")
## To install your API key for use in future sessions, run this function with `install = TRUE`.
# Fetching ACS codebooks
DetailedTables <- load_variables(2022, "acs5", cache = TRUE)
SubjectTables <- load_variables(2022, "acs5/subject", cache = TRUE)
ProfileTables <- load_variables(2022, "acs5/profile", cache = TRUE)
All_ACS_Variables <- bind_rows(DetailedTables, ProfileTables)
All_ACS_Variables <- bind_rows(All_ACS_Variables, SubjectTables)
rm (DetailedTables, SubjectTables, ProfileTables)
# Specify a variable to estimate
VariableList =
c(Estimate_ = "DP04_0047P")
# Fetching data
mydata <- get_acs(
geography = "county",
state = "TN",
variables = VariableList,
year = 2022,
survey = "acs5",
output = "wide",
geometry = TRUE)
## Getting data from the 2018-2022 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Using the ACS Data Profile
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 26%
|
|==================== | 28%
|
|===================== | 29%
|
|====================== | 31%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|========================= | 35%
|
|========================== | 36%
|
|========================== | 37%
|
|=========================== | 38%
|
|============================ | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|=================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|===================================== | 53%
|
|====================================== | 54%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 66%
|
|=============================================== | 68%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|===================================================== | 75%
|
|====================================================== | 77%
|
|======================================================= | 78%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|============================================================ | 86%
|
|============================================================= | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|===================================================================== | 98%
|
|======================================================================| 99%
|
|======================================================================| 100%
# Reformatting data
mydata <-
separate_wider_delim(mydata,
NAME,
delim = ", ",
names = c("County", "State"))
# Filtering data
mydata <- mydata %>%
filter(County == "Cheatham County"|
County == "Davidson County"|
County == "Dickson County"|
County == "Robertson County"|
County == "Rutherford County"|
County == "Sumner County"|
County == "Williamson County"|
County == "Wilson County")
# Mapping data
mapdata <- mydata %>%
rename(Estimate = Estimate_E, Estimate_MOE = Estimate_M)
mapdata <- st_as_sf(mapdata)
mapviewOptions(basemaps.color.shuffle = FALSE)
mapview(mapdata, zcol = "Estimate",
layer.name = "Estimate",
popup = TRUE)
# Exporting data in .csv format
CSVdata <- st_drop_geometry(mapdata)
write.csv(CSVdata, "mydata.csv", row.names = FALSE)
-I was able to conclude that there were 39 posts with video in them, yet the overall impressions garnered from videos or photos can be compared in similar fashions, with the overall difference being discernible that videos have not made a noticeable impression overall in the popularity of the team’s social media.
Part #2
# Install and load tidyverse
if (!require("tidyverse"))
install.packages("tidyverse")
library(tidyverse)
# Read the data
# NOTE: You may edit the URL to load a different dataset
mydata <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/SocialData.csv")
head(mydata,10)
## ID Type Impressions
## 1 1 Photo 695
## 2 2 Text 940
## 3 3 Photo 1196
## 4 4 Photo 936
## 5 5 Photo 1389
## 6 6 Photo 857
## 7 7 Text 797
## 8 8 Photo 1810
## 9 9 Photo 1086
## 10 10 Video 1416
# Specify the DV and IV
# NOTE: You may edit the FGP and Team variable names
mydata$DV <- mydata$Impressions
mydata$IV <- mydata$Type
# Graph the group distributions and averages
averages <- group_by(mydata, IV) %>%
summarise(mean = mean(DV, na.rm = TRUE))
ggplot(mydata, aes(x = DV)) +
geom_histogram() +
facet_grid(IV ~ .) +
geom_histogram(color = "black", fill = "#1f78b4") +
geom_vline(data = averages, aes(xintercept = mean, ))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Calculate and show the group counts, means, standard
# deviations, minimums, and maximums
group_by(mydata, IV) %>%
summarise(
count = n(),
mean = mean(DV, na.rm = TRUE),
sd = sd(DV, na.rm = TRUE),
min = min(DV, na.rm = TRUE),
max = max(DV, na.rm = TRUE))
## # A tibble: 3 × 6
## IV count mean sd min max
## <chr> <int> <dbl> <dbl> <int> <int>
## 1 Photo 58 1035. 297. 397 1810
## 2 Text 43 999. 278. 515 1746
## 3 Video 39 1370. 307. 829 1952
options(scipen = 999)
oneway.test(mydata$DV ~ mydata$IV,
var.equal = FALSE)
##
## One-way analysis of means (not assuming equal variances)
##
## data: mydata$DV and mydata$IV
## F = 19.119, num df = 2.000, denom df = 85.525, p-value = 0.000000137
# If the ANOVA detects significant difference, run
# this post-hoc procedure to learn which
# group pairs differed significantly.
anova_1 <- aov(mydata$DV ~ mydata$IV)
TukeyHSD(anova_1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mydata$DV ~ mydata$IV)
##
## $`mydata$IV`
## diff lwr upr p adj
## Text-Photo -36.35605 -176.6202 103.9081 0.8126345
## Video-Photo 334.87710 190.5414 479.2128 0.0000005
## Video-Text 371.23315 217.1076 525.3587 0.0000002
Part #3
# Load packages
if (!require("tidyverse")) install.packages("tidyverse")
if (!require("tidytext")) install.packages("tidytext")
## Loading required package: tidytext
## Warning: package 'tidytext' was built under R version 4.3.3
library(tidyverse)
library(tidytext)
# Read the data
mydata <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/WhiteHouse.csv")
# Extract individual words to a "tidytext" data frame
tidy_text <- mydata %>%
unnest_tokens(word,Full.Text,token="ngrams",n=2) %>%
count(word, sort = TRUE)
# Delete standard stop words
data("stop_words")
tidy_text <- tidy_text %>%
anti_join(stop_words)
## Joining with `by = join_by(word)`
# Delete custom stop words
my_stopwords <- tibble(word = c("https",
"t.co",
"rt"))
tidy_text <- tidy_text %>%
anti_join(my_stopwords)
## Joining with `by = join_by(word)`
# Define search terms and count items that include them
# "Biden" terms are used as an example
searchterms <- "health|care|debt"
mydata$HealthTerms <- ifelse(grepl(searchterms,
mydata$Full.Text,
ignore.case = TRUE),1,0)
sum(mydata$HealthTerms)
## [1] 857
sum(mydata$HealthTerms)/5508
## [1] 0.1555919
-For part #3 I decided to focus on the relevance of healthcare discussion in white house posts, with there being a mentioning of health, care & debt an overall collective 857 times whilst it is mentioned 15% overall in white house posts. This could be due to recent initiatives to improve health care or overall legislation updates, but this is how the health and overall wellness is being discussed