library(tidyverse)
## ── Attaching packages ────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tidycensus)
library(mapview)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
census_api_key("b5e3d2da685c019db5e7c341c3949706ef5df120",install = TRUE, overwrite = TRUE)
## Your original .Renviron will be backed up and stored in your R HOME directory if needed.
## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY").
## To use now, restart R or run `readRenviron("~/.Renviron")`
## [1] "b5e3d2da685c019db5e7c341c3949706ef5df120"
readRenviron("~/.Renviron")
While English is the most spoken language in the United States of America, Spanish has been on the rise for several decades. Spanish translations can be seen in public transportion signs, movies, and product manuals. As a result, English proficiency, while beneficial, is not absolutely necessary for residents in the United States. In addition, services such as Google Translate enable interactions between English and non-English speakers. The popularity of the Spanish language, along with the advancements in natural language processing, have made it possible for many Spanish speakers to live, commute, work, and socialize in the United States with little English proficiency.
nj_pop <-
get_acs(geography = "county",
variables = "B01003_001",
state = "NJ",
geometry = TRUE)
## Getting data from the 2013-2017 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========= | 15%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|==================== | 32%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|============================================== | 72%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================= | 95%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
mapview(nj_pop, zcol = "estimate")
The map above looks at each New Jersey county and distinguishes them according to population. Bergen County,Essex County and Middlesex County are the top three most populated counties in the state. Heavily populated areas tend to be the ones with the most cultural diversity.
NEWJERSEY <- get_acs(geography = "state", year = 2017,
variables = c(Argentina = "B05006_149", Uruguay = "B05006_157", Chile = "B05006_152", Brazil = "B05006_151", Bolivia = "B05006_150", Colombia = "B05006_153", Ecuador = "B05006_154", Peru = "B05006_156", Venezuela = "B05006_158"), state = 34)
## Getting data from the 2013-2017 5-year ACS
ggplot(NEWJERSEY, aes(x = reorder(variable, -estimate), y =estimate / 1000, color = variable, fill = variable))+
theme(axis.text.x=element_text(angle=90,hjust=1)) +
geom_col()+
ggtitle("PLACE OF BIRTH FOR THE \n FOREIGN-BORN POPULATION IN NJ")+
xlab("South American Spanish-Speaking Countries")+
ylab("Population (In Thousands)")
The graph above looks at NJ’s foreign-born population. This graph particularly focuses on those who were born in South American counties. The highest population represented comes from Colombia, while Bolivia has the lowest population in the state. Although the survey responders were born in these countries, it does not necessarily mean that English is not their native language or that they speak Spanish as their first language. This graph soley intends to take a rough look at the foreign-born Spanish-speaking population in New Jersey and determine if any patterns emerge in the next graph, which looks at the US foreign-born population. In addition, the graph does not include any Central American countries, as its intention is to focus on South America.
US <- get_acs(geography = "us", year = 2017,
variables = c(Argentina = "B05006_149", Uruguay = "B05006_157", Chile = "B05006_152", Brazil = "B05006_151", Bolivia = "B05006_150", Colombia = "B05006_153", Ecuador = "B05006_154", Peru = "B05006_156", Venezuela = "B05006_158"))
## Getting data from the 2013-2017 5-year ACS
ggplot(US, aes(x = reorder(variable, -estimate), y =estimate / 1000, color = variable, fill = variable))+
theme(axis.text.x=element_text(angle=90,hjust=1)) +
geom_col()+
ggtitle("PLACE OF BIRTH FOR THE FOREIGN-BORN \n POPULATION IN THE UNITED STATES")+
xlab("South American Spanish-Speaking Countries")+
ylab("Population (In Thousands)")
In the graph above we can see that some of the patterns from the previous graph remain pretty consistant. The Colombian population remains the highest in the country, while Peru and Ecuador remain in the top three. Bolivia and Chile remain on the lower end of the spectrum, while Uruguay becomes the country with the least population in the US. These findings are pretty consistent with the overall populations of the countries themselves. Colombia has a population of over 49 million, while Uruguay has only about 3 million.
englishprof <- get_acs(geography = "state", year = 2017,state = c("texas","new jersey", "new york","california","florida"),
variables = c(very_well = "B06007_036", not_very_well = "B06007_037")) %>%
filter(estimate > 4000)
## Getting data from the 2013-2017 5-year ACS
ggplot() + geom_bar(aes(y = (estimate/1000000), x = reorder(NAME,-estimate), fill = variable), data = englishprof, stat="identity")+
theme(axis.text.x=element_text(angle=90,hjust=1))+
ggtitle("English Proficiency by State \n (Highest Population of Spanish-Speakers)")+
xlab("State")+
ylab("Population (Millions)")
The graph above takes a look at Spanish Speakers in the five states with the highest Spanish population. The pink portion of the graph demonstrates the amount of people who report speaking English “not very well” while the blue portion represents the amount of people who report speaking English “very well”. While it may seem that California has a much higher percentage of people who do not speak english very well, this graph is somewhat skewed due to the much higher population in California.
test <- get_acs(geography = "state", year = 2017,state = c("new jersey", "texas","california","florida", "new york"),variables = c(very_well = "B06007_036", not_very_well = "B06007_037"), summary_var = "B06007_035")
## Getting data from the 2013-2017 5-year ACS
head(test)
## # A tibble: 6 x 7
## GEOID NAME variable estimate moe summary_est summary_moe
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 06 California very_well 1510026 8503 5101945 21037
## 2 06 California not_very_well 3591919 18790 5101945 21037
## 3 12 Florida very_well 827649 9324 2255733 11882
## 4 12 Florida not_very_well 1428084 11787 2255733 11882
## 5 34 New Jersey very_well 231473 3565 712758 6574
## 6 34 New Jersey not_very_well 481285 5854 712758 6574
test %>%
mutate(pctvs = 100 * (estimate / summary_est)) %>%
ggplot() + geom_col(aes(x= NAME, y = pctvs, fill = variable,position="fill", stat="identity"))+
geom_text(aes (label = round(pctvs,2),y = pctvs, x = NAME), vjust = 1.5, color = "white") +
scale_fill_manual(values = c("#a1d99b", "#31a354")) +
ggtitle("English Proficiency by State \n (Highest Population of Spanish-Speakers)")+
xlab("State")+
ylab("Percentage")
## Warning: Ignoring unknown aesthetics: position, stat
Although there are over 3 million Spanish-Speaking people in California who report having a low English Proficiency, the ratio of low proficiency to overall population of foreign born speakers is fairly consistent throughout all five states. Roughly anywhere from 63 - 71% of the Spanish-Speaking population in these states reports a low English proficiency.
test1 <- get_acs(geography = "state", year = 2017,state = c("maine", "south dakota","north dakota","vermont", "west virginia"),variables = c(high_proficiency = "B06007_036",low_proficiency = "B06007_037"), summary_var = "B06007_035")
## Getting data from the 2013-2017 5-year ACS
test1 %>%
mutate(pctv = 100 * (estimate / summary_est)) %>%
ggplot() + geom_col(aes(x= NAME, y = pctv, fill = variable,position="fill", stat="identity"))+
labs(y="Percentage", x="State")+
scale_fill_manual(values = c("#a1d99b", "#31a354")) +
ggtitle("English Proficiency by State \n (Lowest Population of Spanish Speakers)")
## Warning: Ignoring unknown aesthetics: position, stat
When looking at the states with the lowest numbers of Spanish-Speakers, the percentages change. The amount of people who report speaking English very well increases from about 30% to an average of 40%, (with Vermont having over 65% of the population speaking English very well). This could potentially mean that when Spanish-speakers are not residing in a state with a large hispanic population, there is a stronger need to become proficient in English in order to interact with the community. While this may not be true for South Dakota, the number of immigrants here is the lowest of all the states, so there may be a lack of data contributed to the census from this sector.
test2 <- get_acs(geography = "county", year = 2017,state = c("new jersey"),variables = c(High_Proficiency = "B06007_036", Low_Proficiency = "B06007_037"), summary_var = "B06007_035", geometry = TRUE)
## Getting data from the 2013-2017 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
test2 %>%
mutate(percentage = 100 * (estimate / summary_est)) %>%
ggplot(aes(fill = percentage)) +
facet_wrap(~variable) +
geom_sf(color = NA) +
coord_sf(crs = 26915) +
ggtitle("English Proficiency by County (NJ)")+
scale_fill_viridis_c()
Here we take a closer look at New Jersey’s Spanish-Speaking population. The left map takes a look at each county and how each Spanish-Speaking resident reported their English proficiency. The counties in a lighter blue shade have higher proficiency, while the darker counties have a lower English proficiency. None of the counties in New Jersey report over 60% English proficiency – which aligns with the findings from the previous look at New Jersey, where about 32% of the Spanish-Speaking residents speak English “very well”. This once again demonstates that in a state such as New Jersey, where there is a high population of Spanish speaking immigrants, English proficiency is not prevalent.
#Conclusion In the United States, Spanish is the second most popular language after English. As a result, there are many resources available to Spanish speakers. In states with higher populations of Spanish speakers, approximately 30% report a high English proficiency. In the states with the lowest population of Spanish speakers, there is approximately a 10% increase in English proficiency among Spanish speakers. These areas with low Spanish-speaking populations may not have a strong need to accomodate this group through Spanish translations and even Spanish customer service options. As a result, the Spanish speakers in these areas might feel a stronger urge to learn English in order to communicate with their community.