In this R Notebook, the data, which was gathered from the website of the US Census Bureau, will be analyzed and visualized. The aim of this R notebook is to provide insight to the reader about the Languages Spoken at Home in Passaic, NJ (2015). In order to achieve this, the reader will be provided with charts and data related to the dataset throughout this notebook.
The languages_in_Passaic dataset, which stands for
Languages Spoken at Home in Passaic, NJ (2015),
provides us insight into the languages spoken at home in Passaic, NJ.
The data was collected from the American
Community Survey (ACS), which includes data from 2011 to 2015. In
order to provide simple and clean data visualizations, only four
languages were chosen for this assignment. The chosen languages are as
follows:
Additional Information: French and French Creole are considered two different languages.
## Simple feature collection with 500 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.50288 ymin: 40.81996 xmax: -74.10679 ymax: 41.20341
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME variable
## 1 34031124601 Census Tract 1246.01, Passaic County, New Jersey B16001_002
## 2 34031124601 Census Tract 1246.01, Passaic County, New Jersey B16001_003
## 3 34031124601 Census Tract 1246.01, Passaic County, New Jersey B16001_006
## 4 34031124601 Census Tract 1246.01, Passaic County, New Jersey B16001_009
## 5 34031124601 Census Tract 1246.01, Passaic County, New Jersey B16001_012
## 6 34031175200 Census Tract 1752, Passaic County, New Jersey B16001_002
## 7 34031175200 Census Tract 1752, Passaic County, New Jersey B16001_003
## 8 34031175200 Census Tract 1752, Passaic County, New Jersey B16001_006
## 9 34031175200 Census Tract 1752, Passaic County, New Jersey B16001_009
## 10 34031175200 Census Tract 1752, Passaic County, New Jersey B16001_012
## estimate moe geometry
## 1 1510 259 MULTIPOLYGON (((-74.16573 4...
## 2 1549 349 MULTIPOLYGON (((-74.16573 4...
## 3 6 13 MULTIPOLYGON (((-74.16573 4...
## 4 0 12 MULTIPOLYGON (((-74.16573 4...
## 5 43 37 MULTIPOLYGON (((-74.16573 4...
## 6 372 141 MULTIPOLYGON (((-74.1178 40...
## 7 3796 513 MULTIPOLYGON (((-74.1178 40...
## 8 1 2 MULTIPOLYGON (((-74.1178 40...
## 9 0 17 MULTIPOLYGON (((-74.1178 40...
## 10 0 17 MULTIPOLYGON (((-74.1178 40...
The first visualization that is going to be utilized is a bar plot,
created through using the function geom_bar( ). The purpose
of the bar plot is to display and compare categorical data, which are
languages, population, and census tract. It shows the differences in how
many people speak a different language other than English at home.
Through the bar plot, the distribution of chosen languages can be seen, such as English, Spanish, French (incl. Patois, Cajun) French Creole, and Italian. For the readability of the charts, the languages will be color-coded.
## Getting data from the 2011-2015 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
The data illustrates that in the Passaic county of New Jersey, the most spoken language at home was English in 2015. The second most spoken language was Spanish, followed by French, French Creole, and Italian. In order to prevent the overlapping of the data, thinner lines were utilized.
The second plot label is a histogram. Through the function
geom_histogram( ) the second plot label was created. While
“Frequency” data was plotted on the y-axis, “Population Estimate” data
was plotted on the x-axis. Additionally, “Mean Estimate” legend was
added to the right bottom of the chart.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The third label is a scatter plot, which was created by using the
function geom_point( ), as can be seen below. Through the
utilization of the scatter plot for the data, a cleaner visualization of
data was achieved. It is easier for the audience to see the distribution
of the languages spoken at home in Passaic County.
In order to prevent overlapping in this scatter plot, several
functions were utilized. For example, the alpha function
was used in order to adjust the transparency. Additionally, the
position_jitterdodge function was utilized to align points
generated through the geom_point( ) function.
While doing this assignment, I realized that I should not have only studied on DataCamp; I should have also studied on a different dataset while I was completing the lessons there. I realized that I was incapable of reading data and visualizing them, so I had to use my notes very often. Therefore, I will review all the notes that I took while studying on DataCamp and will try to run each code on various datasets to make sure that I actually understand and memorize what I am doing.
I also realized that I could understand the codes, which made me quite happy. Even though I still have trouble writing complex codes from scratch, it feels amazing to be able to read and understand them As programming is a ‘language’, I know that I am at the step where I say “I understand, but I can’t speak”, just like when I said while I was learning English during 5th Grade. I was sad as I thought I was not making any progress, but I was. Even though I feel like I am not making any progress right now, I know that I am and it motivates me.