The data chosen for this exercise consist of one ESRI Shape file (Shelby.shp) and one CSV file (Data.csv). The shape file is of Shelby County census tract of 2018. The Data.csv contains population and median income data of each census track of the county. The data file also contains population of the White, African American, and Asian people in the census.
Before loading the data it is important to clear the global environment and the console to ensure minimum data storage within the environment. The following codes will clean the environment and the console.
rm(list = ls()) # Clears the global environment for a fresh start
cat('\f') # Cleans the console
This code allows to load multiple library at once.
my_libtrary <- c("sf", "dplyr", "dbplyr", "tibble", "ggplot2")
lapply(my_libtrary, library, character.only = TRUE)
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
##
## ident, sql
The shape file will be loaded using sf package and the data file will be loaded to R as tibble. The following codes will be executed to load both files as an object.
Shelby <- st_read("Data/Assignment_5/Shelby.shp")
## Reading layer `Shelby' from data source `C:\Rizwan\Education\UofM PhD Work\Seminar in Earth Sciences\R-Spatial\Data\Assignment_5\Shelby.shp' using driver `ESRI Shapefile'
## Simple feature collection with 221 features and 9 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -90.3103 ymin: 34.99419 xmax: -89.63278 ymax: 35.40948
## geographic CRS: NAD83
Data <- as_tibble(read.csv("Data/Assignment_5/Data.csv"))
The Shelby.shp is a simple feature(sf) class and the data type for it is data.frame. Summary shows the basic statistics for the variables with numeric or integer class in Shelby data frame.
str(Shelby)
## Classes 'sf' and 'data.frame': 221 obs. of 10 variables:
## $ STATEFP : chr "47" "47" "47" "47" ...
## $ COUNTYFP: chr "157" "157" "157" "157" ...
## $ TRACTCE : chr "021312" "021530" "021726" "022024" ...
## $ AFFGEOID: chr "1400000US47157021312" "1400000US47157021530" "1400000US47157021726" "1400000US47157022024" ...
## $ GEOID : chr "47157021312" "47157021530" "47157021726" "47157022024" ...
## $ NAME : chr "213.12" "215.30" "217.26" "220.24" ...
## $ LSAD : chr "CT" "CT" "CT" "CT" ...
## $ ALAND : num 1362074 10068590 1433803 2581685 1056567 ...
## $ AWATER : num 0 0 2132 0 45527 ...
## $ geometry:sfc_POLYGON of length 221; first list element: List of 1
## ..$ : num [1:26, 1:2] -89.8 -89.8 -89.8 -89.8 -89.8 ...
## ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:9] "STATEFP" "COUNTYFP" "TRACTCE" "AFFGEOID" ...
cat("Number of observastions in the data frame is: ", nrow(Shelby), sep = '')
## Number of observastions in the data frame is: 221
cat("Number of variables in the data frame is: ", ncol(Shelby), sep = '')
## Number of variables in the data frame is: 10
summary(Shelby)
## STATEFP COUNTYFP TRACTCE AFFGEOID
## Length:221 Length:221 Length:221 Length:221
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## GEOID NAME LSAD ALAND
## Length:221 Length:221 Length:221 Min. : 555457
## Class :character Class :character Class :character 1st Qu.: 1940071
## Mode :character Mode :character Mode :character Median : 3843270
## Mean : 8948641
## 3rd Qu.: 6099276
## Max. :138912108
## AWATER geometry
## Min. : 0 POLYGON :221
## 1st Qu.: 0 epsg:4269 : 0
## Median : 0 +proj=long...: 0
## Mean : 251458
## 3rd Qu.: 8845
## Max. :32514243
The Data.csv is loaded to R as tibble. The elaborated variable names are given below.
FID: Feature ID
STATEFP : Federal Information Processing Standards (FIPS) code for the state
COUNTYFP : Federal Information Processing Standards (FIPS) code for the county
GEO_ID : Geographical ID
OBJECTID : Object ID
ALAND : Area of land in sq. ft.
AWATER : Area of Water body in sq. ft.
Area_sqmil : Area of the census in sq. miles
Tot_Pop : Total population
Med_Inc : Median Income
Tot_Wht : Total White population
Tot_AA : Total African American population
Tot_Asi : Total Asian population
Pop_Den : Population density per sq. mile
PopDen_Wht : Population density of White population per sq. mile
PopDen_AA : Population density of African American population per sq. mile
PopDen_Asi : Population density of Asian population per sq. mile
str(Data)
## tibble [221 x 17] (S3: tbl_df/tbl/data.frame)
## $ FID : int [1:221] 0 1 2 3 4 5 6 7 8 9 ...
## $ STATEFP : int [1:221] 47 47 47 47 47 47 47 47 47 47 ...
## $ COUNTYFP : int [1:221] 157 157 157 157 157 157 157 157 157 157 ...
## $ GEO_ID : chr [1:221] "1400000US47157021312" "1400000US47157021530" "1400000US47157021726" "1400000US47157022024" ...
## $ OBJECTID : int [1:221] 1161 1176 1186 1201 1001 1006 1021 1029 1044 1059 ...
## $ ALAND : int [1:221] 1362074 10068590 1433803 2581685 1056567 2037629 1356159 1413184 1418277 1726745 ...
## $ AWATER : int [1:221] 0 0 2132 0 45527 154343 0 0 0 0 ...
## $ Area_sqmil: num [1:221] 0.528 3.886 0.554 0.994 0.421 ...
## $ Tot_Pop : int [1:221] 2108 5396 3486 3429 790 2286 2030 3138 2253 1402 ...
## $ Med_Inc : int [1:221] 54769 129375 26731 49667 14345 14299 45729 56111 20724 27344 ...
## $ Tot_Wht : int [1:221] 1902 4530 1012 167 0 42 882 2417 18 21 ...
## $ Tot_AA : int [1:221] 99 308 2423 3250 762 2244 857 595 2229 1379 ...
## $ Tot_Asian : int [1:221] 97 479 0 0 0 0 110 80 0 1 ...
## $ Pop_Den : num [1:221] 3990 1388 6294 3449 1877 ...
## $ PopDen_Wht: num [1:221] 3600 1166 1827 168 0 ...
## $ PopDen_AA : num [1:221] 187.4 79.3 4375.1 3269 1810.7 ...
## $ PopDen_Asi: num [1:221] 184 123 0 0 0 ...
cat("Number of variables in the data table is: ", ncol(Data), sep = '')
## Number of variables in the data table is: 17
cat("Number of observations in the data table is: ", nrow(Data), sep = '')
## Number of observations in the data table is: 221
cat("Number of NA value in the data table is: ", sum(is.na(Data)), sep = '')
## Number of NA value in the data table is: 0
summary(Data)
## FID STATEFP COUNTYFP GEO_ID OBJECTID
## Min. : 0 Min. :47 Min. :157 Length:221 Min. :1000
## 1st Qu.: 55 1st Qu.:47 1st Qu.:157 Class :character 1st Qu.:1055
## Median :110 Median :47 Median :157 Mode :character Median :1110
## Mean :110 Mean :47 Mean :157 Mean :1110
## 3rd Qu.:165 3rd Qu.:47 3rd Qu.:157 3rd Qu.:1165
## Max. :220 Max. :47 Max. :157 Max. :1220
## ALAND AWATER Area_sqmil Tot_Pop
## Min. : 555457 Min. : 0 Min. : 0.2163 Min. : 0
## 1st Qu.: 1940071 1st Qu.: 0 1st Qu.: 0.7730 1st Qu.: 2625
## Median : 3843270 Median : 0 Median : 1.4855 Median : 3958
## Mean : 8948641 Mean : 251458 Mean : 3.5522 Mean : 4240
## 3rd Qu.: 6099276 3rd Qu.: 8845 3rd Qu.: 2.4992 3rd Qu.: 5426
## Max. :138912108 Max. :32514243 Max. :66.1540 Max. :16377
## Med_Inc Tot_Wht Tot_AA Tot_Asian
## Min. : 0 Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.: 25296 1st Qu.: 167 1st Qu.: 787 1st Qu.: 0.0
## Median : 39375 Median : 845 Median :1800 Median : 23.0
## Mean : 48137 Mean : 1667 Mean :2270 Mean : 108.3
## 3rd Qu.: 65382 3rd Qu.: 2535 3rd Qu.:3425 3rd Qu.: 97.0
## Max. :165321 Max. :10300 Max. :8736 Max. :2219.0
## Pop_Den PopDen_Wht PopDen_AA PopDen_Asi
## Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.00
## 1st Qu.: 1722 1st Qu.: 122.9 1st Qu.: 456.9 1st Qu.: 0.00
## Median : 2993 Median : 432.5 Median :1473.2 Median : 16.44
## Mean : 3039 Mean :1008.1 Mean :1822.5 Mean : 52.86
## 3rd Qu.: 4197 3rd Qu.:1754.7 3rd Qu.:2922.1 3rd Qu.: 72.23
## Max. :11705 Max. :5276.3 Max. :9842.2 Max. :622.92
The shape file will be visualized using the <font color=‘red’)sf package.
plot(st_geometry(Shelby), main = "Shelby County Census Tract", cex.main = 2)
The data can be visualized using histogram. For example, the histogram of the population densities of the Shelby County census tract can visualized using following code.
par(mfrow=c(2,2))
hist(Data$Pop_Den, main = "Histogram of Pop_Den", xlab = "Population Density", col = "deepskyblue",
cex.main = 1.5, cex.lab = 1.5)
hist(Data$PopDen_Wht, main = "Histogram of PopDen_Wht", xlab = "Population Density", col = "deepskyblue",
cex.main = 1.5, cex.lab = 1.5)
hist(Data$PopDen_AA, main = "Histogram of PopDen_AA", xlab = "Population Density", col = "deepskyblue",
cex.main = 1.5, cex.lab = 1.5)
hist(Data$PopDen_Asi, main = "Histogram of PopDen_Asi", xlab = "Population Density", col = "deepskyblue",
cex.main = 1.5, cex.lab = 1.5)
Data can also be visualized using boxplot to check for potential outliers. For example, the outliers of the population densities can visualized using following code.
par(mfrow=c(2,2))
boxplot(Data$Pop_Den, main = "Boxplot of Pop_Den", col = "orange", cex.main = 1.5)
boxplot(Data$PopDen_Wht, main = "Boxplot of PopDen_Wht", col = "orange", cex.main = 1.5)
boxplot(Data$PopDen_AA, main = "Boxplot of PopDen_AA", col = "orange", cex.main = 1.5)
boxplot(Data$PopDen_Asi, main = "Boxplot of PopDen_Asi", col = "orange", cex.main = 1.5)
We will explore the correlation among the variables later for this data. The data will also be joined with the shape file later for final spatial analysis.