At the beginning of the R script, we need to include all the neccesary libraries (packages) we will need for the assignment. R itself has tons of useful functions. But packages may provide more extensive functions. In this assignment, we will need to take use of library “{leaflet}”, “{magrittr}”, “{sf}”, “{geojsonio}” and “{graphics}”. Install them by using the “Install packages” tool in the “Tools” tab (see Figure 2).
During the package installation, if you are asked with,
{“Do you want to install from sources the package which needs compilation? (Yes/no/cancel)”}
You should type {no} !
(If you still encounter errors in installing packages, check you R version and update it to the latest.)
Execute the below three lines to include these packages for this R script. You don’t need to include all the packages you just installed, because some of them just serve as the base for other packages.
library(leaflet)
library(magrittr) #provide the support for %>% operator
library(graphics)
Before we start with real data, let’s first review the “vector” data type.
Now you are asked to create a vector v1, with 6 elements 0, 0.2, 0.4, 0.6, 0.8 and 1.0.
(Hint: use the “c()” operator.)
#Complete the below line
v1=c(0, 0.2, 0.4, 0.6, 0.8, 1.0)
#
print(v1)
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
Use an easy way to create a vector v2, each element of which is twice of that in v1, namely 0 \(\times\) 2, 0.2 \(\times\) 2, 0.4 \(\times\) 2, 0.6 \(\times\) 2, 0.8 \(\times\) 2 and 1.0 \(\times\) 2.
#Complete the below line
v2=v1*2
#
print(v2)
## [1] 0.0 0.4 0.8 1.2 1.6 2.0
Then, you are given two numeric variables a and b
a=2
b=3
You are asked to create a vector v3, the first element of which is a and second is b.
#Complete the below line
v3= c(a, b)
#
print(v3)
## [1] 2 3
Create a vector v4, the first part of which is v3 and second is v2
#Complete the below line
v4= c(v3, v2)
#
Print out v4 to see if its element values are as expected.
print(v4)
## [1] 2.0 3.0 0.0 0.4 0.8 1.2 1.6 2.0
Import US_data.xls into R, using the “import dataset” tool located on the top left side of R studio (See Figure 3).
library(readxl)
US_data <- read_excel('US_data.xls')
## Error: `path` does not exist: 'US_data.xls'
After importing the Excel file, you will see a dataframe named US_data in the variable viewer. Use {summary()} function to examine the dataframe “US_data”.
#Type you code in the blank
summary(US_data)
## Error in summary(US_data): object 'US_data' not found
#
Answer questions :
{1. How many rows and columns are there in this dataframe? What are the column names?} Type your answer here: there are 53 rows and 6 column. Column Names are ‘State’, ‘Average_Income’, ‘High_School_graduate’, ‘Bachelor_degree’, ‘Advanced_degree’, ‘Population’
{2. What are the data types of Average_Income and Population, respectively?} Type your answer here: it is a ‘chr’ and ‘num’
{3. What is the median of the population for all the states?} Type your answer here: 4,298,483
‘$’ operator can be used to extract certain column of a data frame, for instance,
US_data$High_school_graduate
## Error in eval(expr, envir, enclos): object 'US_data' not found
This command extracts only the “High_school_graduate” column of the “US_data”. It is equivalent to,
US_data['High_school_graduate']
## Error in eval(expr, envir, enclos): object 'US_data' not found
Since the ‘High_school_graduate’ column is the third column of the dataframe, it is also equivalent to ,
US_data[3]
## Error in eval(expr, envir, enclos): object 'US_data' not found
Now we can do some basic analysis on the ‘High_school_graduate’ data, for instance, getting the mean of it.
mean(US_data$High_school_graduate)
## Error in mean(US_data$High_school_graduate): object 'US_data' not found
Now its your turn.
{4. Calculate the total population of USA based on the “Population” column of “US_data”} and store the results in variable US_total_population.
(Hint: Use {sum()} function.)
#Complete the below line
US_total_population <- sum(US_data$Population)
## Error in eval(expr, envir, enclos): object 'US_data' not found
#
Now you’ve obtained the total population and it is stored in the variable “US_total_population”. We can print it out by.
print(US_total_population)
## Error in print(US_total_population): object 'US_total_population' not found
Now let’s make a simple pie chart.
pie(x=US_data$Population,label=US_data$State)
## Error in pie(x = US_data$Population, label = US_data$State): object 'US_data' not found
{5. Try to understand the this line. What information is this pie chart giving?} Type your answer here: It is a pie chart of the US population per State
“$” operator can also be used to add a column to a dataframe. For instance, the below code calculates the percentage of population getting high school degree in each state and add the results as a new column named “High_school_percent” to US_data.
US_data$High_school_percent<- US_data$High_school_graduate/US_data$Population*100
## Error in eval(expr, envir, enclos): object 'US_data' not found
Now you are asked to add two more columns “Bachelor_percent” and “Advanced_percent” which are the percentages of population getting bachelors degree and advanced degree.
#Complete the below lines
US_data$Bachelor_percent<- US_data$Bachelor_degree/US_data$Population*100
## Error in eval(expr, envir, enclos): object 'US_data' not found
US_data$Advanced_percent<- US_data$Advanced_degree/US_data$Population*100
## Error in eval(expr, envir, enclos): object 'US_data' not found
#
print(US_data$Bachelor_percent)
## Error in print(US_data$Bachelor_percent): object 'US_data' not found
print(US_data$Advanced_percent)
## Error in print(US_data$Advanced_percent): object 'US_data' not found
{6. Will you be able to calculate the average income for the whole US with the “Average_Income” column?} {Why or why not?} Type your answer here: no because the Average_Income column is a ‘character’ class which cannot be divided.
You don’t need to calculate it at this time. First, we print the column out.
US_data$Average_Income
## Error in eval(expr, envir, enclos): object 'US_data' not found
If we want the unit of income to be in thousand dollar, we can simply divide the income column by 1000,
US_data$Average_Income/1000
## Error in eval(expr, envir, enclos): object 'US_data' not found
But R gives error on it. Think about it and answer the question 6.
Then we start to convert the format of “Average_Income” column.
We see that each “number” in “Average_Income” column has a dollar sign ‘$’ ahead of it and comma “,” in the midlle, so R recognize it as “character”(“string”) rather than numeric. To convert all the “charaters” to numeric, we need to remove the dollar sign and comma, by the below commands.
# *
US_data$Average_Income <- gsub('\\$', '', US_data$Average_Income)
## Error in is.factor(x): object 'US_data' not found
US_data$Average_Income <- gsub(',', '', US_data$Average_Income)
## Error in is.factor(x): object 'US_data' not found
By far you don’t need to fully understand all the codes, but you can try to understand them.
{7. Now, what is the data type of “Average_Income”?} (Hint, use {class()} function to examine the “Average_Income” column) Type your answer here: it is still a ‘chr’
#Type you code in below blank
class(Average_Income)
## Error in eval(expr, envir, enclos): object 'Average_Income' not found
#
It seems that we need one more step to make the column “Average_Income” numeric.
{Before running the below line, make sure you have executed the above two lines with “gsub” function. Otherwise, you will ruin the “Average_Income” column and need to re-run the part from section III to VII. }
US_data$Average_Income <- as.numeric(US_data$Average_Income)
## Error in eval(expr, envir, enclos): object 'US_data' not found
{8. Now, examine the data type of “Average_Income”.}
#Type you code in below blank
class(Average_Income)
## Error in eval(expr, envir, enclos): object 'Average_Income' not found
#
{9. Like what you did in GIS lab, you are asked to create a scatterplot showing the relationship}
{between average income and educational level}.
(Hint: Since we don’t have an indicator for educational level, we can use the percentage of population getting bachelor’s degree, namely the “Bachelor_percent” column, as the indicator.)
(Hint: we can take use of the plot function “plot(x, y)”, where x can be the education level and y can be the average income.)
#Type you code in below blank
plot(Bachelor_percent, Average_Income)
## Error in plot(Bachelor_percent, Average_Income): object 'Bachelor_percent' not found
#
{10. Calculate the average income for the whole US.}
(Hint: it is not just taking an average for the income column.)
(Hint: in the equation, you can take use of the variable “US_total_population” you created previously.)
#Complete the below line
Average_income_US= (Sum(US_data$Average_Income)/US_total_population)
## Error in Sum(US_data$Average_Income): could not find function "Sum"
#
print(Average_income_US)
## Error in print(Average_income_US): object 'Average_income_US' not found
Now we start to make the data visable on map!
First, we plan to create a US maps, with states different colors representing different income levels.
Run below code to import the geodata file “us-states.json”, and store the geo-data in the variable “states_geodata’
# *
states_geodata <- geojsonio::geojson_read("us-states.json", what = "sp")
## Error in loadNamespace(x): there is no package called 'geojsonio'
If R gives the error: File does not exist. You need to set the working directory to the folder where contains the “us-states.json” file.
If R gives the error on “cannot find package named ‘geojsonio’”, it’s because you have not installed it yet. Try to install it now.
Then, we create a variable “income_map” as the media containing the income level map
# *
income_map <- leaflet(states_geodata) %>% #attach geodata to the
setView(-96, 37.8, 4) %>% #set the mapview range to US
addTiles()
## Error in structure(list(options = options), leafletData = data): object 'states_geodata' not found
income_map #show the map
## Error in eval(expr, envir, enclos): object 'income_map' not found
If R cannot recognize “%>%”, it’s because you forget to include the “magrittr” library for this script. Go to the beginning of the R script and run the code “{library(magrittr)}”
By far you should only see a map of US without any other information, because we have not attached any social economical data to it.
Before we attach the income data, we first need to classify them. The average income for each state is a unique number. So without classification, each state will have a unique color on the map, making the map messy. To make the map readable, we need to make it with less classes, say 8.
There are many ways to make the classification. Here, we use percentile. We will use the function {quantile()} to classify “Average_Income” column of US_data
Then, let’s look at the function {quantile()}.
{quantile(x,probs)} may take two arguments. {x} stands for the data yet to be classified. {probs} is a vector containing the pre-set percentiles.
To illustrate, we create a vector x, which has 20 elements, each of which follows a uniform distribution from 0 to 10.
x=runif(20,0,10)
x
## [1] 1.79847464 1.40054574 4.04084308 3.08491994 3.93530496 9.39953881
## [7] 6.01121072 6.65332570 0.09038669 1.20785121 9.00933184 4.05568370
## [13] 3.96800874 8.54257795 3.54522259 1.79769108 0.17058120 3.38778815
## [19] 2.52780022 7.75589800
If we want to get the 20% and 60% percentile of x, we should call the function,
quantile(x,c(0.2,0.6))
## 20% 60%
## 1.718262 3.997142
If we want to equally divide x into 4 parts respect to percentiles, then we should call
quantile(x,c(0.00,0.25,0.5,0.75,1.00))
## 0% 25% 50% 75% 100%
## 0.09038669 1.79827875 3.74026377 6.17173947 9.39953881
If we want to equally divide US_data$Average_Income into 8 parts respect to percentiles, how should we write the code?
#Replace the ???? with correct code
bins_income <- quantile(US_data$Average_Income, ????)
#
print(bins_income)
## Error: <text>:2:53: unexpected ')'
## 1: #Replace the ???? with correct code
## 2: bins_income <- quantile(US_data$Average_Income, ????)
## ^
Now the Average_Income has been classified into 8 categories and the quantile information has been stored in the variable “bins_income”, with which we can then assign colors to each of the 8 classes and attach the data to the map.
{11. Run the below codes to generate the income level map.} You may try to understand this part if you are interested in it. It is not mandatory.
# *
pal_income <- colorBin("YlOrRd", #use a pre-defined color ramp
domain = US_data$Average_Income, #the data used to assign colors
bins = bins_income) #
## Error in colorBin("YlOrRd", domain = US_data$Average_Income, bins = bins_income): object 'US_data' not found
# *
income_map %>% addPolygons( #add Polygons to the maps with below information
fillColor = ~pal_income(US_data$Average_Income), #colored income data
weight = 2, #weight of polygon (state) boundaries
opacity = 1, #opacity of polygon (state) boundaries
color = "white", #color of polygon (state) boundaries
dashArray = "3", #line type of polygon (state) boundaries
fillOpacity = 0.7 #Opacity of filled state colors
)
## Error in getMapData(map): object 'income_map' not found
{{12.}} (Bonus part, extra 5 points)
{ Create a US map showing the percentages of population with highschool or higher degree of each state.}
Some requirements:
The percentages of population should be divided into 5 classes.
The color of the boundary of the states should be “orange”.
The weight of the boundary of the states should be 3.
You just need to replace all the ????? to correct codes below.
#Replace the ????? with correct code
edu_map <- leaflet(states_geodata) %>% #attach geodata to the
setView(-96, 37.8, 4) %>% #set the map range to US
addTiles()
bins_edu <- quantile(?????, ?????)
pal_edu <- colorBin("YlGnBu", #use a pre-defined color ramp
domain = ?????, #the data used to assign colors
bins = bins_edu)
edu_map %>% addPolygons(
fillColor = ~pal_edu(?????),
weight = ?????,
opacity = 1,
color = ?????,
dashArray = "3",
fillOpacity = 0.7)
#
## Error: <text>:7:27: unexpected ','
## 6:
## 7: bins_edu <- quantile(?????,
## ^
#The End