This document was written using R-Markdown, R-studio’s handy interface for working with R in a way that lets you see both your code and its output.An R-Markdown document works a lot like a Word document except that you can put in R commands AND see what they do. When you type normally without making a coding block or comment block, RMarkdown just prints what you type.
Before running the following code, make sure that the file `county.xlsx’ is in the same folder (directory) as this document. We will also need this to be the Working directory for R.You can do that with the following steps
“/Users/robincunningham/Documents/STOR 151 R Projects”
Make sure this .Rmd document is saved in that directory.
To see your current ‘working directory’ type getwd() in the Console below. Whatever directory you see, you can save a lot of trouble for the rest of the course by going to file explorer (on your PC or Mac) and making a new folder inside the one you see as your working directory. Call it “R Projects”
Having completed #3 above, go to the console below again and type ‘setwd(“R Projects”)’. This will be your working directory and you will have to move to it every time you open R.
Save this document to “R Projects” and put the county.xlsx document there. Now everything is all synched up and the hardest part of the day is over.
Now you should be able to run all of the code below by pressing the “Knit HTML” button above. Do that now, but don’t be shocked if an error appears! We will get it working.
# Load dataset
county_data = read.csv("county.csv")
The command read.csv( ) will read a dataset into R from your computer or from online. “csv” stands for “comma separated value”, a common file type where the data is listed in a text file, with variables separated by commas. For now, you don’t need to worry about the details of read.csv( ).
Now that we have the data set loaded, we will look at it in 3 different ways: 1. using the command head() which shows us the first 6 lines of the data set.
# Look at the dataset
head(county_data)
## name state pop2000 pop2010 fed_spend poverty homeownership
## 1 Autauga County Alabama 43671 54571 6.068095 10.6 77.5
## 2 Baldwin County Alabama 140415 182265 6.139862 12.2 76.7
## 3 Barbour County Alabama 29038 27457 8.752158 25.0 68.0
## 4 Bibb County Alabama 20826 22915 7.122016 12.6 82.9
## 5 Blount County Alabama 51024 57322 5.130910 13.4 82.0
## 6 Bullock County Alabama 11714 10914 9.973062 25.3 76.9
## multiunit income med_income
## 1 7.2 24568 53255
## 2 22.6 26469 50147
## 3 11.1 15875 33219
## 4 6.6 19918 41770
## 5 3.7 21070 45549
## 6 9.9 20289 31602
tail() shows you.# Look at dataset
head(county_data)
summary(county_data)
## name state pop2000
## Washington County: 30 Texas : 254 Min. : 67
## Jefferson County : 25 Georgia : 159 1st Qu.: 11211
## Franklin County : 24 Virginia: 134 Median : 24621
## Jackson County : 23 Kentucky: 120 Mean : 89627
## Lincoln County : 23 Missouri: 115 3rd Qu.: 61792
## Madison County : 19 Kansas : 105 Max. :9519338
## (Other) :2997 (Other) :2254
## pop2010 fed_spend poverty homeownership
## Min. : 82 Min. : 2.109 Min. : 0.0 Min. : 0.00
## 1st Qu.: 11119 1st Qu.: 6.970 1st Qu.:11.0 1st Qu.:69.50
## Median : 25887 Median : 8.673 Median :14.7 Median :74.60
## Mean : 98294 Mean : 10.003 Mean :15.5 Mean :73.27
## 3rd Qu.: 66861 3rd Qu.: 10.877 3rd Qu.:19.0 3rd Qu.:78.40
## Max. :9818605 Max. :204.616 Max. :53.5 Max. :91.30
##
## multiunit income med_income
## Min. : 0.00 Min. : 7772 Min. : 19351
## 1st Qu.: 6.10 1st Qu.:19030 1st Qu.: 36948
## Median : 9.70 Median :21765 Median : 42444
## Mean :12.32 Mean :22499 Mean : 44259
## 3rd Qu.:15.90 3rd Qu.:24801 3rd Qu.: 49120
## Max. :98.50 Max. :64381 Max. :115574
##
3. Ok, our last task in R today will be to make the plots from the slides in class. The first is the plot of Federal Spending per Capita versus Poverty Rate. We need to tell R which columns to choose for the plot. Here is the command:
plot(county_data$poverty,county_data$fed_spend)
plot(county_data$poverty,county_data$fed_spend, ylim = c(0,30), col="blue")
Note I put in the blue just to be fancy.
Revise the command above to come up with the command to produce the second plot from the slides.
plot(county_data$multiunit,county_data$homeownership)