This document was written using R-Markdown, R-studio’s handy interface for working with R in a way that lets you see both your code and its output.An R-Markdown document works a lot like a Word document except that you can put in R commands AND see what they do. When you type normally without making a coding block or comment block, RMarkdown just prints what you type.
Before using RMarkdown, we will just run some commands in the console today, just to get an idea how R works.
First, we have to make sure that all files we need are in the directory that R is using … it is called the ‘working directory’. We will do this with the followign steps:
“/Users/robincunningham/Documents/STOR 151 R Projects”
To see your current ‘working directory’ type getwd() in the Console below. Whatever directory you see, you can save a lot of trouble for the rest of the course by going to file explorer (on your PC or Mac) and making a new folder inside the one you see as your working directory. Call it “R Projects”
Having completed #2 above, go to the console below again and type ‘setwd(“R Projects”)’. This will be your working directory and you will have to move to it every time you open R.
Put County.csv in “R Projects” in 2 steps:
Now everything is all synched up and the hardest part of the day is over.
Enter the line of code you see below in the console to load and name the dataset. You can skip anything with a ‘#’ symbol in front, those are just comment lines and R ignores them.
# Load dataset
county_data = read.csv("county.csv")
The command read.csv( ) will read a dataset into R from your computer or from online. “csv” stands for “comma separated value”, a common file type where the data is listed in a text file, with variables separated by commas. For now, you don’t need to worry about the details of read.csv( ).
Now that we have the data set loaded, we will look at it in 3 different ways: 1. using the command head() which shows us the first 6 lines of the data set.
# Look at the dataset
head(county_data)
## name state pop2000 pop2010 fed_spend poverty homeownership
## 1 Autauga County Alabama 43671 54571 6.068095 10.6 77.5
## 2 Baldwin County Alabama 140415 182265 6.139862 12.2 76.7
## 3 Barbour County Alabama 29038 27457 8.752158 25.0 68.0
## 4 Bibb County Alabama 20826 22915 7.122016 12.6 82.9
## 5 Blount County Alabama 51024 57322 5.130910 13.4 82.0
## 6 Bullock County Alabama 11714 10914 9.973062 25.3 76.9
## multiunit income med_income
## 1 7.2 24568 53255
## 2 22.6 26469 50147
## 3 11.1 15875 33219
## 4 6.6 19918 41770
## 5 3.7 21070 45549
## 6 9.9 20289 31602
tail() shows you.
2. Another way to look at a data set is using the ‘summary()’ function.
summary(county_data)
## name state pop2000
## Washington County: 30 Texas : 254 Min. : 67
## Jefferson County : 25 Georgia : 159 1st Qu.: 11211
## Franklin County : 24 Virginia: 134 Median : 24621
## Jackson County : 23 Kentucky: 120 Mean : 89627
## Lincoln County : 23 Missouri: 115 3rd Qu.: 61792
## Madison County : 19 Kansas : 105 Max. :9519338
## (Other) :2997 (Other) :2254
## pop2010 fed_spend poverty homeownership
## Min. : 82 Min. : 2.109 Min. : 0.0 Min. : 0.00
## 1st Qu.: 11119 1st Qu.: 6.970 1st Qu.:11.0 1st Qu.:69.50
## Median : 25887 Median : 8.673 Median :14.7 Median :74.60
## Mean : 98294 Mean : 10.003 Mean :15.5 Mean :73.27
## 3rd Qu.: 66861 3rd Qu.: 10.877 3rd Qu.:19.0 3rd Qu.:78.40
## Max. :9818605 Max. :204.616 Max. :53.5 Max. :91.30
##
## multiunit income med_income
## Min. : 0.00 Min. : 7772 Min. : 19351
## 1st Qu.: 6.10 1st Qu.:19030 1st Qu.: 36948
## Median : 9.70 Median :21765 Median : 42444
## Mean :12.32 Mean :22499 Mean : 44259
## 3rd Qu.:15.90 3rd Qu.:24801 3rd Qu.: 49120
## Max. :98.50 Max. :64381 Max. :115574
##
3. Ok, our last task in R today will be to make the plots from the slides in class. The first is the plot of Federal Spending per Capita versus Poverty Rate. We need to tell R which columns to choose from the county_data dataset. Here is the command, please take note of how we specify the columns!
plot(county_data$poverty,county_data$fed_spend)
plot(county_data$poverty,county_data$fed_spend, ylim = c(0,30), col="blue")
Note I put in the blue just to be fancy.
Revise the command above to come up with the command to produce the second plot from the slides.
plot()