This document was written using R-Markdown, R-studio’s handy interface for working with R in a way that lets you see both your code and its output.An R-Markdown document works a lot like a Word document except that you can put in R commands AND see what they do. When you type normally without making a coding block or comment block, RMarkdown just prints what you type.

Our first R Code Box

Before running the following code, make sure that the file `county.xlsx’ is in the same folder (directory) as this document. We will also need this to be the Working directory for R.You can do that with the following steps

  1. Make a directory where you will do all of your R work this semester. I will use

“/Users/robincunningham/Documents/STOR 151 R Projects”

  1. Make sure this .Rmd document is saved in that directory.

  2. To see your current ‘working directory’ type getwd() in the Console below. Whatever directory you see, you can save a lot of trouble for the rest of the course by going to file explorer (on your PC or Mac) and making a new folder inside the one you see as your working directory. Call it “R Projects”

  3. Having completed #3 above, go to the console below again and type ‘setwd(“R Projects”)’. This will be your working directory and you will have to move to it every time you open R.

  4. Save this document to “R Projects” and put the county.xlsx document there. Now everything is all synched up and the hardest part of the day is over.




Now you should be able to run all of the code below by pressing the “Knit HTML” button above. Do that now, but don’t be shocked if an error appears! We will get it working.

# Load dataset
county_data = read.csv("county.csv")

The command read.csv( ) will read a dataset into R from your computer or from online. “csv” stands for “comma separated value”, a common file type where the data is listed in a text file, with variables separated by commas. For now, you don’t need to worry about the details of read.csv( ).

Now that we have the data set loaded, we will look at it in 3 different ways: 1. using the command head() which shows us the first 6 lines of the data set.

# Look at the dataset
head(county_data)
##             name   state pop2000 pop2010 fed_spend poverty homeownership
## 1 Autauga County Alabama   43671   54571  6.068095    10.6          77.5
## 2 Baldwin County Alabama  140415  182265  6.139862    12.2          76.7
## 3 Barbour County Alabama   29038   27457  8.752158    25.0          68.0
## 4    Bibb County Alabama   20826   22915  7.122016    12.6          82.9
## 5  Blount County Alabama   51024   57322  5.130910    13.4          82.0
## 6 Bullock County Alabama   11714   10914  9.973062    25.3          76.9
##   multiunit income med_income
## 1       7.2  24568      53255
## 2      22.6  26469      50147
## 3      11.1  15875      33219
## 4       6.6  19918      41770
## 5       3.7  21070      45549
## 6       9.9  20289      31602
  1. How many variables does the data set have?
  2. Which variables are categorical and which are numerical
  3. Guess what the command tail() shows you.
  4. Look at the following code box and notice that there is no output from it in the RMarkdown document when you ‘Knit HTML’. Note what is different from the previous code box. (You will have to look at the .Rmd file to see what is different.)
# Look at dataset
head(county_data)
  1. So using ‘eval = FALSE’ gives us a way to look at R commands without the output.


  1. Another way to look at a data set is using the ‘summary()’ function.
summary(county_data)
##                 name           state         pop2000       
##  Washington County:  30   Texas   : 254   Min.   :     67  
##  Jefferson County :  25   Georgia : 159   1st Qu.:  11211  
##  Franklin County  :  24   Virginia: 134   Median :  24621  
##  Jackson County   :  23   Kentucky: 120   Mean   :  89627  
##  Lincoln County   :  23   Missouri: 115   3rd Qu.:  61792  
##  Madison County   :  19   Kansas  : 105   Max.   :9519338  
##  (Other)          :2997   (Other) :2254                    
##     pop2010          fed_spend          poverty     homeownership  
##  Min.   :     82   Min.   :  2.109   Min.   : 0.0   Min.   : 0.00  
##  1st Qu.:  11119   1st Qu.:  6.970   1st Qu.:11.0   1st Qu.:69.50  
##  Median :  25887   Median :  8.673   Median :14.7   Median :74.60  
##  Mean   :  98294   Mean   : 10.003   Mean   :15.5   Mean   :73.27  
##  3rd Qu.:  66861   3rd Qu.: 10.877   3rd Qu.:19.0   3rd Qu.:78.40  
##  Max.   :9818605   Max.   :204.616   Max.   :53.5   Max.   :91.30  
##                                                                    
##    multiunit         income        med_income    
##  Min.   : 0.00   Min.   : 7772   Min.   : 19351  
##  1st Qu.: 6.10   1st Qu.:19030   1st Qu.: 36948  
##  Median : 9.70   Median :21765   Median : 42444  
##  Mean   :12.32   Mean   :22499   Mean   : 44259  
##  3rd Qu.:15.90   3rd Qu.:24801   3rd Qu.: 49120  
##  Max.   :98.50   Max.   :64381   Max.   :115574  
## 
  1. List 3 facts you can tell are true from the summary of the data set.



3. Ok, our last task in R today will be to make the plots from the slides in class. The first is the plot of Federal Spending per Capita versus Poverty Rate. We need to tell R which columns to choose for the plot. Here is the command:

plot(county_data$poverty,county_data$fed_spend)

  1. As you can see, a couple of the county have really high federal spending which is squishing up our picture of the data. We can see that it is the same picture by setting limits on the y-axis:
plot(county_data$poverty,county_data$fed_spend, ylim = c(0,30), col="blue")

  1. Note I put in the blue just to be fancy.

  2. Revise the command above to come up with the command to produce the second plot from the slides.

plot(county_data$multiunit,county_data$homeownership)