Source file ⇒ ~/Desktop/lec6.Rmd
For homeworks from now on please submit an html r-markdown file with .Rmd file embedded in it.
For example, to make .Rmd:
In Rstudio New File/R Markdown then From Template/ simple HTML.
To embed the source code .Rmd in your html document read my instructions in Wednesday’s lecture in the section heading Embedded Files within HTML files
To submit your html document save it on your computer. Then upload it to b-courses like you would upload any other document. You are allowed to upload multiple documents for your assignment. Don’t use RPubs since then other students can go to the RPubs website and see your solution.
We will introduce graphing in R using ggplot steps:
Today: chapters 5 and 6
The main purpose of a scatter plot is to show the relationship between two variables across several or many cases. Most often, there is a Cartesian coordinate system in which the x-axis represents one variable and the y-axis the value of a second variable.
Example: Consider the NHANES data giving medical and morphometric measurements of individual people. Here is a scatter plot showing the relationship between two variables: height and age.
Each dot is one case. The position of that dot signifies the value of the two variables for that case.
Your book uses the word glyph to describe the basic graphical unit that represents one case. A glyph means a mark or a symbol. A glyph in this example is a point.
scatterGraphHelper()
steps: 0. Make sure DataComputing is loaded 1. In the console type scatterGraphHelper(NHANES)
(it wont work in your r-markdown file since you need to interact with the function during compiing) 2. Map variables in NHANES to attributes (aesthetics) of our glyph (ex color or size)
Example:
map Age to x
map Height to y
map Sex to color
map Sex to facet
Warning: These interactive functions are very buggy so don’t fool around with it too much or you will crash Rstudio. In fact don’t use any of the other interactive tools in this chapter besides scatterGraph
Volcabulary:
glyph= graphical unit (point)
aesthetic= a visual property of the glyph (position, shape, color).
scale = the relationship between a variable and teh aesthetic to which it is mapped.
Age -> x
Height -> y
Sex ->color
frame = The position scale describing how data are mapped to x and y guide An indication for human viewers of the scale
#load the data table at http://tiny.cc/dcf/table-6.2.csv
my_table <-read.csv("http://tiny.cc/dcf/table-6-2.csv")
head(my_table)
## country gdp educ roadways net_users
## 1 Albania 9383.46 3.3 0.63 >35%
## 2 Algeria 7335.03 4.3 0.05 >5%
## 3 Angola 6904.82 3.5 0.04 >0%
## 4 Anguilla 10903.89 2.8 1.92 >15%
## 5 Antigua and Barbuda 17635.14 2.4 2.64 >60%
## 6 Argentina 17920.07 6.3 0.08 >15%
#scatterGraphHelper(my_table) write this in the console
# you decide what attributes you want your glyph to have
#show expression will show the ggplot command in the console which you can put in your r-markdown chunk if you want to put the graph in your report.
Make an interactive scatterplot with scatterGraphHelper() Answer:
A histogram shows how many cases fall into given ranges of the variable. For instance, here’s a histogram of heights from NHANES:
#not glyph ready
head(Minneapolis2013)
## Precinct First Second Third Ward
## 1 P-10 BETSY HODGES undervote undervote W-7
## 2 P-06 BOB FINE MARK ANDREW undervote W-10
## 3 P-09 KURTIS W. HANNA BOB FINE MIKE GOULD W-10
## 4 P-05 BETSY HODGES DON SAMUELS undervote W-13
## 5 P-01 DON SAMUELS undervote undervote W-5
## 6 P-04 undervote undervote undervote W-6
FirstPlaceTally <- Minneapolis2013 %>%
rename(candidate=First) %>%
group_by(candidate) %>%
summarise(total=n()) %>%
arrange( desc(total))
#glyph ready
FirstPlaceTally
## Source: local data frame [38 x 2]
##
## candidate total
## (chr) (int)
## 1 BETSY HODGES 28935
## 2 MARK ANDREW 19584
## 3 DON SAMUELS 8335
## 4 CAM WINTON 7511
## 5 JACKIE CHERRYHOMES 3524
## 6 BOB FINE 2094
## 7 DAN COHEN 1798
## 8 STEPHANIE WOODRUFF 1010
## 9 MARK V ANDERSON 975
## 10 undervote 834
## .. ... ...
There are many more (frequency plolygon, maps, networks)
Geom=Glyph
See: http://docs.ggplot2.org/current/
in class exercises