Florian Oswald
2018-09-22
data("mpg",package="ggplot2")
dim(mpg)
[1] 234 11
head(mpg)
manufacturer model displ year cyl trans drv cty hwy fl class
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact
3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact
4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact
tail gives you the last rows.names gives the column names.str:str(mpg)
Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
$ manufacturer: chr "audi" "audi" "audi" "audi" ...
$ model : chr "a4" "a4" "a4" "a4" ...
$ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
$ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
$ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
$ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
$ drv : chr "f" "f" "f" "f" ...
$ cty : int 18 21 20 21 16 18 18 18 16 20 ...
$ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
$ fl : chr "p" "p" "p" "p" ...
$ class : chr "compact" "compact" "compact" "compact" ...
data.frame.mean(x): the average of all values in x.median: the value \( x_j \) below and above which 50% of the values in x lie.x <- c(1,2,2,2,2,100)
mean(x)
[1] 18.16667
mean(x) == sum(x) / length(x)
[1] TRUE
median(x)
[1] 2
var(x)
[1] 1607.367
all.equal(var(x), sum((x - mean(x))^2) / (length(x)-1))
[1] TRUE
range isrange(x)
[1] 1 100
table(x) is a useful function that counts the occurence of each unique value in x:table(x)
x
1 2 100
1 4 1
table(mpg$trans)
auto(av) auto(l3) auto(l4) auto(l5) auto(l6) auto(s4)
5 2 83 39 6 3
auto(s5) auto(s6) manual(m5) manual(m6)
3 16 58 19
table produces a contingency table:table(mpg$trans,mpg$drv)
4 f r
auto(av) 0 5 0
auto(l3) 0 2 0
auto(l4) 34 37 12
auto(l5) 29 8 2
auto(l6) 2 2 2
auto(s4) 2 1 0
auto(s5) 1 2 0
auto(s6) 7 8 1
manual(m5) 21 33 4
manual(m6) 7 8 4
prop.table, we can get proportions:prop.table(table(mpg$trans,mpg$drv),margin=2)
R base plotting is fairly good.ggplot2. We'll see both.histogram counts how many obserations fall within a certain bin.hist(mpg$cty)
hist(mpg$cty, xlab = "Miles Per Gallon (City)", main = "Histogram of MPG (City)", breaks = 12, col = "red",border = "blue")
head(mpg[,c("hwy","displ")])
hwy displ
1 29 1.8
2 29 1.8
3 31 2.0
4 30 2.0
5 26 2.8
6 26 2.8
plot(hwy ~ displ, data = mpg)
Time for our first tutorial!! Type this into your RStudio console:
library(ScPoEconometrics)
runTutorial('chapter2')
The relevant section in the book is mandatory reading.
library(ScPoEconometrics)
runTutorial('correlation')
library(tidyr) # also loads library(tibble)
data(mpg,package = "ggplot2") # data from the ggplot2 package
mpg
# A tibble: 234 x 11
manufacturer model displ year cyl trans drv cty hwy fl cla…
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <ch>
1 audi a4 1.8 1999 4 auto… f 18 29 p com…
2 audi a4 1.8 1999 4 manu… f 21 29 p com…
3 audi a4 2 2008 4 manu… f 20 31 p com…
4 audi a4 2 2008 4 auto… f 21 30 p com…
5 audi a4 2.8 1999 6 auto… f 16 26 p com…
6 audi a4 2.8 1999 6 manu… f 18 26 p com…
7 audi a4 3.1 2008 6 auto… f 18 27 p com…
8 audi a4 q… 1.8 1999 4 manu… 4 18 26 p com…
9 audi a4 q… 1.8 1999 4 auto… 4 16 25 p com…
10 audi a4 q… 2 2008 4 manu… 4 20 28 p com…
# ... with 224 more rows
# mpg[row condition, col condition]
mpg[mpg$hwy > 35, c("manufacturer", "model", "year")]
# A tibble: 6 x 3
manufacturer model year
<chr> <chr> <int>
1 honda civic 2008
2 honda civic 2008
3 toyota corolla 2008
4 volkswagen jetta 1999
5 volkswagen new beetle 1999
6 volkswagen new beetle 1999
library(dplyr)
mpg %>% # %>% is the "pipe" operator
filter(hwy > 35) %>% # takes output and puts into next function
select(manufacturer, model, year)
# A tibble: 6 x 3
manufacturer model year
<chr> <chr> <int>
1 honda civic 2008
2 honda civic 2008
3 toyota corolla 2008
4 volkswagen jetta 1999
5 volkswagen new beetle 1999
6 volkswagen new beetle 1999
# as such, equivalent to
select(filter(mpg, hwy > 35), manufacturer, model, year)