Updated on Wed May 16 15:34:25 2018.
This section will guide you in the process of decoding your data into information and ultimately intelligible insights. In doing so, we will explore the use of tidyverse and R base packages.
When working with a new data what initial questions do you have?
Consider the following questions to guide your understanding.
Once you have this basic understanding of your data you can dig deeper. Then you can use visualization techniques to explore your data and derive some basic understandings of the phenomena you are studying, such as the largest and smallest values for each variable. In addition, calculating summary statistics translate data into information by revealing the shape of the data, the mean, median, minimum value, maximum value, and variability all with simple visualizations.
For any data science project there are few simple steps to follow.
Using the World internet usage data we will compare of read.csv to read_csv for importing data.
utils package using read.csv()
library(utils)
internet_utils <- read.csv("world_internet_usage.csv")
head(internet_utils)
## country X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## 1 China 1.78 2.64 4.60 6.20 7.30 8.52 10.52 16.00
## 2 Mexico 5.08 7.04 11.90 12.90 14.10 17.21 19.52 20.81
## 3 Panama 6.55 7.27 8.52 9.99 11.14 11.48 17.35 22.29
## 4 Senegal 0.40 0.98 1.01 2.10 4.39 4.79 5.61 7.70
## 5 Singapore 36.00 41.67 47.00 53.84 62.00 61.00 59.00 69.90
## 6 United Arab Emirates 23.63 26.27 28.32 29.48 30.13 40.00 52.00 61.00
## X2008 X2009 X2010 X2011 X2012
## 1 22.60 28.90 34.30 38.30 42.30
## 2 21.71 26.34 31.05 34.96 38.42
## 3 33.82 39.08 40.10 42.70 45.20
## 4 10.60 14.50 16.00 17.50 19.20
## 5 69.00 69.00 71.00 71.00 74.18
## 6 63.00 64.00 68.00 78.00 85.00
Use readr to import the data
library(readr)
internet_readr <- read_csv("world_internet_usage.csv")
## Parsed with column specification:
## cols(
## country = col_character(),
## `2000` = col_double(),
## `2001` = col_double(),
## `2002` = col_double(),
## `2003` = col_double(),
## `2004` = col_double(),
## `2005` = col_double(),
## `2006` = col_double(),
## `2007` = col_double(),
## `2008` = col_double(),
## `2009` = col_double(),
## `2010` = col_double(),
## `2011` = col_double(),
## `2012` = col_double()
## )
head(internet_readr)
## # A tibble: 6 x 14
## country `2000` `2001` `2002` `2003` `2004` `2005` `2006` `2007` `2008`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 China 1.78 2.64 4.60 6.20 7.30 8.52 10.5 16.0 22.6
## 2 Mexico 5.08 7.04 11.9 12.9 14.1 17.2 19.5 20.8 21.7
## 3 Panama 6.55 7.27 8.52 9.99 11.1 11.5 17.4 22.3 33.8
## 4 Senegal 0.400 0.980 1.01 2.10 4.39 4.79 5.61 7.70 10.6
## 5 Singapore 36.0 41.7 47.0 53.8 62.0 61.0 59.0 69.9 69.0
## 6 United A… 23.6 26.3 28.3 29.5 30.1 40.0 52.0 61.0 63.0
## # ... with 4 more variables: `2009` <dbl>, `2010` <dbl>, `2011` <dbl>,
## # `2012` <dbl>
Select the second row, first column.
internet_readr[[2,1]]
## [1] "Mexico"
internet_utils [2,1] # double [[ ]] works too
## [1] Mexico
## 7 Levels: China Mexico Panama Senegal Singapore ... United States
Extract the variable “country”
internet_readr$country
## [1] "China" "Mexico" "Panama"
## [4] "Senegal" "Singapore" "United Arab Emirates"
## [7] "United States"
internet_utils$country
## [1] China Mexico Panama
## [4] Senegal Singapore United Arab Emirates
## [7] United States
## 7 Levels: China Mexico Panama Senegal Singapore ... United States
An alternative using the infix function
#to use with infix function add a .
internet_readr %>% .$country
## [1] "China" "Mexico" "Panama"
## [4] "Senegal" "Singapore" "United Arab Emirates"
## [7] "United States"
Rename columns first to remove the X in front of each year.
names(internet_utils) <-c("country", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012")
names(internet_utils)
## [1] "country" "2000" "2001" "2002" "2003" "2004" "2005"
## [8] "2006" "2007" "2008" "2009" "2010" "2011" "2012"
Reshape a data frame
library(reshape2)
internet_utils_reshaped <- melt(internet_utils,id.vars="country", variable.name="year", value.name="usage")
internet_utils_reshaped
## country year usage
## 1 China 2000 1.78
## 2 Mexico 2000 5.08
## 3 Panama 2000 6.55
## 4 Senegal 2000 0.40
## 5 Singapore 2000 36.00
## 6 United Arab Emirates 2000 23.63
## 7 United States 2000 43.08
## 8 China 2001 2.64
## 9 Mexico 2001 7.04
## 10 Panama 2001 7.27
## 11 Senegal 2001 0.98
## 12 Singapore 2001 41.67
## 13 United Arab Emirates 2001 26.27
## 14 United States 2001 49.08
## 15 China 2002 4.60
## 16 Mexico 2002 11.90
## 17 Panama 2002 8.52
## 18 Senegal 2002 1.01
## 19 Singapore 2002 47.00
## 20 United Arab Emirates 2002 28.32
## 21 United States 2002 58.79
## 22 China 2003 6.20
## 23 Mexico 2003 12.90
## 24 Panama 2003 9.99
## 25 Senegal 2003 2.10
## 26 Singapore 2003 53.84
## 27 United Arab Emirates 2003 29.48
## 28 United States 2003 61.70
## 29 China 2004 7.30
## 30 Mexico 2004 14.10
## 31 Panama 2004 11.14
## 32 Senegal 2004 4.39
## 33 Singapore 2004 62.00
## 34 United Arab Emirates 2004 30.13
## 35 United States 2004 64.76
## 36 China 2005 8.52
## 37 Mexico 2005 17.21
## 38 Panama 2005 11.48
## 39 Senegal 2005 4.79
## 40 Singapore 2005 61.00
## 41 United Arab Emirates 2005 40.00
## 42 United States 2005 67.97
## 43 China 2006 10.52
## 44 Mexico 2006 19.52
## 45 Panama 2006 17.35
## 46 Senegal 2006 5.61
## 47 Singapore 2006 59.00
## 48 United Arab Emirates 2006 52.00
## 49 United States 2006 68.93
## 50 China 2007 16.00
## 51 Mexico 2007 20.81
## 52 Panama 2007 22.29
## 53 Senegal 2007 7.70
## 54 Singapore 2007 69.90
## 55 United Arab Emirates 2007 61.00
## 56 United States 2007 75.00
## 57 China 2008 22.60
## 58 Mexico 2008 21.71
## 59 Panama 2008 33.82
## 60 Senegal 2008 10.60
## 61 Singapore 2008 69.00
## 62 United Arab Emirates 2008 63.00
## 63 United States 2008 74.00
## 64 China 2009 28.90
## 65 Mexico 2009 26.34
## 66 Panama 2009 39.08
## 67 Senegal 2009 14.50
## 68 Singapore 2009 69.00
## 69 United Arab Emirates 2009 64.00
## 70 United States 2009 71.00
## 71 China 2010 34.30
## 72 Mexico 2010 31.05
## 73 Panama 2010 40.10
## 74 Senegal 2010 16.00
## 75 Singapore 2010 71.00
## 76 United Arab Emirates 2010 68.00
## 77 United States 2010 74.00
## 78 China 2011 38.30
## 79 Mexico 2011 34.96
## 80 Panama 2011 42.70
## 81 Senegal 2011 17.50
## 82 Singapore 2011 71.00
## 83 United Arab Emirates 2011 78.00
## 84 United States 2011 77.86
## 85 China 2012 42.30
## 86 Mexico 2012 38.42
## 87 Panama 2012 45.20
## 88 Senegal 2012 19.20
## 89 Singapore 2012 74.18
## 90 United Arab Emirates 2012 85.00
## 91 United States 2012 81.03
Reshape a tibble
internet_readr_reshaped <- melt(internet_readr,id.vars="country", variable.name="year", value.name="usage")
internet_readr_reshaped
## country year usage
## 1 China 2000 1.78
## 2 Mexico 2000 5.08
## 3 Panama 2000 6.55
## 4 Senegal 2000 0.40
## 5 Singapore 2000 36.00
## 6 United Arab Emirates 2000 23.63
## 7 United States 2000 43.08
## 8 China 2001 2.64
## 9 Mexico 2001 7.04
## 10 Panama 2001 7.27
## 11 Senegal 2001 0.98
## 12 Singapore 2001 41.67
## 13 United Arab Emirates 2001 26.27
## 14 United States 2001 49.08
## 15 China 2002 4.60
## 16 Mexico 2002 11.90
## 17 Panama 2002 8.52
## 18 Senegal 2002 1.01
## 19 Singapore 2002 47.00
## 20 United Arab Emirates 2002 28.32
## 21 United States 2002 58.79
## 22 China 2003 6.20
## 23 Mexico 2003 12.90
## 24 Panama 2003 9.99
## 25 Senegal 2003 2.10
## 26 Singapore 2003 53.84
## 27 United Arab Emirates 2003 29.48
## 28 United States 2003 61.70
## 29 China 2004 7.30
## 30 Mexico 2004 14.10
## 31 Panama 2004 11.14
## 32 Senegal 2004 4.39
## 33 Singapore 2004 62.00
## 34 United Arab Emirates 2004 30.13
## 35 United States 2004 64.76
## 36 China 2005 8.52
## 37 Mexico 2005 17.21
## 38 Panama 2005 11.48
## 39 Senegal 2005 4.79
## 40 Singapore 2005 61.00
## 41 United Arab Emirates 2005 40.00
## 42 United States 2005 67.97
## 43 China 2006 10.52
## 44 Mexico 2006 19.52
## 45 Panama 2006 17.35
## 46 Senegal 2006 5.61
## 47 Singapore 2006 59.00
## 48 United Arab Emirates 2006 52.00
## 49 United States 2006 68.93
## 50 China 2007 16.00
## 51 Mexico 2007 20.81
## 52 Panama 2007 22.29
## 53 Senegal 2007 7.70
## 54 Singapore 2007 69.90
## 55 United Arab Emirates 2007 61.00
## 56 United States 2007 75.00
## 57 China 2008 22.60
## 58 Mexico 2008 21.71
## 59 Panama 2008 33.82
## 60 Senegal 2008 10.60
## 61 Singapore 2008 69.00
## 62 United Arab Emirates 2008 63.00
## 63 United States 2008 74.00
## 64 China 2009 28.90
## 65 Mexico 2009 26.34
## 66 Panama 2009 39.08
## 67 Senegal 2009 14.50
## 68 Singapore 2009 69.00
## 69 United Arab Emirates 2009 64.00
## 70 United States 2009 71.00
## 71 China 2010 34.30
## 72 Mexico 2010 31.05
## 73 Panama 2010 40.10
## 74 Senegal 2010 16.00
## 75 Singapore 2010 71.00
## 76 United Arab Emirates 2010 68.00
## 77 United States 2010 74.00
## 78 China 2011 38.30
## 79 Mexico 2011 34.96
## 80 Panama 2011 42.70
## 81 Senegal 2011 17.50
## 82 Singapore 2011 71.00
## 83 United Arab Emirates 2011 78.00
## 84 United States 2011 77.86
## 85 China 2012 42.30
## 86 Mexico 2012 38.42
## 87 Panama 2012 45.20
## 88 Senegal 2012 19.20
## 89 Singapore 2012 74.18
## 90 United Arab Emirates 2012 85.00
## 91 United States 2012 81.03
class(internet_readr_reshaped) # turns into a data.frame!
## [1] "data.frame"
Use the gather function to reshape
tidy_internet_readr <-
internet_readr %>%
gather(`2000`,`2001`,`2002`,`2003`,`2004`,`2005`,`2006`,`2007`,`2008`,`2009`,`2010`,`2011`,`2012`, key="year", value="usage")
tidy_internet_readr
## # A tibble: 91 x 3
## country year usage
## <chr> <chr> <dbl>
## 1 China 2000 1.78
## 2 Mexico 2000 5.08
## 3 Panama 2000 6.55
## 4 Senegal 2000 0.400
## 5 Singapore 2000 36.0
## 6 United Arab Emirates 2000 23.6
## 7 United States 2000 43.1
## 8 China 2001 2.64
## 9 Mexico 2001 7.04
## 10 Panama 2001 7.27
## # ... with 81 more rows
Create a few statistical visualizations to understand the makeup of your data.
Build a boxplot
boxplot(internet_readr$`2000`, main="Range of internet users in 2000", sub="Median of 6.55 users per 100 people", col="#999999", frame=FALSE, las=1)
Build multiple box plots
boxplot(internet_readr[,2:14], main="Range of internet users per 100 people", col="#999999", frame=FALSE, las=1)
Build a single histogram
hist(internet_readr$`2000`, main="Frequency of internet users in 2000 per 100 people", xlab="Year: 2000", col="#999999", border="#FFFFFF", label=TRUE, breaks=6, las=1
)
Build a percentage (rather than count) histogram
library(lattice)
histogram(internet_readr$`2000`, main="Frequency of internet users in 2000 per 100 people", xlab="Year: 2000", col="#999999", border="#FFFFFF")
***
Build a histogram matrix
histogram(~ usage | year, data=tidy_internet_readr, layout=c(4,4), main="Histogram matrix: 2000-2012", col="#999999", border="#FFFFFF", xlab="Usage")
Re-arrange the years
h <-histogram(~tidy_internet_readr$usage|tidy_internet_readr$year,col="#999999",breaks=5,layout=c(3,5), xlab="Usage", main="Histogram matrix: 2000-2011")
update (h, index.cond=list(c(10:12, 7:9, 4:6, 1:3)))
Rearrange the years and show all years
tidy_internet_readr$year<-as.character(tidy_internet_readr$year)
h <-histogram(~tidy_internet_readr$usage|tidy_internet_readr$year,col="#999999", xlab="Usage", main="Histogram matrix: 2000-2012", breaks=5,layout=c(4,4))
update(h, index.cond=list(c(10:13, 6:9, 2:5, 1)))
Build column chart using ggplot
library(ggplot2)
g <- ggplot(tidy_internet_readr,aes(tidy_internet_readr$year, tidy_internet_readr$usage))
g + geom_col() + theme_few() + labs(title = "Internet Usage per 100 people", x = "Year",y ="Usage")
Create charts and reports.
Create a presentation ready line chart using ggplot and apply a ggtheme.
library(ggthemes)
library(ggplot2)
ggplot(tidy_internet_readr,aes(x=year,y=usage,colour=country,group=country)) + geom_line() + labs(title = "Internet Usage per 100 people", subtitle = "Since 2011, the UAE has surpassed Singapore and the US in internet users", caption = "Source: World Bank, 2013",x = "Year",y ="Usage") + theme_few()
Create a new markdown document and publish it with the graph you created above.
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.
For more details on using R Markdown see http://rmarkdown.rstudio.com.
This section will introduce control structures such as the while loop, for loop, if/else conditional statements, and functions.
Create a while loop
x <- 10
while (x > 0) {
print(x)
x <- x - 1
}
## [1] 10
## [1] 9
## [1] 8
## [1] 7
## [1] 6
## [1] 5
## [1] 4
## [1] 3
## [1] 2
## [1] 1
Using a variable as a counter
counter <- 0
while (counter < 9) {
print(counter)
counter <- counter + 1 }
## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
Iterate through an array of numbers using a for loop
for (i in c(1,2,3,4)){
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
Iterate through a column in the bikeshare data
for (i in bikeshare$atemp){
print(i)
}
## [1] 0.363625
## [1] 0.353739
## [1] 0.189405
## [1] 0.212122
## [1] 0.22927
## [1] 0.233209
## [1] 0.208839
## [1] 0.162254
## [1] 0.116175
## [1] 0.150888
## [1] 0.191464
## [1] 0.160473
## [1] 0.150883
## [1] 0.188413
## [1] 0.248112
## [1] 0.234217
## [1] 0.176771
## [1] 0.232333
## [1] 0.298422
## [1] 0.25505
## [1] 0.157833
## [1] 0.0790696
## [1] 0.0988391
## [1] 0.11793
## [1] 0.234526
## [1] 0.2036
## [1] 0.2197
## [1] 0.223317
## [1] 0.212126
## [1] 0.250322
## [1] 0.18625
## [1] 0.23453
## [1] 0.254417
## [1] 0.177878
## [1] 0.228587
## [1] 0.243058
## [1] 0.291671
## [1] 0.303658
## [1] 0.198246
## [1] 0.144283
## [1] 0.149548
## [1] 0.213509
## [1] 0.232954
## [1] 0.324113
## [1] 0.39835
## [1] 0.254274
## [1] 0.3162
## [1] 0.428658
## [1] 0.511983
## [1] 0.391404
## [1] 0.27733
## [1] 0.284075
## [1] 0.186033
## [1] 0.245717
## [1] 0.289191
## [1] 0.350461
## [1] 0.282192
## [1] 0.351109
## [1] 0.400118
## [1] 0.263879
## [1] 0.320071
## [1] 0.200133
## [1] 0.255679
## [1] 0.378779
## [1] 0.366252
## [1] 0.238461
## [1] 0.3024
## [1] 0.286608
## [1] 0.385668
## [1] 0.305
## [1] 0.32575
## [1] 0.380091
## [1] 0.332
## [1] 0.318178
## [1] 0.36693
## [1] 0.410333
## [1] 0.527009
## [1] 0.466525
## [1] 0.32575
## [1] 0.409735
## [1] 0.440642
## [1] 0.337939
## [1] 0.270833
## [1] 0.256312
## [1] 0.257571
## [1] 0.250339
## [1] 0.257574
## [1] 0.292908
## [1] 0.29735
## [1] 0.257575
## [1] 0.283454
## [1] 0.315637
## [1] 0.378767
## [1] 0.542929
## [1] 0.39835
## [1] 0.387608
## [1] 0.433696
## [1] 0.324479
## [1] 0.341529
## [1] 0.426737
## [1] 0.565217
## [1] 0.493054
## [1] 0.417283
## [1] 0.462742
## [1] 0.441913
## [1] 0.425492
## [1] 0.445696
## [1] 0.503146
## [1] 0.489258
## [1] 0.564392
## [1] 0.453892
## [1] 0.321954
## [1] 0.450121
## [1] 0.551763
## [1] 0.5745
## [1] 0.594083
## [1] 0.575142
## [1] 0.578929
## [1] 0.497463
## [1] 0.464021
## [1] 0.448204
## [1] 0.532833
## [1] 0.582079
## [1] 0.40465
## [1] 0.441917
## [1] 0.474117
## [1] 0.512621
## [1] 0.518933
## [1] 0.525246
## [1] 0.522721
## [1] 0.5284
## [1] 0.523363
## [1] 0.4943
## [1] 0.500629
## [1] 0.536
## [1] 0.550512
## [1] 0.538529
## [1] 0.527158
## [1] 0.510742
## [1] 0.529042
## [1] 0.571975
## [1] 0.5745
## [1] 0.590296
## [1] 0.604813
## [1] 0.615542
## [1] 0.654688
## [1] 0.637008
## [1] 0.612379
## [1] 0.61555
## [1] 0.671092
## [1] 0.725383
## [1] 0.720967
## [1] 0.643942
## [1] 0.587133
## [1] 0.594696
## [1] 0.616804
## [1] 0.621858
## [1] 0.65595
## [1] 0.727279
## [1] 0.757579
## [1] 0.703292
## [1] 0.678038
## [1] 0.643325
## [1] 0.601654
## [1] 0.591546
## [1] 0.587754
## [1] 0.595346
## [1] 0.600383
## [1] 0.643954
## [1] 0.645846
## [1] 0.595346
## [1] 0.637646
## [1] 0.693829
## [1] 0.693833
## [1] 0.656583
## [1] 0.643313
## [1] 0.637629
## [1] 0.637004
## [1] 0.692558
## [1] 0.654688
## [1] 0.637008
## [1] 0.652162
## [1] 0.667308
## [1] 0.668575
## [1] 0.665417
## [1] 0.696338
## [1] 0.685633
## [1] 0.686871
## [1] 0.670483
## [1] 0.664158
## [1] 0.690025
## [1] 0.729804
## [1] 0.739275
## [1] 0.689404
## [1] 0.635104
## [1] 0.624371
## [1] 0.638263
## [1] 0.669833
## [1] 0.703925
## [1] 0.747479
## [1] 0.74685
## [1] 0.826371
## [1] 0.840896
## [1] 0.804287
## [1] 0.794829
## [1] 0.720958
## [1] 0.696979
## [1] 0.690667
## [1] 0.7399
## [1] 0.785967
## [1] 0.728537
## [1] 0.729796
## [1] 0.703292
## [1] 0.707071
## [1] 0.679937
## [1] 0.664788
## [1] 0.656567
## [1] 0.676154
## [1] 0.715292
## [1] 0.703283
## [1] 0.724121
## [1] 0.684983
## [1] 0.651521
## [1] 0.654042
## [1] 0.645858
## [1] 0.624388
## [1] 0.616167
## [1] 0.645837
## [1] 0.666671
## [1] 0.662258
## [1] 0.633221
## [1] 0.648996
## [1] 0.675525
## [1] 0.638254
## [1] 0.606067
## [1] 0.630692
## [1] 0.645854
## [1] 0.659733
## [1] 0.635556
## [1] 0.647959
## [1] 0.607958
## [1] 0.594704
## [1] 0.611121
## [1] 0.614921
## [1] 0.604808
## [1] 0.633213
## [1] 0.665429
## [1] 0.625646
## [1] 0.5152
## [1] 0.544229
## [1] 0.555361
## [1] 0.578946
## [1] 0.607962
## [1] 0.609229
## [1] 0.60213
## [1] 0.603554
## [1] 0.6269
## [1] 0.553671
## [1] 0.461475
## [1] 0.478512
## [1] 0.490537
## [1] 0.529675
## [1] 0.532217
## [1] 0.550533
## [1] 0.554963
## [1] 0.522125
## [1] 0.564412
## [1] 0.572637
## [1] 0.589042
## [1] 0.574525
## [1] 0.575158
## [1] 0.574512
## [1] 0.544829
## [1] 0.412863
## [1] 0.345317
## [1] 0.392046
## [1] 0.472858
## [1] 0.527138
## [1] 0.480425
## [1] 0.504404
## [1] 0.513242
## [1] 0.523983
## [1] 0.542925
## [1] 0.546096
## [1] 0.517717
## [1] 0.551804
## [1] 0.529675
## [1] 0.498725
## [1] 0.503154
## [1] 0.510725
## [1] 0.522721
## [1] 0.513848
## [1] 0.466525
## [1] 0.423596
## [1] 0.425492
## [1] 0.422333
## [1] 0.457067
## [1] 0.463375
## [1] 0.472846
## [1] 0.457046
## [1] 0.318812
## [1] 0.227913
## [1] 0.321329
## [1] 0.356063
## [1] 0.397088
## [1] 0.390133
## [1] 0.405921
## [1] 0.403392
## [1] 0.323854
## [1] 0.362358
## [1] 0.400871
## [1] 0.412246
## [1] 0.409079
## [1] 0.373721
## [1] 0.306817
## [1] 0.357942
## [1] 0.43055
## [1] 0.524612
## [1] 0.507579
## [1] 0.451988
## [1] 0.323221
## [1] 0.272721
## [1] 0.324483
## [1] 0.457058
## [1] 0.445062
## [1] 0.421696
## [1] 0.430537
## [1] 0.372471
## [1] 0.380671
## [1] 0.385087
## [1] 0.4558
## [1] 0.490122
## [1] 0.451375
## [1] 0.311221
## [1] 0.305554
## [1] 0.331433
## [1] 0.310604
## [1] 0.3491
## [1] 0.393925
## [1] 0.4564
## [1] 0.400246
## [1] 0.256938
## [1] 0.317542
## [1] 0.266412
## [1] 0.253154
## [1] 0.270196
## [1] 0.301138
## [1] 0.338362
## [1] 0.412237
## [1] 0.359825
## [1] 0.249371
## [1] 0.245579
## [1] 0.280933
## [1] 0.396454
## [1] 0.428017
## [1] 0.426121
## [1] 0.377513
## [1] 0.299242
## [1] 0.279961
## [1] 0.315535
## [1] 0.327633
## [1] 0.279974
## [1] 0.263892
## [1] 0.318812
## [1] 0.414121
## [1] 0.375621
## [1] 0.252304
## [1] 0.126275
## [1] 0.119337
## [1] 0.278412
## [1] 0.340267
## [1] 0.390779
## [1] 0.340258
## [1] 0.247479
## [1] 0.318826
## [1] 0.282821
## [1] 0.381938
## [1] 0.249362
## [1] 0.183087
## [1] 0.161625
## [1] 0.190663
## [1] 0.364278
## [1] 0.275254
## [1] 0.190038
## [1] 0.220958
## [1] 0.174875
## [1] 0.16225
## [1] 0.243058
## [1] 0.349108
## [1] 0.294821
## [1] 0.35605
## [1] 0.415383
## [1] 0.326379
## [1] 0.272721
## [1] 0.262625
## [1] 0.381317
## [1] 0.466538
## [1] 0.398971
## [1] 0.309346
## [1] 0.272725
## [1] 0.264521
## [1] 0.296426
## [1] 0.361104
## [1] 0.266421
## [1] 0.261988
## [1] 0.293558
## [1] 0.210867
## [1] 0.101658
## [1] 0.227913
## [1] 0.333946
## [1] 0.351629
## [1] 0.330162
## [1] 0.351629
## [1] 0.355425
## [1] 0.265788
## [1] 0.273391
## [1] 0.295113
## [1] 0.392667
## [1] 0.444446
## [1] 0.410971
## [1] 0.255675
## [1] 0.268308
## [1] 0.357954
## [1] 0.353525
## [1] 0.34847
## [1] 0.475371
## [1] 0.359842
## [1] 0.413492
## [1] 0.303021
## [1] 0.241171
## [1] 0.255042
## [1] 0.3851
## [1] 0.524604
## [1] 0.397083
## [1] 0.277767
## [1] 0.35967
## [1] 0.459592
## [1] 0.542929
## [1] 0.548617
## [1] 0.532825
## [1] 0.436229
## [1] 0.505046
## [1] 0.464
## [1] 0.532821
## [1] 0.538533
## [1] 0.513258
## [1] 0.531567
## [1] 0.570067
## [1] 0.486733
## [1] 0.437488
## [1] 0.43875
## [1] 0.315654
## [1] 0.47095
## [1] 0.482304
## [1] 0.375621
## [1] 0.421708
## [1] 0.417287
## [1] 0.427513
## [1] 0.461483
## [1] 0.53345
## [1] 0.431163
## [1] 0.390767
## [1] 0.426129
## [1] 0.492425
## [1] 0.476638
## [1] 0.436233
## [1] 0.337274
## [1] 0.387604
## [1] 0.431808
## [1] 0.487996
## [1] 0.573875
## [1] 0.614925
## [1] 0.598487
## [1] 0.457038
## [1] 0.493046
## [1] 0.515775
## [1] 0.542921
## [1] 0.389504
## [1] 0.301125
## [1] 0.405283
## [1] 0.470317
## [1] 0.483583
## [1] 0.452637
## [1] 0.377504
## [1] 0.450121
## [1] 0.457696
## [1] 0.577021
## [1] 0.537896
## [1] 0.537242
## [1] 0.590917
## [1] 0.584608
## [1] 0.546737
## [1] 0.527142
## [1] 0.557471
## [1] 0.553025
## [1] 0.491783
## [1] 0.520833
## [1] 0.544817
## [1] 0.585238
## [1] 0.5499
## [1] 0.576404
## [1] 0.595975
## [1] 0.572613
## [1] 0.551121
## [1] 0.566908
## [1] 0.583967
## [1] 0.565667
## [1] 0.580825
## [1] 0.584612
## [1] 0.6067
## [1] 0.627529
## [1] 0.642696
## [1] 0.641425
## [1] 0.6793
## [1] 0.672992
## [1] 0.611129
## [1] 0.631329
## [1] 0.607962
## [1] 0.566288
## [1] 0.575133
## [1] 0.578283
## [1] 0.525892
## [1] 0.542292
## [1] 0.569442
## [1] 0.597862
## [1] 0.648367
## [1] 0.663517
## [1] 0.659721
## [1] 0.597875
## [1] 0.611117
## [1] 0.624383
## [1] 0.599754
## [1] 0.594708
## [1] 0.571975
## [1] 0.544842
## [1] 0.654692
## [1] 0.720975
## [1] 0.752542
## [1] 0.724121
## [1] 0.652792
## [1] 0.674254
## [1] 0.654042
## [1] 0.594704
## [1] 0.640792
## [1] 0.675512
## [1] 0.786613
## [1] 0.687508
## [1] 0.750629
## [1] 0.702038
## [1] 0.70265
## [1] 0.732337
## [1] 0.761367
## [1] 0.752533
## [1] 0.804913
## [1] 0.790396
## [1] 0.654054
## [1] 0.664796
## [1] 0.650271
## [1] 0.654683
## [1] 0.667933
## [1] 0.666042
## [1] 0.705196
## [1] 0.724125
## [1] 0.755683
## [1] 0.745583
## [1] 0.714642
## [1] 0.613025
## [1] 0.549912
## [1] 0.623125
## [1] 0.690017
## [1] 0.70645
## [1] 0.654054
## [1] 0.739263
## [1] 0.734217
## [1] 0.697604
## [1] 0.667933
## [1] 0.684987
## [1] 0.662896
## [1] 0.667308
## [1] 0.707088
## [1] 0.722867
## [1] 0.751267
## [1] 0.731079
## [1] 0.710246
## [1] 0.697621
## [1] 0.707717
## [1] 0.699508
## [1] 0.667942
## [1] 0.638267
## [1] 0.644579
## [1] 0.662254
## [1] 0.676779
## [1] 0.654037
## [1] 0.654688
## [1] 0.2424
## [1] 0.618071
## [1] 0.603554
## [1] 0.595967
## [1] 0.601025
## [1] 0.621854
## [1] 0.637008
## [1] 0.6471
## [1] 0.618696
## [1] 0.595996
## [1] 0.654688
## [1] 0.66605
## [1] 0.635733
## [1] 0.652779
## [1] 0.6894
## [1] 0.702654
## [1] 0.649
## [1] 0.661629
## [1] 0.686888
## [1] 0.708983
## [1] 0.655329
## [1] 0.657204
## [1] 0.611121
## [1] 0.578925
## [1] 0.565654
## [1] 0.554292
## [1] 0.570075
## [1] 0.579558
## [1] 0.594083
## [1] 0.585867
## [1] 0.563125
## [1] 0.55305
## [1] 0.565067
## [1] 0.540404
## [1] 0.532192
## [1] 0.571971
## [1] 0.610488
## [1] 0.518933
## [1] 0.502513
## [1] 0.544179
## [1] 0.596613
## [1] 0.607975
## [1] 0.585863
## [1] 0.530296
## [1] 0.517663
## [1] 0.512
## [1] 0.542333
## [1] 0.599133
## [1] 0.607975
## [1] 0.580187
## [1] 0.538521
## [1] 0.419813
## [1] 0.387608
## [1] 0.438112
## [1] 0.503142
## [1] 0.431167
## [1] 0.433071
## [1] 0.391396
## [1] 0.508204
## [1] 0.53915
## [1] 0.460846
## [1] 0.450108
## [1] 0.512625
## [1] 0.537896
## [1] 0.472842
## [1] 0.456429
## [1] 0.482942
## [1] 0.530304
## [1] 0.558721
## [1] 0.529688
## [1] 0.52275
## [1] 0.515133
## [1] 0.467771
## [1] 0.4394
## [1] 0.309909
## [1] 0.3611
## [1] 0.369942
## [1] 0.356042
## [1] 0.323846
## [1] 0.329538
## [1] 0.308075
## [1] 0.281567
## [1] 0.274621
## [1] 0.341891
## [1] 0.355413
## [1] 0.393937
## [1] 0.421713
## [1] 0.475383
## [1] 0.323225
## [1] 0.281563
## [1] 0.324492
## [1] 0.347204
## [1] 0.326383
## [1] 0.337746
## [1] 0.375621
## [1] 0.380667
## [1] 0.364892
## [1] 0.350371
## [1] 0.378779
## [1] 0.248742
## [1] 0.257583
## [1] 0.339004
## [1] 0.281558
## [1] 0.289762
## [1] 0.298422
## [1] 0.323867
## [1] 0.316904
## [1] 0.359208
## [1] 0.455796
## [1] 0.469054
## [1] 0.428012
## [1] 0.258204
## [1] 0.321958
## [1] 0.389508
## [1] 0.390146
## [1] 0.435575
## [1] 0.338363
## [1] 0.297338
## [1] 0.294188
## [1] 0.294192
## [1] 0.338383
## [1] 0.369938
## [1] 0.4015
## [1] 0.409708
## [1] 0.342162
## [1] 0.335217
## [1] 0.301767
## [1] 0.236113
## [1] 0.259471
## [1] 0.2589
## [1] 0.294465
## [1] 0.220333
## [1] 0.226642
## [1] 0.255046
## [1] 0.2424
## [1] 0.2317
## [1] 0.223487
Next, us a for loop to round each number in bikshare$atemp
output <- vector("double", nrow(bikeshare)) #1.output
for (i in seq_along(bikeshare$atemp)) { #2. sequence
output[[i]] <- round(bikeshare$atemp[[i]], 2) #3. body
}
output
## [1] 0.36 0.35 0.19 0.21 0.23 0.23 0.21 0.16 0.12 0.15 0.19 0.16 0.15 0.19
## [15] 0.25 0.23 0.18 0.23 0.30 0.26 0.16 0.08 0.10 0.12 0.23 0.20 0.22 0.22
## [29] 0.21 0.25 0.19 0.23 0.25 0.18 0.23 0.24 0.29 0.30 0.20 0.14 0.15 0.21
## [43] 0.23 0.32 0.40 0.25 0.32 0.43 0.51 0.39 0.28 0.28 0.19 0.25 0.29 0.35
## [57] 0.28 0.35 0.40 0.26 0.32 0.20 0.26 0.38 0.37 0.24 0.30 0.29 0.39 0.30
## [71] 0.33 0.38 0.33 0.32 0.37 0.41 0.53 0.47 0.33 0.41 0.44 0.34 0.27 0.26
## [85] 0.26 0.25 0.26 0.29 0.30 0.26 0.28 0.32 0.38 0.54 0.40 0.39 0.43 0.32
## [99] 0.34 0.43 0.57 0.49 0.42 0.46 0.44 0.43 0.45 0.50 0.49 0.56 0.45 0.32
## [113] 0.45 0.55 0.57 0.59 0.58 0.58 0.50 0.46 0.45 0.53 0.58 0.40 0.44 0.47
## [127] 0.51 0.52 0.53 0.52 0.53 0.52 0.49 0.50 0.54 0.55 0.54 0.53 0.51 0.53
## [141] 0.57 0.57 0.59 0.60 0.62 0.65 0.64 0.61 0.62 0.67 0.73 0.72 0.64 0.59
## [155] 0.59 0.62 0.62 0.66 0.73 0.76 0.70 0.68 0.64 0.60 0.59 0.59 0.60 0.60
## [169] 0.64 0.65 0.60 0.64 0.69 0.69 0.66 0.64 0.64 0.64 0.69 0.65 0.64 0.65
## [183] 0.67 0.67 0.67 0.70 0.69 0.69 0.67 0.66 0.69 0.73 0.74 0.69 0.64 0.62
## [197] 0.64 0.67 0.70 0.75 0.75 0.83 0.84 0.80 0.79 0.72 0.70 0.69 0.74 0.79
## [211] 0.73 0.73 0.70 0.71 0.68 0.66 0.66 0.68 0.72 0.70 0.72 0.68 0.65 0.65
## [225] 0.65 0.62 0.62 0.65 0.67 0.66 0.63 0.65 0.68 0.64 0.61 0.63 0.65 0.66
## [239] 0.64 0.65 0.61 0.59 0.61 0.61 0.60 0.63 0.67 0.63 0.52 0.54 0.56 0.58
## [253] 0.61 0.61 0.60 0.60 0.63 0.55 0.46 0.48 0.49 0.53 0.53 0.55 0.55 0.52
## [267] 0.56 0.57 0.59 0.57 0.58 0.57 0.54 0.41 0.35 0.39 0.47 0.53 0.48 0.50
## [281] 0.51 0.52 0.54 0.55 0.52 0.55 0.53 0.50 0.50 0.51 0.52 0.51 0.47 0.42
## [295] 0.43 0.42 0.46 0.46 0.47 0.46 0.32 0.23 0.32 0.36 0.40 0.39 0.41 0.40
## [309] 0.32 0.36 0.40 0.41 0.41 0.37 0.31 0.36 0.43 0.52 0.51 0.45 0.32 0.27
## [323] 0.32 0.46 0.45 0.42 0.43 0.37 0.38 0.39 0.46 0.49 0.45 0.31 0.31 0.33
## [337] 0.31 0.35 0.39 0.46 0.40 0.26 0.32 0.27 0.25 0.27 0.30 0.34 0.41 0.36
## [351] 0.25 0.25 0.28 0.40 0.43 0.43 0.38 0.30 0.28 0.32 0.33 0.28 0.26 0.32
## [365] 0.41 0.38 0.25 0.13 0.12 0.28 0.34 0.39 0.34 0.25 0.32 0.28 0.38 0.25
## [379] 0.18 0.16 0.19 0.36 0.28 0.19 0.22 0.17 0.16 0.24 0.35 0.29 0.36 0.42
## [393] 0.33 0.27 0.26 0.38 0.47 0.40 0.31 0.27 0.26 0.30 0.36 0.27 0.26 0.29
## [407] 0.21 0.10 0.23 0.33 0.35 0.33 0.35 0.36 0.27 0.27 0.30 0.39 0.44 0.41
## [421] 0.26 0.27 0.36 0.35 0.35 0.48 0.36 0.41 0.30 0.24 0.26 0.39 0.52 0.40
## [435] 0.28 0.36 0.46 0.54 0.55 0.53 0.44 0.51 0.46 0.53 0.54 0.51 0.53 0.57
## [449] 0.49 0.44 0.44 0.32 0.47 0.48 0.38 0.42 0.42 0.43 0.46 0.53 0.43 0.39
## [463] 0.43 0.49 0.48 0.44 0.34 0.39 0.43 0.49 0.57 0.61 0.60 0.46 0.49 0.52
## [477] 0.54 0.39 0.30 0.41 0.47 0.48 0.45 0.38 0.45 0.46 0.58 0.54 0.54 0.59
## [491] 0.58 0.55 0.53 0.56 0.55 0.49 0.52 0.54 0.59 0.55 0.58 0.60 0.57 0.55
## [505] 0.57 0.58 0.57 0.58 0.58 0.61 0.63 0.64 0.64 0.68 0.67 0.61 0.63 0.61
## [519] 0.57 0.58 0.58 0.53 0.54 0.57 0.60 0.65 0.66 0.66 0.60 0.61 0.62 0.60
## [533] 0.59 0.57 0.54 0.65 0.72 0.75 0.72 0.65 0.67 0.65 0.59 0.64 0.68 0.79
## [547] 0.69 0.75 0.70 0.70 0.73 0.76 0.75 0.80 0.79 0.65 0.66 0.65 0.65 0.67
## [561] 0.67 0.71 0.72 0.76 0.75 0.71 0.61 0.55 0.62 0.69 0.71 0.65 0.74 0.73
## [575] 0.70 0.67 0.68 0.66 0.67 0.71 0.72 0.75 0.73 0.71 0.70 0.71 0.70 0.67
## [589] 0.64 0.64 0.66 0.68 0.65 0.65 0.24 0.62 0.60 0.60 0.60 0.62 0.64 0.65
## [603] 0.62 0.60 0.65 0.67 0.64 0.65 0.69 0.70 0.65 0.66 0.69 0.71 0.66 0.66
## [617] 0.61 0.58 0.57 0.55 0.57 0.58 0.59 0.59 0.56 0.55 0.57 0.54 0.53 0.57
## [631] 0.61 0.52 0.50 0.54 0.60 0.61 0.59 0.53 0.52 0.51 0.54 0.60 0.61 0.58
## [645] 0.54 0.42 0.39 0.44 0.50 0.43 0.43 0.39 0.51 0.54 0.46 0.45 0.51 0.54
## [659] 0.47 0.46 0.48 0.53 0.56 0.53 0.52 0.52 0.47 0.44 0.31 0.36 0.37 0.36
## [673] 0.32 0.33 0.31 0.28 0.27 0.34 0.36 0.39 0.42 0.48 0.32 0.28 0.32 0.35
## [687] 0.33 0.34 0.38 0.38 0.36 0.35 0.38 0.25 0.26 0.34 0.28 0.29 0.30 0.32
## [701] 0.32 0.36 0.46 0.47 0.43 0.26 0.32 0.39 0.39 0.44 0.34 0.30 0.29 0.29
## [715] 0.34 0.37 0.40 0.41 0.34 0.34 0.30 0.24 0.26 0.26 0.29 0.22 0.23 0.26
## [729] 0.24 0.23 0.22
#simple way to round without a loop
#atemp_rounded<- round(bikeshare$atemp, 2)
How would you compute the individual measures of central tendancy and variability for the attitude data set?
You will need to think through this problem. First, you need a place to store the output, then you need to use the seq_along function to iterate through the dataset. Finally, you need to update the output.
Try it.
attitudestats <- vector("double", ncol (attitude)) #1 store output
for (i in seq_along(attitude)) { #2 sequence
attitudestats[[i]] <- median(attitude[[i]])
}
attitudestats
## [1] 65.5 65.0 51.5 56.5 63.5 77.5 41.0
Refined example
attitudestats <- vector("double", ncol (attitude)) #1 store output
for (i in seq_along(attitude)) { #2 sequence
attitudestats[[i]] <- median(attitude[[i]])
print (paste(i, colnames(attitude[i]),":", attitudestats[[i]]))
}
## [1] "1 rating : 65.5"
## [1] "2 complaints : 65"
## [1] "3 privileges : 51.5"
## [1] "4 learning : 56.5"
## [1] "5 raises : 63.5"
## [1] "6 critical : 77.5"
## [1] "7 advance : 41"
Try to code the sample problem using a while loop for fun
i<- 1
attitudestats_while <-vector("double", ncol(attitude))
while (i <= ncol(attitude)){
attitudestats_while[[i]] <- median(attitude[[i]])
print(paste(i, colnames(attitude[i]), ":",median(attitude[[i]])))
i <-i+1
}
## [1] "1 rating : 65.5"
## [1] "2 complaints : 65"
## [1] "3 privileges : 51.5"
## [1] "4 learning : 56.5"
## [1] "5 raises : 63.5"
## [1] "6 critical : 77.5"
## [1] "7 advance : 41"
Let’s review of Boolean variables and logical operators
3 > 4
## [1] FALSE
c(1, 2, 3, 4, 5) > 4
## [1] FALSE FALSE FALSE FALSE TRUE
c(1, 2, 3, 4, 6) == 3
## [1] FALSE FALSE TRUE FALSE FALSE
Build a program that checks to seet which prices are considered “cheap”
prices <- c(12.43, 9.99, 18.22, 7.25, 0.50)
numCheap <- 0
for (p in prices){
if (p < 10){
numCheap <- numCheap + 1
}
}
print(numCheap)
## [1] 3
Alternative approach
prices <- c(12.43, 9.99, 18.22, 7.25, 0.50, 11)
sum(prices < 10)
## [1] 3
Some funcions are built in such as:
sqrt(25)
## [1] 5
mean(c(1,2,3,4,5))
## [1] 3
toupper("hello world")
## [1] "HELLO WORLD"
Write your own function. Here’s a an example of the form… with one minor error…
f <- function(x) x + 2
#f("hello world") # causes an error because we need the parameter as a numeric.
f(3)
## [1] 5
Pass in mulitple arguments
addTogether <- function(x, y) x + y
addTogether(5, 10)
## [1] 15
addTogether(x = 5, y = 10) #alternative
## [1] 15
Create multi-line functions
f <- function(x){
y <- x^2
z <- y/2
z
}
f(2)
## [1] 2
You try it: Write a function that averages two numbers
avg <- function(x,y){
(x + y)/2
}
avg(1,2)
## [1] 1.5
Apply a function over a vector
f <- function(x) x^2
sapply(c(1,2,3,4,5),f)
## [1] 1 4 9 16 25
sapply(attitude,f)
## rating complaints privileges learning raises critical advance
## [1,] 1849 2601 900 1521 3721 8464 2025
## [2,] 3969 4096 2601 2916 3969 5329 2209
## [3,] 5041 4900 4624 4761 5776 7396 2304
## [4,] 3721 3969 2025 2209 2916 7056 1225
## [5,] 6561 6084 3136 4356 5041 6889 2209
## [6,] 1849 3025 2401 1936 2916 2401 1156
## [7,] 3364 4489 1764 3136 4356 4624 1225
## [8,] 5041 5625 2500 3025 4900 4356 1681
## [9,] 5184 6724 5184 4489 5041 6889 961
## [10,] 4489 3721 2025 2209 3844 6400 1681
## [11,] 4096 2809 2809 3364 3364 4489 1156
## [12,] 4489 3600 2209 1521 3481 5476 1681
## [13,] 4761 3844 3249 1764 3025 3969 625
## [14,] 4624 6889 6889 2025 3481 5929 1225
## [15,] 5929 5929 2916 5184 6241 5929 2116
## [16,] 6561 8100 2500 5184 3600 2916 1296
## [17,] 5476 7225 4096 4761 6241 6241 3969
## [18,] 4225 3600 4225 5625 3025 6400 3600
## [19,] 4225 4900 2116 3249 5625 7225 2116
## [20,] 2500 3364 4624 2916 4096 6084 2704
## [21,] 2500 1600 1089 1156 1849 4096 1089
## [22,] 4096 3721 2704 3844 4356 6400 1681
## [23,] 2809 4356 2704 2500 3969 6400 1369
## [24,] 1600 1369 1764 3364 2500 3249 2401
## [25,] 3969 2916 1764 2304 4356 5625 1089
## [26,] 4356 5929 4356 3969 7744 5776 5184
## [27,] 6084 5625 3364 5476 6400 6084 2401
## [28,] 2304 3249 1936 2025 2601 6889 1444
## [29,] 7225 7225 5041 5041 5929 5476 3025
## [30,] 6724 6724 1521 3481 4096 6084 1521
Try it using the bikeshare data.
f <- function(x) x^2
sapply(bikeshare$atemp,f)
## [1] 0.132223141 0.125131280 0.035874254 0.044995743 0.052564733
## [6] 0.054386438 0.043613728 0.026326361 0.013496631 0.022767189
## [11] 0.036658463 0.025751584 0.022765680 0.035499459 0.061559565
## [16] 0.054857603 0.031247986 0.053978623 0.089055690 0.065050502
## [21] 0.024911256 0.006252002 0.009769168 0.013907485 0.055002445
## [26] 0.041452960 0.048268090 0.049870482 0.044997440 0.062661104
## [31] 0.034689062 0.055004321 0.064728010 0.031640583 0.052252017
## [36] 0.059077191 0.085071972 0.092208181 0.039301477 0.020817584
## [41] 0.022364604 0.045586093 0.054267566 0.105049237 0.158682722
## [46] 0.064655267 0.099982440 0.183747681 0.262126592 0.153197091
## [51] 0.076911929 0.080698606 0.034608277 0.060376844 0.083631434
## [56] 0.122822913 0.079632325 0.123277530 0.160094414 0.069632127
## [61] 0.102445445 0.040053218 0.065371751 0.143473531 0.134140528
## [66] 0.056863649 0.091445760 0.082144146 0.148739806 0.093025000
## [71] 0.106113062 0.144469168 0.110224000 0.101237240 0.134637625
## [76] 0.168373171 0.277738486 0.217645576 0.106113062 0.167882770
## [81] 0.194165372 0.114202768 0.073350514 0.065695841 0.066342820
## [86] 0.062669615 0.066344365 0.085795096 0.088417022 0.066344881
## [91] 0.080346170 0.099626716 0.143464440 0.294771899 0.158682722
## [96] 0.150239962 0.188092220 0.105286621 0.116642058 0.182104467
## [101] 0.319470257 0.243102247 0.174125102 0.214130159 0.195287100
## [106] 0.181043442 0.198644924 0.253155897 0.239373391 0.318538330
## [111] 0.206017948 0.103654378 0.202608915 0.304442408 0.330050250
## [116] 0.352934611 0.330788320 0.335158787 0.247469436 0.215315488
## [121] 0.200886826 0.283911006 0.338815962 0.163741623 0.195290635
## [126] 0.224786930 0.262780290 0.269291458 0.275883361 0.273237244
## [131] 0.279206560 0.273908830 0.244332490 0.250629396 0.287296000
## [136] 0.303063462 0.290013484 0.277895557 0.260857391 0.279885438
## [141] 0.327155401 0.330050250 0.348449368 0.365798765 0.378891954
## [146] 0.428616377 0.405779192 0.375008040 0.378901803 0.450364472
## [151] 0.526180497 0.519793415 0.414661299 0.344725160 0.353663332
## [156] 0.380447174 0.386707372 0.430270403 0.528934744 0.573925941
## [161] 0.494619637 0.459735529 0.413867056 0.361987536 0.349926670
## [166] 0.345454765 0.354436860 0.360459747 0.414676754 0.417117056
## [171] 0.354436860 0.406592421 0.481398681 0.481404232 0.431101236
## [176] 0.413851616 0.406570742 0.405774096 0.479636583 0.428616377
## [181] 0.405779192 0.425315274 0.445299967 0.446992531 0.442779784
## [186] 0.484886610 0.470092611 0.471791771 0.449547453 0.441105849
## [191] 0.476134501 0.532613878 0.546527526 0.475277875 0.403357091
## [196] 0.389839146 0.407379657 0.448676248 0.495510406 0.558724855
## [201] 0.557784923 0.682889030 0.707106083 0.646877578 0.631753139
## [206] 0.519780438 0.485779726 0.477020905 0.547452010 0.617744125
## [211] 0.530766160 0.532602202 0.494619637 0.499949399 0.462314324
## [216] 0.441943085 0.431080225 0.457184232 0.511642645 0.494606978
## [221] 0.524351223 0.469201710 0.424479613 0.427770938 0.417132556
## [226] 0.389860375 0.379661772 0.417105431 0.444450222 0.438585659
## [231] 0.400968835 0.421195808 0.456334026 0.407368169 0.367317208
## [236] 0.397772399 0.417127389 0.435247631 0.403931429 0.419850866
## [241] 0.369612930 0.353672848 0.373468877 0.378127836 0.365792717
## [246] 0.400958703 0.442795754 0.391432917 0.265431040 0.296185204
## [251] 0.308425840 0.335178471 0.369617793 0.371159974 0.362560537
## [256] 0.364277431 0.393003610 0.306551576 0.212959176 0.228973734
## [261] 0.240626548 0.280555606 0.283254935 0.303086584 0.307983931
## [266] 0.272614516 0.318560906 0.327913134 0.346970478 0.330078976
## [271] 0.330806725 0.330064038 0.296838639 0.170455857 0.119243830
## [276] 0.153700066 0.223594688 0.277874471 0.230808181 0.254423395
## [281] 0.263417351 0.274558184 0.294767556 0.298220841 0.268030892
## [286] 0.304487654 0.280555606 0.248726626 0.253163948 0.260840026
## [291] 0.273237244 0.264039767 0.217645576 0.179433571 0.181043442
## [296] 0.178365163 0.208910242 0.214716391 0.223583340 0.208891046
## [301] 0.101641091 0.051944336 0.103252326 0.126780860 0.157678880
## [306] 0.152203758 0.164771858 0.162725106 0.104881413 0.131303320
## [311] 0.160697559 0.169946765 0.167345628 0.139667386 0.094136671
## [316] 0.128122475 0.185373303 0.275217751 0.257636441 0.204293152
## [321] 0.104471815 0.074376744 0.105289217 0.208902015 0.198080184
## [326] 0.177827516 0.185362108 0.138734646 0.144910410 0.148291998
## [331] 0.207753640 0.240219575 0.203739391 0.096858511 0.093363247
## [336] 0.109847833 0.096474845 0.121870810 0.155176906 0.208300960
## [341] 0.160196861 0.066017136 0.100832922 0.070975354 0.064086948
## [346] 0.073005878 0.090684095 0.114488843 0.169939344 0.129474031
## [351] 0.062185896 0.060309045 0.078923350 0.157175774 0.183198552
## [356] 0.181579107 0.142516065 0.089545775 0.078378162 0.099562336
## [361] 0.107343383 0.078385441 0.069638988 0.101641091 0.171496203
## [366] 0.141091136 0.063657308 0.015945376 0.014241320 0.077513242
## [371] 0.115781631 0.152708227 0.115775507 0.061245855 0.101650018
## [376] 0.079987718 0.145876636 0.062181407 0.033520850 0.026122641
## [381] 0.036352380 0.132698461 0.075764765 0.036114441 0.048822438
## [386] 0.030581266 0.026325063 0.059077191 0.121876396 0.086919422
## [391] 0.126771602 0.172543037 0.106523252 0.074376744 0.068971891
## [396] 0.145402654 0.217657705 0.159177859 0.095694948 0.074378926
## [401] 0.069971359 0.087868373 0.130396099 0.070980149 0.068637712
## [406] 0.086176299 0.044464892 0.010334349 0.051944336 0.111519931
## [411] 0.123642954 0.109006946 0.123642954 0.126326931 0.070643261
## [416] 0.074742639 0.087091683 0.154187373 0.197532247 0.168897163
## [421] 0.065369706 0.071989183 0.128131066 0.124979926 0.121431341
## [426] 0.225977588 0.129486265 0.170975634 0.091821726 0.058163451
## [431] 0.065046422 0.148302010 0.275209357 0.157674909 0.077154506
## [436] 0.129362509 0.211224806 0.294771899 0.300980613 0.283902481
## [441] 0.190295740 0.255071462 0.215296000 0.283898218 0.290017792
## [446] 0.263433775 0.282563475 0.324976384 0.236909013 0.191395750
## [451] 0.192501562 0.099637448 0.221793902 0.232617148 0.141091136
## [456] 0.177837637 0.174128440 0.182767365 0.212966559 0.284568902
## [461] 0.185901533 0.152698848 0.181585925 0.242482381 0.227183783
## [466] 0.190299230 0.113753751 0.150236861 0.186458149 0.238140096
## [471] 0.329332516 0.378132756 0.358186689 0.208883733 0.243094358
## [476] 0.266023851 0.294763212 0.151713366 0.090676266 0.164254310
## [481] 0.221198080 0.233852518 0.204880254 0.142509270 0.202608915
## [486] 0.209485628 0.332953234 0.289332107 0.288628967 0.349182901
## [491] 0.341766514 0.298921347 0.277878688 0.310773916 0.305836651
## [496] 0.241850519 0.271267014 0.296825563 0.342503517 0.302390010
## [501] 0.332241571 0.355186201 0.327885648 0.303734357 0.321384680
## [506] 0.341017457 0.319979155 0.337357681 0.341771191 0.368084890
## [511] 0.393792646 0.413058148 0.411426031 0.461448490 0.452918232
## [516] 0.373478655 0.398576306 0.369617793 0.320682099 0.330777968
## [521] 0.334411228 0.276562396 0.294080613 0.324264191 0.357438971
## [526] 0.420379767 0.440254809 0.435231798 0.357454516 0.373463988
## [531] 0.389854131 0.359704861 0.353677605 0.327155401 0.296852805
## [536] 0.428621615 0.519804951 0.566319462 0.524351223 0.426137395
## [541] 0.454618457 0.427770938 0.353672848 0.410614387 0.456316462
## [546] 0.618760012 0.472667250 0.563443896 0.492857353 0.493717022
## [551] 0.536317482 0.579679709 0.566305916 0.647884938 0.624725837
## [556] 0.427786635 0.441953722 0.422852373 0.428609830 0.446134492
## [561] 0.443611946 0.497301398 0.524357016 0.571056796 0.555894010
## [566] 0.510713188 0.375799651 0.302403208 0.388284766 0.476123460
## [571] 0.499071603 0.427786635 0.546509783 0.539074603 0.486651341
## [576] 0.446134492 0.469207190 0.439431107 0.445299967 0.499973440
## [581] 0.522536700 0.564402105 0.534476504 0.504449381 0.486675060
## [586] 0.500863352 0.489311442 0.446146515 0.407384763 0.415482087
## [591] 0.438580361 0.458029815 0.427764397 0.428616377 0.058757760
## [596] 0.382011761 0.364277431 0.355176665 0.361231051 0.386702397
## [601] 0.405779192 0.418738410 0.382784740 0.355211232 0.428616377
## [606] 0.443622603 0.404156447 0.426120423 0.475272360 0.493722644
## [611] 0.421201000 0.437752934 0.471815125 0.502656894 0.429456098
## [616] 0.431917098 0.373468877 0.335154156 0.319964448 0.307239621
## [621] 0.324985506 0.335887475 0.352934611 0.343240142 0.317109766
## [626] 0.305864303 0.319300714 0.292036483 0.283228325 0.327150825
## [631] 0.372695598 0.269291458 0.252519315 0.296130784 0.355947072
## [636] 0.369633601 0.343235455 0.281213848 0.267974982 0.262144000
## [641] 0.294125083 0.358960352 0.369633601 0.336616955 0.290004867
## [646] 0.176242955 0.150239962 0.191942125 0.253151872 0.185904982
## [651] 0.187550491 0.153190829 0.258271306 0.290682723 0.212379036
## [656] 0.202597212 0.262784391 0.289332107 0.223579557 0.208327432
## [661] 0.233232975 0.281222332 0.312169156 0.280569377 0.273267563
## [666] 0.265362008 0.218809708 0.193072360 0.096043588 0.130393210
## [671] 0.136857083 0.126765906 0.104876232 0.108595293 0.094910206
## [676] 0.079279975 0.075416694 0.116889456 0.126318401 0.155186360
## [681] 0.177841854 0.225988997 0.104474401 0.079277723 0.105295058
## [686] 0.120550618 0.106525863 0.114072361 0.141091136 0.144907365
## [691] 0.133146172 0.122759838 0.143473531 0.061872583 0.066349002
## [696] 0.114923712 0.079274907 0.083962017 0.089055690 0.104889834
## [701] 0.100428145 0.129030387 0.207749994 0.220011655 0.183194272
## [706] 0.066669306 0.103656954 0.151716482 0.152213901 0.189725581
## [711] 0.114489520 0.088409886 0.086546579 0.086548933 0.114503055
## [716] 0.136854124 0.161202250 0.167860645 0.117074834 0.112370437
## [721] 0.091063322 0.055749349 0.067325200 0.067029210 0.086709636
## [726] 0.048546631 0.051366596 0.065048462 0.058757760 0.053684890
## [731] 0.049946439
Create a function that computes the basic summary statistics (max, min, median, and mean) for the attitude data set.
col_max <- function (df){
output <- vector("double", ncol (df)) #1 store output
for (i in seq_along(df)) { #2 sequence
output[[i]] <- max(df[[i]])
print (paste(i, colnames(df[i]),":", output[[i]]))
}
}
Let’s call the function col_max
col_max(attitude)
## [1] "1 rating : 85"
## [1] "2 complaints : 90"
## [1] "3 privileges : 83"
## [1] "4 learning : 75"
## [1] "5 raises : 88"
## [1] "6 critical : 92"
## [1] "7 advance : 72"
Let’s improve on this function. We’re going to write a function that takes a parameter of another function. This way we can pass in our data and function for the mean(), median(), etc.
col_summary <- function(df, fun){
output <- vector("double", length(df)) #1 store output
for (i in seq_along(df)) { #2 sequence
output[i] <- fun(df[[i]])
print (paste(i, colnames(df[i]),":", output[[i]]))
}
}
Now, let’s call the col_summary function
col_summary(attitude, mean)
## [1] "1 rating : 64.6333333333333"
## [1] "2 complaints : 66.6"
## [1] "3 privileges : 53.1333333333333"
## [1] "4 learning : 56.3666666666667"
## [1] "5 raises : 64.6333333333333"
## [1] "6 critical : 74.7666666666667"
## [1] "7 advance : 42.9333333333333"
col_summary(attitude[,2:4],median)
## [1] "1 complaints : 65"
## [1] "2 privileges : 51.5"
## [1] "3 learning : 56.5"
col_summary(attitude, max)
## [1] "1 rating : 85"
## [1] "2 complaints : 90"
## [1] "3 privileges : 83"
## [1] "4 learning : 75"
## [1] "5 raises : 88"
## [1] "6 critical : 92"
## [1] "7 advance : 72"
col_summary(attitude, min)
## [1] "1 rating : 40"
## [1] "2 complaints : 37"
## [1] "3 privileges : 30"
## [1] "4 learning : 34"
## [1] "5 raises : 43"
## [1] "6 critical : 49"
## [1] "7 advance : 25"
execute = True
if execute:
print("Of course!")
print("This will execute as well")
## Of course!
## This will execute as well
pwd
ls -l
## /Users/ksosulsk/Dropbox/_becomingvisual_manuscript_2017/becomingvisual_R
## total 188160
## drwxr-xr-x@ 11 ksosulsk staff 374 Aug 29 2017 Bike-Sharing-Dataset
## -rw-r--r--@ 1 ksosulsk staff 27512 Aug 14 2017 Bike_Sharing_Carlos_Arias.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1562 Aug 15 2017 ChartTypes.Rmd
## -rw-r--r--@ 1 ksosulsk staff 32774 May 16 14:27 R_In_Class_Session.Rmd
## -rw-r--r--@ 1 ksosulsk staff 5083706 May 16 08:11 R_In_Class_Session.html
## -rw-r--r--@ 1 ksosulsk staff 31116 May 15 23:05 R_In_Class_Session_STUDENTVERSION.Rmd
## drwxr-xr-x@ 4 ksosulsk staff 136 May 15 22:52 R_In_Class_Session_STUDENTVERSION_files
## drwxr-xr-x@ 3 ksosulsk staff 102 May 16 15:34 R_In_Class_Session_files
## -rw-r--r--@ 1 ksosulsk staff 193934 Mar 21 2015 Sidewalk_Cafes.csv
## -rw-r--r--@ 1 ksosulsk staff 90 Apr 13 2017 Untitled.Rnw
## drwxr-xr-x@ 3 ksosulsk staff 102 Aug 15 2017 _bookdown_files
## -rw-r--r--@ 1 ksosulsk staff 70025 May 16 2017 _main.Rmd
## -rw-r--r--@ 1 ksosulsk staff 5511379 May 7 2017 _main.html
## -rw-r--r--@ 1 ksosulsk staff 1494 Oct 18 2017 ansombes.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1195959 Oct 17 2017 ansombes.html
## -rw-r--r--@ 1 ksosulsk staff 54426 May 17 2015 app1.tiff
## -rw-r--r--@ 1 ksosulsk staff 50926 Oct 17 2017 area01.png
## -rw-r--r--@ 1 ksosulsk staff 51827 Oct 17 2017 area02.png
## -rw-r--r--@ 1 ksosulsk staff 50707 Sep 14 2017 area03.png
## -rw-r--r--@ 1 ksosulsk staff 16210 Oct 17 2017 bar01.png
## -rw-r--r--@ 1 ksosulsk staff 31671 Oct 17 2017 bar02.png
## -rw-r--r--@ 1 ksosulsk staff 205 May 10 2017 becoming visual.Rproj
## -rw-r--r--@ 1 ksosulsk staff 205 May 16 09:17 becomingvisual_R.Rproj
## drwxr-xr-x@ 3 ksosulsk staff 102 Aug 15 2017 bikeshare-figure
## -rw-r--r--@ 1 ksosulsk staff 108521 Apr 13 2017 bikeshare-rpubs.html
## -rw-r--r--@ 1 ksosulsk staff 476 Apr 13 2017 bikeshare.Rhtml
## -rw-r--r--@ 1 ksosulsk staff 530 Apr 13 2017 bikeshare.Rpres
## -rw-r--r--@ 1 ksosulsk staff 2036 Apr 13 2017 bikeshare.html
## -rw-r--r--@ 1 ksosulsk staff 808 May 16 2017 bikeshare.md
## -rw-r--r--@ 1 ksosulsk staff 60431 Aug 9 2017 bikeshare_08_07_2017
## -rw-r--r--@ 1 ksosulsk staff 54384 May 16 2017 bikeshare_shinyhistogram.png
## drwxr-xr-x@ 4 ksosulsk staff 136 Aug 15 2017 bikeshareapp
## -rw-r--r--@ 1 ksosulsk staff 54609 May 10 2017 bikesharedailydata.csv
## -rw-r--r--@ 1 ksosulsk staff 19836 Oct 17 2017 boxplot01.png
## -rw-r--r--@ 1 ksosulsk staff 19988 Oct 17 2017 boxplot02.png
## -rw-r--r--@ 1 ksosulsk staff 177343 Sep 27 2017 casino.csv
## -rw-r--r--@ 1 ksosulsk staff 1106825 Dec 8 15:05 casino_games_sub.csv
## -rw-r--r--@ 1 ksosulsk staff 177343 Sep 27 2017 casino_new.csv
## -rw-r--r--@ 1 ksosulsk staff 1106675 Dec 8 14:51 casino_reshaped_2017.csv
## -rw-r--r--@ 1 ksosulsk staff 2057 Nov 1 2017 casinocasestudy.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1963835 Nov 1 2017 casinocasestudy.html
## -rw-r--r--@ 1 ksosulsk staff 4633 Dec 10 13:08 casinoscript.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1293 Jul 10 2017 chapter04_code.R
## -rw-r--r--@ 1 ksosulsk staff 11291 Sep 18 2017 cheatsheet_nicole_version03.Rmd
## -rw-r--r--@ 1 ksosulsk staff 461 Aug 16 2017 columbo.R
## -rw-r--r--@ 1 ksosulsk staff 19028 Oct 12 2015 crime.csv
## -rw-r--r--@ 1 ksosulsk staff 13515 Oct 12 2017 crime_edited.csv
## -rw-r--r--@ 1 ksosulsk staff 5022 Aug 29 2017 daniel_cheatsheet.Rmd
## -rw-r--r--@ 1 ksosulsk staff 6163 Aug 29 2017 daniel_cheatsheet_version02.Rmd
## -rw-r--r--@ 1 ksosulsk staff 71092 May 10 2017 datascienceslides.pptx
## -rw-r--r--@ 1 ksosulsk staff 200374 May 10 2017 datasciencesteps.png
## -rw-r--r--@ 1 ksosulsk staff 5608 May 16 2017 datavisinclasssession_2017.Rmd
## -rw-r--r--@ 1 ksosulsk staff 24527 Oct 17 2017 density01.png
## -rw-r--r--@ 1 ksosulsk staff 25360 Oct 17 2017 density02.png
## -rw-r--r--@ 1 ksosulsk staff 23758 Oct 17 2017 density03.png
## -rw-r--r--@ 1 ksosulsk staff 23997 Oct 17 2017 density04.png
## -rw-r--r--@ 1 ksosulsk staff 24044 Oct 17 2017 density05.png
## -rw-r--r--@ 1 ksosulsk staff 22066 Oct 17 2017 density06.png
## -rw-r--r--@ 1 ksosulsk staff 22164 Oct 17 2017 density07.png
## -rw-r--r--@ 1 ksosulsk staff 23387 Oct 17 2017 density08.png
## drwxr-xr-x@ 3 ksosulsk staff 102 Aug 15 2017 figure
## -rw-r--r--@ 1 ksosulsk staff 5712 Aug 16 2017 ggplot_primer.R
## -rw-r--r--@ 1 ksosulsk staff 791 May 16 11:33 ggplot_test.Rmd
## -rw-r--r--@ 1 ksosulsk staff 961340 May 16 11:22 ggplot_test.html
## -rwxrwxrwx@ 1 ksosulsk staff 5953 Oct 19 2015 ggplot_tutorial.R
## -rw-r--r--@ 1 ksosulsk staff 13534 Sep 14 2017 hist01.jpeg
## -rw-r--r--@ 1 ksosulsk staff 13534 Oct 17 2017 hist01.png
## -rw-r--r--@ 1 ksosulsk staff 13486 Oct 17 2017 hist02.png
## -rw-r--r--@ 1 ksosulsk staff 15952 Oct 17 2017 hist03.png
## -rw-r--r--@ 1 ksosulsk staff 15967 Oct 17 2017 hist04.png
## -rw-r--r--@ 1 ksosulsk staff 13678 Oct 17 2017 hist05.png
## -rw-r--r--@ 1 ksosulsk staff 13766 Oct 17 2017 hist06.png
## -rw-r--r--@ 1 ksosulsk staff 85318 May 16 2015 hist2.tiff
## drwxr-xr-x@ 6 ksosulsk staff 204 Aug 15 2017 histogram
## drwxr-xr-x@ 3 ksosulsk staff 102 May 14 15:47 inclass_presentation-figure
## -rw-r--r--@ 1 ksosulsk staff 541 May 14 15:47 inclass_presentation.Rpres
## -rw-r--r--@ 1 ksosulsk staff 830 May 14 15:47 inclass_presentation.md
## -rw-r--r--@ 1 ksosulsk staff 23039 Nov 1 2017 index.Rmd
## -rw-r--r--@ 1 ksosulsk staff 3109836 May 6 2017 index.html
## -rw-r--r--@ 1 ksosulsk staff 1849558 May 2 2017 index.nb.html
## -rw-r--r--@ 1 ksosulsk staff 24935 Oct 25 2017 lesson05_basic_charts_cheat_sheet.Rmd
## -rw-r--r--@ 1 ksosulsk staff 6258560 Sep 14 2017 lesson05_basic_charts_cheat_sheet.html
## -rw-r--r--@ 1 ksosulsk staff 218877 Sep 14 2017 lesson05_basic_charts_cheat_sheet.md
## drwxr-xr-x@ 4 ksosulsk staff 136 Sep 15 2017 lesson05_basic_charts_cheat_sheet_files
## -rw-r--r--@ 1 ksosulsk staff 17017 Dec 10 13:08 lesson05_solutions_and_demo.Rmd
## -rw-r--r--@ 1 ksosulsk staff 743152 Oct 18 2017 lesson05_solutions_and_demo.docx
## -rw-r--r--@ 1 ksosulsk staff 7395602 Nov 30 16:03 lesson05_solutions_and_demo.html
## -rw-r--r--@ 1 ksosulsk staff 264199 Nov 30 16:03 lesson05_solutions_and_demo.md
## drwxr-xr-x@ 4 ksosulsk staff 136 Feb 20 11:40 lesson05_solutions_and_demo_files
## -rw-r--r--@ 1 ksosulsk staff 68071 Oct 17 2017 line01.png
## -rw-r--r--@ 1 ksosulsk staff 85941 Oct 17 2017 line02.png
## -rw-r--r--@ 1 ksosulsk staff 4837 May 17 2017 markdown_report.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1912940 May 17 2017 markdown_report.html
## -rw-r--r--@ 1 ksosulsk staff 4806 Jul 26 2017 markdown_slides.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1611594 Jul 26 2017 markdown_slides.html
## -rw-r--r--@ 1 ksosulsk staff 1299 Oct 15 2017 multivariate_exercise.R
## -rw-r--r--@ 1 ksosulsk staff 2676 Jul 7 2017 myfile.csv
## -rw-r--r--@ 1 ksosulsk staff 38 Feb 21 10:38 myfirstRScriptToday.R
## -rw-r--r--@ 1 ksosulsk staff 639 Feb 20 09:40 myfirstnotebook.Rmd
## -rw-r--r--@ 1 ksosulsk staff 841650 Feb 20 09:40 myfirstnotebook.nb.html
## -rw-r--r--@ 1 ksosulsk staff 1853 Aug 14 2017 networkdiagram.R
## -rw-r--r--@ 1 ksosulsk staff 8305 Aug 24 2017 nicole_cheatsheet.Rmd
## -rw-r--r--@ 1 ksosulsk staff 7517784 Aug 24 2017 nicole_cheatsheet.html
## -rw-r--r--@ 1 ksosulsk staff 10906 Sep 12 2017 nicole_cheatsheet_02.Rmd
## -rw-r--r--@ 1 ksosulsk staff 7663834 Sep 12 2017 nicole_cheatsheet_02.html
## -rw-r--r--@ 1 ksosulsk staff 1647985 Aug 9 2017 nicole_week05_ck.html
## -rw-r--r--@ 1 ksosulsk staff 5996 Aug 14 2017 nicole_week05_ck.rmd
## -rwxrwxrwx@ 1 ksosulsk staff 308 Jul 30 2015 pakistan.childHIV.csv
## -rw-r--r--@ 1 ksosulsk staff 193443 May 7 2017 plot_id798936310.svg
## drwxr-xr-x@ 3 ksosulsk staff 102 Aug 15 2017 rsconnect
## -rw-r--r--@ 1 ksosulsk staff 2890 Oct 12 2017 sample.Rmd
## -rw-r--r--@ 1 ksosulsk staff 844 May 10 2017 sampleknit.Rmd
## -rw-r--r--@ 1 ksosulsk staff 792958 May 10 2017 sampleknit.html
## -rw-r--r--@ 1 ksosulsk staff 41842 Oct 17 2017 scatter01.png
## -rw-r--r--@ 1 ksosulsk staff 47486 Oct 17 2017 scatter02.png
## -rw-r--r--@ 1 ksosulsk staff 15573 Oct 17 2017 sosulski_visualization_02_cheat_sheet.Rmd
## -rw-r--r--@ 1 ksosulsk staff 5821303 Sep 7 2017 sosulski_visualization_02_cheat_sheet.html
## -rw-r--r--@ 1 ksosulsk staff 163844 Sep 7 2017 sosulski_visualization_02_cheat_sheet.md
## drwxr-xr-x@ 3 ksosulsk staff 102 Aug 30 2017 sosulski_visualization_02_cheat_sheet_files
## -rw-r--r--@ 1 ksosulsk staff 9020 Aug 30 2017 sosulski_visualization_cheat_sheet.Rmd
## -rw-r--r--@ 1 ksosulsk staff 5613184 Aug 30 2017 sosulski_visualization_cheat_sheet.html
## -rw-r--r--@ 1 ksosulsk staff 160975 Aug 30 2017 sosulski_visualization_cheat_sheet.md
## drwxr-xr-x@ 4 ksosulsk staff 136 Aug 29 2017 sosulski_visualization_cheat_sheet_files
## -rw-r--r--@ 1 ksosulsk staff 11187 Nov 30 15:29 sosulski_visualization_cheat_sheet_version02.Rmd
## -rw-r--r--@ 1 ksosulsk staff 5818929 Sep 5 2017 sosulski_visualization_cheat_sheet_version02.html
## -rw-r--r--@ 1 ksosulsk staff 163963 Sep 5 2017 sosulski_visualization_cheat_sheet_version02.md
## drwxr-xr-x@ 3 ksosulsk staff 102 Sep 5 2017 sosulski_visualization_cheat_sheet_version02_files
## -rwxrwxrwx@ 1 ksosulsk staff 794 Jul 30 2015 southasia.csv
## -rw-r--r--@ 1 ksosulsk staff 1782 Jan 8 23:02 sql_script.R
## -rw-r--r--@ 1 ksosulsk staff 447 Oct 18 2017 survey_skills.csv
## -rw-r--r--@ 1 ksosulsk staff 691 Apr 14 2017 test_templatermd.Rmd
## -rw-r--r--@ 1 ksosulsk staff 930 Apr 14 2017 test_templatermd.md
## -rw-r--r--@ 1 ksosulsk staff 1989 Apr 14 2017 vignette_rmdtemplate.Rmd
## -rw-r--r--@ 1 ksosulsk staff 42477 Apr 14 2017 vignette_rmdtemplate.html
## -rw-r--r--@ 1 ksosulsk staff 2330 Feb 25 13:32 visualizing _crime.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1387286 Oct 15 2017 visualizing__crime.docx
## -rw-r--r--@ 1 ksosulsk staff 6267687 Nov 1 2017 visualizing__crime.html
## -rw-r--r--@ 1 ksosulsk staff 22333 Nov 1 2017 visualizing__crime.md
## drwxr-xr-x@ 4 ksosulsk staff 136 Oct 15 2017 visualizing__crime_files
## -rw-r--r--@ 1 ksosulsk staff 1357 Nov 1 2017 visualizing_ancombes.Rmd
## -rw-r--r--@ 1 ksosulsk staff 1028434 Nov 1 2017 visualizing_ancombes.html
## -rw-r--r--@ 1 ksosulsk staff 839 Nov 1 2017 visualizing_ancombes.md
## drwxr-xr-x@ 3 ksosulsk staff 102 Oct 18 2017 visualizing_ancombes_files
## -rw-r--r--@ 1 ksosulsk staff 3420 Oct 17 2017 visualizing_skills.Rmd
## drwxr-xr-x@ 3 ksosulsk staff 102 Oct 17 2017 visualizing_skills_files
## -rw-r--r--@ 1 ksosulsk staff 25886 May 14 21:21 week04_decode.Rmd
## -rw-r--r--@ 1 ksosulsk staff 7846577 Aug 22 2017 week04_decode.html
## -rw-r--r--@ 1 ksosulsk staff 226149 Aug 22 2017 week04_decode.md
## drwxr-xr-x@ 3 ksosulsk staff 102 Aug 17 2017 week04_decode_files
## -rw-r--r--@ 1 ksosulsk staff 13009 Aug 14 2017 week07_shiny.Rmd
## -rw-r--r--@ 1 ksosulsk staff 579 May 7 2017 world_internet_usage.csv
## -rw-r--r--@ 1 ksosulsk staff 4325 Aug 4 2013 worldexports_yearly.csv
## -rw-r--r--@ 1 ksosulsk staff 29263 May 6 2017 worldinternet.png
Review session 7 from R Fundamentals (Sosulski, 2018): SQL & R http://becomingvisual.com/rfundamentals/sql-r.html
SELECT all applicable data
The players on the San Antonio Spurs in 2014
Top 5 blockers in 2010
Top 10 combination power-forwards with the most defensive rebounds
Top 20 Player-seasons in the NBA 50-40-90 Club (players who have hit over 50% for FG%, 40% for 3P%, 90% for FT%, 300 field goals, 55 3-pointers, and 125 free throws) ordered by their amount of points
Top 10 oldest Milwaukee Bucks players with over 1000 points.
Exercise available at: http://becomingvisual.com/rfundamentals/sql-r.html#exercise-7.1
Review session 8 from R Fundamentals (Sosulski, 2018): RShiny http://becomingvisual.com/rfundamentals/rshiny.html
Modify your shiny app.
Use ggplot to create an interactive scatterplot of the same data
Exercise available at: http://becomingvisual.com/rfundamentals/rshiny.html#exercise-8.1 ***
Submit all the files for exercises U & V (with your name on it) to NYU Classes > Assignments > R in class session. Due date: Official end date of module 1.
Write a script to determine the average ridership on weekends versus weekdays.
Let’s imagine it costs $10 per day to rent a bike on a weekday and $12 on a weekend. What is the annual weekday rental revenue in 2011 and 2012? What is the annual weekend revenue in 2011 and 2012?
Hint: Use a for loop and if/else logic.
At this point in the process, you should have gained enough insight to frame a question to guide the rest of your analysis. Sometimes you don’t know what to ask of the data and other times the questions you have cannot be answered by the data that you have. In most visual analytical explorations there will be a back and forth between defining the questions and identifying the data sources that have contain the information you need to extract.
Often your question will fall into one of three categories: Past, present, or future.
Some questions that can guide an historical analysis of past events are:
These questions serve a purpose of guiding reports, where the analyst is reporting on past events.
A question based on the present is:
How many bikes were rented in the past hour or today?
This type of question is reserved for producing a current state of an event.
Can we answer this question?
The data we are using cannot answer this question since it is historical data from 2011 and 2012.
A question about the future could be framed as the following:
Will bike rentals be higher in the summer rather than the winter due to weather?
Questions about the future using involve analysis that requires prediction or forecasting methods. The analyst in this case is trying to predict the future from past data.
Use the data from: https://www.weather.gov/media/unr/heatindex.pdf
As a next step, I encourage you to select a data set from one of the resources provided below and explore it using the process we applied in class.
General Datasets
UCI Machine Learning Repository: Consists of diverse field of datasets (360 datasets currently and still growing) for the purpose of performing analytics and machine learning algorithms. http://archive.ics.uci.edu/ml/
Kaggle datasets: Perfect for exploring data through visualization. https://www.kaggle.com/datasets
Amazon Public Dataset: These are large dataset which deals with dataset with memory in Gbs or Tbs. https://aws.amazon.com/public-datasets/
Google Public Data: A set of dataset provided by Google, including Book corpus, US names, Genome dataset, BIgQuery dataset, and many more. https://cloud.google.com/public-datasets/
Open Data by Socrata: Thousands of free dataset for exploration. https://opendata.socrata.com/
Data.gov: A website dedicated to supply datasets of different domains, eg. Education, Nutrient, Sports. https://catalog.data.gov/dataset?res_format=CSV
Datahub: Just as its tagline, “The easy way to get, share data”. https://datahub.io/dataset?tags=weather
Harvard Dataverse: Find most of the datasets used for research purpose, and cited in different publications. https://dataverse.harvard.edu/
Challenges based dataset
KDD Data Center: Have a problem coming up with a problem statement? No worries, KDD provides you with the dataset and problem statements through its challenges. http://www.kdd.org/kdd-cup
CrowdAnalytics: More challenges to solve with dataset. https://www.crowdanalytix.com/community
DataDriven: Problem for data scientist to solve. https://www.drivendata.org/competitions/
Big Data Innovation Challenge: Tackle real problem with these analytics, and also win a challenge. https://bigdatainnovationchallenge.org/challenges/food-security-nutrition/
Census Dataset
Open Census Data: Details of population in different cities of countries is just a click away with this open data. http://census.okfn.org/en/latest/
Census.gov: Census data of United States. http://www.census.gov/data.html
Weather/Climate dataset
Wunderground: Want to work with weather data? Use Wunderground’s API to get your own dataset. https://www.wunderground.com/weather/api/
National Center for Environmental Information: Climate datasets available for analytics. https://www.ncdc.noaa.gov/cdo-web/datasets
News Dataset
BBC Dataset: It consists of documents from the BBC news website corresponding to stories in five topical areas. http://mlg.ucd.ie/datasets/bbc.html
The Guardian: A collection of news datasets from the guardian, which is updated regularly. https://www.theguardian.com/news/datablog/interactive/2013/jan/14/all-our-datasets-index
Food, and Nutrition Datasets
United States Department of Agriculture: The dataset are provided by the Center of Nutritional Policy and Promotion giving details about food prices dataset, health eating index. https://www.cnpp.usda.gov/data
Nutritional Science Blog: A blog listing some of dataset relating to the domain of nutrition. http://nutsci.org/open-nutrition-food-data/