Updated on Tue May 16 20:53:23 2017.

DATA BASICS

This section will guide you in the process of decoding your data into information and ultimately intelligible insights. In doing so, we will explore the use of tidyverse and R base packages.

When working with a new data what initial questions do you have?

Consider the following questions to guide your understanding.

What does your data represent in the real world?
How is this real world phenomena characterized by the data that you have?
From what time period is the data?

Once you have this basic understanding of your data you can dig deeper. Then you can use visualization techniques to explore your data and derive some basic understandings of the phenomena you are studying, such as the largest and smallest values for each variable. In addition, calculating summary statistics translate data into information by revealing the shape of the data, the mean, median, minimum value, maximum value, and variability all with simple visualizations.

For any data science project there are few simple steps to follow. Caption for the picture.

A. Exercise: Importing your data

Using the World internet usage data we will compare of read.csv to read_csv for importing data.

utils package using read.csv()

internet_utils <- read.csv("world_internet_usage.csv")
head(internet_utils)

##                country X2000 X2001 X2002 X2003 X2004 X2005 X2006 X2007
## 1                China  1.78  2.64  4.60  6.20  7.30  8.52 10.52 16.00
## 2               Mexico  5.08  7.04 11.90 12.90 14.10 17.21 19.52 20.81
## 3               Panama  6.55  7.27  8.52  9.99 11.14 11.48 17.35 22.29
## 4              Senegal  0.40  0.98  1.01  2.10  4.39  4.79  5.61  7.70
## 5            Singapore 36.00 41.67 47.00 53.84 62.00 61.00 59.00 69.90
## 6 United Arab Emirates 23.63 26.27 28.32 29.48 30.13 40.00 52.00 61.00
##   X2008 X2009 X2010 X2011 X2012
## 1 22.60 28.90 34.30 38.30 42.30
## 2 21.71 26.34 31.05 34.96 38.42
## 3 33.82 39.08 40.10 42.70 45.20
## 4 10.60 14.50 16.00 17.50 19.20
## 5 69.00 69.00 71.00 71.00 74.18
## 6 63.00 64.00 68.00 78.00 85.00

readr read_csv using read_csv()

library(readr)
internet_readr <- read_csv("world_internet_usage.csv")

## Parsed with column specification:
## cols(
##   country = col_character(),
##   `2000` = col_double(),
##   `2001` = col_double(),
##   `2002` = col_double(),
##   `2003` = col_double(),
##   `2004` = col_double(),
##   `2005` = col_double(),
##   `2006` = col_double(),
##   `2007` = col_double(),
##   `2008` = col_double(),
##   `2009` = col_double(),
##   `2010` = col_double(),
##   `2011` = col_double(),
##   `2012` = col_double()
## )

head(internet_readr)

## # A tibble: 6 × 14
##                country `2000` `2001` `2002` `2003` `2004` `2005` `2006`
##                  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1                China   1.78   2.64   4.60   6.20   7.30   8.52  10.52
## 2               Mexico   5.08   7.04  11.90  12.90  14.10  17.21  19.52
## 3               Panama   6.55   7.27   8.52   9.99  11.14  11.48  17.35
## 4              Senegal   0.40   0.98   1.01   2.10   4.39   4.79   5.61
## 5            Singapore  36.00  41.67  47.00  53.84  62.00  61.00  59.00
## 6 United Arab Emirates  23.63  26.27  28.32  29.48  30.13  40.00  52.00
## # ... with 6 more variables: `2007` <dbl>, `2008` <dbl>, `2009` <dbl>,
## #   `2010` <dbl>, `2011` <dbl>, `2012` <dbl>

Accessing specific rows and columns

#extract by position
internet_readr[[2,1]]

## [1] "Mexico"

internet_utils [2,1] # double [[ ]] works too

## [1] Mexico
## 7 Levels: China Mexico Panama Senegal Singapore ... United States

#extract by name
internet_readr$country

## [1] "China"                "Mexico"               "Panama"              
## [4] "Senegal"              "Singapore"            "United Arab Emirates"
## [7] "United States"

internet_utils$country

## [1] China                Mexico               Panama              
## [4] Senegal              Singapore            United Arab Emirates
## [7] United States       
## 7 Levels: China Mexico Panama Senegal Singapore ... United States

#to use with infix function add a .
internet_readr %>% .$country

## [1] "China"                "Mexico"               "Panama"              
## [4] "Senegal"              "Singapore"            "United Arab Emirates"
## [7] "United States"

B. Exercise: Tidy data - reshaping

You need to rename columns first to remove the X in front of each year.

names(internet_utils) <-c("country", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012")
names(internet_utils)

##  [1] "country" "2000"    "2001"    "2002"    "2003"    "2004"    "2005"   
##  [8] "2006"    "2007"    "2008"    "2009"    "2010"    "2011"    "2012"

Reshape a data frame

library(reshape2)
internet_utils_reshaped <- melt(internet_utils,id.vars="country", variable.name="year", value.name="usage")

Reshape a tibble

internet_readr_reshaped <- melt(internet_readr,id.vars="country", variable.name="year", value.name="usage")
internet_readr_reshaped

##                 country year usage
## 1                 China 2000  1.78
## 2                Mexico 2000  5.08
## 3                Panama 2000  6.55
## 4               Senegal 2000  0.40
## 5             Singapore 2000 36.00
## 6  United Arab Emirates 2000 23.63
## 7         United States 2000 43.08
## 8                 China 2001  2.64
## 9                Mexico 2001  7.04
## 10               Panama 2001  7.27
## 11              Senegal 2001  0.98
## 12            Singapore 2001 41.67
## 13 United Arab Emirates 2001 26.27
## 14        United States 2001 49.08
## 15                China 2002  4.60
## 16               Mexico 2002 11.90
## 17               Panama 2002  8.52
## 18              Senegal 2002  1.01
## 19            Singapore 2002 47.00
## 20 United Arab Emirates 2002 28.32
## 21        United States 2002 58.79
## 22                China 2003  6.20
## 23               Mexico 2003 12.90
## 24               Panama 2003  9.99
## 25              Senegal 2003  2.10
## 26            Singapore 2003 53.84
## 27 United Arab Emirates 2003 29.48
## 28        United States 2003 61.70
## 29                China 2004  7.30
## 30               Mexico 2004 14.10
## 31               Panama 2004 11.14
## 32              Senegal 2004  4.39
## 33            Singapore 2004 62.00
## 34 United Arab Emirates 2004 30.13
## 35        United States 2004 64.76
## 36                China 2005  8.52
## 37               Mexico 2005 17.21
## 38               Panama 2005 11.48
## 39              Senegal 2005  4.79
## 40            Singapore 2005 61.00
## 41 United Arab Emirates 2005 40.00
## 42        United States 2005 67.97
## 43                China 2006 10.52
## 44               Mexico 2006 19.52
## 45               Panama 2006 17.35
## 46              Senegal 2006  5.61
## 47            Singapore 2006 59.00
## 48 United Arab Emirates 2006 52.00
## 49        United States 2006 68.93
## 50                China 2007 16.00
## 51               Mexico 2007 20.81
## 52               Panama 2007 22.29
## 53              Senegal 2007  7.70
## 54            Singapore 2007 69.90
## 55 United Arab Emirates 2007 61.00
## 56        United States 2007 75.00
## 57                China 2008 22.60
## 58               Mexico 2008 21.71
## 59               Panama 2008 33.82
## 60              Senegal 2008 10.60
## 61            Singapore 2008 69.00
## 62 United Arab Emirates 2008 63.00
## 63        United States 2008 74.00
## 64                China 2009 28.90
## 65               Mexico 2009 26.34
## 66               Panama 2009 39.08
## 67              Senegal 2009 14.50
## 68            Singapore 2009 69.00
## 69 United Arab Emirates 2009 64.00
## 70        United States 2009 71.00
## 71                China 2010 34.30
## 72               Mexico 2010 31.05
## 73               Panama 2010 40.10
## 74              Senegal 2010 16.00
## 75            Singapore 2010 71.00
## 76 United Arab Emirates 2010 68.00
## 77        United States 2010 74.00
## 78                China 2011 38.30
## 79               Mexico 2011 34.96
## 80               Panama 2011 42.70
## 81              Senegal 2011 17.50
## 82            Singapore 2011 71.00
## 83 United Arab Emirates 2011 78.00
## 84        United States 2011 77.86
## 85                China 2012 42.30
## 86               Mexico 2012 38.42
## 87               Panama 2012 45.20
## 88              Senegal 2012 19.20
## 89            Singapore 2012 74.18
## 90 United Arab Emirates 2012 85.00
## 91        United States 2012 81.03

class(internet_readr_reshaped) # turns into a data.frame!

## [1] "data.frame"

Use the gather function to reshape

tidy_internet_readr <- 
internet_readr %>%
gather(`2000`,`2001`,`2002`,`2003`,`2004`,`2005`,`2006`,`2007`,`2008`,`2009`,`2010`,`2011`,`2012`, key="year", value="usage")

tidy_internet_readr

## # A tibble: 91 × 3
##                 country  year usage
##                   <chr> <chr> <dbl>
## 1                 China  2000  1.78
## 2                Mexico  2000  5.08
## 3                Panama  2000  6.55
## 4               Senegal  2000  0.40
## 5             Singapore  2000 36.00
## 6  United Arab Emirates  2000 23.63
## 7         United States  2000 43.08
## 8                 China  2001  2.64
## 9                Mexico  2001  7.04
## 10               Panama  2001  7.27
## # ... with 81 more rows

C. Exercise: Understand - Visualize

Create a few statistical visualizations to understand the makeup of your data.

Single boxplot

boxplot(internet_readr$`2000`, main="Range of internet users in 2000", sub="Median of 6.55 users per 100 people")

Single histogram

hist(internet_readr$`2000`, main="Frequency of internet users in 2000 per 100 people", xlab="2000")

Percentage histogram

library(lattice)
histogram(internet_readr$`2000`, main="Frequency of internet users in 2000 per 100 people", xlab="2000")

Multiple box plots

boxplot(internet_readr[,2:14], main="Range of internet users per 100 people")

Simple point plot

plot(tidy_internet_readr$year, tidy_internet_readr$usage,main="Internet usage per 100 people",xlab="Year",ylab="Usage", type="p")

***

D. Exercise: Communicate

Create charts and reports.

Create a presentation ready chart using ggplot and apply a ggtheme.

library(ggthemes)
library(ggplot2)
#scatter plot
ggplot(tidy_internet_readr,aes(x=year,y=usage,colour=country,group=country)) + geom_line() + labs(title = "Internet Usage per 100 people", subtitle = "Since 2011, the UAE has surpassed Singapore and the US in internet users", caption = "Source: World Bank, 2013",x = "Year",y ="Usage") + theme_few()

Create a markdown document and publish it

Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents.

See the sample markdown here:

For more details on using R Markdown see http://rmarkdown.rstudio.com.

APPLICATION: Capital Bikeshare

Understand your data

The type of data you have will dictate the types of questions you use to guide your analysis. To begin, import the bike sharing data from the Capital Bikeshare system.

E. Exercise: Import the bike sharing data

This data spans the District of Columbia, Arlington County, Alexandria, Montgomery County and Fairfax County. The Capital Bikeshare system is owned by the participating jurisdictions and is operated by Motivate, a Brooklyn, NY-based company that operates several other bikesharing systems including Citibike in New York City, Hubway in Boston and Divvy Bikes in Chicago.

library(readr)
bikeshare <- read_csv("~/Desktop/BECOMING_VISUAL/becoming visual/Bike-Sharing-Dataset/bikesharedailydata.csv")

## Parsed with column specification:
## cols(
##   instant = col_integer(),
##   dteday = col_character(),
##   season = col_integer(),
##   yr = col_integer(),
##   mnth = col_integer(),
##   holiday = col_integer(),
##   weekday = col_integer(),
##   workingday = col_integer(),
##   weathersit = col_integer(),
##   temp = col_double(),
##   atemp = col_double(),
##   hum = col_double(),
##   windspeed = col_double(),
##   casual = col_integer(),
##   registered = col_integer(),
##   cnt = col_integer()
## )

F. Exercise: Take a look at the data.

Preview the data

You can preview the data using the head function to show the first few observations.

head(bikeshare)

## # A tibble: 6 × 16
##   instant dteday season    yr  mnth holiday weekday workingday weathersit
##     <int>  <chr>  <int> <int> <int>   <int>   <int>      <int>      <int>
## 1       1 1/1/11      1     0     1       0       6          0          2
## 2       2 1/2/11      1     0     1       0       0          0          2
## 3       3 1/3/11      1     0     1       0       1          1          1
## 4       4 1/4/11      1     0     1       0       2          1          1
## 5       5 1/5/11      1     0     1       0       3          1          1
## 6       6 1/6/11      1     0     1       0       4          1          1
## # ... with 7 more variables: temp <dbl>, atemp <dbl>, hum <dbl>,
## #   windspeed <dbl>, casual <int>, registered <int>, cnt <int>

Next, you can view the variables and types by using the str function.

str(bikeshare)

One of the first things you may notice is the data dimensions, the number of rows and columns. Specifically there are 731 rows (observations) and 16 columns (variables or attributes).

Rows are commonly referred to as observations or records and columns are described as attributes or variables.

However, the variable names listed at the first row of every column are not very descriptive.

G. Exercise: Understanding the variables

Take a look column named season. What is the meaning of season? What are the possible values for this variable?

bikeshare$season

##   [1]  1  1  1  1  1  1 NA  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [24]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [47]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
##  [70]  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  2
##  [93]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [116]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [139]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [162]  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3
## [185]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [208]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [231]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [254]  3  3  3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4
## [277]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
## [300]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
## [323]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
## [346]  4  4  4  4  4  4  4  4  4  1  1  1  1  1  1  1  1  1  1  1  1  1  1
## [369]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
## [392]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
## [415]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
## [438]  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [461]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [484]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [507]  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2  2
## [530]  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [553]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [576]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [599]  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
## [622]  3  3  3  3  3  3  3  3  3  3  4  4  4  4  4  4  4  4  4  4  4  4  4
## [645]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
## [668]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
## [691]  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4  4
## [714]  4  4  4  4  4  4  4  1  1  1  1  1  1  1  1  1  1  1

What type of variable is it?

It is an integer. You’ll notice that in the column seasons the values are integers that range between 1 and 4.

What do the numbers represent?

If we really think about it’s unlikely that the numbers represent quantities. Instead, they probably represent the seasons of the year because we know there are four seasons. The numbers (1 through 4) are probably a code for the each of the four seasons of the year. Without additional information, such as a data dictionary or read me file, it would be impossible for the user of the data to know what the possible values of 1 through 4 correspond to in the categorical variable named season.

This leads us to the next step, reviewing the data dictionary along with the data set to better understand the meaning behind the values.

Review the data dictionary

A data dictionary defines the characteristics of each of the data attributes. If your data comes from a reputable source, odds are that it is accompanied with a data dictionary or metadata. To know which season is represented by each number in the variable season we can review the data dictionary.

Field	Definition
instant	record index
dteday	date
season	season (1:spring, 2:summer, 3:fall, 4:winter)
yr	year (0: 2011, 1:2012)
mnth	month ( 1 to 12)
hr	hour (0 to 23)
holiday	weather day is holiday or not
weekday	day of the week
workingday	if day is neither weekend nor holiday is 1, otherwise is 0.
weathersit	1, 2, 3, 4
– 1	Clear, Few clouds, Partly cloudy, Partly cloudy
– 2	Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
– 3	Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
– 4	Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp	Normalized temperature in Celsius. The values are divided to 41 (max)
atemp	Normalized feeling temperature in Celsius. The values are divided to 50 (max)
hum	Normalized humidity. The values are divided to 100 (max)
windspeed	Normalized wind speed. The values are divided to 67 (max)
casual	count of casual users
registered	count of registered users
cnt	count of total rental bikes including both casual and registered

For example, season is a categorical variable defined by one of four values, each representing a season (1: spring, 2: summer, 3: fall, 4: winter).

You’ll notice that the variable year is coded with the value of 0 for 2011 and 1 for 2012, rather than actual year value of 2011 or 2012.

The variable weathersit is encoded with four possible values, 1 through 4. The values represent the daily weather situation as defined below.

Clear, Few clouds, Partly cloudy, Partly cloudy
Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

It is essential undergo this process of understanding to help inform the formulate questions for exploration and further analysis. Visualizing data without understanding the meaning of the variables will make it difficult for you to interpret the result. By approaching a data visualization task informed about the data and its attributes you can better formulate questions for visual exploration. The next step is to prepare the data for analytical and visualization tasks.

At this point, you may want to rename the columns in your data set to make the data more usable when you begin the analysis. Renaming columns is a manual process that literally involves change the each column name. It is best practice to use lower case lettering and avoid spaces or hyphenation.

Preparing your data ##H. Exercise: Renaming columns

There are many ways to rename columns. Two approaches are presented below

Renaming columns with the rename function from the dplyr library.

library(dplyr)
bikeshare <- rename(bikeshare, humidity = hum)
names(bikeshare)

##  [1] "instant"    "dteday"     "season"     "yr"         "mnth"      
##  [6] "holiday"    "weekday"    "workingday" "weathersit" "temp"      
## [11] "atemp"      "humidity"   "windspeed"  "casual"     "registered"
## [16] "cnt"

Renaming columns with R base functions.

# Rename column where names is "yr"
names(bikeshare)[names(bikeshare) == "yr"] <- "year"
names(bikeshare)

##  [1] "instant"    "dteday"     "season"     "year"       "mnth"      
##  [6] "holiday"    "weekday"    "workingday" "weathersit" "temp"      
## [11] "atemp"      "humidity"   "windspeed"  "casual"     "registered"
## [16] "cnt"

I. Exercise: Dealing with missing values

Even before you define the questions you seek to have answered from the data, it needs to be formatted appropriately. The rows should correspond to observations and the columns correspond the observed variables. This makes it easier to map the data to visual properties such as position, color, size, or shape. A preprocessing step is necessary to verify the dataset for correctness and consistency. Incomplete information has a high potential for incorrect results.

Tactics

There are several ways you tackle working with data that are incomplete. Each has its pros and cons.

Ignore any record with missing values
Replace empty fields with a pre-defined value
Replace empty fields with the most frequently appeared value
Use the mean value
Manual approach

Problem

Row 7, column 3: The season variable has no value
Row 10, column 5: The month has no value.

Solution

In these two cases it’s easy to replace the value with a pre-known value. We wouldn’t want to ignore the record because the values can be easily determined.

Updating the records

bikeshare$season[7]

## [1] NA

1->bikeshare$season[7]
bikeshare$season[7]

## [1] 1

bikeshare$mnth[10]

## [1] NA

1->bikeshare$mnth[10]
bikeshare$mnth[10]

## [1] 1

J. Exercise: Understand - Calculate basic summary statistics

It is helpful to calculate some summary statistics about your data to learn more about the distribution, the median, minimum, maximum values, variance, standard deviation, number of observations and attributes.

summary(bikeshare)

##     instant         dteday              season           year       
##  Min.   :  1.0   Length:731         Min.   :1.000   Min.   :0.0000  
##  1st Qu.:183.5   Class :character   1st Qu.:2.000   1st Qu.:0.0000  
##  Median :366.0   Mode  :character   Median :3.000   Median :1.0000  
##  Mean   :366.0                      Mean   :2.497   Mean   :0.5007  
##  3rd Qu.:548.5                      3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :731.0                      Max.   :4.000   Max.   :1.0000  
##       mnth          holiday           weekday        workingday   
##  Min.   : 1.00   Min.   :0.00000   Min.   :0.000   Min.   :0.000  
##  1st Qu.: 4.00   1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.000  
##  Median : 7.00   Median :0.00000   Median :3.000   Median :1.000  
##  Mean   : 6.52   Mean   :0.02873   Mean   :2.997   Mean   :0.684  
##  3rd Qu.:10.00   3rd Qu.:0.00000   3rd Qu.:5.000   3rd Qu.:1.000  
##  Max.   :12.00   Max.   :1.00000   Max.   :6.000   Max.   :1.000  
##    weathersit         temp             atemp            humidity     
##  Min.   :1.000   Min.   :0.05913   Min.   :0.07907   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:0.33708   1st Qu.:0.33784   1st Qu.:0.5200  
##  Median :1.000   Median :0.49833   Median :0.48673   Median :0.6267  
##  Mean   :1.395   Mean   :0.49538   Mean   :0.47435   Mean   :0.6279  
##  3rd Qu.:2.000   3rd Qu.:0.65542   3rd Qu.:0.60860   3rd Qu.:0.7302  
##  Max.   :3.000   Max.   :0.86167   Max.   :0.84090   Max.   :0.9725  
##    windspeed           casual         registered        cnt      
##  Min.   :0.02239   Min.   :   2.0   Min.   :  20   Min.   :  22  
##  1st Qu.:0.13495   1st Qu.: 315.5   1st Qu.:2497   1st Qu.:3152  
##  Median :0.18097   Median : 713.0   Median :3662   Median :4548  
##  Mean   :0.19049   Mean   : 848.2   Mean   :3656   Mean   :4504  
##  3rd Qu.:0.23321   3rd Qu.:1096.0   3rd Qu.:4776   3rd Qu.:5956  
##  Max.   :0.50746   Max.   :3410.0   Max.   :6946   Max.   :8714

The summary function shows the mean, median, minimum, and maximum values for each variable in the data set. This is particular useful for continuous variables such as temp, cnt, casual, and registered. For example, you can easily see the average number of customers (casual and registered) per day.

K. Exercise: Understand - Visualize

Explore the data visually. As a first step, consider scatterplots to show relationships between variables, histograms for frequencies, density plots to show distributions, and box plots to show the range of values.

Kernal density plot

Let’s say you wanted to see know the distribution of the ridership.

Kernal density plots are an effective way to view the distribution of a variable. Create the plot using plot(density(x)) where x is a numeric vector.

A density plot that shows the shape of the data for the number of riders per day.

density_riders = density(bikeshare$cnt)
plot(density_riders, main= "Number of riders per day",sub= round(mean(bikeshare$cnt), 2),"Mean =", frame=FALSE)
polygon(density_riders, col="gray", border="gray")

How would we interpret the density plot?

Histogram

A histogram that shows the frequency of the weather situation by day.

hist(bikeshare$weathersit, col="gray",border="gray", xlab="Weather", main="Frequency of weather situations")

Value	Meaning
1	Clear, Few clouds, Partly cloudy, Partly cloudy
2	Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
3	Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
4	Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

How would we interpret the histogram?

You can check to see if your histogram makes is clear by reviewing the sum of each value for weathersit.

table(bikeshare$weathersit)

## 
##   1   2   3 
## 463 247  21

L. Exercise: Scatter plots

To see relationships, scatter plots are useful. In this case, we are looking for positive or negative correlations.

Scatter plot

A simple scatter plot that shows the relationship between the rentals and temperature

plot(bikeshare$cnt, bikeshare$atemp, main= "Relationship between bike rentals and average daily temperature", frame=FALSE, xlab="Number of rentals per day", ylab="Average daily temperature in degrees fahrenheit")

Scatter plot with fit lines

To aid in the interpretation, it is helpful to add a linear regression line if there is a linear relationship or a lowess line. A lowess line will more accurate fit the line to the data.

plot(bikeshare$cnt, bikeshare$atemp, main= "Relationship between bike rentals and average daily temperature", frame=FALSE, xlab="Number of rentals per day", ylab="Average daily temperature in degrees fahrenheit")

# Add fit lines
abline(lm(bikeshare$atemp~bikeshare$cnt), col="blue") # regression line (y~x) 
lines(lowess(bikeshare$cnt, bikeshare$atemp), col="orange") # lowess line (x,y)

How would we interpret this scatter plot? Use this to inform the title of your plot.

Scatter plot with grouped categorical data (season)

Consider using color to group categorical data. In this example, we are grouping the points by season. We’re using the ggvis package.

#static chart
library(ggvis)
bikeshare %>% 
  ggvis(x=~cnt, y=~atemp) %>% 

layer_points(fill = ~season)   %>% 
  add_axis("x", title = "Number of rentals per day") %>%
  add_axis("y", title = "Average daily temperature in degrees fahrenheit")

Scatter plot with grouped categorical data (year)

We can even look at the data by year.

#static chart
library(ggvis)
bikeshare %>% 
  ggvis(x=~cnt, y=~atemp) %>% 

layer_points(fill = ~year)   %>% 
  add_axis("x", title = "Number of rentals per day") %>%
  add_axis("y", title = "Average daily temperature in degrees fahrenheit")

M. Exercise: Interactive chart - Use ggvis to filter

Then we can build on the example above and add a filter to hide and reveal different seasons.

library(ggvis)
bikeshare %>% 
  ggvis(x=~cnt, y=~atemp) %>% 
  filter(bikeshare$season %in% eval(input_checkboxgroup(choices=unique(bikeshare$season), 
    selected = "1")))%>% 
layer_points(fill = ~factor(season))   %>% 
  add_axis("x", title = "Number of rentals per day") %>%
  add_axis("y", title = "Average daily temperature in degrees fahrenheit") 
  %>%
  add_legend(title ="Seasons", size = 200)

N.Homework: Communicate - Create an RMarkdown document

Complete on your own.

Create an RMarkdown document named Bike_Sharing.Rmd. Include the code and markup for exercises E-L.

NEW CONCEPTS

This section will introduce control structures such as the while loop, for loop, if/else conditional statements, and functions.

O.Exercise: Iteration using the while loop

The while loop

x <- 10
while (x > 0) {
 print(x)
 x <- x - 1 
}

## [1] 10
## [1] 9
## [1] 8
## [1] 7
## [1] 6
## [1] 5
## [1] 4
## [1] 3
## [1] 2
## [1] 1

Using a variable as a counter

counter = 0
while (counter < 9) {
  print(counter)
  counter = counter + 1 }

## [1] 0
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8

P. Exercise: Iteration using a for loop

The for loop

Iterate through an array of numbers

for (i in c(1,2,3,4)){
    print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4

Iterate through a column in the bikeshare data

for (i in bikeshare$atemp){
    print(i)
}

## [1] 0.363625
## [1] 0.353739
## [1] 0.189405
## [1] 0.212122
## [1] 0.22927
## [1] 0.233209
## [1] 0.208839
## [1] 0.162254
## [1] 0.116175
## [1] 0.150888
## [1] 0.191464
## [1] 0.160473
## [1] 0.150883
## [1] 0.188413
## [1] 0.248112
## [1] 0.234217
## [1] 0.176771
## [1] 0.232333
## [1] 0.298422
## [1] 0.25505
## [1] 0.157833
## [1] 0.0790696
## [1] 0.0988391
## [1] 0.11793
## [1] 0.234526
## [1] 0.2036
## [1] 0.2197
## [1] 0.223317
## [1] 0.212126
## [1] 0.250322
## [1] 0.18625
## [1] 0.23453
## [1] 0.254417
## [1] 0.177878
## [1] 0.228587
## [1] 0.243058
## [1] 0.291671
## [1] 0.303658
## [1] 0.198246
## [1] 0.144283
## [1] 0.149548
## [1] 0.213509
## [1] 0.232954
## [1] 0.324113
## [1] 0.39835
## [1] 0.254274
## [1] 0.3162
## [1] 0.428658
## [1] 0.511983
## [1] 0.391404
## [1] 0.27733
## [1] 0.284075
## [1] 0.186033
## [1] 0.245717
## [1] 0.289191
## [1] 0.350461
## [1] 0.282192
## [1] 0.351109
## [1] 0.400118
## [1] 0.263879
## [1] 0.320071
## [1] 0.200133
## [1] 0.255679
## [1] 0.378779
## [1] 0.366252
## [1] 0.238461
## [1] 0.3024
## [1] 0.286608
## [1] 0.385668
## [1] 0.305
## [1] 0.32575
## [1] 0.380091
## [1] 0.332
## [1] 0.318178
## [1] 0.36693
## [1] 0.410333
## [1] 0.527009
## [1] 0.466525
## [1] 0.32575
## [1] 0.409735
## [1] 0.440642
## [1] 0.337939
## [1] 0.270833
## [1] 0.256312
## [1] 0.257571
## [1] 0.250339
## [1] 0.257574
## [1] 0.292908
## [1] 0.29735
## [1] 0.257575
## [1] 0.283454
## [1] 0.315637
## [1] 0.378767
## [1] 0.542929
## [1] 0.39835
## [1] 0.387608
## [1] 0.433696
## [1] 0.324479
## [1] 0.341529
## [1] 0.426737
## [1] 0.565217
## [1] 0.493054
## [1] 0.417283
## [1] 0.462742
## [1] 0.441913
## [1] 0.425492
## [1] 0.445696
## [1] 0.503146
## [1] 0.489258
## [1] 0.564392
## [1] 0.453892
## [1] 0.321954
## [1] 0.450121
## [1] 0.551763
## [1] 0.5745
## [1] 0.594083
## [1] 0.575142
## [1] 0.578929
## [1] 0.497463
## [1] 0.464021
## [1] 0.448204
## [1] 0.532833
## [1] 0.582079
## [1] 0.40465
## [1] 0.441917
## [1] 0.474117
## [1] 0.512621
## [1] 0.518933
## [1] 0.525246
## [1] 0.522721
## [1] 0.5284
## [1] 0.523363
## [1] 0.4943
## [1] 0.500629
## [1] 0.536
## [1] 0.550512
## [1] 0.538529
## [1] 0.527158
## [1] 0.510742
## [1] 0.529042
## [1] 0.571975
## [1] 0.5745
## [1] 0.590296
## [1] 0.604813
## [1] 0.615542
## [1] 0.654688
## [1] 0.637008
## [1] 0.612379
## [1] 0.61555
## [1] 0.671092
## [1] 0.725383
## [1] 0.720967
## [1] 0.643942
## [1] 0.587133
## [1] 0.594696
## [1] 0.616804
## [1] 0.621858
## [1] 0.65595
## [1] 0.727279
## [1] 0.757579
## [1] 0.703292
## [1] 0.678038
## [1] 0.643325
## [1] 0.601654
## [1] 0.591546
## [1] 0.587754
## [1] 0.595346
## [1] 0.600383
## [1] 0.643954
## [1] 0.645846
## [1] 0.595346
## [1] 0.637646
## [1] 0.693829
## [1] 0.693833
## [1] 0.656583
## [1] 0.643313
## [1] 0.637629
## [1] 0.637004
## [1] 0.692558
## [1] 0.654688
## [1] 0.637008
## [1] 0.652162
## [1] 0.667308
## [1] 0.668575
## [1] 0.665417
## [1] 0.696338
## [1] 0.685633
## [1] 0.686871
## [1] 0.670483
## [1] 0.664158
## [1] 0.690025
## [1] 0.729804
## [1] 0.739275
## [1] 0.689404
## [1] 0.635104
## [1] 0.624371
## [1] 0.638263
## [1] 0.669833
## [1] 0.703925
## [1] 0.747479
## [1] 0.74685
## [1] 0.826371
## [1] 0.840896
## [1] 0.804287
## [1] 0.794829
## [1] 0.720958
## [1] 0.696979
## [1] 0.690667
## [1] 0.7399
## [1] 0.785967
## [1] 0.728537
## [1] 0.729796
## [1] 0.703292
## [1] 0.707071
## [1] 0.679937
## [1] 0.664788
## [1] 0.656567
## [1] 0.676154
## [1] 0.715292
## [1] 0.703283
## [1] 0.724121
## [1] 0.684983
## [1] 0.651521
## [1] 0.654042
## [1] 0.645858
## [1] 0.624388
## [1] 0.616167
## [1] 0.645837
## [1] 0.666671
## [1] 0.662258
## [1] 0.633221
## [1] 0.648996
## [1] 0.675525
## [1] 0.638254
## [1] 0.606067
## [1] 0.630692
## [1] 0.645854
## [1] 0.659733
## [1] 0.635556
## [1] 0.647959
## [1] 0.607958
## [1] 0.594704
## [1] 0.611121
## [1] 0.614921
## [1] 0.604808
## [1] 0.633213
## [1] 0.665429
## [1] 0.625646
## [1] 0.5152
## [1] 0.544229
## [1] 0.555361
## [1] 0.578946
## [1] 0.607962
## [1] 0.609229
## [1] 0.60213
## [1] 0.603554
## [1] 0.6269
## [1] 0.553671
## [1] 0.461475
## [1] 0.478512
## [1] 0.490537
## [1] 0.529675
## [1] 0.532217
## [1] 0.550533
## [1] 0.554963
## [1] 0.522125
## [1] 0.564412
## [1] 0.572637
## [1] 0.589042
## [1] 0.574525
## [1] 0.575158
## [1] 0.574512
## [1] 0.544829
## [1] 0.412863
## [1] 0.345317
## [1] 0.392046
## [1] 0.472858
## [1] 0.527138
## [1] 0.480425
## [1] 0.504404
## [1] 0.513242
## [1] 0.523983
## [1] 0.542925
## [1] 0.546096
## [1] 0.517717
## [1] 0.551804
## [1] 0.529675
## [1] 0.498725
## [1] 0.503154
## [1] 0.510725
## [1] 0.522721
## [1] 0.513848
## [1] 0.466525
## [1] 0.423596
## [1] 0.425492
## [1] 0.422333
## [1] 0.457067
## [1] 0.463375
## [1] 0.472846
## [1] 0.457046
## [1] 0.318812
## [1] 0.227913
## [1] 0.321329
## [1] 0.356063
## [1] 0.397088
## [1] 0.390133
## [1] 0.405921
## [1] 0.403392
## [1] 0.323854
## [1] 0.362358
## [1] 0.400871
## [1] 0.412246
## [1] 0.409079
## [1] 0.373721
## [1] 0.306817
## [1] 0.357942
## [1] 0.43055
## [1] 0.524612
## [1] 0.507579
## [1] 0.451988
## [1] 0.323221
## [1] 0.272721
## [1] 0.324483
## [1] 0.457058
## [1] 0.445062
## [1] 0.421696
## [1] 0.430537
## [1] 0.372471
## [1] 0.380671
## [1] 0.385087
## [1] 0.4558
## [1] 0.490122
## [1] 0.451375
## [1] 0.311221
## [1] 0.305554
## [1] 0.331433
## [1] 0.310604
## [1] 0.3491
## [1] 0.393925
## [1] 0.4564
## [1] 0.400246
## [1] 0.256938
## [1] 0.317542
## [1] 0.266412
## [1] 0.253154
## [1] 0.270196
## [1] 0.301138
## [1] 0.338362
## [1] 0.412237
## [1] 0.359825
## [1] 0.249371
## [1] 0.245579
## [1] 0.280933
## [1] 0.396454
## [1] 0.428017
## [1] 0.426121
## [1] 0.377513
## [1] 0.299242
## [1] 0.279961
## [1] 0.315535
## [1] 0.327633
## [1] 0.279974
## [1] 0.263892
## [1] 0.318812
## [1] 0.414121
## [1] 0.375621
## [1] 0.252304
## [1] 0.126275
## [1] 0.119337
## [1] 0.278412
## [1] 0.340267
## [1] 0.390779
## [1] 0.340258
## [1] 0.247479
## [1] 0.318826
## [1] 0.282821
## [1] 0.381938
## [1] 0.249362
## [1] 0.183087
## [1] 0.161625
## [1] 0.190663
## [1] 0.364278
## [1] 0.275254
## [1] 0.190038
## [1] 0.220958
## [1] 0.174875
## [1] 0.16225
## [1] 0.243058
## [1] 0.349108
## [1] 0.294821
## [1] 0.35605
## [1] 0.415383
## [1] 0.326379
## [1] 0.272721
## [1] 0.262625
## [1] 0.381317
## [1] 0.466538
## [1] 0.398971
## [1] 0.309346
## [1] 0.272725
## [1] 0.264521
## [1] 0.296426
## [1] 0.361104
## [1] 0.266421
## [1] 0.261988
## [1] 0.293558
## [1] 0.210867
## [1] 0.101658
## [1] 0.227913
## [1] 0.333946
## [1] 0.351629
## [1] 0.330162
## [1] 0.351629
## [1] 0.355425
## [1] 0.265788
## [1] 0.273391
## [1] 0.295113
## [1] 0.392667
## [1] 0.444446
## [1] 0.410971
## [1] 0.255675
## [1] 0.268308
## [1] 0.357954
## [1] 0.353525
## [1] 0.34847
## [1] 0.475371
## [1] 0.359842
## [1] 0.413492
## [1] 0.303021
## [1] 0.241171
## [1] 0.255042
## [1] 0.3851
## [1] 0.524604
## [1] 0.397083
## [1] 0.277767
## [1] 0.35967
## [1] 0.459592
## [1] 0.542929
## [1] 0.548617
## [1] 0.532825
## [1] 0.436229
## [1] 0.505046
## [1] 0.464
## [1] 0.532821
## [1] 0.538533
## [1] 0.513258
## [1] 0.531567
## [1] 0.570067
## [1] 0.486733
## [1] 0.437488
## [1] 0.43875
## [1] 0.315654
## [1] 0.47095
## [1] 0.482304
## [1] 0.375621
## [1] 0.421708
## [1] 0.417287
## [1] 0.427513
## [1] 0.461483
## [1] 0.53345
## [1] 0.431163
## [1] 0.390767
## [1] 0.426129
## [1] 0.492425
## [1] 0.476638
## [1] 0.436233
## [1] 0.337274
## [1] 0.387604
## [1] 0.431808
## [1] 0.487996
## [1] 0.573875
## [1] 0.614925
## [1] 0.598487
## [1] 0.457038
## [1] 0.493046
## [1] 0.515775
## [1] 0.542921
## [1] 0.389504
## [1] 0.301125
## [1] 0.405283
## [1] 0.470317
## [1] 0.483583
## [1] 0.452637
## [1] 0.377504
## [1] 0.450121
## [1] 0.457696
## [1] 0.577021
## [1] 0.537896
## [1] 0.537242
## [1] 0.590917
## [1] 0.584608
## [1] 0.546737
## [1] 0.527142
## [1] 0.557471
## [1] 0.553025
## [1] 0.491783
## [1] 0.520833
## [1] 0.544817
## [1] 0.585238
## [1] 0.5499
## [1] 0.576404
## [1] 0.595975
## [1] 0.572613
## [1] 0.551121
## [1] 0.566908
## [1] 0.583967
## [1] 0.565667
## [1] 0.580825
## [1] 0.584612
## [1] 0.6067
## [1] 0.627529
## [1] 0.642696
## [1] 0.641425
## [1] 0.6793
## [1] 0.672992
## [1] 0.611129
## [1] 0.631329
## [1] 0.607962
## [1] 0.566288
## [1] 0.575133
## [1] 0.578283
## [1] 0.525892
## [1] 0.542292
## [1] 0.569442
## [1] 0.597862
## [1] 0.648367
## [1] 0.663517
## [1] 0.659721
## [1] 0.597875
## [1] 0.611117
## [1] 0.624383
## [1] 0.599754
## [1] 0.594708
## [1] 0.571975
## [1] 0.544842
## [1] 0.654692
## [1] 0.720975
## [1] 0.752542
## [1] 0.724121
## [1] 0.652792
## [1] 0.674254
## [1] 0.654042
## [1] 0.594704
## [1] 0.640792
## [1] 0.675512
## [1] 0.786613
## [1] 0.687508
## [1] 0.750629
## [1] 0.702038
## [1] 0.70265
## [1] 0.732337
## [1] 0.761367
## [1] 0.752533
## [1] 0.804913
## [1] 0.790396
## [1] 0.654054
## [1] 0.664796
## [1] 0.650271
## [1] 0.654683
## [1] 0.667933
## [1] 0.666042
## [1] 0.705196
## [1] 0.724125
## [1] 0.755683
## [1] 0.745583
## [1] 0.714642
## [1] 0.613025
## [1] 0.549912
## [1] 0.623125
## [1] 0.690017
## [1] 0.70645
## [1] 0.654054
## [1] 0.739263
## [1] 0.734217
## [1] 0.697604
## [1] 0.667933
## [1] 0.684987
## [1] 0.662896
## [1] 0.667308
## [1] 0.707088
## [1] 0.722867
## [1] 0.751267
## [1] 0.731079
## [1] 0.710246
## [1] 0.697621
## [1] 0.707717
## [1] 0.699508
## [1] 0.667942
## [1] 0.638267
## [1] 0.644579
## [1] 0.662254
## [1] 0.676779
## [1] 0.654037
## [1] 0.654688
## [1] 0.2424
## [1] 0.618071
## [1] 0.603554
## [1] 0.595967
## [1] 0.601025
## [1] 0.621854
## [1] 0.637008
## [1] 0.6471
## [1] 0.618696
## [1] 0.595996
## [1] 0.654688
## [1] 0.66605
## [1] 0.635733
## [1] 0.652779
## [1] 0.6894
## [1] 0.702654
## [1] 0.649
## [1] 0.661629
## [1] 0.686888
## [1] 0.708983
## [1] 0.655329
## [1] 0.657204
## [1] 0.611121
## [1] 0.578925
## [1] 0.565654
## [1] 0.554292
## [1] 0.570075
## [1] 0.579558
## [1] 0.594083
## [1] 0.585867
## [1] 0.563125
## [1] 0.55305
## [1] 0.565067
## [1] 0.540404
## [1] 0.532192
## [1] 0.571971
## [1] 0.610488
## [1] 0.518933
## [1] 0.502513
## [1] 0.544179
## [1] 0.596613
## [1] 0.607975
## [1] 0.585863
## [1] 0.530296
## [1] 0.517663
## [1] 0.512
## [1] 0.542333
## [1] 0.599133
## [1] 0.607975
## [1] 0.580187
## [1] 0.538521
## [1] 0.419813
## [1] 0.387608
## [1] 0.438112
## [1] 0.503142
## [1] 0.431167
## [1] 0.433071
## [1] 0.391396
## [1] 0.508204
## [1] 0.53915
## [1] 0.460846
## [1] 0.450108
## [1] 0.512625
## [1] 0.537896
## [1] 0.472842
## [1] 0.456429
## [1] 0.482942
## [1] 0.530304
## [1] 0.558721
## [1] 0.529688
## [1] 0.52275
## [1] 0.515133
## [1] 0.467771
## [1] 0.4394
## [1] 0.309909
## [1] 0.3611
## [1] 0.369942
## [1] 0.356042
## [1] 0.323846
## [1] 0.329538
## [1] 0.308075
## [1] 0.281567
## [1] 0.274621
## [1] 0.341891
## [1] 0.355413
## [1] 0.393937
## [1] 0.421713
## [1] 0.475383
## [1] 0.323225
## [1] 0.281563
## [1] 0.324492
## [1] 0.347204
## [1] 0.326383
## [1] 0.337746
## [1] 0.375621
## [1] 0.380667
## [1] 0.364892
## [1] 0.350371
## [1] 0.378779
## [1] 0.248742
## [1] 0.257583
## [1] 0.339004
## [1] 0.281558
## [1] 0.289762
## [1] 0.298422
## [1] 0.323867
## [1] 0.316904
## [1] 0.359208
## [1] 0.455796
## [1] 0.469054
## [1] 0.428012
## [1] 0.258204
## [1] 0.321958
## [1] 0.389508
## [1] 0.390146
## [1] 0.435575
## [1] 0.338363
## [1] 0.297338
## [1] 0.294188
## [1] 0.294192
## [1] 0.338383
## [1] 0.369938
## [1] 0.4015
## [1] 0.409708
## [1] 0.342162
## [1] 0.335217
## [1] 0.301767
## [1] 0.236113
## [1] 0.259471
## [1] 0.2589
## [1] 0.294465
## [1] 0.220333
## [1] 0.226642
## [1] 0.255046
## [1] 0.2424
## [1] 0.2317
## [1] 0.223487

Round each number in atemp

output <- vector("double", ncol(bikeshare)) #1.output
for (i in seq_along(bikeshare$atemp)) { #2. sequence
  output[[i]] <- round(bikeshare$atemp[[i]], 2) #3. body
}
output

##   [1] 0.36 0.35 0.19 0.21 0.23 0.23 0.21 0.16 0.12 0.15 0.19 0.16 0.15 0.19
##  [15] 0.25 0.23 0.18 0.23 0.30 0.26 0.16 0.08 0.10 0.12 0.23 0.20 0.22 0.22
##  [29] 0.21 0.25 0.19 0.23 0.25 0.18 0.23 0.24 0.29 0.30 0.20 0.14 0.15 0.21
##  [43] 0.23 0.32 0.40 0.25 0.32 0.43 0.51 0.39 0.28 0.28 0.19 0.25 0.29 0.35
##  [57] 0.28 0.35 0.40 0.26 0.32 0.20 0.26 0.38 0.37 0.24 0.30 0.29 0.39 0.30
##  [71] 0.33 0.38 0.33 0.32 0.37 0.41 0.53 0.47 0.33 0.41 0.44 0.34 0.27 0.26
##  [85] 0.26 0.25 0.26 0.29 0.30 0.26 0.28 0.32 0.38 0.54 0.40 0.39 0.43 0.32
##  [99] 0.34 0.43 0.57 0.49 0.42 0.46 0.44 0.43 0.45 0.50 0.49 0.56 0.45 0.32
## [113] 0.45 0.55 0.57 0.59 0.58 0.58 0.50 0.46 0.45 0.53 0.58 0.40 0.44 0.47
## [127] 0.51 0.52 0.53 0.52 0.53 0.52 0.49 0.50 0.54 0.55 0.54 0.53 0.51 0.53
## [141] 0.57 0.57 0.59 0.60 0.62 0.65 0.64 0.61 0.62 0.67 0.73 0.72 0.64 0.59
## [155] 0.59 0.62 0.62 0.66 0.73 0.76 0.70 0.68 0.64 0.60 0.59 0.59 0.60 0.60
## [169] 0.64 0.65 0.60 0.64 0.69 0.69 0.66 0.64 0.64 0.64 0.69 0.65 0.64 0.65
## [183] 0.67 0.67 0.67 0.70 0.69 0.69 0.67 0.66 0.69 0.73 0.74 0.69 0.64 0.62
## [197] 0.64 0.67 0.70 0.75 0.75 0.83 0.84 0.80 0.79 0.72 0.70 0.69 0.74 0.79
## [211] 0.73 0.73 0.70 0.71 0.68 0.66 0.66 0.68 0.72 0.70 0.72 0.68 0.65 0.65
## [225] 0.65 0.62 0.62 0.65 0.67 0.66 0.63 0.65 0.68 0.64 0.61 0.63 0.65 0.66
## [239] 0.64 0.65 0.61 0.59 0.61 0.61 0.60 0.63 0.67 0.63 0.52 0.54 0.56 0.58
## [253] 0.61 0.61 0.60 0.60 0.63 0.55 0.46 0.48 0.49 0.53 0.53 0.55 0.55 0.52
## [267] 0.56 0.57 0.59 0.57 0.58 0.57 0.54 0.41 0.35 0.39 0.47 0.53 0.48 0.50
## [281] 0.51 0.52 0.54 0.55 0.52 0.55 0.53 0.50 0.50 0.51 0.52 0.51 0.47 0.42
## [295] 0.43 0.42 0.46 0.46 0.47 0.46 0.32 0.23 0.32 0.36 0.40 0.39 0.41 0.40
## [309] 0.32 0.36 0.40 0.41 0.41 0.37 0.31 0.36 0.43 0.52 0.51 0.45 0.32 0.27
## [323] 0.32 0.46 0.45 0.42 0.43 0.37 0.38 0.39 0.46 0.49 0.45 0.31 0.31 0.33
## [337] 0.31 0.35 0.39 0.46 0.40 0.26 0.32 0.27 0.25 0.27 0.30 0.34 0.41 0.36
## [351] 0.25 0.25 0.28 0.40 0.43 0.43 0.38 0.30 0.28 0.32 0.33 0.28 0.26 0.32
## [365] 0.41 0.38 0.25 0.13 0.12 0.28 0.34 0.39 0.34 0.25 0.32 0.28 0.38 0.25
## [379] 0.18 0.16 0.19 0.36 0.28 0.19 0.22 0.17 0.16 0.24 0.35 0.29 0.36 0.42
## [393] 0.33 0.27 0.26 0.38 0.47 0.40 0.31 0.27 0.26 0.30 0.36 0.27 0.26 0.29
## [407] 0.21 0.10 0.23 0.33 0.35 0.33 0.35 0.36 0.27 0.27 0.30 0.39 0.44 0.41
## [421] 0.26 0.27 0.36 0.35 0.35 0.48 0.36 0.41 0.30 0.24 0.26 0.39 0.52 0.40
## [435] 0.28 0.36 0.46 0.54 0.55 0.53 0.44 0.51 0.46 0.53 0.54 0.51 0.53 0.57
## [449] 0.49 0.44 0.44 0.32 0.47 0.48 0.38 0.42 0.42 0.43 0.46 0.53 0.43 0.39
## [463] 0.43 0.49 0.48 0.44 0.34 0.39 0.43 0.49 0.57 0.61 0.60 0.46 0.49 0.52
## [477] 0.54 0.39 0.30 0.41 0.47 0.48 0.45 0.38 0.45 0.46 0.58 0.54 0.54 0.59
## [491] 0.58 0.55 0.53 0.56 0.55 0.49 0.52 0.54 0.59 0.55 0.58 0.60 0.57 0.55
## [505] 0.57 0.58 0.57 0.58 0.58 0.61 0.63 0.64 0.64 0.68 0.67 0.61 0.63 0.61
## [519] 0.57 0.58 0.58 0.53 0.54 0.57 0.60 0.65 0.66 0.66 0.60 0.61 0.62 0.60
## [533] 0.59 0.57 0.54 0.65 0.72 0.75 0.72 0.65 0.67 0.65 0.59 0.64 0.68 0.79
## [547] 0.69 0.75 0.70 0.70 0.73 0.76 0.75 0.80 0.79 0.65 0.66 0.65 0.65 0.67
## [561] 0.67 0.71 0.72 0.76 0.75 0.71 0.61 0.55 0.62 0.69 0.71 0.65 0.74 0.73
## [575] 0.70 0.67 0.68 0.66 0.67 0.71 0.72 0.75 0.73 0.71 0.70 0.71 0.70 0.67
## [589] 0.64 0.64 0.66 0.68 0.65 0.65 0.24 0.62 0.60 0.60 0.60 0.62 0.64 0.65
## [603] 0.62 0.60 0.65 0.67 0.64 0.65 0.69 0.70 0.65 0.66 0.69 0.71 0.66 0.66
## [617] 0.61 0.58 0.57 0.55 0.57 0.58 0.59 0.59 0.56 0.55 0.57 0.54 0.53 0.57
## [631] 0.61 0.52 0.50 0.54 0.60 0.61 0.59 0.53 0.52 0.51 0.54 0.60 0.61 0.58
## [645] 0.54 0.42 0.39 0.44 0.50 0.43 0.43 0.39 0.51 0.54 0.46 0.45 0.51 0.54
## [659] 0.47 0.46 0.48 0.53 0.56 0.53 0.52 0.52 0.47 0.44 0.31 0.36 0.37 0.36
## [673] 0.32 0.33 0.31 0.28 0.27 0.34 0.36 0.39 0.42 0.48 0.32 0.28 0.32 0.35
## [687] 0.33 0.34 0.38 0.38 0.36 0.35 0.38 0.25 0.26 0.34 0.28 0.29 0.30 0.32
## [701] 0.32 0.36 0.46 0.47 0.43 0.26 0.32 0.39 0.39 0.44 0.34 0.30 0.29 0.29
## [715] 0.34 0.37 0.40 0.41 0.34 0.34 0.30 0.24 0.26 0.26 0.29 0.22 0.23 0.26
## [729] 0.24 0.23 0.22

#simple way to round without a loop
#atemp_rounded<- round(bikeshare$atemp, 2)

Q. Exercise - Conditionals

Review of Boolean variables and logical operators

3 > 4

## [1] FALSE

c(1, 2, 3, 4, 5) > 4

## [1] FALSE FALSE FALSE FALSE  TRUE

c(1, 2, 3, 4, 6) == 3

## [1] FALSE FALSE  TRUE FALSE FALSE

Conditional statements using if/else logic

prices <- c(12.43, 9.99, 18.22, 7.25, 0.50)
numCheap <- 0
for (p in prices){
    if (p < 10){
        numCheap <- numCheap + 1
    }
}  
print(numCheap)

## [1] 3

Alternative approach

prices <- c(12.43, 9.99, 18.22, 7.25, 0.50, 11)
sum(prices < 10)

## [1] 3

R. Homework: Determine the average ridership

Write a script to determine the average ridership on weekends versus weekdays. Next, let’s imagine it costs $10 per day to rent a bike on a weekday and $12 on a weekend. What is the annual weekday rental revenue in 2011 and 2012? What is the annual weekend revenue in 2011 and 2012?

Hint: Use a for loop and if/else logic.

S. Exercise: Functions

Some funcions are built in such as:

sqrt(25)

## [1] 5

mean(c(1,2,3,4,5))

## [1] 3

toupper("hello world")

## [1] "HELLO WORLD"

Write your own function. Here’s a an example of the form… with one minor error…

f <- function(x) x + 2
f(3)

## [1] 5

#f("hello world") # causes an error because we need the parameter as a numeric.

Mulitple arguments

addTogether <- function(x, y) x + y
addTogether(5, 10)

## [1] 15

addTogether(x = 5, y = 10) #alternative

## [1] 15

Multi-line functions

f <- function(x){
    y <- x^2 
  z <- y/2
  z
}
f(2)

## [1] 2

You try it: Write a function that averages two numbers

avg <- function(x,y){
    (x + y)/2
}
avg(1,2)

## [1] 1.5

You can apply functions over vectors

f <- function(x) x^2
sapply(c(1,2,3,4,5),f)

## [1]  1  4  9 16 25

Try it using the bikeshare data.

f <- function(x) x^2
sapply(bikeshare$atemp,f)

##   [1] 0.132223141 0.125131280 0.035874254 0.044995743 0.052564733
##   [6] 0.054386438 0.043613728 0.026326361 0.013496631 0.022767189
##  [11] 0.036658463 0.025751584 0.022765680 0.035499459 0.061559565
##  [16] 0.054857603 0.031247986 0.053978623 0.089055690 0.065050502
##  [21] 0.024911256 0.006252002 0.009769168 0.013907485 0.055002445
##  [26] 0.041452960 0.048268090 0.049870482 0.044997440 0.062661104
##  [31] 0.034689062 0.055004321 0.064728010 0.031640583 0.052252017
##  [36] 0.059077191 0.085071972 0.092208181 0.039301477 0.020817584
##  [41] 0.022364604 0.045586093 0.054267566 0.105049237 0.158682722
##  [46] 0.064655267 0.099982440 0.183747681 0.262126592 0.153197091
##  [51] 0.076911929 0.080698606 0.034608277 0.060376844 0.083631434
##  [56] 0.122822913 0.079632325 0.123277530 0.160094414 0.069632127
##  [61] 0.102445445 0.040053218 0.065371751 0.143473531 0.134140528
##  [66] 0.056863649 0.091445760 0.082144146 0.148739806 0.093025000
##  [71] 0.106113062 0.144469168 0.110224000 0.101237240 0.134637625
##  [76] 0.168373171 0.277738486 0.217645576 0.106113062 0.167882770
##  [81] 0.194165372 0.114202768 0.073350514 0.065695841 0.066342820
##  [86] 0.062669615 0.066344365 0.085795096 0.088417022 0.066344881
##  [91] 0.080346170 0.099626716 0.143464440 0.294771899 0.158682722
##  [96] 0.150239962 0.188092220 0.105286621 0.116642058 0.182104467
## [101] 0.319470257 0.243102247 0.174125102 0.214130159 0.195287100
## [106] 0.181043442 0.198644924 0.253155897 0.239373391 0.318538330
## [111] 0.206017948 0.103654378 0.202608915 0.304442408 0.330050250
## [116] 0.352934611 0.330788320 0.335158787 0.247469436 0.215315488
## [121] 0.200886826 0.283911006 0.338815962 0.163741623 0.195290635
## [126] 0.224786930 0.262780290 0.269291458 0.275883361 0.273237244
## [131] 0.279206560 0.273908830 0.244332490 0.250629396 0.287296000
## [136] 0.303063462 0.290013484 0.277895557 0.260857391 0.279885438
## [141] 0.327155401 0.330050250 0.348449368 0.365798765 0.378891954
## [146] 0.428616377 0.405779192 0.375008040 0.378901803 0.450364472
## [151] 0.526180497 0.519793415 0.414661299 0.344725160 0.353663332
## [156] 0.380447174 0.386707372 0.430270403 0.528934744 0.573925941
## [161] 0.494619637 0.459735529 0.413867056 0.361987536 0.349926670
## [166] 0.345454765 0.354436860 0.360459747 0.414676754 0.417117056
## [171] 0.354436860 0.406592421 0.481398681 0.481404232 0.431101236
## [176] 0.413851616 0.406570742 0.405774096 0.479636583 0.428616377
## [181] 0.405779192 0.425315274 0.445299967 0.446992531 0.442779784
## [186] 0.484886610 0.470092611 0.471791771 0.449547453 0.441105849
## [191] 0.476134501 0.532613878 0.546527526 0.475277875 0.403357091
## [196] 0.389839146 0.407379657 0.448676248 0.495510406 0.558724855
## [201] 0.557784923 0.682889030 0.707106083 0.646877578 0.631753139
## [206] 0.519780438 0.485779726 0.477020905 0.547452010 0.617744125
## [211] 0.530766160 0.532602202 0.494619637 0.499949399 0.462314324
## [216] 0.441943085 0.431080225 0.457184232 0.511642645 0.494606978
## [221] 0.524351223 0.469201710 0.424479613 0.427770938 0.417132556
## [226] 0.389860375 0.379661772 0.417105431 0.444450222 0.438585659
## [231] 0.400968835 0.421195808 0.456334026 0.407368169 0.367317208
## [236] 0.397772399 0.417127389 0.435247631 0.403931429 0.419850866
## [241] 0.369612930 0.353672848 0.373468877 0.378127836 0.365792717
## [246] 0.400958703 0.442795754 0.391432917 0.265431040 0.296185204
## [251] 0.308425840 0.335178471 0.369617793 0.371159974 0.362560537
## [256] 0.364277431 0.393003610 0.306551576 0.212959176 0.228973734
## [261] 0.240626548 0.280555606 0.283254935 0.303086584 0.307983931
## [266] 0.272614516 0.318560906 0.327913134 0.346970478 0.330078976
## [271] 0.330806725 0.330064038 0.296838639 0.170455857 0.119243830
## [276] 0.153700066 0.223594688 0.277874471 0.230808181 0.254423395
## [281] 0.263417351 0.274558184 0.294767556 0.298220841 0.268030892
## [286] 0.304487654 0.280555606 0.248726626 0.253163948 0.260840026
## [291] 0.273237244 0.264039767 0.217645576 0.179433571 0.181043442
## [296] 0.178365163 0.208910242 0.214716391 0.223583340 0.208891046
## [301] 0.101641091 0.051944336 0.103252326 0.126780860 0.157678880
## [306] 0.152203758 0.164771858 0.162725106 0.104881413 0.131303320
## [311] 0.160697559 0.169946765 0.167345628 0.139667386 0.094136671
## [316] 0.128122475 0.185373303 0.275217751 0.257636441 0.204293152
## [321] 0.104471815 0.074376744 0.105289217 0.208902015 0.198080184
## [326] 0.177827516 0.185362108 0.138734646 0.144910410 0.148291998
## [331] 0.207753640 0.240219575 0.203739391 0.096858511 0.093363247
## [336] 0.109847833 0.096474845 0.121870810 0.155176906 0.208300960
## [341] 0.160196861 0.066017136 0.100832922 0.070975354 0.064086948
## [346] 0.073005878 0.090684095 0.114488843 0.169939344 0.129474031
## [351] 0.062185896 0.060309045 0.078923350 0.157175774 0.183198552
## [356] 0.181579107 0.142516065 0.089545775 0.078378162 0.099562336
## [361] 0.107343383 0.078385441 0.069638988 0.101641091 0.171496203
## [366] 0.141091136 0.063657308 0.015945376 0.014241320 0.077513242
## [371] 0.115781631 0.152708227 0.115775507 0.061245855 0.101650018
## [376] 0.079987718 0.145876636 0.062181407 0.033520850 0.026122641
## [381] 0.036352380 0.132698461 0.075764765 0.036114441 0.048822438
## [386] 0.030581266 0.026325063 0.059077191 0.121876396 0.086919422
## [391] 0.126771602 0.172543037 0.106523252 0.074376744 0.068971891
## [396] 0.145402654 0.217657705 0.159177859 0.095694948 0.074378926
## [401] 0.069971359 0.087868373 0.130396099 0.070980149 0.068637712
## [406] 0.086176299 0.044464892 0.010334349 0.051944336 0.111519931
## [411] 0.123642954 0.109006946 0.123642954 0.126326931 0.070643261
## [416] 0.074742639 0.087091683 0.154187373 0.197532247 0.168897163
## [421] 0.065369706 0.071989183 0.128131066 0.124979926 0.121431341
## [426] 0.225977588 0.129486265 0.170975634 0.091821726 0.058163451
## [431] 0.065046422 0.148302010 0.275209357 0.157674909 0.077154506
## [436] 0.129362509 0.211224806 0.294771899 0.300980613 0.283902481
## [441] 0.190295740 0.255071462 0.215296000 0.283898218 0.290017792
## [446] 0.263433775 0.282563475 0.324976384 0.236909013 0.191395750
## [451] 0.192501562 0.099637448 0.221793902 0.232617148 0.141091136
## [456] 0.177837637 0.174128440 0.182767365 0.212966559 0.284568902
## [461] 0.185901533 0.152698848 0.181585925 0.242482381 0.227183783
## [466] 0.190299230 0.113753751 0.150236861 0.186458149 0.238140096
## [471] 0.329332516 0.378132756 0.358186689 0.208883733 0.243094358
## [476] 0.266023851 0.294763212 0.151713366 0.090676266 0.164254310
## [481] 0.221198080 0.233852518 0.204880254 0.142509270 0.202608915
## [486] 0.209485628 0.332953234 0.289332107 0.288628967 0.349182901
## [491] 0.341766514 0.298921347 0.277878688 0.310773916 0.305836651
## [496] 0.241850519 0.271267014 0.296825563 0.342503517 0.302390010
## [501] 0.332241571 0.355186201 0.327885648 0.303734357 0.321384680
## [506] 0.341017457 0.319979155 0.337357681 0.341771191 0.368084890
## [511] 0.393792646 0.413058148 0.411426031 0.461448490 0.452918232
## [516] 0.373478655 0.398576306 0.369617793 0.320682099 0.330777968
## [521] 0.334411228 0.276562396 0.294080613 0.324264191 0.357438971
## [526] 0.420379767 0.440254809 0.435231798 0.357454516 0.373463988
## [531] 0.389854131 0.359704861 0.353677605 0.327155401 0.296852805
## [536] 0.428621615 0.519804951 0.566319462 0.524351223 0.426137395
## [541] 0.454618457 0.427770938 0.353672848 0.410614387 0.456316462
## [546] 0.618760012 0.472667250 0.563443896 0.492857353 0.493717022
## [551] 0.536317482 0.579679709 0.566305916 0.647884938 0.624725837
## [556] 0.427786635 0.441953722 0.422852373 0.428609830 0.446134492
## [561] 0.443611946 0.497301398 0.524357016 0.571056796 0.555894010
## [566] 0.510713188 0.375799651 0.302403208 0.388284766 0.476123460
## [571] 0.499071603 0.427786635 0.546509783 0.539074603 0.486651341
## [576] 0.446134492 0.469207190 0.439431107 0.445299967 0.499973440
## [581] 0.522536700 0.564402105 0.534476504 0.504449381 0.486675060
## [586] 0.500863352 0.489311442 0.446146515 0.407384763 0.415482087
## [591] 0.438580361 0.458029815 0.427764397 0.428616377 0.058757760
## [596] 0.382011761 0.364277431 0.355176665 0.361231051 0.386702397
## [601] 0.405779192 0.418738410 0.382784740 0.355211232 0.428616377
## [606] 0.443622603 0.404156447 0.426120423 0.475272360 0.493722644
## [611] 0.421201000 0.437752934 0.471815125 0.502656894 0.429456098
## [616] 0.431917098 0.373468877 0.335154156 0.319964448 0.307239621
## [621] 0.324985506 0.335887475 0.352934611 0.343240142 0.317109766
## [626] 0.305864303 0.319300714 0.292036483 0.283228325 0.327150825
## [631] 0.372695598 0.269291458 0.252519315 0.296130784 0.355947072
## [636] 0.369633601 0.343235455 0.281213848 0.267974982 0.262144000
## [641] 0.294125083 0.358960352 0.369633601 0.336616955 0.290004867
## [646] 0.176242955 0.150239962 0.191942125 0.253151872 0.185904982
## [651] 0.187550491 0.153190829 0.258271306 0.290682723 0.212379036
## [656] 0.202597212 0.262784391 0.289332107 0.223579557 0.208327432
## [661] 0.233232975 0.281222332 0.312169156 0.280569377 0.273267563
## [666] 0.265362008 0.218809708 0.193072360 0.096043588 0.130393210
## [671] 0.136857083 0.126765906 0.104876232 0.108595293 0.094910206
## [676] 0.079279975 0.075416694 0.116889456 0.126318401 0.155186360
## [681] 0.177841854 0.225988997 0.104474401 0.079277723 0.105295058
## [686] 0.120550618 0.106525863 0.114072361 0.141091136 0.144907365
## [691] 0.133146172 0.122759838 0.143473531 0.061872583 0.066349002
## [696] 0.114923712 0.079274907 0.083962017 0.089055690 0.104889834
## [701] 0.100428145 0.129030387 0.207749994 0.220011655 0.183194272
## [706] 0.066669306 0.103656954 0.151716482 0.152213901 0.189725581
## [711] 0.114489520 0.088409886 0.086546579 0.086548933 0.114503055
## [716] 0.136854124 0.161202250 0.167860645 0.117074834 0.112370437
## [721] 0.091063322 0.055749349 0.067325200 0.067029210 0.086709636
## [726] 0.048546631 0.051366596 0.065048462 0.058757760 0.053684890
## [731] 0.049946439

T. Challenge: Optional

Use temp and humidty to calculate the heat index for temperatures >=80

Use the data from: https://www.weather.gov/media/unr/heatindex.pdf

BUILDING APPS IN R

U. Exercise - Shiny

The first thing you need to do is to make sure you have the Shiny package installed and enabled. Next, we will look at the basic operation of a Shiny app.

The basic structure of a Shiny app consists of a folder in the working directory of R, for example: app_1. That folder then contains two R script files, server.R and ui.R. Server.R contains the R commands that govern the server in performing calculations, analyzing data, and creating visualizations. ui.R contains the instructions for layout of the user interface and controls the interaction with the user. The app is then launched with the command runApp(“app name here”). When you run an app, you can no longer interact with the command line interface of R, as the runApp command is continuously running to be able to respond to commands from the user interface.

The server.R script contains the instructions that your computer needs to build your app.

Review: Shiny apps have a basic file structure:

Directory (my_app)
Any data files
ui.R
server.R *** ###Quick Demo Example (Try it)

There is a set of built in examples included with Shiny. Let’s look at the first one, a basic histogram. To run it, first make sure the Shiny package is installed and enabled, then run:

library(shiny)
runExample("01_hello")

This example bring up a sample histogram, and a slider to control the bin size. You can also see the code being used to generate the histogram and slider below the histogram.

If you look at the console window of R Studio, you will see a small STOP sign icon. If you click that button, it will stop the execution of the sever code, and allow you to interact with R Studio again. *** ###Other shiny examples

library(shiny)
#runExample("02_text")
#runExample("05_sliders")
#runExample("06_tabsets")
#runExample("07_widgets")

Modifying the Example to use the Bikeshare data

Now try making your own histogram using this example as a base, but substituting in the bikesharing dataset. First create a new directory in your working directory called histogram. Next, copy and paste the code below into new R script files.
***

First, the server.R file:

library(shiny)

# Define server logic required to draw a histogram
shinyServer(function(input, output) {
  
  # Expression that generates a histogram. The expression is
  # wrapped in a call to renderPlot to indicate that:
  #
  #  1) It is "reactive" and therefore should re-execute automatically
  #     when inputs change
  #  2) Its output type is a plot
  
  output$distPlot <- renderPlot({
    x    <- faithful[, 2]  # Old Faithful Geyser data
    bins <- seq(min(x), max(x), length.out = input$bins + 1)
    
    # draw the histogram with the specified number of bins
    hist(x, breaks = bins, col = 'darkgray', border = 'white')
  })
})

And now ui.R

library(shiny)

# Define UI for application that draws a histogram
shinyUI(fluidPage(
  
  # Application title
  titlePanel("Hello Shiny!"),
  
  # Sidebar with a slider input for the number of bins
  sidebarLayout(
    sidebarPanel(
      sliderInput("bins",
                  "Number of bins:",
                  min = 1,
                  max = 50,
                  value = 30)
    ),
    
    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("distPlot")
    )
  )
))

You can read the comments in the code to get a basic idea of what each part does. The ui.R file looks pretty good, except we want to change the title. We’ll edit “Hello Shiny” to say “Capital Bikeshare data: Frequency of users”. Now we’ll pull in the bikesharing data, creating a histogram of the frequency of users. The line of the code that pulls in the data is in the server.R file, and is below:

    x    <- faithful[, 2]  # Old Faithful Geyser data

This needs to be changed to pull in the bike sharing data instead. Also, if you notice, the next line calculates the bin length. That calculation will fail if there are any “NA” values in the data, so the NA values need to be dropped. Assuming that the data is already imported into a tibble called “bikeshare”, the new code will look like this:

    x <-  (bikeshare$cnt)

Now if you try running the histogram code you should see a histogram formed with bikesharing data. You should also add a new title and axis labels to the chart, finishing with something like this:

Elements of a Shiny App

As previously stated, the code that runs a Shiny app resides in two files, server.R for the server commands, and ui.R for the User Interface. We’ll now explore the elements that are used for building the apps are deployed and interact with one another.

The User Interface: ui.R

The user interface begins with the fluidPage function. This function creates a blank webpage which is automatically sized to the browser window. Next, panels are embedded in the webpage using the fluidPage function. It it common to use a title panel, and the sidebarLayout function. The sidebarLayout function requires a sidebarPanel and mainPanel. Each of the above functions take arguments, which can be in the form of non-interactive text, or much more advanced functions. Here is a minimal example, using only non-interactive text:

shinyUI(fluidPage(
  titlePanel("Hello Shiny!"),
  sidebarLayout(
    sidebarPanel("Hello from sidebarPanel"),
    mainPanel("Hello from mainPanel")
)
))

Create a directory called “app1” in your working directory and save the above code to ui.R

The Basic Server File: server.R

A minimal serve file consists of the shinyServer function, which serves to receive input from and deliver output to the User Interface. The code is shown below.

shinyServer(function(input,output) {
}
)

Copy and paste that code to a file called server.R in the app1 directory. You now have a Shiny app that will display static text in a title panel, a side panel, and a main panel. The server will also be listening for input from the UI. If you run app1, you should see the webpage shown below.

Reviewing the Histogram Example

The code in the histogram example should make more sense now, and it is also a good example of some simple interaction between the UI and the server. Let’s review it. (All the comments have been removed for a more compact layout.)

shinyUI(fluidPage(
  
  titlePanel("Bike sharing rental frequency"),

  sidebarLayout(
    sidebarPanel(

Everything above is the same as our simple “app1” example. However, the next lines in the code defines the slider widget.

      sliderInput("bins",
                  "Number of bins:",
                  min = 1,
                  max = 50,
                  value = 30)
    ),

There are a few new things going on in the above code. First, the sliderInput function is a UI widget. There are a number available, predefined widgets which you can use to build your page. There is a specific tutorial on building widgets at R Studio, and a gallery of available widgets here.

The next thing to notice is in the first line of the code, the sliderInput function is making the value of the slider available to the server through the variable named “bins.” This is how the UI interacts with the server. The remaining lines of code are the values and text that are displayed on the slider.

    mainPanel(
      plotOutput("distPlot")
    )
  )
))

The next lines of code illustrate how output from the server is displayed in the UI. The mainPanel is displaying the plot, which is defined in the server code. The final lines of code in the UI file are quite straightforward, and are just closing braces for the various functions in the UI file.

Now let’s take another look at the server.R file.

shinyServer(function(input, output) {

  output$distPlot <- renderPlot({
    x <-  (bikeshare$cnt)
    bins <- seq(min(x), max(x), length.out = input$bins + 1)

The server file opens with the shinyServer function which is the basic function to set the server listening for input and output. The next line defines the output as the distPlot function, which is referenced in the UI file, in the mainPanel. distPlot is defined by the renderPlot function, and as noted in the comments(which were removed), the renderPlot function will cause the plot to be redrawn automatically when one of the inputs changes. These are the functions that add interactivity to your R code, and they are explain in much more detail here.

The next line contains our data, but the following line is a little more complicated. It defines the break points for the histogram bins, using a sequence from the min x to the max x, with the number of break points dictated by the input from the slider. Now the entire sequence should be clear. Moving the slider will cause the UI to update the bins variable to the server. The server will take that input, recalculate the bin size, update the plot, and return the plot to the UI for the display to be redrawn.

hist(x, breaks = bins, col = 'darkgray', border = 'white', main='Frequency of bike sharing rentals', ylab="Count", xlab='Rentals')
  })
})

The remaining code is simply standard R for drawing a histogram, and the closing braces indicating the end of the various functions.

Publishing Your Shiny App

Once you have built your interactive display, there are several ways to share it. First, you can just continue to run it from the R console. There are several reasons why you might want to do this, including data privacy. Your data and visualization can be viewed by anyone with an R console, as long as you share your shiny app and the original data set.

A more complicated method of sharing would be to set up your own web server, and resources are available to help you do this. Github is also a popular choice for hosting. Finally, there is also shinyapps.io, which is hosted by R Studio, and has hooks for direct publishing of your apps from with the R Studio program.

Useful Shiny References

1. Shiny Tutorial

2. Gallerys (Visualizations, Widgets, Layouts, etc.)

3. Function Reference

V. Homework: Devise the problem, challenge, and/or questions

At this point in the process, you should have gained enough insight to frame a question to guide the rest of your analysis. Sometimes you don’t know what to ask of the data and other times the questions you have cannot be answered by the data that you have. In most visual analytical explorations there will be a back and forth between defining the questions and identifying the data sources that have contain the information you need to extract. ***

Often your question will fall into one of three categories: Past, present, or future.

Some questions that can guide an historical analysis of past events are:

Do weather conditions affect rental behaviors?
Does the precipitation, day of week, season, hour of the day, etc. affect rental behavior?
Which weather conditions affect behavior the most? Do they differ by season?

These questions serve a purpose of guiding reports, where the analyst is reporting on past events.

A question based on the present is:

How many bikes were rent in the past hour or today?

This type of question is reserved for producing a current state of an event.

Can we answer this question?

The data we are using cannot answer this question since it is historical data from 2011 and 2012.

A question about the future could be framed as the following:

Will bike rentals be higher in the summer rather than the winter due to weather?

Questions about the future using involve analysis that requires prediction or forecasting methods. The analyst in this case is trying to predict the future from past data.

To complete on your own. ###Try to answer the following questions. Show your work as a data visualization.

Do weather conditions affect rental behaviors?
Does the precipitation, day of week, season, hour of the day, etc. affect rental behavior?
Which weather conditions affect behavior the most? Do they differ by season

FURTHER STUDY

As a next step, I encourage you to select a data set from one of the resources provided below.

General Datasets

UCI Machine Learning Repository: Consists of diverse field of datasets (360 datasets currently and still growing) for the purpose of performing analytics and machine learning algorithms. http://archive.ics.uci.edu/ml/
Kaggle datasets: Perfect for exploring data through visualization. https://www.kaggle.com/datasets
Amazon Public Dataset: These are large dataset which deals with dataset with memory in Gbs or Tbs. https://aws.amazon.com/public-datasets/
Google Public Data: A set of dataset provided by Google, including Book corpus, US names, Genome dataset, BIgQuery dataset, and many more. https://cloud.google.com/public-datasets/
Open Data by Socrata: Thousands of free dataset for exploration. https://opendata.socrata.com/
Data.gov: A website dedicated to supply datasets of different domains, eg. Education, Nutrient, Sports. https://catalog.data.gov/dataset?res_format=CSV
Datahub: Just as its tagline, “The easy way to get, share data”. https://datahub.io/dataset?tags=weather
Harvard Dataverse: Find most of the datasets used for research purpose, and cited in different publications. https://dataverse.harvard.edu/

Challenges based dataset

KDD Data Center: Have a problem coming up with a problem statement? No worries, KDD provides you with the dataset and problem statements through its challenges. http://www.kdd.org/kdd-cup
CrowdAnalytics: More challenges to solve with dataset. https://www.crowdanalytix.com/community
DataDriven: Problem for data scientist to solve. https://www.drivendata.org/competitions/
Big Data Innovation Challenge: Tackle real problem with these analytics, and also win a challenge. https://bigdatainnovationchallenge.org/challenges/food-security-nutrition/

Census Dataset

Open Census Data: Details of population in different cities of countries is just a click away with this open data. http://census.okfn.org/en/latest/
Census.gov: Census data of United States. http://www.census.gov/data.html

Weather/Climate dataset

Wunderground: Want to work with weather data? Use Wunderground’s API to get your own dataset. https://www.wunderground.com/weather/api/
National Center for Environmental Information: Climate datasets available for analytics. https://www.ncdc.noaa.gov/cdo-web/datasets

News Dataset

BBC Dataset: It consists of documents from the BBC news website corresponding to stories in five topical areas. http://mlg.ucd.ie/datasets/bbc.html
The Guardian: A collection of news datasets from the guardian, which is updated regularly. https://www.theguardian.com/news/datablog/interactive/2013/jan/14/all-our-datasets-index

Food, and Nutrition Datasets

United States Department of Agriculture: The dataset are provided by the Center of Nutritional Policy and Promotion giving details about food prices dataset, health eating index. https://www.cnpp.usda.gov/data

Nutritional Science Blog: A blog listing some of dataset relating to the domain of nutrition. http://nutsci.org/open-nutrition-food-data/

HOMEWORK

Complete items N, R, and V. Submit on NYU Classes > Assignments ***

R basics - Working with data

Kristen Sosulski