R Introduction

Illya Mowerman, Ph.D.

University of Bridgeport

What is an object?

An object:

  • Contains data
  • May have a hierarchy
  • Has a name
  • Has a class (is of a type)

Examples of types of objects

  • Vector
  • Dataframe
  • Graph
  • Function
  • List

Packages

Packages are functions that are created be the R community and have made them available to all

Installng packages is easy:

install.packages('dplyr' , repos='http://cran.us.r-project.org')

The downloaded binary packages are in
    /var/folders/8c/w4htphd93r7cq7tcyp65xr780000gn/T//Rtmply3yn0/downloaded_packages

Loading Packages is even easier

To be able to use the functions of a package, you must load it first

library(dplyr)

Highly used packages for this class

  • tidyverse (tibble , tidyr, readr, & dplyr)
  • stringr
  • rmarkdown
  • lubridate

Help Facility

To get help on a specigic function simply put a ? in front of the function, and the help facility will display the documentation. Note that it is extremely usefull to scroll down to the examples

?sum

Getting data into R

There are many ways to get data into R:

  • Reading from a local file
  • Through a connection to a database (any database)
  • API
  • Web scraping

We will be reading in files

The easiest way to import data using the Import Dataset functionality within RStudio. You will find this under the Environment Tab

Let's try importing some data

Import the following files using the Import Dataset button

  • Grocery.xlsx
  • Uber.csv

The files can be found on the course website: https://bridgeport.instructure.com/courses/1511792/files/folder/Data/Class%202

Code generated by the Import Dataset

library(readxl)
Grocery <- read_excel("~/Dropbox/Bridgeport/ITKM 560 - Fall 2017/Data/Class 2/Grocery.xlsx")

print(Grocery)
# A tibble: 15 x 4
                                Item  Tops `Wal-Mart` Wegmans
                               <chr> <dbl>      <dbl>   <dbl>
 1                   Bananas (1 lb.)  0.49       0.48    0.49
 2        Campbell's soup (10.75 oz)  0.60       0.54    0.77
 3          Chicken breasts (3 lbs.) 10.47       8.61    8.07
 4       Colgate toothpaste (6.2 oz)  1.99       2.40    1.97
 5              Large eggs (1 dozen)  1.59       0.88    0.79
 6             Heinz ketchup (36 oz)  2.59       1.78    2.59
 7            Jell-O (cherry, 3 oz.)  0.67       0.42    0.65
 8        Jif peanut butter (18 oz.)  2.29       1.78    2.09
 9         Milk (fat free, 1/2 gal.)  1.34       1.24    1.34
10      Oscar Meyer hot dogs (1 lb.)  3.29       1.50    3.39
11   Ragu past sauce (1 lb., 10 oz.)  2.09       1.50    1.25
12             Ritz crackers (1 lb.)  3.29       2.00    3.39
13  Tide detergent (liquid, 100 oz.)  6.79       5.24    5.99
14 Tropicana orange juice (1/2 gal.)  2.50       2.50    2.50
15     Twizzlers (strawberry, 1 lb.)  1.19       1.27    1.69

Modifying an object

Depending on the type of object and the syntac you employ you would:

  • Add to it
  • Overwrite it (dangerous)

Always start with summary stats of the data

summary(Grocery)
     Item                Tops           Wal-Mart        Wegmans     
 Length:15          Min.   : 0.490   Min.   :0.420   Min.   :0.490  
 Class :character   1st Qu.: 1.265   1st Qu.:1.060   1st Qu.:1.020  
 Mode  :character   Median : 2.090   Median :1.500   Median :1.970  
                    Mean   : 2.745   Mean   :2.143   Mean   :2.465  
                    3rd Qu.: 2.940   3rd Qu.:2.200   3rd Qu.:2.990  
                    Max.   :10.470   Max.   :8.610   Max.   :8.070  

Let's create a new variable

Using the Grocery data, create a new variable that is the variable Tops divided by 2

Grocery$new_var <- Grocery$Tops/2

summary(Grocery$Tops)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.490   1.265   2.090   2.745   2.940  10.470 
summary(Grocery$new_var)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2450  0.6325  1.0450  1.3730  1.4700  5.2350 

Let's create a new variable using dplyr

Using the Grocery data, create a new variable that is the variable Wegmans divided by 2

library(dplyr)
Grocery <- Grocery %>% 
  mutate(new_var_dplyr = Wegmans/2)
summary(Grocery$Wegmans)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.490   1.020   1.970   2.465   2.990   8.070 
summary(Grocery$new_var_dplyr)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.245   0.510   0.985   1.232   1.495   4.035 

Usefull Websites