R is a language and environment for statistical computing and graphics.R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, .) and graphical techniques, and is highly extensible.
This notebook is a tutorial on how to use R.
First we will begin with a few basic operations.
A variable allows you to store values or an object (e.g. a function).
x = 3874
y = 23874
vars = c(2,4,8,16,32,45,53,64,76,82,106,146) # This is a vector
vars[1] #This calls the first value in the vector vars
## [1] 2
vars[2] #This calls the second value in the vector vars
## [1] 4
vars[1:3] #This calls the first through third values in the vector vars
## [1] 2 4 8
vars #This calls the vector
## [1] 2 4 8 16 32 45 53 64 76 82 106 146
Below shows some simple arithmetic operations.
mult = 12*6
div = 128/16
pow = 9^2
pow*mult
## [1] 5832
div/pow
## [1] 0.09876543
R works with numerous data types. Some of the most basic types are: numeric,integers, logical (Boolean-TRUE/FALSE) and characters (string-"TEXT").
#Type: Character
#Example:"TRUE",'23.4'
v = "TRUE"
class(v)
## [1] "character"
#Type: Numeric
#Example: 12.3,5
v = 38
class(v)
## [1] "numeric"
#Type: Logical
#Example: TRUE,FALSE
v = FALSE
class(v)
## [1] "logical"
#Type: Factor
#Example: m f m f m
v = as.factor(c("m", "f", "m"))
class(v)
## [1] "factor"
Before starting to work with R, we need to set the working directory.
il_income = read.csv(file = "data/il_income.csv")
top_il_income = read.csv(file = "data/top_il_income.csv")
We can extract values from the dataset to perform calculations.
DuPage = top_il_income$per_capita_income[1]
Lake = top_il_income$per_capita_income[2]
DuPage-Lake*5
## [1] -153364
DuPage+Lake/78
## [1] 39424.06
(DuPage+Lake)/2
## [1] 38695
mean(il_income$per_capita_income)
## [1] 25164.14
median(il_income$per_capita_income)
## [1] 24808.5
quantile(il_income$per_capita_income)
## 0% 25% 50% 75% 100%
## 14052.00 22666.00 24808.50 26899.75 38931.00
summary(top_il_income$county)
## DuPage Kane Kendall Lake McHenry McLean Monroe Piatt
## 1 1 1 1 1 1 1 1
## Sangamon Will
## 1 1
A sequence of data elements of the same basic type is defined as a vector.
# vector of numeric values
c(2, 3, 5, 8, 16, 24)
## [1] 2 3 5 8 16 24
# vector of logical values.
c(TRUE, FALSE, TRUE)
## [1] TRUE FALSE TRUE
# vector of character strings.
c("A-", "B+", "B-", "C-", "F")
## [1] "A-" "B+" "B-" "C-" "F"
Lists, as opposed to vectors, can hold components of different types.
scores = c(80, 75, 55) # vector of numeric values
grades = c("B", "C", "D-") # vector of character strings.
office_hours = c(TRUE, TRUE, FALSE) # vector of logical values.
student = list(scores,grades,office_hours) # list of vectors
student
## [[1]]
## [1] 80 75 55
##
## [[2]]
## [1] "B" "C" "D-"
##
## [[3]]
## [1] TRUE TRUE FALSE
We can retrieve components of the list with the single square bracket [] operator.
student[1]
## [[1]]
## [1] 80 75 55
student[2]
## [[1]]
## [1] "B" "C" "D-"
student[3]
## [[1]]
## [1] TRUE TRUE FALSE
# first two components of the list
student[1:2]
## [[1]]
## [1] 80 75 55
##
## [[2]]
## [1] "B" "C" "D-"
Using the double square bracket [[]] operator we can reference a member of the list directly.
student[[1]] # Components of the Scores Vector
## [1] 80 75 55
First element of the Scores vector
student[[1]][1]
## [1] 80
First three elements of the Scores vector
student[[1]][1:3]
## [1] 80 75 55
It’s possible to assign names to list members and reference them by names instead of by numeric indexes.
student = list(scores = c(50, 99, 93), grades = c("A", "B", "F"), office_hours = c(TRUE, TRUE, FALSE))
student
## $scores
## [1] 50 99 93
##
## $grades
## [1] "A" "B" "F"
##
## $office_hours
## [1] TRUE TRUE FALSE
student$scores
## [1] 50 99 93
student$grades
## [1] "A" "B" "F"
student$office_hours
## [1] TRUE TRUE FALSE
When we need to store data in table form, we use data frames, which are created by combining lists of vectors of equal length. The variables of a data set are the columns and the observations are the rows.
The str() function helps us to display the internal structure of any R data structure or object to make sure that it’s correct.
str(top_il_income)
## 'data.frame': 10 obs. of 5 variables:
## $ rank : int 2 3 32 44 67 16 4 8 5 90
## $ county : Factor w/ 10 levels "DuPage","Kane",..: 1 4 5 7 8 3 10 6 2 9
## $ per_capita_income: int 38931 38459 33118 33059 31750 31110 30791 30728 30645 30594
## $ population : int 933736 703910 46045 33879 16387 123355 687263 266209 530847 7032
## $ region : int 2 2 4 5 4 2 2 5 2 4
Snapshot of the solar system.
name = c("Mercury", "Venus", "Pluto")
type = c("Terrestrial","Terrestrial", "Gas giant")
diameter = c(1, .2, .3)
rotation = c(.5, 2.5, .6)
rings = c(FALSE, TRUE, FALSE)
Now, by combining the vectors of equal size, we can create a data frame object.
planets_df = data.frame(name,type,diameter,rotation,rings)
planets_df
## name type diameter rotation rings
## 1 Mercury Terrestrial 1.0 0.5 FALSE
## 2 Venus Terrestrial 0.2 2.5 TRUE
## 3 Pluto Gas giant 0.3 0.6 FALSE
Datacamp - Learn Data Science from your browser:
R-tutor - An R intro to stats that explains basic R concepts: