R is a language and environment for statistical computing and graphics.R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, .) and graphical techniques, and is highly extensible.
This notebook is a tutorial on how to use R.
First we will begin with a few basic operations.
A variable allows you to store values or an object (e.g. a function).
x = 967
y = 162
vars = c(122,4432,8634,16745,323422,1,2,3,4,5,6,323452,2234234,234) # This is a vector
vars[1] #This calls the first value in the vector vars
## [1] 122
vars[2] #This calls the second value in the vector vars
## [1] 4432
vars[1:7] #This calls the first through third values in the vector vars
## [1] 122 4432 8634 16745 323422 1 2
vars #This calls the vector
## [1] 122 4432 8634 16745 323422 1 2 3
## [9] 4 5 6 323452 2234234 234
Below shows some simple arithmetic operations.
1234829*6123
## [1] 7560857967
12834554/1612
## [1] 7961.882
9^8
## [1] 43046721
R works with numerous data types. Some of the most basic types are: numeric,integers, logical (Boolean-TRUE/FALSE) and characters (string-"TEXT").
#Type: Character
#Example:"TRUE",'23.4'
v = "TRUE"
class(v)
## [1] "character"
#Type: Numeric
#Example: 12.3,5
v = 105
class(v)
## [1] "numeric"
#Type: Logical
#Example: TRUE,FALSE
v = FALSE
class(v)
## [1] "logical"
#Type: Factor
#Example: e v e v e
v = as.factor(c("e", "v", "e"))
class(v)
## [1] "factor"
Before starting to work with R, we need to set the working directory.
il_income = read.csv(file = "data/il_income.csv")
top_il_income = read.csv(file = "data/top_il_income.csv")
We can extract values from the dataset to perform calculations.
DuPage = top_il_income$per_capita_income[1]
Lake = top_il_income$per_capita_income[2]
DuPage-Lake/(10^2)
## [1] 38546.41
DuPage+Lake*(40)
## [1] 1577291
(DuPage+Lake)/4
## [1] 19347.5
mean(il_income$per_capita_income)
## [1] 25164.14
median(il_income$per_capita_income)
## [1] 24808.5
quantile(il_income$per_capita_income)
## 0% 25% 50% 75% 100%
## 14052.00 22666.00 24808.50 26899.75 38931.00
summary(il_income)
## rank county per_capita_income population
## Min. : 1.00 Adams : 1 Min. :14052 Min. : 4135
## 1st Qu.: 26.25 Alexander: 1 1st Qu.:22666 1st Qu.: 14284
## Median : 51.50 Bond : 1 Median :24809 Median : 26610
## Mean : 51.50 Boone : 1 Mean :25164 Mean : 126078
## 3rd Qu.: 76.75 Brown : 1 3rd Qu.:26900 3rd Qu.: 53319
## Max. :102.00 Bureau : 1 Max. :38931 Max. :5238216
## (Other) :96
## region
## Min. :1.000
## 1st Qu.:3.000
## Median :4.000
## Mean :3.735
## 3rd Qu.:5.000
## Max. :5.000
##
A sequence of data elements of the same basic type is defined as a vector.
# vector of numeric values
c(22, 33, 45, 58)
## [1] 22 33 45 58
# vector of logical values.
c(FALSE, TRUE, FALSE)
## [1] FALSE TRUE FALSE
# vector of character strings.
c("A+", "B-", "D-", "C+", "A")
## [1] "A+" "B-" "D-" "C+" "A"
Lists, as opposed to vectors, can hold components of different types.
scores = c(801, 275, 535) # vector of numeric values
annoyance = c("F", "C", "A") # vector of character strings.
efficient = c(FALSE, FALSE, TRUE) # vector of logical values.
student = list(scores,annoyance,efficient) # list of vectors
student
## [[1]]
## [1] 801 275 535
##
## [[2]]
## [1] "F" "C" "A"
##
## [[3]]
## [1] FALSE FALSE TRUE
We can retrieve components of the list with the single square bracket [] operator.
student[2]
## [[1]]
## [1] "F" "C" "A"
student[3]
## [[1]]
## [1] FALSE FALSE TRUE
student[1]
## [[1]]
## [1] 801 275 535
# first two components of the list
student[1:2]
## [[1]]
## [1] 801 275 535
##
## [[2]]
## [1] "F" "C" "A"
Using the double square bracket [[]] operator we can reference a member of the list directly.
student[[1]] # Components of the Scores Vector
## [1] 801 275 535
First element of the Scores vector
student[[1]][1]
## [1] 801
First three elements of the Scores vector
efficient[[1]][1]
## [1] FALSE
It’s possible to assign names to list members and reference them by names instead of by numeric indexes.
student = list(scores = c(99, 32, 100), grades = c("A+", "F", "A+"), office_hours = c(FALSE, TRUE, FALSE))
student
## $scores
## [1] 99 32 100
##
## $grades
## [1] "A+" "F" "A+"
##
## $office_hours
## [1] FALSE TRUE FALSE
student$scores
## [1] 99 32 100
student$grades
## [1] "A+" "F" "A+"
student$office_hours
## [1] FALSE TRUE FALSE
When we need to store data in table form, we use data frames, which are created by combining lists of vectors of equal length. The variables of a data set are the columns and the observations are the rows.
The str() function helps us to display the internal structure of any R data structure or object to make sure that it’s correct.
str(il_income)
## 'data.frame': 102 obs. of 5 variables:
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ county : Factor w/ 102 levels "Adams","Alexander",..: 16 22 49 99 45 60 101 64 86 10 ...
## $ per_capita_income: int 30468 38931 38459 30791 30645 23937 24802 30728 23279 26087 ...
## $ population : int 5238216 933736 703910 687263 530847 307343 287078 266209 264052 208861 ...
## $ region : int 1 2 2 2 2 2 2 5 5 3 ...
Who is hungry?.
name = c("Elio", "Michele", "Laura")
gender = c("Male","Female", "Female")
weight = c("187", "123", "119")
hometown = c("Chicago", "Glen Ellyn", "Saint Charles")
apples = c(2, 3, 4)
oranges = c(1, 3, 4)
hungry = c(FALSE, FALSE, TRUE)
Now, by combining the vectors of equal size, we can create a data frame object.
whos_hungry = data.frame(name,gender,weight,hometown,apples,oranges,hungry)
whos_hungry
## name gender weight hometown apples oranges hungry
## 1 Elio Male 187 Chicago 2 1 FALSE
## 2 Michele Female 123 Glen Ellyn 3 3 FALSE
## 3 Laura Female 119 Saint Charles 4 4 TRUE
Datacamp - Learn Data Science from your browser:
R-tutor - An R intro to stats that explains basic R concepts: