R is a programming language and software environment for statistical analysis, graphical representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team.
The only hardware requirement for most of the R tutorials is a PC with the latest free open source R software installed. R has extensive documentation and active online community support. It is the perfect environment to get started in statistical computing.
R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities
R has an effective data handling and storage facility
R provides a suite of operators for calculations on arrays, lists, vectors and matrices
R provides a large, coherent and integrated collection of tools for data analysis.
R provides graphical facilities for data analysis and display either directly at the computer or printing at the papers.
As a convention, we will start learning R programming by writing a “Hello, World!” program.
# My first program in R Programming
myString <- "Hello, World!"
print(myString)
## [1] "Hello, World!"
Generally, while doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that, when you create a variable you reserve some space in memory.
In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects. The frequently used ones are:
* Vectors
* Lists
* Matrices
* Arrays
* Factors
* Data Frames
setwd("D:/Analytics/BACP-Dec2017/R_Programming")
getwd()
## [1] "D:/Analytics/BACP-Dec2017/R_Programming"
x<-1:100
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
12+5
## [1] 17
(12+5)*(39-13)/45
## [1] 9.822222
Variable allows you to store a value (e.g. 6) or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable.
my_var<-29
my_var
## [1] 29
my_oranges<-6
my_apples<-6
my_fruits<-my_apples+my_oranges
my_fruits
## [1] 12
class (5)
## [1] "numeric"
class('six')
## [1] "character"
class(1.2)
## [1] "numeric"
Vectors * R operates on named data structures.
* The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers.
x<-c(3,4,6,7,8) #Assignment operator and c() function called combined function
c(3,4,6,7,8) ->x1 #is also possible
x1
## [1] 3 4 6 7 8
1/x #gives the reciprocal
## [1] 0.3333333 0.2500000 0.1666667 0.1428571 0.1250000
C<-2*x #Vector arithmetics
C
## [1] 6 8 12 14 16
Characters delimited by the double quote character, e.g., “x-values”, “New iteration results”.
names<-c("ram","shyam","john")
type<- c("Compact","Minivan","SUV","Roadster","Pickup Truck")
mileage<-c(1256,237,6780,1000,12000)
names
## [1] "ram" "shyam" "john"
type
## [1] "Compact" "Minivan" "SUV" "Roadster"
## [5] "Pickup Truck"
mileage
## [1] 1256 237 6780 1000 12000
mileage[3]
## [1] 6780
type[2]
## [1] "Minivan"
type[5]
## [1] "Pickup Truck"
mileage[4]
## [1] 1000
M<-matrix(1:9, 3, 3)
M
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
M is the name of the matrix
matrix is a keyword
1:9 is for data (1,2,3,4,5,6,7,8,9) to be arranged within the matrix
3,3 is for no of rows and columns
To select 1st row and 2nd column
M[1,2]
## [1] 4
Use negative subscripts to remove elements: M [-1,-2] removes 1st row and 2nd column
m<-matrix(data=c(2,3,4,5), nrow=2, ncol=2)
m
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
m2 <-matrix(c(2,3,4,5),2,2)
m2
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
x<-c(1,2,3,4)
m3<-matrix(x,2,2)
m3
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
m3[1,2]
## [1] 3
m4<-m3[-1,-2]
m3
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
m4
## [1] 2
m<-matrix(data=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16), nrow=4, ncol=4)
m
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
m5<-m[1,]
m5
## [1] 1 5 9 13
m6<-m[,1]
m6
## [1] 1 2 3 4
m7<-m[1:3,2:4]
m7
## [,1] [,2] [,3]
## [1,] 5 9 13
## [2,] 6 10 14
## [3,] 7 11 15
m<-matrix(data=c(2,3,4,5), nrow=2, ncol=2)
m
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
m_rsum<-rowSums(m)
m_rsum
## [1] 6 8
m_csum<-colSums(m)
m_csum
## [1] 5 9
m
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
a<-c(8,9)
m_newc<-cbind(m,a) #use cbind to insert new column
m_newc
## a
## [1,] 2 4 8
## [2,] 3 5 9
b<-c(6,7)
m_newr<-rbind(m,b) #use rbind to insert new row
m_newr
## [,1] [,2]
## 2 4
## 3 5
## b 6 7
type<- c("Compact","Minivan","SUV","Roadster","Pickup Truck")
mileage<-c(1256,237,6780,1000,12000)
price<-c(36790,3445,6678,2455,76889)
no.cyl<-c(3,4,4,4,4)
cars<-data.frame(type,price,mileage,no.cyl)
cars
## type price mileage no.cyl
## 1 Compact 36790 1256 3
## 2 Minivan 3445 237 4
## 3 SUV 6678 6780 4
## 4 Roadster 2455 1000 4
## 5 Pickup Truck 76889 12000 4
cars[1,2]
## [1] 36790
cars[1,]
## type price mileage no.cyl
## 1 Compact 36790 1256 3
cars[,1]
## [1] Compact Minivan SUV Roadster Pickup Truck
## Levels: Compact Minivan Pickup Truck Roadster SUV
cars[2:4,1:3]
## type price mileage
## 2 Minivan 3445 237
## 3 SUV 6678 6780
## 4 Roadster 2455 1000
my_list <- list(component1, component2 …)
my_vector<-1:10
my_matrix<-matrix(1:9,ncol=3)
my_list<-list(my_vector,my_matrix)
my_list
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
my_list[[2]]
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
my_list[[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
install.packages(“package name”)
library(“package name”)
install.packages("psych",repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/Ranvir Kumar/Documents/R/win-library/3.4'
## (as 'lib' is unspecified)
## package 'psych' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\Ranvir Kumar\AppData\Local\Temp\RtmpURLW4n\downloaded_packages
library("psych")
## Warning: package 'psych' was built under R version 3.4.3
A<-c(1:10,21:35) B<-matrix(A,5,5) C<-B[1:3,1:3]
A<-c(1:10,21:35)
B<-matrix(A,5,5)
C<-B[1:3,1:3]
A
## [1] 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25 26 27 28 29 30 31 32 33
## [24] 34 35
B
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 6 21 26 31
## [2,] 2 7 22 27 32
## [3,] 3 8 23 28 33
## [4,] 4 9 24 29 34
## [5,] 5 10 25 30 35
C
## [,1] [,2] [,3]
## [1,] 1 6 21
## [2,] 2 7 22
## [3,] 3 8 23
x<- c(1:10)
xsq<-x^2
xsq
## [1] 1 4 9 16 25 36 49 64 81 100
logx<-log(x)
logx
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 2.0794415 2.1972246 2.3025851
File formats:
* Text (.txt)
* CSV (.csv)
* Excel (.xls)
* SPSS ( .sav)
* STATA (.dta)
* SAS (.ssd)
* (For more formats you can visit http://cran.rproject.org/doc/manuals/R-data.pdf , here you get information on how to import image files as well ! )
Comma Delimited Text Files:
data1<- read.table(“C:/Users/xyz/Desktop/folderX/mydata.txt”,header=TRUE, sep=“,”)
Space as the separator:
data1<- read.table(“C:/Users/xyz/Desktop/folderX/mydata.txt”, header=TRUE)
Another(easier) way, set your working directory then the command is:
data1<- read.table(“mydata.txt”, header=TRUE)
Similar way, use ‘read.csv’ instead of ‘read.table’
Use read.xls (needs package ‘gdata’, use ‘library(gdata)’ after installing this package)
Need library ‘foreign’ - Use command: ‘read.spss’
Need library ‘foreign’- Use command: ‘read.dta’
Need library ‘foreign’- Use command: ‘read.ssd’
library(readr)
houseprices<-read_csv("D:/Analytics/BACP-Dec2017/02_Introduction_To_R/ClassMaterial/houseprices.csv")
## Parsed with column specification:
## cols(
## Price = col_integer(),
## LivingArea = col_integer(),
## Bathrooms = col_double(),
## Bedrooms = col_integer(),
## LotSize = col_double(),
## Age = col_integer(),
## Fireplace = col_integer()
## )
houseprices
## # A tibble: 1,047 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 142212 1982 1.00 3 2.00 133 0
## 2 134865 1676 1.50 3 0.380 14 1
## 3 118007 1694 2.00 3 0.960 15 1
## 4 138297 1800 1.00 2 0.480 49 1
## 5 129470 2088 1.00 3 1.84 29 1
## 6 206512 1456 2.00 3 0.980 10 0
## 7 50709 960 1.50 2 0.0100 12 0
## 8 108794 1464 1.00 2 0.110 87 0
## 9 68353 1216 1.00 2 0.610 101 0
## 10 123266 1632 1.50 3 0.230 14 0
## # ... with 1,037 more rows
View(houseprices) #View dataset in a seperate window
head(houseprices) #First six rows of the dataset
## # A tibble: 6 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 142212 1982 1.00 3 2.00 133 0
## 2 134865 1676 1.50 3 0.380 14 1
## 3 118007 1694 2.00 3 0.960 15 1
## 4 138297 1800 1.00 2 0.480 49 1
## 5 129470 2088 1.00 3 1.84 29 1
## 6 206512 1456 2.00 3 0.980 10 0
head(houseprices,10) #First 10 rows of the dataset
## # A tibble: 10 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 142212 1982 1.00 3 2.00 133 0
## 2 134865 1676 1.50 3 0.380 14 1
## 3 118007 1694 2.00 3 0.960 15 1
## 4 138297 1800 1.00 2 0.480 49 1
## 5 129470 2088 1.00 3 1.84 29 1
## 6 206512 1456 2.00 3 0.980 10 0
## 7 50709 960 1.50 2 0.0100 12 0
## 8 108794 1464 1.00 2 0.110 87 0
## 9 68353 1216 1.00 2 0.610 101 0
## 10 123266 1632 1.50 3 0.230 14 0
tail(houseprices) #Last six rows of the dataset
## # A tibble: 6 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 206480 2310 2.50 3 1.00 18 0
## 2 107695 1802 2.00 4 0.970 56 1
## 3 236737 3239 3.50 4 2.50 1 1
## 4 154829 1440 2.00 2 0.610 66 1
## 5 179492 2030 2.50 3 1.00 3 1
## 6 189108 2097 2.50 3 1.93 10 1
tail(houseprices,10) #Last 10 rows of the dataset
## # A tibble: 10 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 107973 1388 1.00 3 0.230 60 0
## 2 119875 1512 1.50 4 1.00 61 1
## 3 66027 1653 2.00 3 0.480 79 0
## 4 182649 1758 2.50 3 0.270 1 1
## 5 206480 2310 2.50 3 1.00 18 0
## 6 107695 1802 2.00 4 0.970 56 1
## 7 236737 3239 3.50 4 2.50 1 1
## 8 154829 1440 2.00 2 0.610 66 1
## 9 179492 2030 2.50 3 1.00 3 1
## 10 189108 2097 2.50 3 1.93 10 1
dim(houseprices) # Dimension of the dataset
## [1] 1047 7
summary(houseprices) # Summary of the dataset
## Price LivingArea Bathrooms Bedrooms
## Min. : 16858 Min. : 672 Min. :1.000 Min. :1.000
## 1st Qu.:112014 1st Qu.:1336 1st Qu.:1.500 1st Qu.:3.000
## Median :151917 Median :1672 Median :2.000 Median :3.000
## Mean :163862 Mean :1807 Mean :1.918 Mean :3.183
## 3rd Qu.:205235 3rd Qu.:2206 3rd Qu.:2.500 3rd Qu.:4.000
## Max. :446436 Max. :4534 Max. :4.500 Max. :6.000
## LotSize Age Fireplace
## Min. :0.0000 Min. : 0.00 Min. :0.0000
## 1st Qu.:0.2100 1st Qu.: 6.00 1st Qu.:0.0000
## Median :0.3900 Median : 18.00 Median :1.0000
## Mean :0.5696 Mean : 28.06 Mean :0.5931
## 3rd Qu.:0.6000 3rd Qu.: 34.00 3rd Qu.:1.0000
## Max. :9.0000 Max. :247.00 Max. :1.0000
summary(houseprices$Bedrooms)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 3.000 3.183 4.000 6.000
table(houseprices$Bedrooms)
##
## 1 2 3 4 5 6
## 3 176 522 321 22 3
summary(houseprices$Bathrooms)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.500 2.000 1.918 2.500 4.500
table(houseprices$Bathrooms)
##
## 1 1.5 2 2.5 3 3.5 4 4.5
## 198 261 172 373 23 15 3 2
mean(houseprices$Price)
## [1] 163862.1
var(houseprices$Price)
## [1] 4576733424
sd(houseprices$Price)
## [1] 67651.56
Subset1<-data[which(data$Price<=amount),]
housepricessubset1<-houseprices[which(houseprices$Price<=50000),]
housepricessubset1
## # A tibble: 12 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 45004 960 1.00 2 0.540 11 0
## 2 45904 1328 1.00 4 0.190 103 0
## 3 44674 1214 1.00 3 0.140 103 0
## 4 16858 1629 1.00 3 0.760 180 0
## 5 26130 822 1.00 2 0.560 173 0
## 6 47630 1235 1.00 3 0.320 84 1
## 7 40932 1320 1.00 3 0.170 90 0
## 8 26049 1344 2.00 3 0.920 13 0
## 9 49211 800 1.00 2 0.460 55 0
## 10 31113 1540 1.00 2 0.0400 115 0
## 11 44873 882 1.50 3 0.180 71 0
## 12 49564 1363 2.00 3 2.40 39 0
housepricessubset2<-houseprices[which(houseprices$Price<=50000 & houseprices$Bedrooms==3),]
housepricessubset2
## # A tibble: 7 x 7
## Price LivingArea Bathrooms Bedrooms LotSize Age Fireplace
## <int> <int> <dbl> <int> <dbl> <int> <int>
## 1 44674 1214 1.00 3 0.140 103 0
## 2 16858 1629 1.00 3 0.760 180 0
## 3 47630 1235 1.00 3 0.320 84 1
## 4 40932 1320 1.00 3 0.170 90 0
## 5 26049 1344 2.00 3 0.920 13 0
## 6 44873 882 1.50 3 0.180 71 0
## 7 49564 1363 2.00 3 2.40 39 0
x<-c(2:200)
y<-2*x-8
plot(x,y)
Plot
plot(houseprices$Price,houseprices$Bedrooms)
plot(houseprices$Price,houseprices$LotSize)
Barplot
barplot(table(houseprices$Bedrooms),main = "No. of Bedrooms")
barplot(table(houseprices$Price), main = "Prices")
barplot(table(houseprices$LotSize), main = "LotSize")
Histogram
hist(houseprices$Bedrooms, main = "Bedrooms")
hist(houseprices$Price, main = "Prices",xlab = "Home Prices", ylab = "frequency of prices")
hist(houseprices$Age, main = "Age of the House", xlab = "House Age", ylab = "frequency of the house age",col = "blue")
Boxplot
boxplot(houseprices$Bedrooms, horizontal = TRUE)
boxplot(houseprices$Price, horizontal = TRUE)
boxplot(houseprices$Age, horizontal = TRUE,main="House Age", xlab="Frequency", ylab="Age",col="red")
par(mfrow=c(2,2))
plot(houseprices$Price,houseprices$Bedrooms,main = "Prices vs No. of Bedrooms")
hist(houseprices$Age, main = "Age of the House", xlab = "House Age", ylab = "frequency of the house age",col ="blue")
boxplot(houseprices$Age, horizontal = TRUE,main="House Age", xlab="Frequency", ylab="Age",col="green")
barplot(table(houseprices$Bedrooms),main = "No. of Bedrooms", col = heat.colors(10))
dev.off()
## null device
## 1