Assigning Operator

It will store a value in a variable.

a<-2 # Assigning Value '2' to 'a' variable
a # Print a value
[1] 2

x<-1:10 # Assigning sequence of values to varilable
print(x) # To print we can also use print function
[1]  1  2  3  4  5  6  7  8  9 10

Shortcuts

Ctrl+R (or) Ctrl+Enter -> To Run the command from markdown
Ctrl+L -> Clear the console  
Alt+Ctrl+I -> Insert Chunk in markdown
Ctrl+Shft+K -> Print markdown in a html

Comments

Start with # (or) ## are comments

1+1 # This is my first comment

Removing Variables

We can remove variables/objects which we have created earlier

a<-10
[1] 10
rm(a) # Remove global variable 'a'
ls() # List all the global varibles we created
[1] "m" "x" "y" "z"
rm(list=ls()) # Remove all global variable

Functions

We can call default functions like

c(1,2,3) # It will print cobination of 1,2,3 values
ls() # Print all the global varibles which we created
mean(argument) # Print mean value of pasing argument
sd(argument) # Print Std Deviation value of passing argument

Installing External Packages

Goto Tools->Install Packages->type package name like "ggplot2"->Enter  
(or)  
install.packages("ggplot2"") # It will install 'ggplot' external package

Mean Value

To find mean we can using predefined mean() function

x<-1:20 
mean(x) # Find mean value of x
[1] 10.5

Library

library(ggplot) # If we need to use installed package then we need to run 
data(datasets)

Help

help(functionname) or ?functionname  # We will get help document of particular function
[ex:] help(c), help(mean)

Passing Arguments

x<-c(1,2,3,5:10,NA)

args(sd) # Print the syntax of sd
function(x,na.rm=FALSE)

sd(x) # Standard deviation of x vector
[1] NA

sd(x,na.rm=TRUE) # Not Applicable value will be ignored
[1] 3.162278

sd(x,TRUE) # It will automatically assign values in sequence order
[1] 3.162278

sd(na.rm=TRUE,x) # We can also pass values in alternate order by assigning
[1] 3.162278

sd(TRUE,x=x) # We can also pass values in alternate order by assigning
[1] 3.162278

sd(TRUE,x) #It will throw an error
[1] NA
Warning message:
In if (na.rm) "na.or.complete" else "everything" :
  the condition has length > 1 and only the first element will be used

Queries

We can get help from following links for our errors

r-help@r-project.org
github.com
stackoverflow

Assignment-1

Questions:

  1. Create one 5 dimensional vector of values 4,7,3,2,5 as ‘x’
  2. Add 2 with x and assign the value to y
  3. Find the mean, sd of y
  4. Create another variable z as combination of 2,3
  5. Add x and z

Solutions:

1.x<-c(4,7,3,2,5) # Assinging values to x
  x # Print x
  [1] 4 7 3 2 5

2.y<- x+2 # Add 2 with x & assign those value to y
  y # Print x
  [1] 6 9 5 4 7

3.mean(y) # Find mean of y
  [1] 6.2
  sd(y) # Find Std Deviation of y
  [1] 1.923538

4.z<-c(2,3) # Assign combination of 2,3 to z variable
  z # Print z
  [1] 2 3

5.x+z # Adding x and z variable and print it
  [1]  6 10  5  5  7
  Warning message:
  In x + z : longer object length is not a multiple of shorter object length

Data Types:

R has 5 basic or “atomic” class of objects:

Character
numeric(Real numbers)
integer
complex
logical(True/False)

How to find data type of my variable

x<-1
class(x) # class is used to print the data type of variable
[1] "numeric"
x<-1L # convert numeric to integer
class(x)
[1] "integer"
x<-as.character(x) #Convert numeric to charactor
class(x)
[1] "character"
x<-as.factor(x) #Convert numeric to factor
class(x)
[1] "factor"

It will also support Infinite & NOt Applicable Number values

1/0
[1] Inf
0/0
[1] NaN

We can also find the length of object

x<-1:20
x
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
length(x) # It will print the total array value or length
[1] 20
x<-c(1:20,"Ranjith")
x
 [1] "1"       "2"       "3"       "4"       "5"       "6"       "7"       "8"       "9"       "10"      "11"      "12"     
[13] "13"      "14"      "15"      "16"      "17"      "18"      "19"      "20"      "Ranjith"
length(x)
[1] 21

Attributes

List of attributes we have in R

names, dimnames
dimensions (matrices,array)
class
length
other user-defined attributes

Sequence Operator

Colon(:) is an operator which is used to create sequence of numbers

from : to
1:20  # Print sequence of values from 1 to 20
[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Creating Vectors

C() is used to cerate vector of objects

x<-c(0.5,0.6) # numeric
x<-c(TRUE,FALSE) # logical
x<-c(T,F) # logical
x<-c("a","b","c") # character
x<-9:29 # integer
x<-c(1+0i,2+4i) # complex

Using the vector() function

x<-vector("numeric",length=10)
x
[1] 0 0 0 0 0 0 0 0 0 0
c(TRUE,2,FALSE,"Ranjith") # If string is there it will consider all values as characters
[1] "TRUE"    "2"       "FALSE"   "Ranjith"
c(TRUE,2,FALSE) # By default it will assign TRUE=1 & FALSE=0
[1] 1 2 0

x<-c(1:50,20:50)  # We can print multiple sequence values
x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
[41] 41 42 43 44 45 46 47 48 49 50 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
[81] 50

Mixing objects

We can assign diff kinds of values to the objects

y<-c(1.7,"a") # We are assigning numeric & character values to y object
y
[1] "1.7" "a"
y<-c(TRUE,2) # By default it will assign TRUE=1 & FALSE=0
y
[1] 1 2

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions

class(x)
[1] "integer"
as.numeric(x)
[1] 0 1 2 3 4 5 6
class(x)
[1] "integer"
as.logical(x)
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"

Matrices

m<-matrix(nrow=2,ncol=3)
m
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA

matrix(1:20,nrow=2,byrow=TRUE) # Its print values in column wise
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]   11   12   13   14   15   16   17   18   19    20

matrix(1:20,nrow=2) # By default its print in a row wise
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    3    5    7    9   11   13   15   17    19
[2,]    2    4    6    8   10   12   14   16   18    20

x<-1:10
x
 [1]  1  2  3  4  5  6  7  8  9 10 # 1 t0 20 sequence number

dim(x)dim(x)<-c(2,5) # converting single row to matrix
x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10

Binding

Binding values in column and row wise

x<-1:10
y<-21:30

cbind(x,y) # Binding values in a colomn wise 
       x  y
 [1,]  1 21
 [2,]  2 22
 [3,]  3 23
 [4,]  4 24
 [5,]  5 25
 [6,]  6 26
 [7,]  7 27
 [8,]  8 28
 [9,]  9 29
[10,] 10 30

rbind(x,y) # Binding values in a row wise 
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x    1    2    3    4    5    6    7    8    9    10
y   21   22   23   24   25   26   27   28   29    30

x<-c(1,2)
y<-c(3,4,5)
cbind(x,y)
     x y
[1,] 1 3
[2,] 2 4
[3,] 1 5
Warning message:
In cbind(x, y) :
  number of rows of result is not a multiple of vector length (arg 1)

Lists

We can store different types of data types and also we can access specific values using array index values

x<-list(c(1,2,3),c("Ranjith","Kumar")) # Storing diff types of values in list
x
[[1]]
[1] 1 2 3

[[2]]
[1] "Ranjith" "Kumar"  

x[1] # Print only 1st values
[[1]]
[1] 1 2 3

x[[c(1,2)]] # Print value in 2nd value in 1st row(1,2)
[1] 2

x[[c(2,1)]] # Print value in 1st value in 2nd row(2,1)
[1] "Ranjith"

x<-list(a=1,b=2,c=3)
x
$a
[1] 1

$b
[1] 2

$c
[1] 3

Factors

Factors are used to represent categorical data and it can be ordered / unordered

x<-factor(c("yes","yes","no","yes")) 
x
[1] yes yes no  yes
Levels: no yes
class(x)
[1] "factor"

y<-factor(c("yes","yes","no","yes"),labels=c("2","1")) # Assgning labels for the levels
y
[1] 1 1 2 1
Levels: 2 1

Frequency / table()-> no of occurances for individual

x<-factor(c("yes","yes","no","yes")) 
x
[1] yes yes no  yes
Levels: no yes
table(x) # Individually counting the no of occurances
x
 no yes 
  1   2 

Assignment-2

Questions:

1. Create a list whose first element is 2-element set of names say "Atanu" and "Karam", second element is a set of numbers like 2,3,5,19,2,7,5,5.
2. Check the class of second element of the list.
3. Print the 4th element of the 2nd element of the list.
4. Change the 1st element of the list as factor.
5. Find out the frequency of each element of the 2nd element of the list.

Solutions:

1.x<-list(c("Atanu","Karam"),c(2,3,5,19,2,7,5,5))
  x
  [[1]]
  [1] "Atanu" "Karam"

  [[2]]
  [1]  2  3  5 19  2  7  5  5

2.class(x[[2]]) # Find the data type / class of values in the 2nd element
  [1] "numeric"

3.x[[c(2,4)]]
  [1] 19

4.y<-as.factor(x[[1]])
  y
  [1] Atanu Karam
  Levels: Atanu Karam

5.table(x[[2]])

  2  3  5  7 19 
  2  1  3  1  1 

Missing Values

Mssing values are denoted by NA or NaN.

is.na() # Is there any NA
is.nan() # Is there any NaN
complete.cases() # Inverse of NA
NA values have class also, so there are integer NA, character NA, etc.
NaN=>NA
x<-c(1,2,3,NA,5,NaN)
is.na(x)
[1] FALSE FALSE FALSE  TRUE FALSE  TRUE

But NA!=>NAN
is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE  TRUE

complete.cases(x)
[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Data Frames

Data frames are used to store tabular data  
Once the data is inserted into empty frame then it will call as data.frame  
Each list/col/frame should have same no. of rows/length
It also have special attribute called row.names
We can store diff types of class/data type values in it
Data frames are usually created by calling read.table() or read.csv()
It can also be converted to a matrix by calling data.matrix()
x<-data.frame(foo=1:4,bar=c(T,T,F,F))
x
foo   bar
1   1  TRUE
2   2  TRUE
3   3 FALSE
4   4 FALSE

nrow(x) # Print no of coloumns in x data frame
[1] 4
ncol(x) # Print no of rows in x data frame
[1] 2

names(x) or colnames(x) # Display the column names
[1] "name"   "age"    "gender"

row.names(x) or rownames(x) # Display the row names
[1] "1" "2" "3"

Creating data frame with multiple values in diff classes

x<-data.frame(name=c("Ranjith","Pream","Sharavanan"),age=c(26,23,27),gender=c("M","M","M")) # store in data frame
x
        name age gender
1    Ranjith  26      M
2      Pream  23      M
3 Sharavanan  27      M
class(x)
[1] "data.frame"

y<-cbind(name=c("Ranjith","Pream","Sharavanan"),age=c(26,23,27),gender=c("M","M","M")) # store in matrix
y
     name         age  gender
[1,] "Ranjith"    "26" "M"   
[2,] "Pream"      "23" "M"   
[3,] "Sharavanan" "27" "M"   
class(y)
[1] "matrix"

Converting matrix class to data frame class

z<-as.data.frame(y) # Convert matrix to data frame and assign to z
z
        name age gender
1    Ranjith  26      M
2      Pream  23      M
3 Sharavanan  27      M
class(z)
[1] "data.frame"

str(x) # Overview of data frame
'data.frame':   3 obs. of  3 variables: # 3 obs/rows and 3 variables/cols/attributes
 $ name  : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
 $ age   : num  26 23 27
 $ gender: Factor w/ 1 level "M": 1 1 1
 
str(x$age) # display the overview of age column
num [1:3] 26 23 27

Converting class of individual columns

x[,1]=as.character(x[,1]) # Converting first column into char from factor
x
str(x)
'data.frame':   3 obs. of  3 variables:
 $ name  : chr  "Ranjith" "Pream" "Sharavanan"
 $ age   : num  26 23 27
 $ gender: Factor w/ 1 level "M": 1 1 1
 
x[,2]=as.integer(x[,2]) # Converting second column into number from integer
str(x)
'data.frame':   3 obs. of  3 variables:
 $ name  : chr  "Ranjith" "Pream" "Sharavanan"
 $ age   : int  26 23 27
 $ gender: Factor w/ 1 level "M": 1 1 1
 
 View(x) # display the data frame in a table format

If we need to convert from factor to integer in data frame

Else it will show only the level values like below

str(x)
'data.frame':   3 obs. of  3 variables:
 $ name  : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
 $ age   : Factor w/ 3 levels "23","26","27": 2 1 3
 $ gender: Factor w/ 1 level "M": 1 1 1

x[,2]=as.integer(x[,2]) # Convert factor into integer

str(x)
'data.frame':   3 obs. of  3 variables:
 $ name  : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
 $ age   : int  2 1 3
 $ gender: Factor w/ 1 level "M": 1 1 1

First we convert factor into character then to interger

str(x)
'data.frame':   3 obs. of  3 variables:
 $ name  : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
 $ age   : Factor w/ 3 levels "23","26","27": 2 1 3
 $ gender: Factor w/ 1 level "M": 1 1 1
 
x[,2]=as.integer(as.character(x[,2])) # COnvert from factor->Charcter->integer

str(x)
'data.frame':   3 obs. of  3 variables:
 $ name  : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
 $ age   : int  26 23 27
 $ gender: Factor w/ 1 level "M": 1 1 1

Changing column names

z=data.frame(c("Ranjith","Pream","Sharavanan"),c(26,23,27),c("M","M","M"))
z
  c..Ranjith....Pream....Sharavanan.. c.26..23..27. c..M....M....M..
1                             Ranjith            26                M
2                               Pream            23                M
                          Sharavanan            27                M

colnames(z)<-c("Name","Age","Gender") # Changing default names into our own(we can also use names() function)

colnames(z)
[1] "Name"   "Age"    "Gender"
z
        Name Age Gender
1    Ranjith  26      M
2      Pream  23      M
3 Sharavanan  27      M

Importing and Exporting data sets

Importing data sets from url

#wine<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")
#write.csv(wine,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")
wine<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")

#adult<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\adult.csv")
#write.csv(adult,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\adult.csv")
adult<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\adult.csv")

#iris<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\iris.csv")
#write.csv(iris,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\iris.csv")
iris<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\iris.csv")
Go to-> Tools->Import Data set->From Web URL->"Give path name"

wine1<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv") # Read file & load into wine1

summary(dataset) # Print the mean, median, 1st, 3rd, min and max of data set

summary(wine)
       V1              V2              V3              V4              V5              V6               V7       
 Min.   :1.000   Min.   :11.03   Min.   :0.740   Min.   :1.360   Min.   :10.60   Min.   : 70.00   Min.   :0.980  
 1st Qu.:1.000   1st Qu.:12.36   1st Qu.:1.603   1st Qu.:2.210   1st Qu.:17.20   1st Qu.: 88.00   1st Qu.:1.742  
 Median :2.000   Median :13.05   Median :1.865   Median :2.360   Median :19.50   Median : 98.00   Median :2.355  
 Mean   :1.938   Mean   :13.00   Mean   :2.336   Mean   :2.367   Mean   :19.49   Mean   : 99.74   Mean   :2.295  
 3rd Qu.:3.000   3rd Qu.:13.68   3rd Qu.:3.083   3rd Qu.:2.558   3rd Qu.:21.50   3rd Qu.:107.00   3rd Qu.:2.800  
 Max.   :3.000   Max.   :14.83   Max.   :5.800   Max.   :3.230   Max.   :30.00   Max.   :162.00   Max.   :3.880  
       V8              V9              V10             V11              V12              V13             V14        
 Min.   :0.340   Min.   :0.1300   Min.   :0.410   Min.   : 1.280   Min.   :0.4800   Min.   :1.270   Min.   : 278.0  
 1st Qu.:1.205   1st Qu.:0.2700   1st Qu.:1.250   1st Qu.: 3.220   1st Qu.:0.7825   1st Qu.:1.938   1st Qu.: 500.5  
 Median :2.135   Median :0.3400   Median :1.555   Median : 4.690   Median :0.9650   Median :2.780   Median : 673.5  
 Mean   :2.029   Mean   :0.3619   Mean   :1.591   Mean   : 5.058   Mean   :0.9574   Mean   :2.612   Mean   : 746.9  
 3rd Qu.:2.875   3rd Qu.:0.4375   3rd Qu.:1.950   3rd Qu.: 6.200   3rd Qu.:1.1200   3rd Qu.:3.170   3rd Qu.: 985.0  
 Max.   :5.080   Max.   :0.6600   Max.   :3.580   Max.   :13.000   Max.   :1.7100   Max.   :4.000   Max.   :1680.0  

Exporting data sets into csv

write.csv(wine,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")

wine$V10 # Print the 10th col values

  [1] 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 2.38 1.57 1.81 2.81 2.96 1.46 1.97 1.72 1.86 1.66 2.10 1.98 1.69
 [24] 1.46 1.66 1.92 1.45 1.35 1.76 1.98 2.38 1.95 1.97 1.35 1.54 1.86 1.36 1.44 1.37 2.08 2.34 1.48 1.70 1.66 2.03 1.25
 [47] 2.19 2.14 2.38 2.08 2.91 2.29 1.87 1.68 1.62 2.45 2.03 1.66 2.04 0.42 0.41 0.62 0.73 1.87 1.03 2.08 2.28 1.04 0.42
 [70] 2.50 1.46 1.87 1.03 1.96 1.65 1.15 1.46 0.95 2.76 1.95 1.43 1.77 1.40 1.62 2.35 1.46 1.56 1.34 1.35 1.38 1.64 1.63
 [93] 1.62 1.99 1.35 3.28 1.56 1.77 1.95 2.81 1.40 1.35 1.31 1.42 1.48 1.42 1.63 1.63 2.08 2.49 3.58 1.22 1.05 1.44 1.04
[116] 2.01 1.53 1.61 0.83 1.87 1.83 1.87 1.71 2.01 2.91 1.35 1.77 1.76 1.90 1.35 0.94 0.83 0.83 0.84 1.25 0.94 0.80 1.10
[139] 0.88 0.81 0.75 0.64 0.55 1.02 1.14 1.30 0.68 0.86 1.25 1.14 1.25 1.26 1.56 1.87 1.40 1.55 1.56 1.14 2.70 2.29 1.04
[162] 0.80 0.96 0.94 1.03 1.15 1.46 0.97 1.54 1.11 0.73 0.64 1.24 1.06 1.41 1.35 1.46 1.35

Removing columns and rows from data set

wine4<-wine1[,-c(1,15)] # Which will remove 1st & 15th column from wine1 data set

wine5<-wine1[1:100,] # Which will filter first 100 rows

wine5<-wine1[10:50,5:10] # Which will filter first 10-50 rows & 5-10 columns

Subsetting

Filtering the column and rows values depends the condition given

iris_setosa<-subset(iris1,class=="Iris-setosa") # Filter only setosa from class column in data set

iris4<-subset(iris1,petal.length==1.4) # Filter sepal.length values should be equal to 1.4

iris5<-subset(iris1,sepal.width>=3.1) # Filter sepal.length values should be greater than 3.1

iris5<-subset(iris1,sepal.width>3.0 & petal.width==0.2) # Filter sepal.length & petal.length values should be greater than 3.1 & equal to 0.2 respectively

iris5<-subset(iris1,class=="Iris-setosa" & petal.width==0.2) # Filter class & petal.length values should be equal to "Iris-setosa" & 0.2 respectively

iris5<-subset(iris1,class=="Iris-setosa" | petal.width==0.2) # Filter class & petal.length values should be equal to "Iris-setosa" or 0.2 respectively

Subsetting using “which”" function

iris6<-iris1[which(iris1$sepal.width>3.0 & iris1$petal.width==0.2),]

Assignment-3

Questions:

Import Adult dataset in R studio then do following:
1.Rename the variables
2.How many observations above income > 50k, <= 50k
3.How many observations have age greater than 20 & income > 50k
4.How many observations have age less than 30 or income > 50k
5.How many observations have education bachelors nad age less than 24
1.colnames(adult)<-c("age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-country")

2.adult2<-subset(adult,income==" >50K")
  adult2<-subset(adult,income==" <=50K")
  
3.adult2<-subset(adult,income==" >50K" & age>20)

4.adult3<-adult[which(adult$age<30 & adult$income==" >50K"),]

5.adult4<-adult[which(adult$education==" Bachelors" & adult$age<24),]

Trimming

adult$education=str_trim(adult$education) # Removing spaces in column

for(i in 1:15){adult8[,i]=str_trim(adult8[,i])} # Removing spaces in all columns

Substring

nchar("Ranjith") # Print no of charcters in given string
[1] 7

a<-c("Ranjith","Kumar","Hai")
nchar(a) # Print no of characters in each string
[1] 7 5 3

substr("Statistics",1,4) # It will print 1 to 4 characters from given string
[1] "Stat"

substr("Statistics",7,10) # It will print 7 to 10 characters from given string
[1] "tics"

substr(a,1,3) # It will print 1 to 3 characters from all given string
[1] "Ran" "Kum" "Hai"

Replace string

#sub(old,new,string)
s<-"Curly is the smart one. Curly is funny, too."
# Replace "Meo" instead of "Curly" only for 1st occurance of each string
s1<-c("Curly is the smart one. Curly is funny, too.","Curly is beauty.")
sub("Curly","Meo",s1)
[1] "Meo is the smart one. Curly is funny, too." "Meo is beauty." 

gsub("Curly","Meo",s)
[1] "Meo is the smart one. Meo is funny, too." # Replace "Meo" instead of "Curly" in all occurances
# Use outer and paste to create matrix with all possible combinations
locations<-c("NY","LA","CHI","HOU")
treatments<-c("T1","T2","T3")
outer(locations,treatments,paste,sep="-")
     [,1]     [,2]     [,3]    
[1,] "NY-T1"  "NY-T2"  "NY-T3" 
[2,] "LA-T1"  "LA-T2"  "LA-T3" 
[3,] "CHI-T1" "CHI-T2" "CHI-T3"
[4,] "HOU-T1" "HOU-T2" "HOU-T3"

paste("Ranjith","Kumar",sep="-") # COmbine both strings with separator value
[1] "Ranjith-Kumar"

class(paste("Ranjith","Kumar",sep=" "))
[1] "character"

Dates

Sys.Date() # Display the current date
[1] "2015-10-06"
date()
[1] "Tue Oct 06 15:26:01 2015"

class(Sys.Date())
[1] "Date"

# Convert string into date
as.Date("10/6/2015") # Wrong way
[1] "0010-06-20"

as.Date("10/6/15",format="%d/%m/%y")
[1] "2015-06-10"

as.Date("10/6/2015",format="%d/%m/%Y") # If year represented fully we should use "Y"
[1] "2015-06-10"

as.Date("6-10-15",format="%m-%d-%y") # If year represented with 2 digits we should use "y"
[1] "2015-06-10"

ISO Dates

# ISOdate(year,month,day)
ISOdate(2015,10,26) # It will print ISOdate with default time (Time is optional)
[1] "2015-10-26 12:00:00 GMT"

ISOdate(2015,10,6,10,15,16,tz="GMT") # It will print ISOdate with given time
[1] "2015-10-06 10:15:16 GMT"

#Convert ISOdate into date
as.Date(ISOdate(2015,10,26))
[1] "2015-10-26"

# ISOdatetime(year,month,day,hour,minute,seconds)
ISOdatetime(2015,10,26,10,36,57)
[1] "2015-10-26 10:36:57 IST" # By default it will give IST standard

ISOdatetime(2015,10,6) # If we using ISOdatetime we should mention time too.
Error in ISOdatetime(2015, 10, 6) : 
  argument "min" is missing, with no default

Splitting day/month/year from date and do calculations

s<-ISOdate(2015,2,25)
y<-as.Date(s)

format(y,"%Y")
[1] "2015"
format(y,"%d")
[1] "25"
class(format(y,"%d"))
[1] "character"

# Suppose if we need to subtract from one year with another we should convert into integer
as.integer(format(y,"%Y"))
[1] 2015
> 2020-as.integer(format(y,"%Y")) # subtract year from 2020 after convert into integer
[1] 5

Plots

plot(cars) # Scatter plot for cars data frame

plot(cars,main="CARS",xlab="Speed",ylab="Distance") # Scatter plot for cars data frame with our own labels
grid() # After scatter plot we need to use this to grid the lines in that plot
lines(cars) # Join all those points with lines

colnames(iris)<-c("sepal.length","sepal.width","petal.length","petal.width","class")
with(iris,plot(petal.length,petal.width,pch=as.integer(class)))
colnames(adult)<-c("age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-country","income")
f<-factor(iris$class)
legend(1.5,2.4,as.character(levels(f)),pch=17:19)
legend(1.5,2.4,as.character(levels(f)),pch=1:length(levels(f)))

Co plot

data(Cars93,package = "MASS")
coplot(Horsepower~MPG.city | Origin,data=Cars93) # Draw a plot two variables w.r.t another by separating

coplot(Type~Model | AirBags,data=Cars93)

Simple Linear Correlation And Regression

Correlation

#http://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html

library("MASS", lib.loc="~/R/win-library/3.2")
data("cats")
data(cats)
str(cats)
## 'data.frame':    144 obs. of  3 variables:
##  $ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Bwt: num  2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ...
##  $ Hwt: num  7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ...
summary(cats)
##  Sex         Bwt             Hwt       
##  F:47   Min.   :2.000   Min.   : 6.30  
##  M:97   1st Qu.:2.300   1st Qu.: 8.95  
##         Median :2.700   Median :10.10  
##         Mean   :2.724   Mean   :10.63  
##         3rd Qu.:3.025   3rd Qu.:12.12  
##         Max.   :3.900   Max.   :20.50
with(cats,plot(Bwt,Hwt))
title(main="Height Weight plot")

with(cats,plot(Hwt~Bwt))

with(cats,cor(Bwt,Hwt))
## [1] 0.8041274
with(cats,cor(Bwt,Hwt))^2
## [1] 0.6466209
with(cats,cor.test(Bwt,Hwt))
## 
##  Pearson's product-moment correlation
## 
## data:  Bwt and Hwt
## t = 16.119, df = 142, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7375682 0.8552122
## sample estimates:
##       cor 
## 0.8041274
with(cats,cor.test(Bwt,Hwt,alternative = "greater",conf.level = .8))
## 
##  Pearson's product-moment correlation
## 
## data:  Bwt and Hwt
## t = 16.119, df = 142, p-value < 2.2e-16
## alternative hypothesis: true correlation is greater than 0
## 80 percent confidence interval:
##  0.7776141 1.0000000
## sample estimates:
##       cor 
## 0.8041274
with(cats,cor.test(~Bwt+Hwt))
## 
##  Pearson's product-moment correlation
## 
## data:  Bwt and Hwt
## t = 16.119, df = 142, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7375682 0.8552122
## sample estimates:
##       cor 
## 0.8041274
with(cats, cor.test(~ Bwt + Hwt, subset=(Sex=="F")))
## 
##  Pearson's product-moment correlation
## 
## data:  Bwt and Hwt
## t = 4.2152, df = 45, p-value = 0.0001186
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2890452 0.7106399
## sample estimates:
##       cor 
## 0.5320497
with(cats, plot(Bwt, Hwt, type="n", xlab="Body Weight in kg",ylab="Heart Weight in g",main="Heart Weight vs. Body Weight of Cats"))
with(cats,points(Bwt[Sex=="F"],Hwt[Sex=="F"],pch=16,col="red"))
with(cats,points(Bwt[Sex=="M"],Hwt[Sex=="M"],pch=17,col="blue"))

Correlation and Covariance Matrices

data(cement)
str(cement)
## 'data.frame':    13 obs. of  5 variables:
##  $ x1: int  7 1 11 11 7 11 3 1 2 21 ...
##  $ x2: int  26 29 56 31 52 55 71 31 54 47 ...
##  $ x3: int  6 15 8 8 6 9 17 22 18 4 ...
##  $ x4: int  60 52 20 47 33 22 6 44 22 26 ...
##  $ y : num  78.5 74.3 104.3 87.6 95.9 ...
cor(cement)
##            x1         x2         x3         x4          y
## x1  1.0000000  0.2285795 -0.8241338 -0.2454451  0.7307175
## x2  0.2285795  1.0000000 -0.1392424 -0.9729550  0.8162526
## x3 -0.8241338 -0.1392424  1.0000000  0.0295370 -0.5346707
## x4 -0.2454451 -0.9729550  0.0295370  1.0000000 -0.8213050
## y   0.7307175  0.8162526 -0.5346707 -0.8213050  1.0000000
cov(cement)
##           x1         x2         x3          x4          y
## x1  34.60256   20.92308 -31.051282  -24.166667   64.66346
## x2  20.92308  242.14103 -13.878205 -253.416667  191.07949
## x3 -31.05128  -13.87821  41.025641    3.166667  -51.51923
## x4 -24.16667 -253.41667   3.166667  280.166667 -206.80833
## y   64.66346  191.07949 -51.519231 -206.808333  226.31359
cov.matr=cov(cement)
cov2cor(cov.matr)
##            x1         x2         x3         x4          y
## x1  1.0000000  0.2285795 -0.8241338 -0.2454451  0.7307175
## x2  0.2285795  1.0000000 -0.1392424 -0.9729550  0.8162526
## x3 -0.8241338 -0.1392424  1.0000000  0.0295370 -0.5346707
## x4 -0.2454451 -0.9729550  0.0295370  1.0000000 -0.8213050
## y   0.7307175  0.8162526 -0.5346707 -0.8213050  1.0000000
pairs(cement)

Simple Linear Regression

autompg<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\autompg.csv")
colnames(autompg)<-c("mpg","cylinders","displacement","horsepower","weight","acceleration","model_year","origin","car_name")
autompg1=na.omit(autompg)
colnames(autompg1)<-c("mpg","cylinders","displacement","horsepower","weight","acceleration","model_year","origin","car_name")
cor(autompg1[,c(1,3,4,5,6)])
##                     mpg displacement horsepower     weight acceleration
## mpg           1.0000000   -0.3615028 -0.3885417 -0.4246212   -0.3223630
## displacement -0.3615028    1.0000000  0.9508233  0.8429834    0.8975273
## horsepower   -0.3885417    0.9508233  1.0000000  0.8972570    0.9329944
## weight       -0.4246212    0.8429834  0.8972570  1.0000000    0.8645377
## acceleration -0.3223630    0.8975273  0.9329944  0.8645377    1.0000000
lm.out=lm(horsepower~mpg+displacement+weight+acceleration,autompg1)
summary(lm.out)
## 
## Call:
## lm(formula = horsepower ~ mpg + displacement + weight + acceleration, 
##     data = autompg1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -85.950 -11.982   1.236  12.490 105.397 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.370e+02  5.813e+00 -23.573  < 2e-16 ***
## mpg          -2.087e-02  1.167e-02  -1.788   0.0745 .  
## displacement  3.077e+01  1.688e+00  18.232  < 2e-16 ***
## weight        5.777e-01  6.791e-02   8.507 3.96e-16 ***
## acceleration  3.593e-02  3.646e-03   9.857  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23.83 on 387 degrees of freedom
## Multiple R-squared:  0.9487, Adjusted R-squared:  0.9481 
## F-statistic:  1788 on 4 and 387 DF,  p-value: < 2.2e-16
#dev.off() close all background running graphs
par(mfrow=c(1,2))
plot(lm.out)

par(mfrow=c(2,2))
plot(lm.out)

par(mfrow=c(2,1))
plot(lm.out)

predict(lm.out,autompg[7,])
##        7 
## 392.5502