It will store a value in a variable.
a<-2 # Assigning Value '2' to 'a' variable
a # Print a value
[1] 2
x<-1:10 # Assigning sequence of values to varilable
print(x) # To print we can also use print function
[1] 1 2 3 4 5 6 7 8 9 10
Ctrl+R (or) Ctrl+Enter -> To Run the command from markdown
Ctrl+L -> Clear the console
Alt+Ctrl+I -> Insert Chunk in markdown
Ctrl+Shft+K -> Print markdown in a html
We can remove variables/objects which we have created earlier
a<-10
[1] 10
rm(a) # Remove global variable 'a'
ls() # List all the global varibles we created
[1] "m" "x" "y" "z"
rm(list=ls()) # Remove all global variable
We can call default functions like
c(1,2,3) # It will print cobination of 1,2,3 values
ls() # Print all the global varibles which we created
mean(argument) # Print mean value of pasing argument
sd(argument) # Print Std Deviation value of passing argument
Goto Tools->Install Packages->type package name like "ggplot2"->Enter
(or)
install.packages("ggplot2"") # It will install 'ggplot' external package
To find mean we can using predefined mean() function
x<-1:20
mean(x) # Find mean value of x
[1] 10.5
library(ggplot) # If we need to use installed package then we need to run
data(datasets)
help(functionname) or ?functionname # We will get help document of particular function
[ex:] help(c), help(mean)
x<-c(1,2,3,5:10,NA)
args(sd) # Print the syntax of sd
function(x,na.rm=FALSE)
sd(x) # Standard deviation of x vector
[1] NA
sd(x,na.rm=TRUE) # Not Applicable value will be ignored
[1] 3.162278
sd(x,TRUE) # It will automatically assign values in sequence order
[1] 3.162278
sd(na.rm=TRUE,x) # We can also pass values in alternate order by assigning
[1] 3.162278
sd(TRUE,x=x) # We can also pass values in alternate order by assigning
[1] 3.162278
sd(TRUE,x) #It will throw an error
[1] NA
Warning message:
In if (na.rm) "na.or.complete" else "everything" :
the condition has length > 1 and only the first element will be used
We can get help from following links for our errors
r-help@r-project.org
github.com
stackoverflow
Questions:
Solutions:
1.x<-c(4,7,3,2,5) # Assinging values to x
x # Print x
[1] 4 7 3 2 5
2.y<- x+2 # Add 2 with x & assign those value to y
y # Print x
[1] 6 9 5 4 7
3.mean(y) # Find mean of y
[1] 6.2
sd(y) # Find Std Deviation of y
[1] 1.923538
4.z<-c(2,3) # Assign combination of 2,3 to z variable
z # Print z
[1] 2 3
5.x+z # Adding x and z variable and print it
[1] 6 10 5 5 7
Warning message:
In x + z : longer object length is not a multiple of shorter object length
R has 5 basic or “atomic” class of objects:
Character
numeric(Real numbers)
integer
complex
logical(True/False)
How to find data type of my variable
x<-1
class(x) # class is used to print the data type of variable
[1] "numeric"
x<-1L # convert numeric to integer
class(x)
[1] "integer"
x<-as.character(x) #Convert numeric to charactor
class(x)
[1] "character"
x<-as.factor(x) #Convert numeric to factor
class(x)
[1] "factor"
It will also support Infinite & NOt Applicable Number values
1/0
[1] Inf
0/0
[1] NaN
We can also find the length of object
x<-1:20
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
length(x) # It will print the total array value or length
[1] 20
x<-c(1:20,"Ranjith")
x
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
[13] "13" "14" "15" "16" "17" "18" "19" "20" "Ranjith"
length(x)
[1] 21
List of attributes we have in R
names, dimnames
dimensions (matrices,array)
class
length
other user-defined attributes
Colon(:) is an operator which is used to create sequence of numbers
from : to
1:20 # Print sequence of values from 1 to 20
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
C() is used to cerate vector of objects
x<-c(0.5,0.6) # numeric
x<-c(TRUE,FALSE) # logical
x<-c(T,F) # logical
x<-c("a","b","c") # character
x<-9:29 # integer
x<-c(1+0i,2+4i) # complex
Using the vector() function
x<-vector("numeric",length=10)
x
[1] 0 0 0 0 0 0 0 0 0 0
c(TRUE,2,FALSE,"Ranjith") # If string is there it will consider all values as characters
[1] "TRUE" "2" "FALSE" "Ranjith"
c(TRUE,2,FALSE) # By default it will assign TRUE=1 & FALSE=0
[1] 1 2 0
x<-c(1:50,20:50) # We can print multiple sequence values
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
[41] 41 42 43 44 45 46 47 48 49 50 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
[81] 50
We can assign diff kinds of values to the objects
y<-c(1.7,"a") # We are assigning numeric & character values to y object
y
[1] "1.7" "a"
y<-c(TRUE,2) # By default it will assign TRUE=1 & FALSE=0
y
[1] 1 2
Objects can be explicitly coerced from one class to another using the as.* functions
class(x)
[1] "integer"
as.numeric(x)
[1] 0 1 2 3 4 5 6
class(x)
[1] "integer"
as.logical(x)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"
m<-matrix(nrow=2,ncol=3)
m
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
matrix(1:20,nrow=2,byrow=TRUE) # Its print values in column wise
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
[2,] 11 12 13 14 15 16 17 18 19 20
matrix(1:20,nrow=2) # By default its print in a row wise
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 3 5 7 9 11 13 15 17 19
[2,] 2 4 6 8 10 12 14 16 18 20
x<-1:10
x
[1] 1 2 3 4 5 6 7 8 9 10 # 1 t0 20 sequence number
dim(x)dim(x)<-c(2,5) # converting single row to matrix
x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Binding values in column and row wise
x<-1:10
y<-21:30
cbind(x,y) # Binding values in a colomn wise
x y
[1,] 1 21
[2,] 2 22
[3,] 3 23
[4,] 4 24
[5,] 5 25
[6,] 6 26
[7,] 7 27
[8,] 8 28
[9,] 9 29
[10,] 10 30
rbind(x,y) # Binding values in a row wise
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x 1 2 3 4 5 6 7 8 9 10
y 21 22 23 24 25 26 27 28 29 30
x<-c(1,2)
y<-c(3,4,5)
cbind(x,y)
x y
[1,] 1 3
[2,] 2 4
[3,] 1 5
Warning message:
In cbind(x, y) :
number of rows of result is not a multiple of vector length (arg 1)
We can store different types of data types and also we can access specific values using array index values
x<-list(c(1,2,3),c("Ranjith","Kumar")) # Storing diff types of values in list
x
[[1]]
[1] 1 2 3
[[2]]
[1] "Ranjith" "Kumar"
x[1] # Print only 1st values
[[1]]
[1] 1 2 3
x[[c(1,2)]] # Print value in 2nd value in 1st row(1,2)
[1] 2
x[[c(2,1)]] # Print value in 1st value in 2nd row(2,1)
[1] "Ranjith"
x<-list(a=1,b=2,c=3)
x
$a
[1] 1
$b
[1] 2
$c
[1] 3
Factors are used to represent categorical data and it can be ordered / unordered
x<-factor(c("yes","yes","no","yes"))
x
[1] yes yes no yes
Levels: no yes
class(x)
[1] "factor"
y<-factor(c("yes","yes","no","yes"),labels=c("2","1")) # Assgning labels for the levels
y
[1] 1 1 2 1
Levels: 2 1
Frequency / table()-> no of occurances for individual
x<-factor(c("yes","yes","no","yes"))
x
[1] yes yes no yes
Levels: no yes
table(x) # Individually counting the no of occurances
x
no yes
1 2
Questions:
1. Create a list whose first element is 2-element set of names say "Atanu" and "Karam", second element is a set of numbers like 2,3,5,19,2,7,5,5.
2. Check the class of second element of the list.
3. Print the 4th element of the 2nd element of the list.
4. Change the 1st element of the list as factor.
5. Find out the frequency of each element of the 2nd element of the list.
Solutions:
1.x<-list(c("Atanu","Karam"),c(2,3,5,19,2,7,5,5))
x
[[1]]
[1] "Atanu" "Karam"
[[2]]
[1] 2 3 5 19 2 7 5 5
2.class(x[[2]]) # Find the data type / class of values in the 2nd element
[1] "numeric"
3.x[[c(2,4)]]
[1] 19
4.y<-as.factor(x[[1]])
y
[1] Atanu Karam
Levels: Atanu Karam
5.table(x[[2]])
2 3 5 7 19
2 1 3 1 1
Mssing values are denoted by NA or NaN.
is.na() # Is there any NA
is.nan() # Is there any NaN
complete.cases() # Inverse of NA
NA values have class also, so there are integer NA, character NA, etc.
NaN=>NA
x<-c(1,2,3,NA,5,NaN)
is.na(x)
[1] FALSE FALSE FALSE TRUE FALSE TRUE
But NA!=>NAN
is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE TRUE
complete.cases(x)
[1] TRUE TRUE TRUE FALSE TRUE FALSE
Data frames are used to store tabular data
Once the data is inserted into empty frame then it will call as data.frame
Each list/col/frame should have same no. of rows/length
It also have special attribute called row.names
We can store diff types of class/data type values in it
Data frames are usually created by calling read.table() or read.csv()
It can also be converted to a matrix by calling data.matrix()
x<-data.frame(foo=1:4,bar=c(T,T,F,F))
x
foo bar
1 1 TRUE
2 2 TRUE
3 3 FALSE
4 4 FALSE
nrow(x) # Print no of coloumns in x data frame
[1] 4
ncol(x) # Print no of rows in x data frame
[1] 2
names(x) or colnames(x) # Display the column names
[1] "name" "age" "gender"
row.names(x) or rownames(x) # Display the row names
[1] "1" "2" "3"
Creating data frame with multiple values in diff classes
x<-data.frame(name=c("Ranjith","Pream","Sharavanan"),age=c(26,23,27),gender=c("M","M","M")) # store in data frame
x
name age gender
1 Ranjith 26 M
2 Pream 23 M
3 Sharavanan 27 M
class(x)
[1] "data.frame"
y<-cbind(name=c("Ranjith","Pream","Sharavanan"),age=c(26,23,27),gender=c("M","M","M")) # store in matrix
y
name age gender
[1,] "Ranjith" "26" "M"
[2,] "Pream" "23" "M"
[3,] "Sharavanan" "27" "M"
class(y)
[1] "matrix"
Converting matrix class to data frame class
z<-as.data.frame(y) # Convert matrix to data frame and assign to z
z
name age gender
1 Ranjith 26 M
2 Pream 23 M
3 Sharavanan 27 M
class(z)
[1] "data.frame"
str(x) # Overview of data frame
'data.frame': 3 obs. of 3 variables: # 3 obs/rows and 3 variables/cols/attributes
$ name : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
$ age : num 26 23 27
$ gender: Factor w/ 1 level "M": 1 1 1
str(x$age) # display the overview of age column
num [1:3] 26 23 27
Converting class of individual columns
x[,1]=as.character(x[,1]) # Converting first column into char from factor
x
str(x)
'data.frame': 3 obs. of 3 variables:
$ name : chr "Ranjith" "Pream" "Sharavanan"
$ age : num 26 23 27
$ gender: Factor w/ 1 level "M": 1 1 1
x[,2]=as.integer(x[,2]) # Converting second column into number from integer
str(x)
'data.frame': 3 obs. of 3 variables:
$ name : chr "Ranjith" "Pream" "Sharavanan"
$ age : int 26 23 27
$ gender: Factor w/ 1 level "M": 1 1 1
View(x) # display the data frame in a table format
If we need to convert from factor to integer in data frame
Else it will show only the level values like below
str(x)
'data.frame': 3 obs. of 3 variables:
$ name : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
$ age : Factor w/ 3 levels "23","26","27": 2 1 3
$ gender: Factor w/ 1 level "M": 1 1 1
x[,2]=as.integer(x[,2]) # Convert factor into integer
str(x)
'data.frame': 3 obs. of 3 variables:
$ name : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
$ age : int 2 1 3
$ gender: Factor w/ 1 level "M": 1 1 1
First we convert factor into character then to interger
str(x)
'data.frame': 3 obs. of 3 variables:
$ name : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
$ age : Factor w/ 3 levels "23","26","27": 2 1 3
$ gender: Factor w/ 1 level "M": 1 1 1
x[,2]=as.integer(as.character(x[,2])) # COnvert from factor->Charcter->integer
str(x)
'data.frame': 3 obs. of 3 variables:
$ name : Factor w/ 3 levels "Pream","Ranjith",..: 2 1 3
$ age : int 26 23 27
$ gender: Factor w/ 1 level "M": 1 1 1
Changing column names
z=data.frame(c("Ranjith","Pream","Sharavanan"),c(26,23,27),c("M","M","M"))
z
c..Ranjith....Pream....Sharavanan.. c.26..23..27. c..M....M....M..
1 Ranjith 26 M
2 Pream 23 M
Sharavanan 27 M
colnames(z)<-c("Name","Age","Gender") # Changing default names into our own(we can also use names() function)
colnames(z)
[1] "Name" "Age" "Gender"
z
Name Age Gender
1 Ranjith 26 M
2 Pream 23 M
3 Sharavanan 27 M
Importing data sets from url
#wine<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")
#write.csv(wine,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")
wine<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")
#adult<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\adult.csv")
#write.csv(adult,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\adult.csv")
adult<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\adult.csv")
#iris<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\iris.csv")
#write.csv(iris,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\iris.csv")
iris<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\iris.csv")
Go to-> Tools->Import Data set->From Web URL->"Give path name"
wine1<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv") # Read file & load into wine1
summary(dataset) # Print the mean, median, 1st, 3rd, min and max of data set
summary(wine)
V1 V2 V3 V4 V5 V6 V7
Min. :1.000 Min. :11.03 Min. :0.740 Min. :1.360 Min. :10.60 Min. : 70.00 Min. :0.980
1st Qu.:1.000 1st Qu.:12.36 1st Qu.:1.603 1st Qu.:2.210 1st Qu.:17.20 1st Qu.: 88.00 1st Qu.:1.742
Median :2.000 Median :13.05 Median :1.865 Median :2.360 Median :19.50 Median : 98.00 Median :2.355
Mean :1.938 Mean :13.00 Mean :2.336 Mean :2.367 Mean :19.49 Mean : 99.74 Mean :2.295
3rd Qu.:3.000 3rd Qu.:13.68 3rd Qu.:3.083 3rd Qu.:2.558 3rd Qu.:21.50 3rd Qu.:107.00 3rd Qu.:2.800
Max. :3.000 Max. :14.83 Max. :5.800 Max. :3.230 Max. :30.00 Max. :162.00 Max. :3.880
V8 V9 V10 V11 V12 V13 V14
Min. :0.340 Min. :0.1300 Min. :0.410 Min. : 1.280 Min. :0.4800 Min. :1.270 Min. : 278.0
1st Qu.:1.205 1st Qu.:0.2700 1st Qu.:1.250 1st Qu.: 3.220 1st Qu.:0.7825 1st Qu.:1.938 1st Qu.: 500.5
Median :2.135 Median :0.3400 Median :1.555 Median : 4.690 Median :0.9650 Median :2.780 Median : 673.5
Mean :2.029 Mean :0.3619 Mean :1.591 Mean : 5.058 Mean :0.9574 Mean :2.612 Mean : 746.9
3rd Qu.:2.875 3rd Qu.:0.4375 3rd Qu.:1.950 3rd Qu.: 6.200 3rd Qu.:1.1200 3rd Qu.:3.170 3rd Qu.: 985.0
Max. :5.080 Max. :0.6600 Max. :3.580 Max. :13.000 Max. :1.7100 Max. :4.000 Max. :1680.0
Exporting data sets into csv
write.csv(wine,"C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\wine.csv")
wine$V10 # Print the 10th col values
[1] 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 2.38 1.57 1.81 2.81 2.96 1.46 1.97 1.72 1.86 1.66 2.10 1.98 1.69
[24] 1.46 1.66 1.92 1.45 1.35 1.76 1.98 2.38 1.95 1.97 1.35 1.54 1.86 1.36 1.44 1.37 2.08 2.34 1.48 1.70 1.66 2.03 1.25
[47] 2.19 2.14 2.38 2.08 2.91 2.29 1.87 1.68 1.62 2.45 2.03 1.66 2.04 0.42 0.41 0.62 0.73 1.87 1.03 2.08 2.28 1.04 0.42
[70] 2.50 1.46 1.87 1.03 1.96 1.65 1.15 1.46 0.95 2.76 1.95 1.43 1.77 1.40 1.62 2.35 1.46 1.56 1.34 1.35 1.38 1.64 1.63
[93] 1.62 1.99 1.35 3.28 1.56 1.77 1.95 2.81 1.40 1.35 1.31 1.42 1.48 1.42 1.63 1.63 2.08 2.49 3.58 1.22 1.05 1.44 1.04
[116] 2.01 1.53 1.61 0.83 1.87 1.83 1.87 1.71 2.01 2.91 1.35 1.77 1.76 1.90 1.35 0.94 0.83 0.83 0.84 1.25 0.94 0.80 1.10
[139] 0.88 0.81 0.75 0.64 0.55 1.02 1.14 1.30 0.68 0.86 1.25 1.14 1.25 1.26 1.56 1.87 1.40 1.55 1.56 1.14 2.70 2.29 1.04
[162] 0.80 0.96 0.94 1.03 1.15 1.46 0.97 1.54 1.11 0.73 0.64 1.24 1.06 1.41 1.35 1.46 1.35
Removing columns and rows from data set
wine4<-wine1[,-c(1,15)] # Which will remove 1st & 15th column from wine1 data set
wine5<-wine1[1:100,] # Which will filter first 100 rows
wine5<-wine1[10:50,5:10] # Which will filter first 10-50 rows & 5-10 columns
Filtering the column and rows values depends the condition given
iris_setosa<-subset(iris1,class=="Iris-setosa") # Filter only setosa from class column in data set
iris4<-subset(iris1,petal.length==1.4) # Filter sepal.length values should be equal to 1.4
iris5<-subset(iris1,sepal.width>=3.1) # Filter sepal.length values should be greater than 3.1
iris5<-subset(iris1,sepal.width>3.0 & petal.width==0.2) # Filter sepal.length & petal.length values should be greater than 3.1 & equal to 0.2 respectively
iris5<-subset(iris1,class=="Iris-setosa" & petal.width==0.2) # Filter class & petal.length values should be equal to "Iris-setosa" & 0.2 respectively
iris5<-subset(iris1,class=="Iris-setosa" | petal.width==0.2) # Filter class & petal.length values should be equal to "Iris-setosa" or 0.2 respectively
Subsetting using “which”" function
iris6<-iris1[which(iris1$sepal.width>3.0 & iris1$petal.width==0.2),]
Questions:
Import Adult dataset in R studio then do following:
1.Rename the variables
2.How many observations above income > 50k, <= 50k
3.How many observations have age greater than 20 & income > 50k
4.How many observations have age less than 30 or income > 50k
5.How many observations have education bachelors nad age less than 24
1.colnames(adult)<-c("age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-country")
2.adult2<-subset(adult,income==" >50K")
adult2<-subset(adult,income==" <=50K")
3.adult2<-subset(adult,income==" >50K" & age>20)
4.adult3<-adult[which(adult$age<30 & adult$income==" >50K"),]
5.adult4<-adult[which(adult$education==" Bachelors" & adult$age<24),]
adult$education=str_trim(adult$education) # Removing spaces in column
for(i in 1:15){adult8[,i]=str_trim(adult8[,i])} # Removing spaces in all columns
nchar("Ranjith") # Print no of charcters in given string
[1] 7
a<-c("Ranjith","Kumar","Hai")
nchar(a) # Print no of characters in each string
[1] 7 5 3
substr("Statistics",1,4) # It will print 1 to 4 characters from given string
[1] "Stat"
substr("Statistics",7,10) # It will print 7 to 10 characters from given string
[1] "tics"
substr(a,1,3) # It will print 1 to 3 characters from all given string
[1] "Ran" "Kum" "Hai"
Replace string
#sub(old,new,string)
s<-"Curly is the smart one. Curly is funny, too."
# Replace "Meo" instead of "Curly" only for 1st occurance of each string
s1<-c("Curly is the smart one. Curly is funny, too.","Curly is beauty.")
sub("Curly","Meo",s1)
[1] "Meo is the smart one. Curly is funny, too." "Meo is beauty."
gsub("Curly","Meo",s)
[1] "Meo is the smart one. Meo is funny, too." # Replace "Meo" instead of "Curly" in all occurances
# Use outer and paste to create matrix with all possible combinations
locations<-c("NY","LA","CHI","HOU")
treatments<-c("T1","T2","T3")
outer(locations,treatments,paste,sep="-")
[,1] [,2] [,3]
[1,] "NY-T1" "NY-T2" "NY-T3"
[2,] "LA-T1" "LA-T2" "LA-T3"
[3,] "CHI-T1" "CHI-T2" "CHI-T3"
[4,] "HOU-T1" "HOU-T2" "HOU-T3"
paste("Ranjith","Kumar",sep="-") # COmbine both strings with separator value
[1] "Ranjith-Kumar"
class(paste("Ranjith","Kumar",sep=" "))
[1] "character"
Sys.Date() # Display the current date
[1] "2015-10-06"
date()
[1] "Tue Oct 06 15:26:01 2015"
class(Sys.Date())
[1] "Date"
# Convert string into date
as.Date("10/6/2015") # Wrong way
[1] "0010-06-20"
as.Date("10/6/15",format="%d/%m/%y")
[1] "2015-06-10"
as.Date("10/6/2015",format="%d/%m/%Y") # If year represented fully we should use "Y"
[1] "2015-06-10"
as.Date("6-10-15",format="%m-%d-%y") # If year represented with 2 digits we should use "y"
[1] "2015-06-10"
ISO Dates
# ISOdate(year,month,day)
ISOdate(2015,10,26) # It will print ISOdate with default time (Time is optional)
[1] "2015-10-26 12:00:00 GMT"
ISOdate(2015,10,6,10,15,16,tz="GMT") # It will print ISOdate with given time
[1] "2015-10-06 10:15:16 GMT"
#Convert ISOdate into date
as.Date(ISOdate(2015,10,26))
[1] "2015-10-26"
# ISOdatetime(year,month,day,hour,minute,seconds)
ISOdatetime(2015,10,26,10,36,57)
[1] "2015-10-26 10:36:57 IST" # By default it will give IST standard
ISOdatetime(2015,10,6) # If we using ISOdatetime we should mention time too.
Error in ISOdatetime(2015, 10, 6) :
argument "min" is missing, with no default
Splitting day/month/year from date and do calculations
s<-ISOdate(2015,2,25)
y<-as.Date(s)
format(y,"%Y")
[1] "2015"
format(y,"%d")
[1] "25"
class(format(y,"%d"))
[1] "character"
# Suppose if we need to subtract from one year with another we should convert into integer
as.integer(format(y,"%Y"))
[1] 2015
> 2020-as.integer(format(y,"%Y")) # subtract year from 2020 after convert into integer
[1] 5
plot(cars) # Scatter plot for cars data frame
plot(cars,main="CARS",xlab="Speed",ylab="Distance") # Scatter plot for cars data frame with our own labels
grid() # After scatter plot we need to use this to grid the lines in that plot
lines(cars) # Join all those points with lines
colnames(iris)<-c("sepal.length","sepal.width","petal.length","petal.width","class")
with(iris,plot(petal.length,petal.width,pch=as.integer(class)))
colnames(adult)<-c("age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hours-per-week","native-country","income")
f<-factor(iris$class)
legend(1.5,2.4,as.character(levels(f)),pch=17:19)
legend(1.5,2.4,as.character(levels(f)),pch=1:length(levels(f)))
Co plot
data(Cars93,package = "MASS")
coplot(Horsepower~MPG.city | Origin,data=Cars93) # Draw a plot two variables w.r.t another by separating
coplot(Type~Model | AirBags,data=Cars93)
#http://ww2.coastal.edu/kingw/statistics/R-tutorials/simplelinear.html
library("MASS", lib.loc="~/R/win-library/3.2")
data("cats")
data(cats)
str(cats)
## 'data.frame': 144 obs. of 3 variables:
## $ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
## $ Bwt: num 2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ...
## $ Hwt: num 7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ...
summary(cats)
## Sex Bwt Hwt
## F:47 Min. :2.000 Min. : 6.30
## M:97 1st Qu.:2.300 1st Qu.: 8.95
## Median :2.700 Median :10.10
## Mean :2.724 Mean :10.63
## 3rd Qu.:3.025 3rd Qu.:12.12
## Max. :3.900 Max. :20.50
with(cats,plot(Bwt,Hwt))
title(main="Height Weight plot")
with(cats,plot(Hwt~Bwt))
with(cats,cor(Bwt,Hwt))
## [1] 0.8041274
with(cats,cor(Bwt,Hwt))^2
## [1] 0.6466209
with(cats,cor.test(Bwt,Hwt))
##
## Pearson's product-moment correlation
##
## data: Bwt and Hwt
## t = 16.119, df = 142, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7375682 0.8552122
## sample estimates:
## cor
## 0.8041274
with(cats,cor.test(Bwt,Hwt,alternative = "greater",conf.level = .8))
##
## Pearson's product-moment correlation
##
## data: Bwt and Hwt
## t = 16.119, df = 142, p-value < 2.2e-16
## alternative hypothesis: true correlation is greater than 0
## 80 percent confidence interval:
## 0.7776141 1.0000000
## sample estimates:
## cor
## 0.8041274
with(cats,cor.test(~Bwt+Hwt))
##
## Pearson's product-moment correlation
##
## data: Bwt and Hwt
## t = 16.119, df = 142, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7375682 0.8552122
## sample estimates:
## cor
## 0.8041274
with(cats, cor.test(~ Bwt + Hwt, subset=(Sex=="F")))
##
## Pearson's product-moment correlation
##
## data: Bwt and Hwt
## t = 4.2152, df = 45, p-value = 0.0001186
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2890452 0.7106399
## sample estimates:
## cor
## 0.5320497
with(cats, plot(Bwt, Hwt, type="n", xlab="Body Weight in kg",ylab="Heart Weight in g",main="Heart Weight vs. Body Weight of Cats"))
with(cats,points(Bwt[Sex=="F"],Hwt[Sex=="F"],pch=16,col="red"))
with(cats,points(Bwt[Sex=="M"],Hwt[Sex=="M"],pch=17,col="blue"))
data(cement)
str(cement)
## 'data.frame': 13 obs. of 5 variables:
## $ x1: int 7 1 11 11 7 11 3 1 2 21 ...
## $ x2: int 26 29 56 31 52 55 71 31 54 47 ...
## $ x3: int 6 15 8 8 6 9 17 22 18 4 ...
## $ x4: int 60 52 20 47 33 22 6 44 22 26 ...
## $ y : num 78.5 74.3 104.3 87.6 95.9 ...
cor(cement)
## x1 x2 x3 x4 y
## x1 1.0000000 0.2285795 -0.8241338 -0.2454451 0.7307175
## x2 0.2285795 1.0000000 -0.1392424 -0.9729550 0.8162526
## x3 -0.8241338 -0.1392424 1.0000000 0.0295370 -0.5346707
## x4 -0.2454451 -0.9729550 0.0295370 1.0000000 -0.8213050
## y 0.7307175 0.8162526 -0.5346707 -0.8213050 1.0000000
cov(cement)
## x1 x2 x3 x4 y
## x1 34.60256 20.92308 -31.051282 -24.166667 64.66346
## x2 20.92308 242.14103 -13.878205 -253.416667 191.07949
## x3 -31.05128 -13.87821 41.025641 3.166667 -51.51923
## x4 -24.16667 -253.41667 3.166667 280.166667 -206.80833
## y 64.66346 191.07949 -51.519231 -206.808333 226.31359
cov.matr=cov(cement)
cov2cor(cov.matr)
## x1 x2 x3 x4 y
## x1 1.0000000 0.2285795 -0.8241338 -0.2454451 0.7307175
## x2 0.2285795 1.0000000 -0.1392424 -0.9729550 0.8162526
## x3 -0.8241338 -0.1392424 1.0000000 0.0295370 -0.5346707
## x4 -0.2454451 -0.9729550 0.0295370 1.0000000 -0.8213050
## y 0.7307175 0.8162526 -0.5346707 -0.8213050 1.0000000
pairs(cement)
autompg<-read.csv("C:\\Users\\localadmin\\Desktop\\Ranjith\\R\\autompg.csv")
colnames(autompg)<-c("mpg","cylinders","displacement","horsepower","weight","acceleration","model_year","origin","car_name")
autompg1=na.omit(autompg)
colnames(autompg1)<-c("mpg","cylinders","displacement","horsepower","weight","acceleration","model_year","origin","car_name")
cor(autompg1[,c(1,3,4,5,6)])
## mpg displacement horsepower weight acceleration
## mpg 1.0000000 -0.3615028 -0.3885417 -0.4246212 -0.3223630
## displacement -0.3615028 1.0000000 0.9508233 0.8429834 0.8975273
## horsepower -0.3885417 0.9508233 1.0000000 0.8972570 0.9329944
## weight -0.4246212 0.8429834 0.8972570 1.0000000 0.8645377
## acceleration -0.3223630 0.8975273 0.9329944 0.8645377 1.0000000
lm.out=lm(horsepower~mpg+displacement+weight+acceleration,autompg1)
summary(lm.out)
##
## Call:
## lm(formula = horsepower ~ mpg + displacement + weight + acceleration,
## data = autompg1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -85.950 -11.982 1.236 12.490 105.397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.370e+02 5.813e+00 -23.573 < 2e-16 ***
## mpg -2.087e-02 1.167e-02 -1.788 0.0745 .
## displacement 3.077e+01 1.688e+00 18.232 < 2e-16 ***
## weight 5.777e-01 6.791e-02 8.507 3.96e-16 ***
## acceleration 3.593e-02 3.646e-03 9.857 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23.83 on 387 degrees of freedom
## Multiple R-squared: 0.9487, Adjusted R-squared: 0.9481
## F-statistic: 1788 on 4 and 387 DF, p-value: < 2.2e-16
#dev.off() close all background running graphs
par(mfrow=c(1,2))
plot(lm.out)
par(mfrow=c(2,2))
plot(lm.out)
par(mfrow=c(2,1))
plot(lm.out)
predict(lm.out,autompg[7,])
## 7
## 392.5502
Comments
Start with # (or) ## are comments