Data frame is a 2 dimensional data structure in R. It is a list which have components with equal length. Each column in daata frame can also store different class of object
We can create data frame either by importing it by reading from external file(json,csv), reading it from r dataset or manualy creating it.
df <- data.frame(int = 1:3, boolean = c(T,T,F), str= c("one","two","three") )
print(df)
## int boolean str
## 1 1 TRUE one
## 2 2 TRUE two
## 3 3 FALSE three
Data frame can be created using a vector
a = c(1,2,3)
b = c(T,F,T)
df <- data.frame(a,b)
print(df)
## a b
## 1 1 TRUE
## 2 2 FALSE
## 3 3 TRUE
When creating a data frame, each component need to have equal length, otherwise it will return an error.
try(data.frame(int = 1:3, boolean = c(T,T)))
## Error in data.frame(int = 1:3, boolean = c(T, T)) :
## arguments imply differing number of rows: 3, 2
###Reading from R dataset
data("mtcars")
print(head(mtcars))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
We can see and edit the column and row label with these methods
#see column name
names(df)
## [1] "a" "b"
#edit column name
names(df) <- c("one","two")
names(df)
## [1] "one" "two"
#see row names
row.names(df)
## [1] "1" "2" "3"
#edit row names
row.names(df) <- c("a","b","c")
row.names(df)
## [1] "a" "b" "c"
We can see dimension of our dataframe with these methods
#check row size
nrow(df)
## [1] 3
#check col size
ncol(df)
## [1] 2
length(df)
## [1] 2
#check overall dimension
dim(df)
## [1] 3 2
We can access component from dataframe with these methods
df["one"]
## one
## a 1
## b 2
## c 3
df[1,2]
## [1] TRUE
df[,2]
## [1] TRUE FALSE TRUE
df[1,]
## one two
## a 1 TRUE
We can add new component into dataframe using these methods
df
## one two
## a 1 TRUE
## b 2 FALSE
## c 3 TRUE
df = rbind(df,list(1,TRUE))
df
## one two
## a 1 TRUE
## b 2 FALSE
## c 3 TRUE
## 4 1 TRUE
df
## one two
## a 1 TRUE
## b 2 FALSE
## c 3 TRUE
## 4 1 TRUE
df = cbind(df,newcol=c("one","two","three","four"))
df
## one two newcol
## a 1 TRUE one
## b 2 FALSE two
## c 3 TRUE three
## 4 1 TRUE four
We can remove component from dataframe using these methods
df
## one two newcol
## a 1 TRUE one
## b 2 FALSE two
## c 3 TRUE three
## 4 1 TRUE four
df$newcol <- NULL
df
## one two
## a 1 TRUE
## b 2 FALSE
## c 3 TRUE
## 4 1 TRUE
df
## one two
## a 1 TRUE
## b 2 FALSE
## c 3 TRUE
## 4 1 TRUE
df <- df [-2,]
df
## one two
## a 1 TRUE
## c 3 TRUE
## 4 1 TRUE
we can get the sumary of every column in data frame using these methods
str(df)
## 'data.frame': 3 obs. of 2 variables:
## $ one: num 1 3 1
## $ two: logi TRUE TRUE TRUE
summary(df)
## one two
## Min. :1.000 Mode:logical
## 1st Qu.:1.000 TRUE:3
## Median :1.000
## Mean :1.667
## 3rd Qu.:2.000
## Max. :3.000