Data Frame Practice

This Rmd file is meant to get you familiar with data frames and some of the simpler things one can do with them.

setwd("C:/Users/APFACER-0497/Documents/R")
RClassNames1<-c("Aabha", "Arima", "Anvesh", "Ardra", "Aashish", "Anagha", "Bhuvana", 
               "Trupti","Shama", "Ramya", "Gangadhar", "Kritika", "Sharon", "Lavanya",
               "Mahalakshmi","Nazir", "Nirupam", "Ramesh", "Kedar", "Michael", 
               "Sudarshan", "Satyavati", "Rajaram" )


set.seed(100)
seating <- matrix(sample(RClassNames1[-23],22,replace=FALSE),4,6,byrow=TRUE)

## Warning in matrix(sample(RClassNames1[-23], 22, replace = FALSE), 4, 6, :
## data length [22] is not a sub-multiple or multiple of the number of rows
## [4]

seating[4,2:5]<-seating[4,1:4] #shifting the first four to the centre
seating[4,c(1,6)]<-c(NA,NA) # deleting the two ends of the last row

We have created a character vector with class names. We then create a random seating arrangment. Note that we have set the seed - this means we will get the same result, every time the code runs (in a real use, we would not do this, but this is so that different members of the class can compare the results )

Since the classroom has six seats per row, we are sampling this vector without replacement and making a 4 row, 6 column matrix of these names. We are filling the vector members into the matrix by rows. Note that we are removing the last member of the vector (the instructor) Recycling will be needed since we have only 22 students. The next two lines of code are meant to shift the four back row students to the centre, and get rid of the repeated (recycled) students.

RClassGender <-  c("F","F","M","F","M","F","F","F","F",
                   "F","M","F","F","F","F","M","M","M","M","M","M","F","M")
set.seed(101)
fruitpreference <- sample(c("apple","banana","chikoo"),23,replace=TRUE)
set.seed(102)
beverage <- sample(c("coffee","tea","pepsi","limejuice"),23, replace=TRUE)
set.seed(103)
diet<- sample(c("veg","veg_egg","jain","vegan","non-veg","kosher","halal"),23,replace=TRUE)

The preceding code creates more vectors, all character type. The gender vector is based on the names (order is same as the students). The remaining vectors -fruit, beverage, diet- are filled randomly by sampling

set.seed(104)
#rnorm has mean zero, standard deviation 1. We can get the mean  we #want by adding a constant and the s.d we want by multiplying by a #constant. 
1.6+0.075*rnorm(23) +0.3*(RClassGender=="M")->height
height[RClassGender=="M"]->mheight
height[RClassGender=="F"]->fheight
set.seed(105)
bmim <- 22 +3*rnorm(23)
set.seed(106)
bmif <- 24+2*rnorm(23)
bmi <- ifelse(RClassGender=="M",bmim, bmif)
set.seed(103)
weight<- height^2*bmi + 0.2*rnorm(23)

RClassInfo1 <-data.frame(RClassNames1,RClassGender,diet,beverage,height,weight,bmi)

We are now filling in data for height, weight, using a gaussian/ normal distribution. The data for height have a mean of 1.6 metres, and a standard deviation of 0.075 metres (which means a total width of about five to six times 0.075, around 0.4 metres ). An added 0.3 metres has been given to the men. Note that the logical vector has been coerced to numeric for this purpose. We are also using logical indexing to create separate vectors mheight and fheight for the men and women.

The weight has been created using a quantity called ‘bmi’ (for body mass index) defined as ** weight in kg/ square of the height in metres** Ideally, it should be between 20 and 25, with the values for women being on the highter side, This is again being simulated using normal random vectors with a fixed seed, to reflect variation in the population.

Notice that we have created two vectors of bmi, one for the men and one for the women. the final consolidated bmi vector is constructed using an ifelse statement with syntax ifelse(logical vector, vector, vector). The value of this statement is a vector, chosen from the second argument if the first argument is true, and from the third if the first argument is false.

RClassInfo1_sorted <- RClassInfo1[order(RClassInfo1$RClassNames1),]
#the function order returns the indices of the elements of a vector, in #alphabetical (for characters) or numerical ascending order. Applied to a data frame, it reorders all the columns, so that the chosen column (i this case RClassNames) is in alphabetical order. 
set.seed(104)

#extract height
ht0<-RClassInfo1[5]
ht1<-RClassInfo1["height"]
ht2<-RClassInfo1[["height"]]
ht3<-RClassInfo1$height
hw1<-RClassInfo1[c("height","weight")] #this is a data frame
hw2<-subset(RClassInfo1,,c(height,weight))# this selects all rows,two #columns of the dataframe
hw3<-subset(RClassInfo1,(diet=="veg")&(RClassGender=="F"),c(RClassNames1,height,weight)) #rows can be selected by a logical vector condition
#here one is looking for women vegetarians and getting their names, #heights, and weights

Data Frame Practice

Rajaram Nityananda

September 17, 2019