Take a list L generated as follows. set.seed(4321) L<-list(u=sample(c(rep(NA,4),runif(96))),n=rnorm(200),t=sample(c(rep(NA,10),rt(290,df=3))))
# Creation of list
L<-list(u=sample(c(rep(NA,4),runif(96))),n=rnorm(200),t=sample(c(rep(NA,10),rt(290,df=3))))
# Calculation of mean, variance, standard error and number of observations of the list elements using sapply and binding them using rbind
mytable <- rbind(sapply(L,mean,na.rm=TRUE),sapply(L,var,na.rm=TRUE), sapply(L,sd,na.rm=TRUE)/sqrt(sapply(L,function(x) length(x[!is.na(x)]))),sapply(L,function(x) length(x[!is.na(x)])))
# Checking the output mytable
mytable
## u n t
## [1,] 0.48994450 -0.03591338 -0.07279339
## [2,] 0.07203766 1.00220674 2.50666720
## [3,] 0.02739329 0.07078865 0.09297139
## [4,] 96.00000000 200.00000000 290.00000000
# Adding names
dimnames(mytable)<-list(c("mean","variance","standard error","observation"),names(L))
mytable
## u n t
## mean 0.48994450 -0.03591338 -0.07279339
## variance 0.07203766 1.00220674 2.50666720
## standard error 0.02739329 0.07078865 0.09297139
## observation 96.00000000 200.00000000 290.00000000
str(mytable)
## num [1:4, 1:3] 0.4899 0.072 0.0274 96 -0.0359 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:4] "mean" "variance" "standard error" "observation"
## ..$ : chr [1:3] "u" "n" "t"
Note- Lapply is short for list apply.It will generate a list containing all elements of the same type. Sapply performs an lapply, and sees whether the result can be simplified to a vector.
Approach - I am writing a function that will input a list L and will output a matrix where each of the columns contains the mean, variance, standard error and number of observations of the list elements. First, I Defined the function as convert matrix. Next, I used sapply four times to calculate mean, variance, standard error and number of observations of the list elements using sapply. To remove missing values, I used na.rm=TRUE and to calculate length of list without NAs and number of observations without NAs, I used (!is.na). To convert to a matrix, I used rbind command to bind the list row wise. Next, I added names to the variables using dimnames and used return to get mytable as outut from the function.
I called the function convert_matrix with data input = L and got the matrix as output
# Defining the function as convert matrix
convert_matrix <- function(L) {
# I used sapply four times to calculate mean, variance, standard error and number of observations of the list elements using sapply. To covert to a matrix, I used rbind to bind the list row wise
mytable <- rbind(sapply(L,mean,na.rm=TRUE),
sapply(L,var,na.rm=TRUE),
sapply(L,sd,na.rm=TRUE)/sqrt(sapply(L,function(x) length(x[!is.na(x)]))),
sapply(L,function(x) length(x[!is.na(x)])))
# Adding names to mytable
dimnames(mytable)<-list(c("mean","variance","standard error","observations"),names(L))
# Using round to get only two decimal places
finaltable <- round(mytable,2)
# returning mytable
return(round(finaltable,2)) }
# Calling the function
convert_matrix(L)
## u n t
## mean 0.49 -0.04 -0.07
## variance 0.07 1.00 2.51
## standard error 0.03 0.07 0.09
## observations 96.00 200.00 290.00
2.a) Write a function that calculates the location (MidIQR) as the mid-point of 25 percentile and 75 percentile from data.
Need to eliminate all the missing values in the function. Apply it to data1 generated as follows. Summarize the data first using histogram and summary. set.seed(123);data1=c(rep(NA,10),rcauchy(20),rnorm(970))
set.seed(123);
data1=c(rep(NA,10),rcauchy(20),rnorm(970))
# Removing missing values
data2=data1[!is.na(data1)]
# Creating histogram
hist(data2,col ="lightblue", xlab = "Numbers ", main= " Histogram of Data without NA")
# Summary of data2
summary(data2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -11.30000 -0.62920 0.01519 0.01040 0.66500 7.29100
MidIQR <-function(x)
{
# Removing missing values
data2=x[!is.na(x)]
q1 <- quantile(data2,0.25)
q1
q3 <- quantile(data2,0.75)
q3
midiqr <- (q1+q3)/2
names(midiqr) <-NULL
#return(summary(data2))
return(midiqr)
}
MidIQR(data1)
## [1] 0.01790665
rsd <- function(x,na.rm=TRUE) {
y=x[!is.na(x)]
IQR= quantile(y,0.75)-quantile(y,0.25)
IQR_SD= qnorm(0.75)-qnorm(0.25)
# robust standard deviation (RSD) is Interquartile range /(iqr of standard normal distribution).
RSD=IQR/IQR_SD
return(RSD)
}
rsd(data1)
## 75%
## 0.9593433
c)The median absolute deviation is defined as the median of the absolute deviations from the median multiplied by 1.4826. median(abs(x - median(x)))*1.4826. Make funtion MAD that calculates the median absolute deviation which takes x and eliminate missing values if parameter na.rm is TRUE. Apply it to the data 1.
MAD <- function(x,na.rm=TRUE) {
# eliminating missing values
y=(x[!is.na(x)])
#
z=median(abs(y-median(y)))*1.4826
return(z)}
MAD(data1)
## [1] 0.9630704