/—————————————————————————/ DESCRIPTIVE STATISTICS /—————————————————————————/
Statistical analysis in R is performed by using many in-built functions. Most of these functions are part of the R base package. These functions take R vector as an input along with the arguments and give the result.
The functions we are discussing in this chapter are Measures of Central Tendencies: mean, median and mode.
Mean
It is calculated by taking the sum of the values and dividing with the number of values in a data series.
The function mean() is used to calculate this in R.
The basic syntax for calculating mean in R is:
mean(x, trim = 0, na.rm = FALSE, ...)
Following is the description of the parameters used:
x - is the input vector.
trim - is used to drop some observations from both end of the sorted vector.
na.rm - is used to remove the missing values from the input vector.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
Applying Trim Option
When trim parameter is supplied, the values in the vector get sorted and then the required numbers of observations are dropped from calculating the mean.
When trim = 0.3, 3 values from each end will be dropped from the calculations to find mean.
In this case the sorted vector is (-21, -5, 2, 3, 4.2, 7, 8, 12, 18, 54) and the values removed from the vector for calculating mean are (-21,-5,2) from left and (12,18,54) from right.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)
Applying NA Option
If there are missing values, then the mean function returns NA.
To drop the missing values from the calculation use na.rm = TRUE. which means remove the NA values.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5,NA)
# Find mean.
result.mean <- mean(x)
print(result.mean)
# Find mean dropping NA values.
result.mean <- mean(x,na.rm = TRUE)
print(result.mean)
Median
The middle most value in a data series is called the median. The median() function is used in R to calculate this value.
The basic syntax for calculating median in R is:
median(x, na.rm = FALSE)
Following is the description of the parameters used:
x - is the input vector.
na.rm - is used to remove the missing values from the input vector.
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find the median.
median.result <- median(x)
print(median.result)
Mode
The mode is the value that has highest number of occurrences in a set of data. Unike mean and median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode. So we create a user function to calculate mode of a data set in R. This function takes the vector as input and gives the mode value as output.
# Create the function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Create the vector with numbers.
v <- c(2,1,2,3,1,2,3,4,1,5,5,3,2,3)
# Calculate the mode using the user function.
result <- getmode(v)
print(result)
# Create the vector with characters.
charv <- c("o","it","the","it","it")
# Calculate the mode using the user function.
result <- getmode(charv)
print(result)
LS0tDQp0aXRsZTogIlIgU3RhdGlzdGljcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCi8tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0vDQogICAgICAgICAgICAgICAgICAgICAgICAgIERFU0NSSVBUSVZFIFNUQVRJU1RJQ1MNCi8tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0vIA0KDQogICAgU3RhdGlzdGljYWwgYW5hbHlzaXMgaW4gUiBpcyBwZXJmb3JtZWQgYnkgdXNpbmcgbWFueSBpbi1idWlsdCBmdW5jdGlvbnMuIE1vc3Qgb2YgdGhlc2UgZnVuY3Rpb25zIGFyZSBwYXJ0IG9mIHRoZSBSIGJhc2UgcGFja2FnZS4gVGhlc2UgZnVuY3Rpb25zIHRha2UgUiB2ZWN0b3IgYXMgYW4gaW5wdXQgYWxvbmcgd2l0aCB0aGUgYXJndW1lbnRzIGFuZCBnaXZlIHRoZSByZXN1bHQuDQoNCiAgICBUaGUgZnVuY3Rpb25zIHdlIGFyZSBkaXNjdXNzaW5nIGluIHRoaXMgY2hhcHRlciBhcmUgTWVhc3VyZXMgb2YgQ2VudHJhbCBUZW5kZW5jaWVzOiBtZWFuLCBtZWRpYW4gYW5kIG1vZGUuDQogICAgDQpNZWFuDQoNCiAgICBJdCBpcyBjYWxjdWxhdGVkIGJ5IHRha2luZyB0aGUgc3VtIG9mIHRoZSB2YWx1ZXMgYW5kIGRpdmlkaW5nIHdpdGggdGhlIG51bWJlciBvZiB2YWx1ZXMgaW4gYSBkYXRhIHNlcmllcy4NCg0KICAgIFRoZSBmdW5jdGlvbiBtZWFuKCkgaXMgdXNlZCB0byBjYWxjdWxhdGUgdGhpcyBpbiBSLg0KDQogICAgVGhlIGJhc2ljIHN5bnRheCBmb3IgY2FsY3VsYXRpbmcgbWVhbiBpbiBSIGlzOg0KDQpgYGB7cn0NCm1lYW4oeCwgdHJpbSA9IDAsIG5hLnJtID0gRkFMU0UsIC4uLikNCmBgYA0KDQogICAgRm9sbG93aW5nIGlzIHRoZSBkZXNjcmlwdGlvbiBvZiB0aGUgcGFyYW1ldGVycyB1c2VkOg0KICAgIA0KICAgICAgeCAtIGlzIHRoZSBpbnB1dCB2ZWN0b3IuDQoNCiAgICAgIHRyaW0gLSBpcyB1c2VkIHRvIGRyb3Agc29tZSBvYnNlcnZhdGlvbnMgZnJvbSBib3RoIGVuZCBvZiB0aGUgc29ydGVkIHZlY3Rvci4NCiAgICAgIA0KICAgICAgbmEucm0gLSBpcyB1c2VkIHRvIHJlbW92ZSB0aGUgbWlzc2luZyB2YWx1ZXMgZnJvbSB0aGUgaW5wdXQgdmVjdG9yLg0KDQpgYGB7cn0NCiMgQ3JlYXRlIGEgdmVjdG9yLiANCnggPC0gYygxMiw3LDMsNC4yLDE4LDIsNTQsLTIxLDgsLTUpDQoNCiMgRmluZCBNZWFuLg0KcmVzdWx0Lm1lYW4gPC0gbWVhbih4KQ0KcHJpbnQocmVzdWx0Lm1lYW4pDQpgYGANCg0KQXBwbHlpbmcgVHJpbSBPcHRpb24NCg0KICAgIFdoZW4gdHJpbSBwYXJhbWV0ZXIgaXMgc3VwcGxpZWQsIHRoZSB2YWx1ZXMgaW4gdGhlIHZlY3RvciBnZXQgc29ydGVkIGFuZCB0aGVuIHRoZSByZXF1aXJlZCBudW1iZXJzIG9mIG9ic2VydmF0aW9ucyBhcmUgZHJvcHBlZCBmcm9tIGNhbGN1bGF0aW5nIHRoZSBtZWFuLg0KDQogICAgV2hlbiB0cmltID0gMC4zLCAzIHZhbHVlcyBmcm9tIGVhY2ggZW5kIHdpbGwgYmUgZHJvcHBlZCBmcm9tIHRoZSBjYWxjdWxhdGlvbnMgdG8gZmluZCBtZWFuLg0KICAgIA0KICAgIEluIHRoaXMgY2FzZSB0aGUgc29ydGVkIHZlY3RvciBpcyAoLTIxLCAtNSwgMiwgMywgNC4yLCA3LCA4LCAxMiwgMTgsIDU0KSBhbmQgdGhlIHZhbHVlcyByZW1vdmVkIGZyb20gdGhlIHZlY3RvciBmb3IgY2FsY3VsYXRpbmcgbWVhbiBhcmUgKC0yMSwtNSwyKSBmcm9tIGxlZnQgYW5kICgxMiwxOCw1NCkgZnJvbSByaWdodC4NCiAgICANCmBgYHtyfQ0KIyBDcmVhdGUgYSB2ZWN0b3IuDQp4IDwtIGMoMTIsNywzLDQuMiwxOCwyLDU0LC0yMSw4LC01KQ0KDQojIEZpbmQgTWVhbi4NCnJlc3VsdC5tZWFuIDwtICBtZWFuKHgsdHJpbSA9IDAuMykNCnByaW50KHJlc3VsdC5tZWFuKQ0KYGBgDQoNCkFwcGx5aW5nIE5BIE9wdGlvbg0KDQogICAgSWYgdGhlcmUgYXJlIG1pc3NpbmcgdmFsdWVzLCB0aGVuIHRoZSBtZWFuIGZ1bmN0aW9uIHJldHVybnMgTkEuDQoNCiAgICBUbyBkcm9wIHRoZSBtaXNzaW5nIHZhbHVlcyBmcm9tIHRoZSBjYWxjdWxhdGlvbiB1c2UgbmEucm0gPSBUUlVFLiB3aGljaCBtZWFucyByZW1vdmUgdGhlIE5BIHZhbHVlcy4NCg0KYGBge3J9DQojIENyZWF0ZSBhIHZlY3Rvci4gDQp4IDwtIGMoMTIsNywzLDQuMiwxOCwyLDU0LC0yMSw4LC01LE5BKQ0KDQojIEZpbmQgbWVhbi4NCnJlc3VsdC5tZWFuIDwtICBtZWFuKHgpDQpwcmludChyZXN1bHQubWVhbikNCg0KIyBGaW5kIG1lYW4gZHJvcHBpbmcgTkEgdmFsdWVzLg0KcmVzdWx0Lm1lYW4gPC0gIG1lYW4oeCxuYS5ybSA9IFRSVUUpDQpwcmludChyZXN1bHQubWVhbikNCmBgYA0KDQpNZWRpYW4NCg0KICAgIFRoZSBtaWRkbGUgbW9zdCB2YWx1ZSBpbiBhIGRhdGEgc2VyaWVzIGlzIGNhbGxlZCB0aGUgbWVkaWFuLiBUaGUgbWVkaWFuKCkgZnVuY3Rpb24gaXMgdXNlZCBpbiBSIHRvIGNhbGN1bGF0ZSB0aGlzIHZhbHVlLg0KDQogICAgVGhlIGJhc2ljIHN5bnRheCBmb3IgY2FsY3VsYXRpbmcgbWVkaWFuIGluIFIgaXM6DQogICAgDQpgYGB7cn0NCm1lZGlhbih4LCBuYS5ybSA9IEZBTFNFKQ0KYGBgDQoNCiAgICBGb2xsb3dpbmcgaXMgdGhlIGRlc2NyaXB0aW9uIG9mIHRoZSBwYXJhbWV0ZXJzIHVzZWQ6DQogICAgICANCiAgICAgIHggLSBpcyB0aGUgaW5wdXQgdmVjdG9yLg0KDQogICAgICBuYS5ybSAtIGlzIHVzZWQgdG8gcmVtb3ZlIHRoZSBtaXNzaW5nIHZhbHVlcyBmcm9tIHRoZSBpbnB1dCB2ZWN0b3IuDQogICAgICANCmBgYHtyfQ0KIyBDcmVhdGUgdGhlIHZlY3Rvci4NCnggPC0gYygxMiw3LDMsNC4yLDE4LDIsNTQsLTIxLDgsLTUpDQoNCiMgRmluZCB0aGUgbWVkaWFuLg0KbWVkaWFuLnJlc3VsdCA8LSBtZWRpYW4oeCkNCnByaW50KG1lZGlhbi5yZXN1bHQpDQpgYGANCg0KTW9kZQ0KDQogICAgVGhlIG1vZGUgaXMgdGhlIHZhbHVlIHRoYXQgaGFzIGhpZ2hlc3QgbnVtYmVyIG9mIG9jY3VycmVuY2VzIGluIGEgc2V0IG9mIGRhdGEuIFVuaWtlIG1lYW4gYW5kIG1lZGlhbiwgbW9kZSBjYW4gaGF2ZSBib3RoIG51bWVyaWMgYW5kIGNoYXJhY3RlciBkYXRhLg0KDQogICAgUiBkb2VzIG5vdCBoYXZlIGEgc3RhbmRhcmQgaW4tYnVpbHQgZnVuY3Rpb24gdG8gY2FsY3VsYXRlIG1vZGUuIFNvIHdlIGNyZWF0ZSBhIHVzZXIgZnVuY3Rpb24gdG8gY2FsY3VsYXRlIG1vZGUgb2YgYSBkYXRhIHNldCBpbiBSLiBUaGlzIGZ1bmN0aW9uIHRha2VzIHRoZSB2ZWN0b3IgYXMgaW5wdXQgYW5kIGdpdmVzIHRoZSBtb2RlIHZhbHVlIGFzIG91dHB1dC4NCiAgICANCmBgYHtyfQ0KIyBDcmVhdGUgdGhlIGZ1bmN0aW9uLg0KZ2V0bW9kZSA8LSBmdW5jdGlvbih2KSB7DQogICB1bmlxdiA8LSB1bmlxdWUodikNCiAgIHVuaXF2W3doaWNoLm1heCh0YWJ1bGF0ZShtYXRjaCh2LCB1bmlxdikpKV0NCn0NCg0KIyBDcmVhdGUgdGhlIHZlY3RvciB3aXRoIG51bWJlcnMuDQp2IDwtIGMoMiwxLDIsMywxLDIsMyw0LDEsNSw1LDMsMiwzKQ0KDQojIENhbGN1bGF0ZSB0aGUgbW9kZSB1c2luZyB0aGUgdXNlciBmdW5jdGlvbi4NCnJlc3VsdCA8LSBnZXRtb2RlKHYpDQpwcmludChyZXN1bHQpDQoNCiMgQ3JlYXRlIHRoZSB2ZWN0b3Igd2l0aCBjaGFyYWN0ZXJzLg0KY2hhcnYgPC0gYygibyIsIml0IiwidGhlIiwiaXQiLCJpdCIpDQoNCiMgQ2FsY3VsYXRlIHRoZSBtb2RlIHVzaW5nIHRoZSB1c2VyIGZ1bmN0aW9uLg0KcmVzdWx0IDwtIGdldG1vZGUoY2hhcnYpDQpwcmludChyZXN1bHQpDQpgYGANCg0K