Haiding Luo
2023.9.19
Write one paragraph each (in your own words) describing what are classes and one paragraph on what are data structures (with examples).
Classes refer to the data types or structures used to represent and manipulate data.
Numeric: The numeric data class is used to represent numerical values, including integers and floating-point numbers.
Character: The character class is used fo text data.
Factor: Factors are used to represent categorical data.
Logical: Logical data class has two possible values: true and false.
Integer: the integer data type is used to represent whole numbers.
Data structures are fundamental objects used to store data.
Vectors: Vector is a one-dimensional array that can store elements of the same data type.
Matrix: Matrix is two-dimensional array that can store data of the same data type.
Data frames: Data frames are two-dimensional tabular structures.
Lists: a list is a data structure that can hold a collection of elements.
Factor: Factors are used to represent categorical data.
Pick up a vector with 7 elements. Apply the “sd()” function and store the result in an object called R_StandardDeviation_InBuilt. Then, please calculate the standard deviation without using the “sd()” function and show that you get the same results. Store the manual calculation results in an object called R_StandardDeviation_Hand.
vector <- c(1,4,6,7,13,12,8)
R_StandardDeviation_InBuilt <- sd(vector)
R_StandardDeviation_InBuilt
## [1] 4.231402
R_StandardDeviation_Hand <- sqrt(sum((vector - mean(vector))^2 ) / (length(vector)-1))
R_StandardDeviation_Hand
## [1] 4.231402
mad
## function (x, center = median(x), constant = 1.4826, na.rm = FALSE,
## low = FALSE, high = FALSE)
## {
## if (na.rm)
## x <- x[!is.na(x)]
## n <- length(x)
## constant * if ((low || high) && n%%2 == 0) {
## if (low && high)
## stop("'low' and 'high' cannot be both TRUE")
## n2 <- n%/%2 + as.integer(high)
## sort(abs(x - center), partial = n2)[n2]
## }
## else median(abs(x - center))
## }
## <bytecode: 0x0000021a55a3a9f0>
## <environment: namespace:stats>
I guess the mad function stands for “Median Absolute Deviation.” Because I saw a row of code that is the formula to calculate mad.
sd
## function (x, na.rm = FALSE)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
## na.rm = na.rm))
## <bytecode: 0x0000021a5325e4c8>
## <environment: namespace:stats>
I guess the mad function stands for standard deviation. Because I saw the formula.
My own functions:
meter_to_decimetre <- function(meter1){
decimetre <- (meter1 * 10)
return(decimetre)
}
meter_to_decimetre(1)
## [1] 10
data(cars)
library(ggplot2)
library(psych)
##
## 载入程辑包:'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(cars$speed)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 50 15.4 5.29 15 15.47 5.93 4 25 21 -0.11 -0.67 0.75
describe(cars$dist)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 50 42.98 25.77 36 40.88 23.72 2 120 118 0.76 0.12 3.64
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
average_speed <- mean(cars$speed)
ggplot(data = cars, aes(x = speed)) +
geom_density(fill = 'red') + ggtitle("Density of speed") +
geom_vline(xintercept = average_speed)
average_dist <- mean(cars$dist)
ggplot(data = cars, aes(x = dist)) +
geom_density(fill = 'red') + ggtitle("Density of distance") +
geom_vline(xintercept = average_dist)
library(moments)
skewness(cars$speed)
## [1] -0.1139548
skewness(cars$dist)
## [1] 0.7824835
Based on the graph we can see the density of speed is a left-skewed distribution, I would expect its skewness to be a negative value.
For distance I think it is highly right-skewed and i would expect its skewness to be a positive value.