Discussion 3

Haiding Luo

2023.9.19

Part I

Write one paragraph each (in your own words) describing what are classes and one paragraph on what are data structures (with examples).

Classes refer to the data types or structures used to represent and manipulate data.

Part II

Pick up a vector with 7 elements.  Apply the “sd()” function and store the result in an object called R_StandardDeviation_InBuilt.  Then, please calculate the standard deviation without using the “sd()” function and show that you get the same results. Store the manual calculation results in an object called R_StandardDeviation_Hand.

vector <- c(1,4,6,7,13,12,8)
R_StandardDeviation_InBuilt <- sd(vector)
R_StandardDeviation_InBuilt
## [1] 4.231402
R_StandardDeviation_Hand <- sqrt(sum((vector - mean(vector))^2 ) / (length(vector)-1))
R_StandardDeviation_Hand
## [1] 4.231402

Part III

mad
## function (x, center = median(x), constant = 1.4826, na.rm = FALSE, 
##     low = FALSE, high = FALSE) 
## {
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     n <- length(x)
##     constant * if ((low || high) && n%%2 == 0) {
##         if (low && high) 
##             stop("'low' and 'high' cannot be both TRUE")
##         n2 <- n%/%2 + as.integer(high)
##         sort(abs(x - center), partial = n2)[n2]
##     }
##     else median(abs(x - center))
## }
## <bytecode: 0x0000021a55a3a9f0>
## <environment: namespace:stats>

I guess the mad function stands for “Median Absolute Deviation.” Because I saw a row of code that is the formula to calculate mad.

sd
## function (x, na.rm = FALSE) 
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <bytecode: 0x0000021a5325e4c8>
## <environment: namespace:stats>

I guess the mad function stands for standard deviation. Because I saw the formula.

My own functions:

meter_to_decimetre <- function(meter1){
  decimetre <- (meter1 * 10)
  return(decimetre)
}
meter_to_decimetre(1)
## [1] 10

Part IV

data(cars)
library(ggplot2)
library(psych)
## 
## 载入程辑包:'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
describe(cars$speed)
##    vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 50 15.4 5.29     15   15.47 5.93   4  25    21 -0.11    -0.67 0.75
describe(cars$dist)
##    vars  n  mean    sd median trimmed   mad min max range skew kurtosis   se
## X1    1 50 42.98 25.77     36   40.88 23.72   2 120   118 0.76     0.12 3.64
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
average_speed <- mean(cars$speed)
ggplot(data = cars, aes(x = speed)) +
  geom_density(fill = 'red') + ggtitle("Density of speed") +
  geom_vline(xintercept = average_speed)

average_dist <- mean(cars$dist)
ggplot(data = cars, aes(x = dist)) +
  geom_density(fill = 'red') + ggtitle("Density of distance") +
  geom_vline(xintercept = average_dist)

library(moments)
skewness(cars$speed)
## [1] -0.1139548
skewness(cars$dist)
## [1] 0.7824835

Based on the graph we can see the density of speed is a left-skewed distribution, I would expect its skewness to be a negative value.

For distance I think it is highly right-skewed and i would expect its skewness to be a positive value.