Discussion 3

Haiding Luo

2023.9.19

Part I

Write one paragraph each (in your own words) describing what are classes and one paragraph on what are data structures (with examples).

Classes refer to the data types or structures used to represent and manipulate data.

Numeric: The numeric data class is used to represent numerical values, including integers and floating-point numbers.
Character: The character class is used fo text data.
Factor: Factors are used to represent categorical data.
Logical: Logical data class has two possible values: true and false.
Integer: the integer data type is used to represent whole numbers.

Data structures are fundamental objects used to store data.
Vectors: Vector is a one-dimensional array that can store elements of the same data type.
Matrix: Matrix is two-dimensional array that can store data of the same data type.
Data frames: Data frames are two-dimensional tabular structures.
Lists: a list is a data structure that can hold a collection of elements.
Factor: Factors are used to represent categorical data.

Part II

Pick up a vector with 7 elements. Apply the “sd()” function and store the result in an object called R_StandardDeviation_InBuilt. Then, please calculate the standard deviation without using the “sd()” function and show that you get the same results. Store the manual calculation results in an object called R_StandardDeviation_Hand.

vector <- c(1,4,6,7,13,12,8)
R_StandardDeviation_InBuilt <- sd(vector)
R_StandardDeviation_InBuilt

## [1] 4.231402

R_StandardDeviation_Hand <- sqrt(sum((vector - mean(vector))^2 ) / (length(vector)-1))
R_StandardDeviation_Hand

## [1] 4.231402

Part III

mad

## function (x, center = median(x), constant = 1.4826, na.rm = FALSE, 
##     low = FALSE, high = FALSE) 
## {
##     if (na.rm) 
##         x <- x[!is.na(x)]
##     n <- length(x)
##     constant * if ((low || high) && n%%2 == 0) {
##         if (low && high) 
##             stop("'low' and 'high' cannot be both TRUE")
##         n2 <- n%/%2 + as.integer(high)
##         sort(abs(x - center), partial = n2)[n2]
##     }
##     else median(abs(x - center))
## }
## <bytecode: 0x0000021a55a3a9f0>
## <environment: namespace:stats>

I guess the mad function stands for “Median Absolute Deviation.” Because I saw a row of code that is the formula to calculate mad.

sd

## function (x, na.rm = FALSE) 
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <bytecode: 0x0000021a5325e4c8>
## <environment: namespace:stats>

I guess the mad function stands for standard deviation. Because I saw the formula.

My own functions:

meter_to_decimetre <- function(meter1){
  decimetre <- (meter1 * 10)
  return(decimetre)
}

meter_to_decimetre(1)

## [1] 10

Part IV

data(cars)
library(ggplot2)
library(psych)

## 
## 载入程辑包：'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

describe(cars$speed)

##    vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 50 15.4 5.29     15   15.47 5.93   4  25    21 -0.11    -0.67 0.75

describe(cars$dist)

##    vars  n  mean    sd median trimmed   mad min max range skew kurtosis   se
## X1    1 50 42.98 25.77     36   40.88 23.72   2 120   118 0.76     0.12 3.64

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

average_speed <- mean(cars$speed)
ggplot(data = cars, aes(x = speed)) +
  geom_density(fill = 'red') + ggtitle("Density of speed") +
  geom_vline(xintercept = average_speed)

average_dist <- mean(cars$dist)
ggplot(data = cars, aes(x = dist)) +
  geom_density(fill = 'red') + ggtitle("Density of distance") +
  geom_vline(xintercept = average_dist)

library(moments)
skewness(cars$speed)

## [1] -0.1139548

skewness(cars$dist)

## [1] 0.7824835

Based on the graph we can see the density of speed is a left-skewed distribution, I would expect its skewness to be a negative value.

For distance I think it is highly right-skewed and i would expect its skewness to be a positive value.