Centrality: Mean, Median, Mode
Centrality: Mean, Median, Mode
0 . 0 . 0 . 0 . 0
o . o . o . o . o . o
Centrality: Mean, Median, Mode
Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4
2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4 ( n=8 , n/2 = 4)
Centrality: Mean, Median, Mode
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Quantiles, Percentiles, and the Five-Number Summary
The median = the 0.5th quantile = The 50th percentile
Sample : 2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4
2 , 2 , 2 , 2.2 , 3 , 3 , 4, 4.4
0.5th quantile = median = 2.6
Quantiles, Percentiles, and the Five-Number Summary
xdata <- c(2,4.4,3,3,2,2.2,2,4)
quantile(xdata,prob=0.8) # the 0.8th quan- tile (or 80th percentile)## 80%
## 3.6
## 0% 25% 50% 75% 100%
## 2.00 2.00 2.60 3.25 4.40
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 2.600 2.825 3.250 4.400
Quantiles, Percentiles, and the Five-Number Summary
A quartile is a type of quantile.
Quantiles , Percentiles, and the Five-Number Summary
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Spread: Variance, Standard Deviation, and the Interquartile Range
## [1] 2.825
## [1] 2.825
plot(xdata,type="n",xlab="values",ylab="data vector",yaxt="n",bty="n", xlim = c(0,7) )
abline(h=c(2.5,3),lty=2,col="red")
abline(v=2.825,lwd=2,lty=3)
text(c(0.0,0.0),c(2.5,3),labels=c("x","y"))
text(c(2.825),c(4),labels=c("mean"))
points(jitter(c(xdata,ydata)),c(rep(2.5,length(xdata)), rep(3,length(ydata))))the observations in ydata are more “spread out”
Spread: Variance, Standard Deviation, and the Interquartile Range
Spread: Variance, Standard Deviation, and the Interquartile Range
2 , 4.4 , 3 , 3 , 2 , 2.2 , 2 , 4 ( mean = 2.825)
Spread: Variance, Standard Deviation, and the Interquartile Range
0.953 represents the average distance of each observation from the mean
Spread: Variance, Standard Deviation, and the Interquartile Range,
Spread: Variance, Standard Deviation, and the Interquartile Range
## [1] 0.9078571
## [1] 0.9528154
## [1] 1.25
Spread: Variance, Standard Deviation, and the Interquartile Range
xdata <- c(2,4.4,3,3,2,2.2,2,4)
Covariance and Correlation
Covariance and Correlation
x = {x1,x2,…,xn}
y = {y1,y2,…,yn}
for i = 1,. . . ,n
When you get a positive result for rxy, it shows that there is a positive lin- ear relationship. When rxy = 0, this indicates that there is no linear relationship.
Covariance and Correlation
x = {2,4.4,3,3,2,2.2,2,4}
y = {1,4.4,1,3,2,2.2,2,7}
mean x and y = 2.825
positive relationship
Covariance and Correlation
Covariance and Correlation
x <- c(2,4.4,3,3,2,2.2,2,4)
y <- c(1,4.4,1,3,2,2.2,2,7)
plot(x,y, col="red", pch=13,cex=1.5 )
abline(lm(y~x), col="blue")Covariance and Correlation
Covariance and Correlation
Most common of these is Pearson’s product-moment correlation coefficient. (R default)
The correlation coefficient estimates the nature of the linear relationship between two sets of observations
−1 ≤ ρxy ≤ 1
ρxy = 1, which is a perfect positive linear relationship
Covariance and Correlation
x = {2,4.4,3,3,2,2.2,2,4}
y = {1,4.4,1,3,2,2.2,2,7}
(mean x and y = 2.825)
(sx = 0.953 and sy = 2.013)
(rxy = 1.479)
ρxy is positive
Covariance and Correlation
x <- c(2,4.4,3,3,2,2.2,2,4)
y <- c(1,4.4,1,3,2,2.2,2,7)
plot(x,y, col="red", pch=13,cex=1.5 )
abline(lm(y~x), col="blue")Covariance and Correlation
## [1] 1.479286
## [1] 0.7713962
Covariance and Correlation
Barplots
station_data <- read.csv("https://web.itu.edu.tr/~tokerem/18397_Cekmekoy_Omerli_15min.txt", header=T, sep = ";")
head(station_data)## sta_no year month day hour minutes temp precipitation pressure
## 1 18397 2017 7 26 18 0 23.9 0 1003.0
## 2 18397 2017 7 26 18 15 23.9 0 1003.1
## 3 18397 2017 7 26 18 30 23.8 0 1003.2
## 4 18397 2017 7 26 18 45 23.8 0 1003.2
## 5 18397 2017 7 26 19 0 23.6 0 1003.2
## 6 18397 2017 7 26 19 15 23.2 0 1003.1
## relative_humidity
## 1 94
## 2 95
## 3 96
## 4 96
## 5 96
## 6 97
Barplots
## [1] 23.9 23.9 23.8 23.8 23.6 23.2
Barplots
## [1] 23.9 23.9 23.8 23.8 23.6 23.2 23.2 23.1 23.0 22.8 22.5 22.4 22.2 22.3 22.2
## [16] 21.7 21.9 21.7 21.6 22.2 22.2 22.1 22.3 22.5 22.3 22.2 22.5 22.6 22.6 22.6
## [31] 22.6 22.7 22.6 22.5 22.6 22.5 22.5 22.4 22.5 22.4 22.5 22.6 23.0 23.2 24.2
## [46] 25.1 25.5 26.1 27.1 26.9 27.6 28.0 28.4 28.5 29.3 30.2 30.1 30.1 30.4 30.4
## [61] 30.8 30.9 31.0 31.5 31.2 30.9 30.9 30.4 30.4 30.0 29.2 29.5 29.4 29.3 29.6
## [76] 28.8 29.0 29.0 29.2 28.4 27.8 27.4 26.6 26.2 25.8 25.6 25.4 24.2 19.2 19.5
## [91] 20.1 20.8 21.2 21.4 21.4 21.4 21.2 21.0 20.8 20.9 20.8 20.7 20.8 20.8 20.9
## [106] 20.6 20.6 20.5 20.7 20.8 20.4 20.4 20.6 20.5 20.4 20.5 20.5 20.6 20.5 20.5
## [121] 20.4
Barplots
##
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9 21 21.2 21.4 21.6 21.7 21.9 22.1
## 1 1 1 4 6 4 2 6 2 1 2 3 1 2 1 1
## 22.2 22.3 22.4 22.5 22.6 22.7 22.8 23 23.1 23.2 23.6 23.8 23.9 24.2 25.1 25.4
## 5 3 3 8 7 1 1 2 1 3 1 2 2 2 1 1
## 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8 28 28.4 28.5 28.8 29
## 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2
## 29.2 29.3 29.4 29.5 29.6 30 30.1 30.2 30.4 30.8 30.9 31 31.2 31.5
## 2 2 1 1 1 1 2 1 4 1 3 1 1 1
Barplots
Histogram
Histogram
library(ggplot2)
qplot(station_data$temp,geom="blank",main="Temp Hist",xlab="Temp")+
geom_histogram(color="black",fill="white",breaks=seq(19,32,1),closed="right") +
geom_vline(mapping=aes(xintercept=c(mean(station_data$tem), median(station_data$tem)), linetype=factor(c("mean","median"))) , col=c("blue","red"),show.legend=TRUE)+
scale_linetype_manual(values=c(2,3)) +
labs(linetype="")Boxplot
Boxplot
Histogram and Boxplot
Scatter Plots
A probability is a number that describes the “magnitude of chance” associated with making a particular observation or statement.
? what is the probability of rolling a 3 with this black dice
It’s always a number between 0 and 1 (inclusive) and is often expressed as a fraction.
Quantitative data can be counted, measured, and expressed using numbers. Qualitative data is descriptive and conceptual.
Discrete data is information that can only take certain values. Continuous data is data that can take any value
Probability mass and density functions are used to describe discrete and continuous probability distributions, respectively.
CDF describes the probability (with a given probability distribution) at less than or equal to x.
X.outcomes <- c(2:12)
X.prob <- c((1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36))
barplot(X.prob,ylim=c(0,0.20),names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X = x)", main = "probability mass function")X.outcomes <- c(2:12)
X.prob <- c((1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36))
X.cumul <- cumsum(X.prob)
barplot(X.cumul,names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X <= x)", main = "cumulative distribution function")X.outcomes <- c(2:12)
X.prob <- c((1/36),(2/36),(3/36),(4/36),(5/36),(6/36),(5/36),(4/36),(3/36),(2/36),(1/36))
barplot(X.prob,ylim=c(0,0.20),names.arg=X.outcomes,space=0,xlab="x",ylab="Pr(X = x)", main = "probability mass function")
abline(v=c(0.5:10.5))lower < 7 < upper
X >= 2 & X <= 7
(X[lower] - 1)/36
X > 7 & X <= 12
13 - X[upper])/36
X.outcomes <- c(1,2,3,4,5,6,7,8,9,10,11,12,13)
lower <- X.outcomes >= 2 & X.outcomes <= 7
upper <- X.outcomes > 7 & X.outcomes <= 12
fx <- rep(0,length(X.outcomes))
fx[lower] <- (X.outcomes[lower] - 1)/36
fx[upper] <- (13 - X.outcomes[upper])/36
plot(X.outcomes,fx,type="l",ylab="f(x)", xlim = c(0,14), main = "probability density function")
abline(h=0,col="gray",lty=2)fx.specific <- (4.5-1)/36
fx.specific.area <- 3.5*fx.specific*0.5
fx.specific.vertices <- rbind(c(1,0),c(4.5,0),c(4.5,fx.specific))
plot(X.outcomes,fx,type="l",ylab="f(x)", xlim = c(0,14), main = "probability density function")
abline(h=0,col="gray",lty=2)
polygon(fx.specific.vertices,col="gray",border=NA)
abline(v=4.5,lty=3)
text(4,0.01,labels=fx.specific.area)Symmetry : Draw a vertical line down the center, and it is equally reflected with 0.5 probability.
Skewness : If a distribution is asymmetric, look at the “tail” of a distribution. Positive or right skew indicates a tail extending longer to the right of center.
Modality : Describes the number of easily identifiable peaks in the distribution of interest. Unimodal, bimodal, and trimodal…
Kurtosis : Measure of the “tailedness” of the probability distribution. Positive kurtosis indicates a distribution where more of the values are located in the tails
##
## 19.2 19.5 20.1 20.4 20.5 20.6 20.7 20.8 20.9 21 21.2 21.4 21.6 21.7 21.9 22.1
## 1 1 1 4 6 4 2 6 2 1 2 3 1 2 1 1
## 22.2 22.3 22.4 22.5 22.6 22.7 22.8 23 23.1 23.2 23.6 23.8 23.9 24.2 25.1 25.4
## 5 3 3 8 7 1 1 2 1 3 1 2 2 2 1 1
## 25.5 25.6 25.8 26.1 26.2 26.6 26.9 27.1 27.4 27.6 27.8 28 28.4 28.5 28.8 29
## 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 2
## 29.2 29.3 29.4 29.5 29.6 30 30.1 30.2 30.4 30.8 30.9 31 31.2 31.5
## 2 2 1 1 1 1 2 1 4 1 3 1 1 1
## Var1 Freq
## 1 19.2 1
## 2 19.5 1
## 3 20.1 1
## 4 20.4 4
## 5 20.5 6
## 6 20.6 4
## 7 20.7 2
## 8 20.8 6
## 9 20.9 2
## 10 21 1
## 11 21.2 2
## 12 21.4 3
## 13 21.6 1
## 14 21.7 2
## 15 21.9 1
## 16 22.1 1
## 17 22.2 5
## 18 22.3 3
## 19 22.4 3
## 20 22.5 8
## 21 22.6 7
## 22 22.7 1
## 23 22.8 1
## 24 23 2
## 25 23.1 1
## 26 23.2 3
## 27 23.6 1
## 28 23.8 2
## 29 23.9 2
## 30 24.2 2
## 31 25.1 1
## 32 25.4 1
## 33 25.5 1
## 34 25.6 1
## 35 25.8 1
## 36 26.1 1
## 37 26.2 1
## 38 26.6 1
## 39 26.9 1
## 40 27.1 1
## 41 27.4 1
## 42 27.6 1
## 43 27.8 1
## 44 28 1
## 45 28.4 2
## 46 28.5 1
## 47 28.8 1
## 48 29 2
## 49 29.2 2
## 50 29.3 2
## 51 29.4 1
## 52 29.5 1
## 53 29.6 1
## 54 30 1
## 55 30.1 2
## 56 30.2 1
## 57 30.4 4
## 58 30.8 1
## 59 30.9 3
## 60 31 1
## 61 31.2 1
## 62 31.5 1
Density, distribution function, quantile function and random generation for the binomial distribution with parameters size and prob.
## [1] 0.0002441406 0.0029296875 0.0161132813 0.0537109375 0.1208496094
## [1] 0.1938477
λp should be interpreted as the “mean number of occurrences”
There are three functions associated with Binomial distributions.
## [1] 0.1041956
The uniform distribution is a simple density function that describes a continuous random variable whose interval of possible values offers no fluctuations in probability.
## [1] 2.821992 2.765888 2.561574 2.424556 2.928520 1.079249 2.050842 1.525587
## [9] 1.482912 1.915594
## r1
## 1.08275828650221 1.22311208304018 1.29288631817326 1.4663178306073
## 1 1 1 1
## 1.68820196110755 1.88678338751197 2.05697202542797 2.19751014048234
## 1 1 1 1
## 2.34811947587878 2.48047358682379
## 1 1
The standard normal distribution is a normal distribution with a mean of 0 and standard deviation of 1.
## [1] -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3
## [20] 4 5 6 7 8 9 10 11 12 13 14 15
`
## [1] 0.01125172 0.01340898 0.01578770 0.01836489 0.02110592 0.02396441
## [7] 0.02688287 0.02979414 0.03262365 0.03529236 0.03772032 0.03983055
## [13] 0.04155314 0.04282898 0.04361321 0.04387780 0.04361321 0.04282898
## [19] 0.04155314 0.03983055 0.03772032 0.03529236 0.03262365 0.02979414
## [25] 0.02688287 0.02396441 0.02110592 0.01836489 0.01578770 0.01340898
## [31] 0.01125172