Exercise 1.2 Generating samples, histograms and QQ-plots

a

Generate 4 times a random sample of size 15 from the exponential distribution with λ = 4, using rexp(15,4). Plot for each of the 4 samples the histogram and compare. (Use par(mfrow=c(2,2)) to draw the 4 figures in one plot.)

a=rexp(15,4)
b=rexp(15,4)
c=rexp(15,4)
d=rexp(15,4)

par(mfrow = c(2, 2))

hist(a,prob=T)
hist(b,prob=T)
hist(c,prob=T)
hist(d,prob=T)

When compared we see different x-axis, and different y-axis. Some distributions are more to the left, lower side.

b

Repeat part (a) for sample size 1000. What is the main difference between part (a) and this part?

a=rexp(1000,4)
b=rexp(1000,4)
c=rexp(1000,4)
d=rexp(1000,4)

par(mfrow = c(2, 2))

hist(a,prob=T)
hist(b,prob=T)
hist(c,prob=T)
hist(d,prob=T)

The main difference between (a) and (b) is that in b the distribution are more clearly skewed to the left and the distributions in (b) are more alike as compared to (a).

c

Generate several samples from normal distributions and vary with the values for n (sample size), μ (mean) and σ (sd). Plot for each of the samples the normal QQ-plot. Does the shape of the QQ-plot depend on these 3 parameters? If so, in what way? Specify your answer for each of the 3 variables.

par(mfrow = c(2, 2))

x1 = rnorm(50,0,sqrt(2))
qqnorm(x1)

x2 = rnorm(1000,0,sqrt(2))
qqnorm(x2)

x3 = rnorm(1000,100,sqrt(2))
qqnorm(x3)

x4 = rnorm(1000,100,0.1)
qqnorm(x4)

The shape of the QQ-plot does not depend on mean and sd. It does depend on sample size; a larger sample size results in a more linear line. The mean and sd do change the range of scores and therefore the y-axis.

Exercise 1.3 Summarizing data

a

Make numerical and graphical summaries of the birth rates and mortality rates separately. You can use min, max, range, mean, median, sd, var etc. Use the help-function to see what these commands produce.

setwd("C:/Users/Saxion/Desktop/BADS")

mortality <- read.table("C:/Users/Saxion/Desktop/BADS/Statistiek/data_bestanden/mortality.txt", header=TRUE)

#a

summary(mortality$teen)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.30    9.85   11.65   12.43   15.22   20.50

sd (mortality$teen)

## [1] 3.293019

var(mortality$teen)

## [1] 10.84397

summary(mortality$mort)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.20   10.15   10.32   11.30   13.30

sd (mortality$mort)

## [1] 1.349941

var(mortality$mort)

## [1] 1.82234

b

Add numerical and graphical summaries to assess the relation between the two rates.

cor(mortality$teen, mortality$mort)

## [1] 0.5490758

plot(mortality$teen, mortality$mort)

qqplot(mortality$teen, mortality$mort)

##c

What conclusions can be drawn from the summaries in parts (a) and (b)?

Teenage birth rate per 1000 (teen) has a mean of 12.43. Infant mortality rate per 1000 live births (mort) has a mean of 10.32.

A correlation of .55 between Teenage birth rate and Infant mortality rate shows that they are related. The correlation is positive indicating that a higher teenage birth rate and is related to a higher infant mortality rate (idem dito for low rates). A correlation of .55 indicates that the relation is moderate.

Exercise 1.4 Different distributions

Generate the following samples, and plot for each of the generated samples the histogram, the boxplot and the normal QQ-plot. • sample size 10 from the normal distribution with μ = −3 and σ = 5 • sample size 40 from the binomial distribution with 50 trials and p = 0.25 • sample size 60 from the uniform distribution on the interval [min,max]=[-3,3] • sample size 200 from the Poisson distribution with λ = 350 Comment on the plots (symmetry, normality, …). Can you explain what you see in the different plots? Specify your answer per sample.

normaldistribution=rnorm(10, -3, 5)
binomialdistribution=rbinom(100, 40, 0.25)
uniformdistribution=runif(60, min = -3, max = 3)
Poissondistribution=rpois(200, 350)

#create histogram, boxplot and normal QQ-plot

par(mfrow = c(3, 4))
hist(normaldistribution,prob=T, main="Hist. normal dist", xlab="normal dist", ylab="Frequency")
hist(binomialdistribution,prob=T, main="Hist. binomial dist", xlab="binomial dist", ylab="Frequency")
hist(uniformdistribution,prob=T, main="Hist. uniform dist", xlab="uniform dist", ylab="Frequency")
hist(Poissondistribution,prob=T, main="Hist. Poisson dist", xlab="Poisson dist", ylab="Frequency")
boxplot(normaldistribution, main="Boxplot normal dist", xlab="", ylab="normal dist")
boxplot(binomialdistribution, main="Boxplot binomial dist", xlab="", ylab="binomial dist")
boxplot(uniformdistribution, main="Boxplot uniform dist", xlab="", ylab="uniform dist")
boxplot(Poissondistribution, main="Boxplot Poisson dist", xlab="", ylab="Poisson dist")
qqnorm(normaldistribution, pch = 1, frame = FALSE, main="QQplot normal dist", xlab="Theoretical Quantiles", ylab="Sample Quantiles")
qqnorm(binomialdistribution, pch = 1, frame = FALSE, main="QQplot binomial dist", xlab="Theoretical Quantiles", ylab="Sample Quantiles")
qqnorm(uniformdistribution, pch = 1, frame = FALSE, main="QQplot uniform dist", xlab="Theoretical Quantiles", ylab="Sample Quantiles")
qqnorm(Poissondistribution, pch = 1, frame = FALSE, main="QQplot Poisson dist", xlab="Theoretical Quantiles", ylab="Sample Quantiles")

Comments: 1. For the normal distribution (the first column) we see a rather symmetrical and normal distribution (histogram and qq-plot), but since there are a few measures the distribution is not smooth, we see no outliers (boxplot). 2. For the binomial distribution (the second column) we see a somewhat normal distribution (histogram and qq-plot), we see no outliers (boxplot), we see that it it not symmetrical in that it is skewed to the left. The qq-plot shows scores are not continuous, but discrete. 3. For the uniform distribution (the third column) we don’t see a normal distribution (histogram and qq-plot), we see no outliers (boxplot), we see that it it not symmetrical in that the scores are somewhat evenly distributed along the whole range. 4. For the Poisson distribution (the fourth column) we see a rather symmetrical and normal distribution (histogram and qq-plot), we see some outliers (boxplot).

knit2html()

BADS Module Advanced Statistical Methods - Assignment 1

Elian de Kleine, 2792106

13-9-2022

Exercise 1.2 Generating samples, histograms and QQ-plots

a

b

c

Exercise 1.3 Summarizing data

a

b

Exercise 1.4 Different distributions