Example 1

Suppose we would like to estimate the mean height (in inches) of a certain type of plant in a certain field. We gather a simple random sample of 13 plants and measure the height of each plant.

Solution:

The following code shows how to calculate the sample mean:

#define data
data <- c(8, 8, 9, 12, 13, 13, 14, 15, 19, 22, 23, 23, 24)
data

##  [1]  8  8  9 12 13 13 14 15 19 22 23 23 24

#calculate sample mean
xbar=mean(data, na.rm = TRUE)
xbar

## [1] 15.61538

print(paste("The sample mean is", round(xbar,2), "inches. This represents our point estimate for the population mean."))

## [1] "The sample mean is 15.62 inches. This represents our point estimate for the population mean."

We can also use the following code to calculate a 95% confidence interval for the population mean:

#find sample size, sample mean, and sample standard deviation
n <- length(data)
s <- sd(data)

#calculate margin of error
margin <- qt(0.975,df=n-1)*s/sqrt(n)
margin2 <-qt(0.05/2,df=n-1, lower.tail = FALSE)*s/sqrt(n)

#calculate lower and upper bounds of confidence interval
low <- xbar - margin2
low

## [1] 12.03575

high <- xbar + margin2
high

## [1] 19.19502

print(paste("The 95% confidence interval for the population mean is [",round(low,2),",", round(high,2),"] inches."))

## [1] "The 95% confidence interval for the population mean is [ 12.04 , 19.2 ] inches."

Example 2

Suppose we would like to estimate the proportion of people in a certain city that support a certain law. We survey a simple random sample of 20 citizens.

The following code shows how to calculate the sample proportion:

#define data
data <- c('Y', 'Y', 'Y', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'Y',
          'N', 'Y', 'Y', 'N', 'N', 'Y', 'Y', 'Y', 'N', 'N')

#find total sample size
n <- length(data)

#find number who responded 'Yes'
k <- sum(data == 'Y') 

#find sample proportion
p <- k/n

p

## [1] 0.6

print(paste("The sample proportion of citizens who support the law is", round(p,2), ". This represents our point estimate for the population proportion."))

## [1] "The sample proportion of citizens who support the law is 0.6 . This represents our point estimate for the population proportion."

We can also use the following code to calculate a 95% confidence interval for the population mean:

#calculate margin of error
margin <- qnorm(0.025, lower.tail=FALSE)*sqrt(p*(1-p)/n)

#calculate lower and upper bounds of confidence interval
low <- p - margin
low

## [1] 0.3852967

high <- p + margin
high

## [1] 0.8147033

print(paste("The 95% confidence interval for the population proportion is [",round(low,2),",", round(high,2),"]."))

## [1] "The 95% confidence interval for the population proportion is [ 0.39 , 0.81 ]."

Example 3

A wine importer needs to report the average percentage of alcohol in bottles of French wine. From experience with previous kinds of wine, the importer believes that alcohol percentages are normally distributed and the population standard deviation is 1.2%. The importer randomly samples 60 bottles of the new wine and obtains a sample mean \(\bar{x}=9.3\%\). Give a 90% confidence interval for the average percentage of alcohol in all bottles of the new wine.

Solution:

From the problem, we know the following. \[\begin{align} n &= 60\\ \sigma&=1.2\\ \bar{x} &=9.3\\ \alpha&=0.10 \end{align} \]

We must first figure out which type of confidence interval to use. Notice that we are trying to estimate the average percentage of alcohol, so our parameter is a mean \(\mu\). Moreover, we are told to assume that the data are normally distributed and the population standard deviation \(\sigma\) is known. Therefore, our confidence interval will be of the form:

\[\bar{x}\pm z_{\alpha/2}{\sigma\over\sqrt{n}}.\] We can now define each object in R and construct the confidence interval.

n = 60
sigma = 1.2
xbar = 9.3
zalpha = qnorm(0.05, mean=0, sd=1, lower.tail=FALSE)

xbar - zalpha*sigma/sqrt(n)

## [1] 9.04518

xbar + zalpha*sigma/sqrt(n)

## [1] 9.55482

Therefore, we are 90% confident that the true average alcohol content in all new bottles of wine is between 9.05% and 9.55%.

Example 4

An economist wants to estimate the average amount in checking accounts at banks in a given region. A random sample of 100 accounts gives \(\bar{x}=£357.60\) and \(s=£140.00\). Give a 95% confidence interval for \(\mu\), the average amount in any checking account at a bank in the given region.

Solution:

From the problem, we know the following. \[\begin{align*} n &= 100\\ \bar{x} &=357.60\\ s &= 140\\ \alpha&=0.5 \end{align*} \]

Here we are not told whether the data are normally distributed. However, it won’t matter because we only have an estimate of \(\sigma\) (remember that among the four types of confidence intervals we considered earlier, there are no differences between case II and III). Therefore, our confidence interval will be of the form: \[\bar{x}\pm t_{n-1,\alpha/2}{s\over\sqrt{n}}. \]

We can again define each object in R and construct the confidence interval.

n = 100
xbar = 357.60
s = 140
talpha = qt(0.025, df=n-1, lower.tail=FALSE)

xbar - talpha*s/sqrt(n)

## [1] 329.821

xbar + talpha*s/sqrt(n)

## [1] 385.379

Therefore, we are 95% confident that the true average account checking account value in the given region is between £329.82 and £385.38.

Example 5

The EuStockMarkets data set in R provides daily closing prices of four major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Using this data set, produce a 99% confidence interval for the average closing price of the UK FTSE.

Solution:

First, let’s load in the data from R.

data(EuStockMarkets)
head(EuStockMarkets)

##          DAX    SMI    CAC   FTSE
## [1,] 1628.75 1678.1 1772.8 2443.6
## [2,] 1613.63 1688.5 1750.5 2460.2
## [3,] 1606.51 1678.6 1718.0 2448.2
## [4,] 1621.04 1684.1 1708.1 2470.4
## [5,] 1618.16 1686.6 1723.1 2484.7
## [6,] 1610.61 1671.6 1714.3 2466.8

Now let’s pull the subset of data we care about (i.e., the UK FTSE column).

uk = EuStockMarkets[,4]

Notice that we are not told anything about the true distribution of the data. Therefore, our confidence interval will be of the form: \[\bar{x}\pm t_{n-1,\alpha/2}{s\over\sqrt{n}}. \]

Next, let’s compute each component necessary to construct the 99% confidence interval.

n = length(uk)
xbar = mean(uk)
s = sd(uk)
talpha = qt(0.005, df=n-1, lower.tail=FALSE)

xbar - talpha*s/sqrt(n)

## [1] 3507.248

xbar + talpha*s/sqrt(n)

## [1] 3624.038

Therefore, we are 99% confident that the true closing price for the UK FTSE index is between £3,507.25 and £3,624.04.

R Examples and Illustrations

STAT 115 Basic Statistical Methods

First Semester, A.Y. 2022 - 2023

Example 1

Solution:

Example 2

Example 3

Solution:

Example 4

Solution:

Example 5

Solution: