Introduction

Questions for the sampling distribution

  1. Suppose a population of companies has a mean score of 400 and a standard deviation of 24 for their credit ratings. If a random sample of size 144 is drawn from the population, calculate the probability of drawing a sample of companies with a mean credit score between 395.5 and 404.5.

Population parameters: \(\mu = 400\) and \(\sigma = 24\). Sample size is relatively large. Therefore the sample mean \(\bar{X}\) is normally distributed with a mean of 400 and a standard deviation of \(\frac{\sigma}{\sqrt{n}} = \frac{24}{\sqrt{144}} = \frac{24}{12} = 2\)

\[P \left(395.5 \leq \bar{X} \leq 404.5 \right)= P \left(\frac{395.4-400}{2} \leq \frac{\bar{X} - \mu}{\sigma} \leq \frac{404.5 - 400}{2} \right)\]

We need to find the probability that something will be between -2.25 and +2.25.

\[P(-2.25 \leq z \leq 2.25)\] If 1 gives the probability that a standard normal will be below the 2.25 quantile (2.25 standard deviations), to get the probability that something is between -2.25 and plus 2.25 we would need to calculate the weight that is less than 2.25 and the mean multiplied by two to get the symetry.

round(2 * (pnorm(2.25) - pnorm(0)), 2)
## [1] 0.98

This is the light blue area for the following t-distribution.

  1. Below you are given ages that were obtained by taking a random sample of 6 MBA students. Assume the population has a normal distribution. It represents 98 percent.

                             40 42 43 39 37 39
  1. Calculate the point estimate of population mean, μ?
  2. Construct a 90% confidence interval for the average age of the students.
  3. Construct a 99% confidence interval for the average age of the students.

popdata <- c(40, 42, 43, 39, 37, 39)
popmean <- sum(popdata)/6
popvar <- sum((40-popmean)^2 + (42-popmean)^2 + (43 - popmean)^2 + 
              (39 - popmean)^2 + (37 - popmean)^ 2 + (39 - popmean)^2) / (length(popdata) -1)
popsd <- sqrt(popvar)

The population mean is 40 and the population standard deviation is 2.19.

This is a small sample with a normally distribuited, unknown variance. For the \(90\%\) confidence intervals we use the t-distribution and find the 5% cutoff (half of 1 - 90%). There are n-1 (6) degrees of freedom.

t_stat_90 <- qt(0.05, 5)
myconfint_90 <- t_stat_90 * popsd/sqrt(length(popdata))
popmean + myconfint_90
## [1] 38.19769
popmean - myconfint_90
## [1] 41.80231

The \(90\%\) quantile on a t-distribution with 5 degrees of freedom would be -2.015 and the confident interval would be 38.1977 to 41.8023.

=======================

For the \(99\%\) confidence intervals, you would use half of one percent for each tail.

t_stat_99 <- qt(0.005, 5)
myconfint_99 <- t_stat_99 * popsd/sqrt(length(popdata))
popmean + myconfint_99
## [1] 36.39354
popmean - myconfint_99
## [1] 43.60646

Therefore the confidence intervals are 36.3935 and 43.6065.

  1. A sample of 200 women from the labour force found an average wage of £6000 pa. with standard deviation £2500. A sample of 100 men found an average wage of £8000 with standard deviation £1500. Estimate the true difference in wages between men and women using a 95% confidence interval.

Find the confidence intervals for the difference in means. Where \(n_m = 100\) and \(n_w = 200\), \(\bar{M} = 8000\) and \(\bar{W} = 6000\), \(S_m = 1500\) and \(s_w = 2500\).

\[\text{Standard Error of difference} = \sqrt{\frac{s_m^2}{n_m} + \frac{s_w^2}{n_w}} = \sqrt{\frac{1500^2}{100} + \frac{2500^2}{200}} = 231.8\] Given the large sample size, the critical values for 95% confidence intervals are z = 1.96.

Therefore, the critical values will be given by

\[\bar{W_m} - \bar{W_w} \pm z_{\alpha/s} * \sqrt{\frac{s_m^2}{n_m} + \frac{s_w^2}{n_w}}\]

WageDiff <- 8000 - 6000
Z_score_95 <- qnorm(0.975)
StandErr <- sqrt(1500^2/100 + 2500^2/200)
UCI <- WageDiff + Z_score_95 * StandErr
LCI <- WageDiff - Z_score_95 * StandErr

Therefore the 95% confidence interval for the wage difference between men and women is £1546 to £2454.

  1. A different survey, of 50 men and 75 women doing similar jobs, found that women were paid on average £7200, with standard deviation £1225 and men were paid on average £7600 with standard deviation £750. Estimate the difference between male and female wages using these new data. What can be concluded from the results of the two surveys?

Details

  • \(n_w = 75, \bar{W_w} = 7200, s_w = 1225\)
  • \(n_m = 50, \bar{W_m} = 6500, s_m = 750\)

The wage difference is \(\bar{W_m} -\bar{W_w} = 7600 - 7200 = 400\)

The standard error of the difference is \(\sqrt{\frac{s_m^2}{n_m} + \frac{s_w^2}{n_w}} = \sqrt{\frac{720^2}{50} + \frac{1225^2}{75}} = 176.8\)

Given the large sample, z-score for critical values

# 2.5 percent in each tail
qnorm(0.975)
## [1] 1.959964

Critical values are

\[\bar{W_m} - \bar{W_w} \pm z_{\alpha/2} * \sqrt{\frac{s_m^2}{n_m} + \frac{s_w^2}{n_w}} = 400 \pm 1.96 * 176.8\]

# Infomration
Nmen <- 50
Nwomen <- 75
Mmean <- 7600
Wmean <- 7200
s_m <- 750
s_w <- 1225
#===================
Mmean - Wmean - qnorm(0.975) * sqrt(s_m^2/Nmen + s_w^2/Nwomen)
## [1] 53.47785
Mmean - Wmean + qnorm(0.975) * sqrt(s_m^2/Nmen + s_w^2/Nwomen)
## [1] 746.5221

  1. An IT manager considers that the average number of hits on the company intranet pages should be 75. The mean number of hits per week on a random sample of 60 pages is 71.4 with a standard deviation of 31.9. Is there sufficient evidence to support the IT manager’s claim using 95% confidence intervals?

The information that we have is

\(mu = 75; \bar{x} = 71.4; s = 31.9; n = 60\)

The null and the alternative hypothesis are:

  • \(H0: \mu = 75\)
  • \(H1: \mu \neq 75\)

As the sample size is more than 30 we can assume a normal distribution for the mean and use a two-tailed z-test.

\[z = \frac{\bar{x} - \mu}{s/\sqrt{n}} = \frac{71.4 - 75}{31.9 / \sqrt{60}} = -0.87\]

The critical for z is

qnorm(p = c(0.025, 0.975))
## [1] -1.959964  1.959964

That is, -1.96 and +1.96. This is 95% for a two-tailed test of the standard normal distribution.

Therefore, there is insufficient information to reject the null that the mean hits are 75.

  1. It is claimed that the starting salary for Economics graduates in 2020/21 was £28,000. A random sample of 50 Economics graduates in 2024, were asked for their starting annual salary. The mean salary was £32,200 with a standard deviation of £8,000. Is there sufficient evidence to suggest that the starting salary for economics graduates has increased?

The information is $= £28,000; {x} = \(32,200; s = £8,000; n = 50\)

  • \(H0 = 28000\)
  • \(H1 > 28000\)

The sample size is greater than 30 so we use a one-tailed z-test.

\[z = \frac{\bar{x} = \mu}{s/\sqrt{n}} = \frac{32200 - 28000}{8000/\sqrt{50}} = 3.71\]

The critical value for a onesided z-test is

qnorm(0.95)
## [1] 1.644854

That is 1.64. Therefore, we reject the null and assume that there is sufficient evidence to conclude that the mean salery for economists has increased.