In this handout, I use ChatGPT 4.0 and the KeyMate.AI Search plugin
to synthesize the information from the textbook, give examples, and
create R code. All prompts are exact, but some of the output may be
modified for correctness.
Hanck, Arnold, Gerber, and Schmelzer (2023). Introduction to
Econometrics with R. https://www.econometrics-with-r.org/index.html. This
book is licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License.
Prompt
Synthesize the information in these websites. Include all relevant
formulas and definitions.
https://www.econometrics-with-r.org/3.1-estimation-of-the-population-mean.html
https://www.econometrics-with-r.org/3.2-potsm.html
https://www.econometrics-with-r.org/3.3-hypothesis-tests-concerning-the-population-mean.html
3.1 Estimation of the Population Mean
- Population Mean (μ): The average of all possible
observations in a population.
- Sample Mean (x̄): The average of observations in a
sample.
- Central Limit Theorem: For a sufficiently large
sample size, the distribution of the sample mean will be approximately
normally distributed, regardless of the population’s distribution.
- Standard Error (SE): Measures the dispersion of the
sample mean. It is calculated as: \[ SE =
\frac{σ}{\sqrt{n}} \] Where:
- \(σ\) is the population standard
deviation.
- \(n\) is the sample size.
- Confidence Interval: Gives an estimated range of
values which is likely to include the population mean. It is calculated
as: \[ x̄ ± z \times SE \] Where:
- \(z\) is the z-value corresponding
to the desired confidence level.
3.3 Hypothesis Tests Concerning the Population Mean
- Null Hypothesis (H0): A statement that there is no
effect or no difference. It is a starting point for statistical
significance testing.
- Alternative Hypothesis (H1): A statement that
indicates the presence of an effect or difference.
- Test Statistic: A standardized value that is
calculated from sample data during a hypothesis test. It helps in
deciding whether to reject the null hypothesis.
- P-value: The probability of observing a test
statistic as extreme as, or more extreme than, the statistic computed
from the sample, given that the null hypothesis is true.
- Decision Rule: If the p-value is less than the
chosen significance level (α), then the null hypothesis is rejected in
favor of the alternative hypothesis.
- Type I Error: Occurs when the null hypothesis is
true, but it is rejected.
- Type II Error: Occurs when the null hypothesis is
false, but it is not rejected.
This synthesis provides an overview of the concepts related to the
estimation of the population mean, point of symmetry, median, and
hypothesis testing concerning the population mean.
Prompt
Go into more detail about the population mean, standard error, and
confidence intervals.
Population Mean (μ)
The population mean, denoted by \(\mu\), represents the average of all
possible observations in a population. For instance, consider an
economic variable like the hourly earnings of college graduates. To
precisely calculate \(\mu\), one would
need to interview every working graduate in the economy. However, due to
time and cost constraints, this is often not feasible. Instead, a random
sample is drawn, and the sample mean is used as an estimator for the
population mean.
Standard Error (SE)
The standard error measures the dispersion or variability of the
sample mean. It provides an indication of how much the sample mean is
expected to vary from the true population mean. The formula for the
standard error is: \[ SE = \frac{σ}{\sqrt{n}}
\] Where:
- \(σ\) is the population standard
deviation.
- \(n\) is the sample size.
The standard error is crucial because it helps in determining the
precision of the sample mean as an estimate of the population mean. A
smaller SE indicates that the sample mean is a more precise estimator of
the population mean.
Confidence Interval
A confidence interval provides a range of values within which the
true population mean is likely to fall, with a certain level of
confidence. The formula for a confidence interval (for large sample
sizes) is: \[ x̄ ± z \times SE \]
Where:
- \(x̄\) is the sample mean.
- \(z\) is the z-value corresponding
to the desired confidence level (e.g., 1.96 for a 95% confidence
level).
- \(SE\) is the standard error.
The width of the confidence interval provides insight into the
precision of the estimate. A narrower confidence interval indicates a
more precise estimate of the population mean. Conversely, a wider
interval suggests more uncertainty.
In the context of the provided content, consider hourly earnings of
college graduates. If we were to draw a sample and calculate the sample
mean, the standard error would give us an idea of how much this sample
mean might vary from the true average earnings of all college graduates.
Using this standard error, we can then construct a confidence interval
to estimate a range within which the true average earnings likely
fall.
In summary, while the population mean provides an overall average,
the standard error and confidence intervals offer insights into the
reliability and precision of estimates derived from sample data.
Prompt
Create a couple real-world, economics related examples of creating
hypothesis tests concerning the population mean.
Example 1: Average Household Income
Background: A government agency believes that the
average annual household income in a particular city is $50,000. A local
economist thinks that this figure is outdated and that the average
income has increased due to recent economic growth in the region.
Hypothesis Test:
Null Hypothesis (H0): The average annual
household income \(\mu\) is $50,000. (
H0: = $50,000 )
Alternative Hypothesis (H1): The average annual
household income \(\mu\) is greater
than $50,000. ( H1: > $50,000 )
Procedure:
- Collect a random sample of annual incomes from households in the
city.
- Calculate the sample mean and standard error.
- Conduct a one-tailed t-test to determine if the sample mean is
significantly greater than $50,000.
- If the p-value is less than a significance level (e.g., 0.05),
reject the null hypothesis.
Example 2: Unemployment Rate
Background: A recent report suggests that the
national unemployment rate is 6%. An economist believes that in a
specific state, due to recent layoffs in major industries, the
unemployment rate is higher than the national average.
Hypothesis Test:
Procedure:
- Collect a random sample of employment statuses from residents of the
state.
- Calculate the sample unemployment rate.
- Conduct a one-tailed z-test or t-test (depending on sample size and
variance known) to determine if the sample unemployment rate is
significantly greater than 6%.
- If the p-value is less than a significance level (e.g., 0.05),
reject the null hypothesis.
In both examples, the hypothesis tests help in making informed
decisions based on sample data and provide insights into the economic
conditions of the region or state in question.
Prompt
Create some example problems where students must calculate the
z-statistic and conduct a hypothesis test from data.
Example 1: Exam Scores
Background: The national average score for a
standardized exam is known to be 500 with a standard deviation of 100. A
school believes that its students, on average, score above the national
average. To test this claim, they randomly select 36 students and find
that their average score is 520.
Problem:
- State the null and alternative hypotheses.
- Calculate the z-statistic.
- Using a significance level of 0.05, conduct a hypothesis test to
determine if the school’s claim is valid.
Solution:
- Null Hypothesis (H0): \(\mu = 500\)
- Alternative Hypothesis (H1): \(\mu > 500\)
\(z = \frac{x̄ -
\mu}{\frac{σ}{\sqrt{n}}}\)
\(z = \frac{520 -
500}{\frac{100}{\sqrt{36}}} = \frac{20}{10} = 2\)
For a one-tailed test at a significance level of 0.05, the
critical z-value is approximately 1.645. Since 2 > 1.645, we reject
the null hypothesis. The school’s claim is valid.
Example 2: Soda Can Filling
Background: A soda company claims that their machine
fills cans with an average of 12 ounces of soda. A quality control team
suspects that the machine is not filling to that level. They randomly
select 49 cans and find that the average fill is 11.95 ounces with a
standard deviation of 0.3 ounces.
Problem:
- State the null and alternative hypotheses.
- Calculate the z-statistic.
- Using a significance level of 0.05, conduct a hypothesis test to
determine if the machine is filling cans with less than 12 ounces of
soda.
Solution:
- Null Hypothesis (H0): \(\mu = 12\)
- Alternative Hypothesis (H1): \(\mu < 12\)
\(z = \frac{x̄ -
\mu}{\frac{σ}{\sqrt{n}}}\)
\(z = \frac{11.95 -
12}{\frac{0.3}{\sqrt{49}}} = \frac{-0.05}{0.0429} \approx
-1.165\)
For a one-tailed test at a significance level of 0.05, the
critical z-value is approximately -1.645. Since -1.165 > -1.645, we
fail to reject the null hypothesis. The machine appears to be filling
cans correctly.
Prompt
Go into detail about the difference between one-sample, two-sample,
and paired t-tests.
1. One-sample t-test:
Purpose: Used to test if the mean of a single sample
is significantly different from a known or hypothesized population
mean.
Hypotheses:
- Null Hypothesis (H0): The sample mean is equal to
the population mean. \(H0: \mu =
\mu_0\)
- Alternative Hypothesis (H1): The sample mean is not
equal to the population mean. \(H1: \mu \neq
\mu_0\)
Example: Suppose a manufacturer claims that a type
of battery lasts 100 hours on average. You take a sample of batteries,
test them, and want to see if the average lifespan differs from the
claimed 100 hours.
2. Two-sample t-test (Independent t-test):
Purpose: Used to compare the means of two
independent samples to see if they come from populations with the same
mean.
Hypotheses:
- Null Hypothesis (H0): The means of the two
populations are equal. \(H0: \mu_1 =
\mu_2\)
- Alternative Hypothesis (H1): The means of the two
populations are not equal. \(H1: \mu_1 \neq
\mu_2\)
Example: Imagine you want to test if there’s a
difference in average test scores between students taught by two
different teachers. You would collect scores from a sample of students
from each teacher’s class and then compare the means using the
two-sample t-test.
3. Paired t-test (Dependent t-test):
Purpose: Used to compare the means of the same group
at two different times or under two different conditions. The
observations are paired, meaning each observation in one sample
corresponds to an observation in the other sample.
Hypotheses:
- Null Hypothesis (H0): The mean difference between
paired observations is zero. \(H0: \mu_d =
0\)
- Alternative Hypothesis (H1): The mean difference
between paired observations is not zero. \(H1:
\mu_d \neq 0\)
Example: Consider a clinical trial where you want to
test the effectiveness of a new drug. Patients’ health metrics are
measured before taking the drug and after completing the treatment.
Since the measurements before and after are from the same individuals,
the data is paired.
Prompt
Create some examples of using R to conduct hypothesis testing of
means.
Example 1: One-sample t-test
Suppose we have a sample of exam scores from 10 students, and we want
to test if the average score is different from 75.
# Sample data
scores <- c(78, 82, 74, 69, 77, 83, 72, 76, 80, 71)
# One-sample t-test
t.test(scores, mu = 75)
One Sample t-test
data: scores
t = 0.80579, df = 9, p-value = 0.4411
alternative hypothesis: true mean is not equal to 75
95 percent confidence interval:
72.83115 79.56885
sample estimates:
mean of x
76.2
Example 2: Two-sample t-test
Imagine we have exam scores from two different classes, and we want
to test if there’s a significant difference in their means.
# Sample data
class1_scores <- c(78, 82, 74, 69, 77, 83, 72, 76, 80, 71)
class2_scores <- c(85, 88, 84, 86, 87, 89, 83, 84, 82, 81)
# Two-sample t-test
t.test(class1_scores, class2_scores)
Welch Two Sample t-test
data: class1_scores and class2_scores
t = -5.1137, df = 14.024, p-value = 0.0001568
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.348348 -5.051652
sample estimates:
mean of x mean of y
76.2 84.9
Example 3: Paired t-test
Suppose we measure the blood pressure of 5 patients before and after
a treatment, and we want to test if the treatment had a significant
effect.
# Sample data
before_treatment <- c(150, 165, 155, 142, 138)
after_treatment <- c(145, 160, 152, 140, 135)
# Paired t-test
t.test(before_treatment, after_treatment, paired = TRUE)
Paired t-test
data: before_treatment and after_treatment
t = 6, df = 4, p-value = 0.003883
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
1.934133 5.265867
sample estimates:
mean difference
3.6
Example 4: One-sample z-test (using external package)
For large samples, we might prefer a z-test. Suppose we have a large
sample of weights from a factory, and we want to test if the average
weight is different from 50 grams. (Note: R doesn’t have a built-in
function for the one-sample z-test, so we’ll use the z.test
function from the BSDA package.)
# Install and load the BSDA package
install.packages("BSDA")
also installing the dependencies ‘proxy’, ‘e1071’
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.3/proxy_0.4-27.zip'
Content type 'application/zip' length 179924 bytes (175 KB)
downloaded 175 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.3/e1071_1.7-13.zip'
Content type 'application/zip' length 653294 bytes (637 KB)
downloaded 637 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.3/BSDA_1.2.1.zip'
Content type 'application/zip' length 901430 bytes (880 KB)
downloaded 880 KB
package ‘proxy’ successfully unpacked and MD5 sums checked
package ‘e1071’ successfully unpacked and MD5 sums checked
package ‘BSDA’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\rselby\AppData\Local\Temp\RtmpYT4aIL\downloaded_packages
library(BSDA)
Loading required package: lattice
Attaching package: ‘BSDA’
The following object is masked from ‘package:datasets’:
Orange
# Sample data (assuming a large sample)
weights <- rnorm(1000, mean = 50.5, sd = 5) # Generating random data around 50.5 with sd = 5
# One-sample z-test
z.test(weights, mu = 50, sigma.x = sd(weights))
One-sample z-Test
data: weights
z = 3.6077, p-value = 0.0003089
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
50.25829 50.87274
sample estimates:
mean of x
50.56552
