Welcome to my mathematical statistics report.

We are going to firstly make some descriptive statistics, see boxplot and a histogram in the first section to get to know the data we will be working on. In the second section we are going to focus on Point & Interval Estimation. Finally we are going to make some summary in section no. 3.

1. Descriptive statistics

This is an R Markdown document describing data from questionnaires among the people. What’s our data?

Cross-section wage data consisting of a random sample taken from the U.S. Current Population Survey for the year 1976. There are 526 observations in total. A data frame with 24 columns, and 526 rows.

The visualization of the dataset:

1.1

Boxplot and violinplot showing married and not married comparison:

1.2

Histograms showing wage and its mean (blue) and density (red area under the curve):

2. Point & Interval estimation

A point estimate is a single value estimate of a parameter. For instance, a sample mean is a point estimate of a population mean.

An interval estimate gives a range of values where the parameter is expected to lie. A confidence interval is the most common type of interval estimate.

2.1 Point estimate of population mean

2.1.1

Let’s calculate length, mean and standard deviation:
n <- length(wage1$wage)
xbar <- mean(wage1$wage, na.rm = TRUE)
s <- sd(wage1$wage)
c("Number of wages:"=n, "Mean:"=xbar, "Standard deviation:"=s)
##    Number of wages:               Mean: Standard deviation: 
##          526.000000            5.896103            3.693086

2.1.2

Now we will calculate margin of error and lower and upper bounds of 95% confidence interval
margin <- qt(0.975, df=n-1) * s / sqrt(n)
low <- xbar - margin
high <- xbar + margin
c("From:" = low, "To:" = high)
##    From:      To: 
## 5.579768 6.212437

2.2 Interval estimation

2.2.1

Let’s calculate standard error of the mean.
# Standard error of mean (whole)
s/sqrt(n)
## [1] 0.1610262

2.2.2

Let’s now create empty vectors where we are going to save means and standard deviations of each sample
samp_mean <- rep(NA, 55)
samp_sd <- rep(NA, 55)
samp_n <- 44
for(i in 1:55) {
  samp <- sample(wage1$wage, samp_n)
  samp_mean[i] <- mean(samp)
  samp_sd[i] <- s
}

2.2.3

Let’s get lower and upper bounds of those 55 confidence intervals and view how the first interval will look like:
lower_ie <- samp_mean - 1.96 * samp_sd / sqrt(samp_n)
upper_ie <- samp_mean + 1.96 * samp_sd / sqrt(samp_n)
c("Lower bound:" = lower_ie[1], "Upper bound:" = upper_ie[1])
## Lower bound: Upper bound: 
##     4.561490     6.743964

2.2.4

Finally we can plot our mean confidence interval
plotCI(1:55,
    samp_mean,
    uiw = qnorm(0.975)*samp_sd,
    pt.bg=par("bg"),
    pch=21,
    xlab = "Sample means confidence interval (from 1 to 55)",
    ylab = "Samples of size 44",
    main = "Mean confidence interval")