DASS 2016-17 Homework 1

Problem 1

Consider the following table that represents the result of some race. One row provides a result of a particular athlete.

country	place	time	year_of_birth	main_sponsor
USA	2	10.5	1997	1
USA	3	11.5	1995	2
UK	1	8.3	1996	1
China	4	13.2	1997	3

Here main_sponsor column represents an id of a company that supports the particular athlete. Actual names of these companies are unknown to the researcher.

To which scales (nominal, ordinal, interval, ratio) the varibles of this dataframe belong?

Problem 2

Given the set of numbers: {1, 5, 122, 12.3, 9, 103, 15}, use R to calculate, the following descriptive statistics: mean, median, variance, standard deviation.

Provide your answers and the code you used.

Problem 3

Choose all correct statements:

Standard deviation is a square root of variance.
Variance is a square root of standard deviation.
Variance is a square of standard deviation.
Standard deviation is a square of variance.

Problem 4

The following histogram corresponds to some population (i.e. the numbers written on balls in the basket). Let us draw one value from this population (i.e. pick a random ball from the basket and record the number written on it). Denote this value by \(X\) (this is in fact random variable and the histogram represents its distribution). Find the following probabilities:

\(P(X\ge 0)\) (i.e. probability that \(X\) is greater than or equal to 0)
\(P(X>3)\) (i.e. probability that \(X\) is greater than 3)
\(P(X<3.5)\)
\(P(X=3)\)
\(P(X\ge 4)\)
\(P(X>9)\)

Problem 5

The following histogram is similar to one from the previous problem but corresponds to different population. Let us draw one number from this population and denote it by \(Y\).

Find the probability \(P(Y > 2)\).
Estimate the median of numbers in the basket.

Explain your answers.

Problem 6

Some population is fixed. We draw many random samples from this population. (Every sample is drawn ”with replacements”: we return each ball to the basket before choosing the next one.) The size of each sample is \(k\). Then we calculate mean of each sample and plot the histogram of means. After that we repeat the same procedure for different \(k\) (with the same population) and draw another histogram, and so on.

By doing so four times we obtain the following four figures. One row corresponds to one histogram and one value of \(k\): the only difference between left and right images is horizontal rescaling.

It is known that the values of \(k\) were selected from the set 25, 100, 400, 900.

For every figure, find the corresponding \(k\). Explain your answers.
Estimate the mean of the population (in other terms, the expected value of our random variable).