# If you want to work together, you can, just make sure you all understand it
# In task 4 the code should be sample(2*31,1), otherwise your computer will not finish calculating in your lifetime!
#
# Task 2:
# Use it again to produce 2x 10, 1000, and 100000 values (mean = 50, sd = 20).
# Use the functions mean(), median(), sd() on the vector and look at the result. Are
# they what you expect them to be?
# Make a histogram again each time.
#
# Mean and standard deviation
n_mean <- 50
n_stdev <- 100
# 10 random observations
n_observation <- 10
vec_norm_1 <- rnorm(n = n_observation, mean = n_mean, sd = n_stdev)
vec_norm_2 <- rnorm(n_observation, n_mean, n_stdev)
# mean, median, sd, hist #1
mean(vec_norm_1)
[1] 1.58238
median(vec_norm_1)
[1] -35.1225
sd(vec_norm_1)
[1] 99.06874
hist(vec_norm_1)

# mean, median, sd, hist #2
mean(vec_norm_2)
[1] 42.09803
median(vec_norm_2)
[1] 29.23854
sd(vec_norm_2)
[1] 103.8325
hist(vec_norm_2)

Task 2: Answer
Ten observations using rnorm with mean = 50 and median = 100. With
only 10 observations, we didn’t get a symmetric normal distribution
histogram.
# 1000 random observations
n_observation <- 1000
vec_norm_1 <- rnorm(n_observation, n_mean, n_stdev)
vec_norm_2 <- rnorm(n_observation, n_mean, n_stdev)
# mean, median, sd, hist #1
mean(vec_norm_1)
[1] 52.67748
median(vec_norm_1)
[1] 48.65339
sd(vec_norm_1)
[1] 100.0692
hist(vec_norm_1)

# mean, median, sd, hist #2
mean(vec_norm_2)
[1] 42.3473
median(vec_norm_2)
[1] 44.35544
sd(vec_norm_2)
[1] 98.47205
hist(vec_norm_2)

Task 2: Answer:
Thousand observations using rnorm with mean = 50 and median = 100.
Increase of observations to a thousand produced a histogram quite
similar to a normal distribution.
# 100,000 observations
n_observation <- 100 * 1000
vec_norm_1 <- rnorm(n_observation, n_mean, n_stdev)
vec_norm_2 <- rnorm(n_observation, n_mean, n_stdev)
# mean, median, sd, hist #1
mean(vec_norm_1)
[1] 50.29052
median(vec_norm_1)
[1] 50.02448
sd(vec_norm_1)
[1] 99.88641
hist(vec_norm_1)

# mean, median, sd, hist #2
mean(vec_norm_2)
[1] 50.41016
median(vec_norm_2)
[1] 50.2367
sd(vec_norm_2)
[1] 100.1026
hist(vec_norm_2)

Task 2: Answer:
Hundred thousand observations using rnorm with mean = 50 and median =
100. Increase of observations to one hundred thousand produced a
symmetric histogram of a normal distribution.
#Task 3:
#Research the function sample(). It can get a vector to sample (draw from). What does
#replace = TRUE do?
# Use it on a vector 1:100 with replace = TRUE, drawing 10, 100, 1000 elements from.
#Use a histogram to determine what distribution underlies the sampling. What is the
#difference to the function you have used before?
#
# Vector of numbers from 1 to 100
vec_nums <- c(1:100)
# 10 samples
n_sample <- 10
vec_sample <-sample(vec_nums, n_sample, replace = TRUE)
hist(vec_sample)

# 100 samples
n_sample <- 100
vec_sample <-sample(vec_nums, n_sample, replace = TRUE)
hist(vec_sample)

# 1000 samples
n_sample <- 1000
vec_sample <-sample(vec_nums, n_sample, replace = TRUE)
hist(vec_sample)

Task 3: Answer:
Replace = True means allow the item to be picked up again when
sampling. If it is false then the result value of a sampling will not be
picked again in the same sampling operation. As we increased the number
of samples from ten to a thousand, the histogram become more and more
similar to a uniform distribution.
Using the rnorm function, we get random values in a normal
distribution and the numbers closer to mean are more likely to be picked
up than the numbers away from it. But with the sample function, we get a
uniform distribution so each number is equally likely to be picked
up.
# Task 4:
# Use the command sample(2*31) to create your own seed and give it out.
#Set this seed with set.seed().
#Use runif(10) and look at the values. Do runif(10) again. Are they same?
# Use set.seed() again with the same seed and do runif(10) again.
#Do you understand the idea behind seeds now?
#
# Vector for the seed value
v_seed = sample(2*31)
set.seed(v_seed)
# First runif(10)
v_unif_first <- runif(10)
v_unif_first
[1] 0.09878282 0.48823179 0.36403673 0.42061913 0.30096439 0.14763513 0.89857491 0.22355651 0.96596330 0.14106704
# Second runif(10)
v_unif_second <- runif(10)
v_unif_second
[1] 0.06535501 0.39725471 0.54399676 0.87963113 0.22338939 0.92094417 0.25146172 0.83046382 0.60815737 0.43135087
# Set seed again
set.seed(v_seed)
# Final runif(10)
v_unif_final <- runif(10)
v_unif_final
[1] 0.09878282 0.48823179 0.36403673 0.42061913 0.30096439 0.14763513 0.89857491 0.22355651 0.96596330 0.14106704
Task 4: Answer:
Setting the seed value allowed us to get the same random numbers
again using same seed value. If we do not set the seed value then the
numbers we get will be randomly different in each time and those numbers
have equal probability of being selected, following a uniform
distribution pattern.
# Task 5:
# Research the function pnorm(). Can you summarize in your own words what it does?
#
# pnorm function
# This function returns the value of the cumulative density function (cdf) of the normal distribution given a certain random variable q, a population mean μ, and the population standard deviation σ.
# Example 1
# Find the percentage of males taller than 78 inches in a population with mean = 74 and sd = 2.
pnorm(78, mean = 74, sd = 2, lower.tail = FALSE)
[1] 0.02275013
# Example 2
# Find the percentage of otters that weight less than 33 lbs in a population with mean = 40 and sd = 8. Let’s see the following code example.
pnorm(33, mean=40, sd = 8)
[1] 0.190787
Task 5: Answer:
The pnorm function allows us to find percentage of observations in
normal distribution, less than a given criteria value which we already
know the mean and standard deviation of this criteria in the
population.
---
title: "W02 Tasks"
output: html_notebook
---


```{r message=FALSE, warning=FALSE}
# If you want to work together, you can, just make sure you all understand it
# In task 4 the code should be sample(2*31,1), otherwise your computer will not finish calculating in your lifetime!
#
# Task 2:
# Use it again to produce 2x 10, 1000, and 100000 values (mean = 50, sd = 20).
# Use the functions mean(), median(), sd() on the vector and look at the result. Are
# they what you expect them to be?
#   Make a histogram again each time.
#
# Mean and standard deviation
n_mean  <- 50
n_stdev <- 100
```

```{r message=FALSE, warning=FALSE}
# 10 random observations
n_observation <- 10
vec_norm_1 <- rnorm(n = n_observation, mean = n_mean, sd = n_stdev)
vec_norm_2 <- rnorm(n_observation, n_mean, n_stdev)
# mean, median, sd, hist #1
mean(vec_norm_1)
median(vec_norm_1)
sd(vec_norm_1)
hist(vec_norm_1)
# mean, median, sd, hist #2
mean(vec_norm_2)
median(vec_norm_2)
sd(vec_norm_2)
hist(vec_norm_2)
```

::: {.infobox data-latex="caution"}
**Task 2: Answer**

Ten observations using rnorm with mean = 50 and median = 100. 
With only 10 observations, we didn't get a symmetric normal distribution histogram.
:::

```{r message=FALSE, warning=FALSE}
# 1000 random observations
n_observation <- 1000
vec_norm_1 <- rnorm(n_observation, n_mean, n_stdev)
vec_norm_2 <- rnorm(n_observation, n_mean, n_stdev)
# mean, median, sd, hist #1
mean(vec_norm_1)
median(vec_norm_1)
sd(vec_norm_1)
hist(vec_norm_1)
# mean, median, sd, hist #2
mean(vec_norm_2)
median(vec_norm_2)
sd(vec_norm_2)
hist(vec_norm_2)
```
::: {.infobox data-latex="caution"}
**Task 2: Answer:**

Thousand observations using rnorm with mean = 50 and median = 100. 
Increase of observations to a thousand produced a histogram quite similar to a normal distribution.
:::

```{r message=FALSE, warning=FALSE}
# 100,000 observations
n_observation <- 100 * 1000
vec_norm_1 <- rnorm(n_observation, n_mean, n_stdev)
vec_norm_2 <- rnorm(n_observation, n_mean, n_stdev)
# mean, median, sd, hist #1
mean(vec_norm_1)
median(vec_norm_1)
sd(vec_norm_1)
hist(vec_norm_1)
# mean, median, sd, hist #2
mean(vec_norm_2)
median(vec_norm_2)
sd(vec_norm_2)
hist(vec_norm_2)
```
::: {.infobox data-latex="caution"}
**Task 2: Answer:**

Hundred thousand observations using rnorm with mean = 50 and median = 100. 
Increase of observations to one hundred thousand produced a symmetric histogram of a normal distribution.
:::

```{r message=FALSE, warning=FALSE}
#Task 3:
#Research the function sample(). It can get a vector to sample (draw from). What does
#replace = TRUE do?
#  Use it on a vector 1:100 with replace = TRUE, drawing 10, 100, 1000 elements from.
#Use a histogram to determine what distribution underlies the sampling. What is the
#difference to the function you have used before?
#
# Vector of numbers from 1 to 100
vec_nums <- c(1:100)
# 10 samples
n_sample <- 10
vec_sample <-sample(vec_nums, n_sample, replace = TRUE)
hist(vec_sample)
# 100 samples
n_sample <- 100
vec_sample <-sample(vec_nums, n_sample, replace = TRUE)
hist(vec_sample)
# 1000 samples
n_sample <- 1000
vec_sample <-sample(vec_nums, n_sample, replace = TRUE)
hist(vec_sample)
```
::: {.infobox data-latex="caution"}
**Task 3: Answer:**

Replace = True means allow the item to be picked up again when sampling. If it is false then the result value of a sampling will not be picked again in the same sampling operation. As we increased the number of samples from ten to a thousand, the histogram become more and more similar to a uniform distribution.

Using the rnorm function, we get random values in a normal distribution and the numbers closer to mean are more likely to be picked up than the numbers away from it. But with the sample function, we get a uniform distribution so each number is equally likely to be picked up.
:::

```{r message=FALSE, warning=FALSE}
#  Task 4:
#  Use the command sample(2*31) to create your own seed and give it out.
#Set this seed with set.seed().
#Use runif(10) and look at the values. Do runif(10) again. Are they same?
#  Use set.seed() again with the same seed and do runif(10) again.
#Do you understand the idea behind seeds now?
#
# Vector for the seed value
v_seed = sample(2*31)
set.seed(v_seed)
# First runif(10)
v_unif_first  <- runif(10)
v_unif_first
# Second runif(10)
v_unif_second <- runif(10)
v_unif_second
# Set seed again
set.seed(v_seed)
# Final runif(10)
v_unif_final <- runif(10)
v_unif_final
```

::: {.infobox data-latex="caution"}
**Task 4: Answer:**

Setting the seed value allowed us to get the same random numbers again using same seed value. If we do not set the seed value then the numbers we get will be randomly different in each time and those numbers have equal probability of being selected, following a uniform distribution pattern.
:::

```{r message=FALSE, warning=FALSE}
#  Task 5:
#  Research the function pnorm(). Can you summarize in your own words what it does?
#  
# pnorm function
# This function returns the value of the cumulative density function (cdf) of the normal distribution given a certain random variable q, a population mean μ, and the population standard deviation σ.

# The percentage of males taller than 78 inches in a population with mean = 74 and sd = 2
pnorm(78, mean = 74, sd = 2, lower.tail = FALSE)
# The percentage of otters that weight less than 33 lbs in a population with mean = 40 and sd = 8.
pnorm(33, mean=40, sd = 8)
```

::: {.infobox data-latex="caution"}
**Task 5: Answer:**

The pnorm function allows us to find percentage of observations in normal distribution, less than a given criteria value which we already know the mean and standard deviation of this criteria in the population.
:::

```{css, echo=FALSE}
tr {
    border-bottom: 1px solid #dddddd;
}
tr:nth-of-type(even) {
    background-color: #f4f4f4; !important;
        -webkit-print-color-adjust: exact;
}
tr:last-of-type {
    border-bottom: 2px solid #009980;
}
.infobox {
  padding: 1em 1em 1em 5em;
  margin-bottom: 10px;
  border: 2px dotted orange;
  border-radius: 10px;
  background: #dcdcdc 1em center/3em no-repeat;
  
}
```