We have been given some values of \(X_i,Y_i\) for \(i=1,2.\) Using the formulas given to us we can calculate the slope and intercept accordingly.
X1 <- 3
X2 <- -4
Y1 <- 2
Y2 <- 100
b <- (Y1-Y2)/(X1-X2) #Slope
a <- (Y2*X1 - Y1*X2)/(X1 - X2) #Intercept
print(paste("Intercept is", a))
## [1] "Intercept is 44"
print(paste("Slope is",b))
## [1] "Slope is -14"
We can re-use the code from before, just change the numbers.
X1 <- 0
X2 <- -11
Y1 <- -2
Y2 <- -100
b <- (Y1-Y2)/(X1-X2) #Slope
a <- (Y2*X1 - Y1*X2)/(X1 - X2) #Intercept
print(paste("Intercept is", a))
## [1] "Intercept is -2"
print(paste("Slope is", round(b,6)))
## [1] "Slope is 8.909091"
set.seed(123) #For reproductiability
hist <- rnorm(500) #Simulate histogram
#Figure
hist(hist, main="Histogram of Simulated Data", xlab="x", col="red")
#Median
summary(hist)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.66092 -0.57463 0.02072 0.03459 0.68521 3.24104
print(paste("Median is", round(median(hist), 4)))
## [1] "Median is 0.0207"
set.seed(123) #For reproductiability
X <- rnorm(20) #Simulate histogram
barX <- mean(X) #Average of X
tildeX <- X - barX #X Tilde
#Verify that average of tildeX is zero by eyeing result (looks like zero)
print(paste("tilde X is on average", mean(tildeX)))
## [1] "tilde X is on average 2.77013655070046e-18"
set.seed(123) #For reproductiability
X <- rnorm(20) #Simulate histogram
barX <- mean(X) #Average of X
sX <- sd(X)
tildeX <- (X - barX)/sX #X Tilde
#Verify that average of tildeX is zero
print(paste("tilde X is on average", mean(tildeX)))
## [1] "tilde X is on average 1.50107790780618e-17"
#Verify that sd of tildeX is 1
print(paste("Sd of tilde X is on average", sd(tildeX)))
## [1] "Sd of tilde X is on average 1"
To see that the result remains the same when doing it a couple of times we can either run a loop or just manually do it. For efficiency, I will run a loop - which also makes it possible to change the number of times we test it, and consequently strengthen our belief that this is true always, and for every sequence of numbers \(X_1,X_2,...,X_n.\)
n <- 5 #Number of repetitions
#Run loop
for (i in 1:n){
#Simulate standard normal
X <- rnorm(20)
#Compute mean and sd
barX <- mean(X)
sX <- sd(X)
#Compute tildeX
tildeX <- (X-barX)/sX
#Print result for each round
print(paste("Repetition", i, ":"))
print(paste("Mean of tildeX:", mean(tildeX)))
print(paste("Sd of tildeX:", sd(tildeX)))
}
## [1] "Repetition 1 :"
## [1] "Mean of tildeX: -2.62891921773423e-17"
## [1] "Sd of tildeX: 1"
## [1] "Repetition 2 :"
## [1] "Mean of tildeX: 1.90819582357449e-17"
## [1] "Sd of tildeX: 1"
## [1] "Repetition 3 :"
## [1] "Mean of tildeX: -1.31752586813617e-17"
## [1] "Sd of tildeX: 1"
## [1] "Repetition 4 :"
## [1] "Mean of tildeX: -2.33103467084383e-18"
## [1] "Sd of tildeX: 1"
## [1] "Repetition 5 :"
## [1] "Mean of tildeX: 9.03682510766668e-18"
## [1] "Sd of tildeX: 1"
As we can see, the mean varies a bit but is always close to zero. The standard deviation remains constant at one.
If you would like to try this with, say, \(n\) repetitions, I would suggest that you in the loop create two index variables (varies over \(i\)) that stores the mean and sd of tildeX in each repetition. This will give you two vectors with \(n\) entries. To confirm the result, you can, for instance, compute the average, of the averages, to see what the mean and sd is on average. Doing this would give you a number close to zero and precisely one for the mean and sd respectively.
To figure out what the code is doing, let us add comments to the code and in that way get an understanding of what is going on.
n <- 15 #Number of observations
x <- 0.5 #Value of x variable
series <- x^(0:n) #Creates a variable that vill have entires 0.5^(0 to 15)
#Gives values 1, 0.5, 0.25, 0.125,...=0.5^0, 0.5^1, 0.5^2,...
series
## [1] 1.000000e+00 5.000000e-01 2.500000e-01 1.250000e-01 6.250000e-02
## [6] 3.125000e-02 1.562500e-02 7.812500e-03 3.906250e-03 1.953125e-03
## [11] 9.765625e-04 4.882812e-04 2.441406e-04 1.220703e-04 6.103516e-05
## [16] 3.051758e-05
sum(series) #LHS of equation
## [1] 1.999969
(1-x^(n+1))/(1-x) #RHS of equation
## [1] 1.999969
Since this is user specific, I leave it for you to do on your own :).
The mean is can be viewed as a weighted average \[\bar{X}_n=\sum_{i=1}^n w_iX_i,\] where \(w_i\) is the share of the population associated with that particular \(i=1,...,n.\) Hence, \[\bar{X}_{200}=\frac{100}{200}\cdot 180+\frac{100}{200}\cdot 178=179.\]
Use the weighted average formula again: \[\bar{X}_{200}=\frac{100}{200}y+\frac{100}{200}z=\frac{y+z}{2}.\]
Following the formula again: \[\bar{X}_n=\frac{30}{100}y+\frac{70}{10}z.\]
Follwing the formula: \[\bar{X}_n=\frac{a}{n}y+\frac{n-a}{n}z.\]
Following the hint, it is easy to see that \[\bar{X}_{1:n}=\frac{1}{n-1+1}\sum_{i=1}^nX_i=\frac{1}{n}\sum_{i=1}^nX_i=\bar{X}_n,\] where the last equality follows from the fact that the LHS expression is the definition of the mean.
Set \(u=v\), then \[\bar{X}_{u:u}=\frac{1}{u-u+1}\sum_{i=u}^uX_i=X_u.\] The last equality holds from the fact that the sum from on number to the same number is just that number.
It is just a generalization. It is actually straightforward to generalize a sum into the sum of its parts. For instance, consider the following sum \[\sum_{i=1}^{10}X_i=1+2+3+....+10.\] It is obvious that this is the same as writing:
\[ \sum_{i=1}^{10}X_i=\sum_{i=1}^3X_i+\sum_{i=4}^6X_i+\sum_{i=7}^{10}X_i=\\\ =(1+2+3) + (4+5+6) + (7+8+9+10). \]
It should not come as a surprise that this partitioning of the sum can be done in even smaller intervals.
*) From the RHS: \[\frac{a}{n}\frac{1}{a}\sum_{i=1}^aX_i=\frac{1}{n}\sum_{i=1}^aX_i.\]
**) From the RHS: \[\frac{b-a}{n}\frac{1}{b-a}\sum_{i=a+1}^bX_i=\frac{1}{n}\sum_{i=a+1}^bX_i.\]
***) From the RHS \[\frac{n-b}{n}\frac{1}{n-b}\sum_{i=b+1}^nX_i=\frac{1}{n}\sum_{i=b+1}^nX_i.\]
\[\begin{align} \bar{X}_n&=\sum_{i=1}^nw_iX_i=\sum_{i=1}^aw_iX_i+\sum_{a+1}^bw_iX_i+\sum_{b+1}^nw_iX_i \\ &= \frac{a}{n}\bar{X}_{1:a}+\frac{b-a}{n}\bar{X}_{a+1:b}+\frac{n-b}{n}\bar{X}_{b+1:n}. \end{align}\]
Following the hints, doing some simplifications, and applying the definition of the mean, we can obtain the expression. That is,
\[\begin{align} S_X^2&=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X}_n)^2 \\ &= \frac{1}{n-1}\sum_{i=1}^n\left[X_i^2-2X_i\bar{X}_n+\bar{X}_n^2 \right] \\ &= \frac{1}{n-1}\left[\sum_{i=1}^nX_i^2-2\bar{X}_n\sum_{i=1}^nX_i+\sum_{i=1}^n\bar{X}_n^2 \right]\\ &= \frac{1}{n-1}\left[\sum_{i=1}^nX_i^2-2n\bar{X}_n^2+n\bar{X}_n^2 \right]\\ &=\frac{1}{n-1}\sum_{i=1}^nX_i^2-\frac{n}{n-1}\bar{X}_n^2. \end{align}\]
We can just split as before, and then to compute the whole we just take the sum of the partitions That is, \[S_n^2 = S_{1:a} +S_{a+1:b}+S_{b+1:n}.\] Do we have to have weighting on these as well?
Given that the weights should add to one, and they are equally large and constant, we can just set \(w=1/n.\) Then we have \[\bar{Z}_n=\sum_{i=1}^n\frac{1}{n}Z_i=\frac{1}{n}\sum_{i=1}^nZ_i,\] which is precisely the definition of the classical mean.
Here it is just about setting the weights in a correct way. That is exactly what we argued in question E) (v.). So by setting the weights as \(w_1=a/n\), \(w_2=(b-a)/n\), and \(w_3=(n-b)/n\). To confirm that this works, we just need to see that \(w_1+w_2+w_3=1.\) That is, \[\frac{a}{n}+\frac{b-a}{n}+\frac{n-b}{n}=\frac{a+b-a+n-b}{n}=\frac{n}{n}=1,\] so we are fine! Thus, we can write \(\bar{X}_n\) as a weighted average of the three sums.
If the sequence \(X_1,...,X_n\) is split into \(M\) parts, each part will have its own average, say \(\bar{X}_1,\bar{X}_2,...,\bar{X}_M,\) and its own weight based on the number of elements in each part, say \(w_1,w_2,...,w_M.\) The general formula is therefore just the weighted average of all means: \[\bar{X}_n=\sum_{j=1}^Mw_j\bar{X}_j,\] where \(w_j\) is the proportion of the number of elements in the \(j\)-th part of the total number of elements. Thus, the general formula is just an average of all the individual specific means.