Descriptives Part 2

Rasim Muzaffer Musal

Gatos Curioses

I can invest in one of 2 portfolios. Portfolio A and B has the same return of 6% but Portfolio A has a variance of 2% whereas Portfolio B has a variance of 5%.
If you are a risk seeking person which one of these would you invest in?
If you are a risk avoiding person which one of these would you invest in?

Descriptives of Discrete Random Variables

So far the means and variances we have calculated assumed the random variable to be continuous. However continous data can be grouped together and either discretized or just grouped from discrete raw observations.

Discrete Variable Mean

\[E(X)=\sum_{i=1}^{i=N} P(X=x_{i})\times x_{i} \]

E(X) is expected value of X
P(X=x) is going to later be referred to as probability, for now it is percentage of random variable X taking the value x.
If it is not given how is it calculated?

\[P(X=x_{i})=\frac{\# x_{i}}{N} \]

-where N is the total number of observations

Discrete Variable Variance

\[Var(X)=\sigma^{2}=\sum_{i=1}^{i=N} P(X=x_i)\times(x_i-\mu_{x})^{2} \]

As you can see this is a similar operation to the discrete mean formulation. What is being multiplied with the weighing operator \(P(X=x_i)\) changes from the expected value calculation.
In the expected value calculation \(P(X=x_{i})\) is multiplied with \(x_{i}\) whereas in the variance calculation \(P(X=x_{i})\) is multiplied with \((x_{i}-\mu_{x})^{2}\)

Continous vs Discrete

Continuous and discrete mean as well as variance formulations are really equivalent to each other.
In the calculation of continuous mean, each value has the same weight of occurring. Therefore each value is multiplied with \(\frac{1}{N}\) whereas in discrete random variables this is replaced with \(P(X=x_{i})\) the relative frequency of time the value \(x_{i}\) has occurred.

Examples

(X=xi)	#	P(X=xi)
4	10	\(\frac{10}{50}\)=0.2
3	20	\(\frac{20}{50}\)=0.4
2	10	\(\frac{10}{50}\)=0.2
1	10	\(\frac{10}{50}\)=0.2
Sum	50	\(\frac{50}{50}\)= 1.00

Grades of individuals quantified between 1 to 4 and the number of times they have received that grade. \(x_{1}\) is 4, \(x_{2}\) is 3 and so forth.

Calculating the Mean for the grades of individuals

\[E(X)=\mu_{x}=\sum_{i=1}^{i=4} P(X=x_{i}) \times x_{i}\]

Note that here 4 represents the number of values that the grouped value x has.

Calculating the Mean for the grades of individuals

when \(i = 1\) the first operation is \(P(X=4)\times 4\) in the formulation. We then repeat this for i=2, i=3 and i=4. We proceed to sum all the values.
P(X=4) = 0.2, P(X=3)=0.4 and so forth.

\[\begin{aligned} \text{When } i=1==> P(X=x_{1}) \times x_{1}\\ \text{When } i=2==> P(X=x_{2}) \times x_{2}\\ \text{When } i=3==> P(X=x_{3}) \times x_{3}\\ \text{When } i=4==> P(X=x_{4}) \times x_{4} \\ \end{aligned}\]

Calculating the Mean for the grades of individuals

\[\begin{aligned} \text{When } i=1==> 0.2 \times 4 =0.8\\ \text{When } i=2==> 0.4 \times 3 = 1.2\\ \text{When } i=3==> 0.2 \times 2 = 0.4\\ \text{When } i=4==> 0.2 \times 1 = 0.2 \\ \end{aligned}\]

Of course we need to sum them together as implied by the \(\sum_{i=1}^{i=4}\) operator 0.8+1.2+0.4+0.2=2.6

Calculating the Variance for the grades of individuals

\[Var(X)=\sigma^{2}_{x}=\sum_{i=1}^{i=N} P(X=x_{i}) \times (x_{i}-\mu_{X})^{2}\]

\[\begin{aligned} \text{When } i=1==> P(X=x_{1}) \times (x_{1} - \mu_{X})^{2}\\ \text{When } i=2==> P(X=x_{2}) \times (x_{2}- \mu_{X})^{2}\\ \text{When } i=3==> P(X=x_{3}) \times (x_{3}- \mu_{X})^{2}\\ \text{When } i=4==> P(X=x_{4}) \times (x_{4}- \mu_{X})^{2} \\ \end{aligned}\]

Do not forget to SUM

Calculating the Variance for the grades of individuals

\[Var(X)=\sigma^{2}_{x}=\sum_{i=1}^{i=N} P(X=x_{i}) \times (x_{i}-\mu_{X})^{2}\]

\[\begin{aligned} \text{When } i=1==> 0.2 \times (4 - 2.6)^{2}=0.2 \times 2.56\\ \text{When } i=2==> 0.4 \times (3- 2.6)^{2}=0.4 \times 0.16 \\ \text{When } i=3==> 0.2 \times (2- 2.6)^{2}=0.2 \times 0.36\\ \text{When } i=4==> 0.2 \times (1- 2.6)^{2}=0.2 \times 2.56 \\ \end{aligned}\]

Do not forget to SUM

Calculating the Variance for the grades of individuals

\(0.2 \times 2.56+ 0.4 \times 0.16+0.2 \times 0.36+0.2 \times 2.56 =1.16\)

If you wanted to calculate the standard deviation take the square root of 1.16. \(\sigma=\sqrt{1.16}=1.07\)
Variance or standard deviation by itself does not mean much. You should use it for comparison purposes. We will talk about it when we get to hypothesis tests but for now let us think about a couple of simple scenarios.

Gatos Curioses

Assume you are evaluating 2 candidates for an internal promotion. Candidate A and candidate B. They both have the same average performance through the 10 years in the company with a score of 4.2 out of 5. Candidate A has a variance of 2.1 and candidate B has a variance of 1.2. Which candidate should be preferred? What questions should you ask (obviously just think in terms of statistics).

Gatos Curioses

You are managing a call center. You are measuring average call times. Why could the variance across time be also important even if the average call time is approximately the same?

Gatos Curioses

You are managing a manufacturing process of chips. Would you like high/low variance in the process? How can you use it to improve quality control?