GE193 - Course Introduction

Equation Review

Dr Robert Batzinger
Instructor Emeritus

2/12/23

1 Descriptive Statistics

1.1 Average

\[\bar x = \frac{\sum x_i}{n}\]

\[\eqalign{a &=& \left[4,4,4,5,5,5,5,6\right]\\ \bar a &=& \frac{\left(4+4+4+5+5+5+5+6\right)}{8}=\frac{38}{8} = 4.75\\ b &=& \left[4,5,5,6,6,7,7,8\right]\\ \bar b &=& \frac{\left(4+5+5+6+6+7+7+8\right)}{8}=\frac{48}{8}=6.0\\ }\]

1.2 Standard deviation

\[\eqalign{\displaystyle\sigma &=& \sqrt{\frac{\sum (x - \bar x)^2}{n-1}}\\ &=&\sqrt{\frac{\sum x^2 - \left(\sum x\right)^2/n}{n-1}}\\}\]

1.3 Method 1: Group A: Sums of deviations

Step 1 Calculating sums

id x \(\bar x\) \(x- \bar x\) \((x- \bar x)^2\)
a1 4 4.75 -0.75 0.5625
a2 4 4.75 -0.75 0.5625
a3 4 4.75 -0.75 0.5625
a4 5 4.75 0.25 0.0625
a5 5 4.75 0.25 0.0625
a6 5 4.75 0.25 0.0625
a7 5 4.75 0.25 0.0625
a8 6 4.75 1.25 1.5625
x 38 \(\sum dev^2\) 3.5

Step 2: Calculating Std Dev

\[\eqalign{ \sum x &=& 38\\ n&=& 8\\ average&=& \left(\sum{x}\right) = 38/8 = 4.75\\ df &=& n-1 = 7\\ std\ dev &=& \sqrt{\sum (dev^2)/df} \\ &=& \sqrt{3.5/7} = 0.707107\\}\]

1.4 Method 1: Group B: Sums of deviations

Step 1 Calculating sums

id x \(\bar x\) \(x- \bar x\) \((x- \bar x)^2\)
b1 4 6 -2 4
b2 5 6 -1 1
b3 5 6 -1 1
b4 6 6 0 0
b5 6 6 0 0
b6 7 6 1 1
b7 7 6 1 1
b8 8 6 2 4
x 48 \(\sum dev^2\) 12

Step 2: Calculating Std Dev

\[\eqalign{ \sum x &=& 48\\ n&=& 8\\ average&=& \left(\sum{x}\right)/n\\ &=& 48/8 = 6 \\ df &=& n-1 = 7\\ std\ dev &=& \sqrt{\sum (dev^2)/df} \\ &=& \sqrt{12/7} = 1.309307\\}\]

1.5 Method 2: Group A: Sum of squares

Step 1 Calculating sums

id x \(x^2\)
a1 4 16
a2 4 16
a3 4 16
a4 5 25
a5 5 25
a6 5 25
a7 5 25
a8 6 36
sum 38 184

Step 2: Calculating Std Dev

\[\eqalign{ \sum x &=& 38\\ n&=& 8\\ \sum x^2 &=& 184\\ \left(\sum x\right)^2/n&=& 38^2/8 = 180.5\\ difference&=& 184-180.5 =3.5\\ df &=& n-1 = 8-1 = 7\\ std\ dev &=& \sqrt{difference/df} \\ &=& \sqrt{3.5/7} = 0.707107\\}\]

1.6 Method 2: Group B: Sum of squares

Step 1 Calculating sums

id x \(x^2\)
b1 4 16
b2 4 16
b3 4 16
b4 5 25
b5 5 25
b6 5 25
b7 5 25
b8 6 36
sum 48 300

Step 2: Calculating Std Dev

\[\eqalign{ \sum x &=& 48\\ n&=& 8\\ \sum x^2 &=& 300 \\ \left(\sum x\right)^2/n&=& 48^2/8= 2304/8 = 288\\ difference&=& 300 - 288 = 12\\ df &=& n-1 = 8-1 = 7\\ std\ dev &=& \sqrt{difference/df} \\ &=& \sqrt{12/7} = 1.309307\\}\]

1.7 By computer function

a = c(4,4,4,5,5,5,5,6)
cat("a =", mean(a),"+/-", sd(a))
a = 4.75 +/- 0.7071068
b = c(4,5,5,6,6,7,7,8)
cat("b =", mean(b),"+/-", sd(b))
b = 6 +/- 1.309307

1.8 Data points vs Population modelling

1.9 Normal distribution

\[f\left(x\right) = \frac{1}{\sqrt{2\pi \sigma}}e^{-\frac{\left(x-\mu\right)^2}{2\sigma^2}}\]

2 Standard error of the mean

\[se = \frac{std\ dev}{\sqrt{n}}\]

se a̅  = 0.7071068 / 2.828427 = 0.25 
se b̅  = 1.309307 / 2.828427 = 0.46291

2.1 SE vs Std Dev

2.2 Standard deviations

2.3 Sample size

3 t-test

\[\displaystyle{t=\frac{difference}{standard\ error}}\]

3.1 t test - One sample

\[{\displaystyle t={\frac {{\bar {x}}-\mu _{0}}{ \frac{s}{\sqrt {n}}}} = \frac{\Delta x}{\frac{\sigma}{\sqrt{n}}}}\]

\[df = n - 1\]

3.2 2 Sample of equal size equal variance and size

\[{\displaystyle t={\frac {{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}{\sqrt {\frac {2}{n}}}}}}={\frac {{\bar {X}}_{1}-{\bar {X}}_{2}}{\sqrt {\frac {s_{1}^{2}+s_{2}^{2}}{n}}}}\]

\[{\displaystyle s_{p}={\sqrt {\frac {s_{1}^{2}+s_{2}^{2}}{2}}}}\] \[df= 2n - 2\]

3.3 2 Sample t-test

\[t = \frac{\bar x_1 - \bar x_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\]

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 +(n_2-1)s_2^2}{n_1+n_2-2}}\]

\[df = n_1 + n_2 -2\]

3.4 Welches t-test

\[t = \frac{\quad\bar x_1 - \bar x_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

\[{\displaystyle \mathrm {d.f.} ={\frac {\left({\frac {s_{1}^{2}}{n_{1}}}+{\frac {s_{2}^{2}}{n_{2}}}\right)^{2}}{{\frac {\left(s_{1}^{2}/n_{1}\right)^{2}}{n_{1}-1}}+{\frac {\left(s_{2}^{2}/n_{2}\right)^{2}}{n_{2}-1}}}}} \]

4 Minimum sample size

\[\displaystyle \left[t = \frac{\Delta x}{\frac{\sigma}{\sqrt{n}}}\right] \quad \longrightarrow\quad \left[n = \left(\frac{t \sigma}{\Delta x}\right)^2\right]\] \[n = \left(\frac{t\sigma}{M}\right)^2\]

4.1 Simple experimental design

To have 95% confidence on experiments with only 0.5 \(\sigma\) separation:

\[n = t^2\left(\frac{\sigma}{\Delta x}\right)^2=1.96^2\left(\frac{\sigma}{0.5\sigma}\right)^2=15.3664=16\]

4.2 Example

120 people work at Company Q, 85 of which drink coffee daily, find the 99% confidence interval of the true proportion of people who drink coffee at Company Q on a daily basis.

\[\eqalign{CI &=& \hat p \pm z \times \sqrt{\frac{p(1-p)}{n}}\\ &=&\frac{85}{120} \pm 2.58 \times \sqrt{\frac{\frac{85}{120}\left(1 - \frac{85}{120}\right)}{120}}= 92.83\\}\]

4.3 Unlimited population

Sample size necessary to estimate the proportion of people in a supermarket that identify as vegan with 95% confidence, and a margin of error of 5%. Assume a population proportion of 0.5, and unlimited population size.

\[n=\frac{z^2 \times \hat p(1- \hat p)}{\epsilon^2}=1.96^2 \times \frac{0.5(1-0.5)}{0.05^2}=384.16\]

4.4 Finite population

Sample size needed to estimate the proportion of German speakers in a college with 95% confidence, and a margin of error of 5%. Assume a population proportion of 0.2, and population of 1,200 students.

\[\eqalign{n &=& \frac{\frac{z^2 \times \hat p(1- \hat p)}{\epsilon^2}}{1+\frac{z^2 \times \hat p(1- \hat p)}{\epsilon^2 N}} = \frac{\frac{1.96^2 \times 0.2(1- 0.2)}{0.05^2}}{1+\frac{1.96^2 \times 0.2(1- 0.2)}{0.05^2 \times 1200}}\\ &=&\frac{\frac{3.84\times 0.16}{0.0025}}{1+\frac{3.84\times 0.16}{0.0025\times 1200}} = \frac{245.76}{3.6144} = 67.995 \\ }\]

4.5 Variables

  • z is the z score
  • ε is the margin of error
  • N is the population size
  • p̂ is the population proportion

4.6 Yamane FORMULA

The variables in this formula are:

\[\eqalign{n &=& sample\ size\\ N &=& population\ of\ the\ study\\ e &=& margin\ of\ error\ in\ the\ calculation\\ }\]

\[n=\frac{N}{(1 + N e^2)}\]

5 CHI SQUARE TEST

\[{\Large\sum} \left(\frac{(observed-expected)^2}{expected}\right)\]

5.1 One dimension

  • Observed: Heads: 3, Tails: 7
  • Expected: Heads: 5, Tails: 5

\[\chi^2 = \frac{(3-5)^2}{5} + \frac{(7-5)^2}{5} = \frac{4}{5} + \frac{4}{5} = 1.6\]

df = 1 ; p = 0.7940968

5.2 Contingency Table with rows and columns

\[\chi ^{2}=\sum _{{i=1}}^{{r}}\sum _{{j=1}}^{{c}}{(O_{{i,j}}-E_{{i,j}})^{2} \over E_{{i,j}}}\]

\[df = r + c - 2\]

6 Exam Questions: Ht

male: 172 175 175 180 180 180 185 195 
female: 147 158 160 160 160 164 165 171 173 173 

    Welch Two Sample t-test

data:  m and f
t = 4.7748, df = 15.725, p-value = 0.0002165
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  9.524862 24.775138
sample estimates:
mean of x mean of y 
   180.25    163.10 

6.1 Shoesize

male: 41 42 42 44 42 42 42 46 
female: 36 37 40 38 38.5 39 37 40 38 40 

    Welch Two Sample t-test

data:  ms and fs
t = 5.9314, df = 14.194, p-value = 3.461e-05
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 2.731133 5.818867
sample estimates:
mean of x mean of y 
   42.625    38.350 

6.2 Dating Behaviour

       Phone EMail F2F
Male      27    52  95
Female    35    48  65

    Pearson's Chi-squared test

data:  dta
X-squared = 4.7488, df = 2, p-value = 0.09307

6.3 Raw counts

Observed
       Phone EMail F2F
Male      27    52  95
Female    35    48  65

Expected
          Phone    EMail      F2F
Male   33.50311 54.03727 86.45963
Female 28.49689 45.96273 73.54037

6.4 Relative as Percentages

Observed
           Phone    EMail      F2F
Male    8.385093 16.14907 29.50311
Female 10.869565 14.90683 20.18634

Expected
           Phone    EMail      F2F          
Male   10.404691 16.78176 26.85082  54.03727
Female  8.849967 14.27414 22.83863  45.96273
       19.254658 31.05590 49.68944 100.00000

6.5 Percentage (by row)

           V1    Phone    EMail      F2F
1   Male: Obs 15.51724 29.88506  54.5977
2 Female: Obs 23.64865 32.43243 43.91892
3    Expected 19.25466  31.0559 49.68944

6.6

\[ sd = \sqrt{\frac{\sum x^2 -\frac{(\sum{x})^2}{n}}{n-1}} =\sqrt{\frac{75-\frac{19^2}{5}}{4}} =0.84 \]

\[sd = \sqrt{\frac{\sum x^2 -\frac{(\sum{x})^2}{n}}{n-1}} = \sqrt{\frac{142 - \frac{24^2}{5}}{4}}=2.59\]

\[t = \frac{4.8-3.8}{\sqrt{\frac{0.84^2+2.59^2}{2}}}\]

7 PYU IC ITCLUB - Upcoming talk

AI Applications Learning via Service Robot Development

Dr. Jerry Tan,
Business Director

Lattel Robotics: A company promoting artificial intelligence-focused robotics education

  • Venue: 20th March 2023, 3PM-4PM, PC301

7.1 IT Club Calendar

7.2 Post Prison therapy

Result No Therapy Group Therapy Individual Therapy Total
Sent back to Prison <1yr 24 10 41 75
Back in Prison 1-10yrs 25 13 32 70
Never back in prison 12 20 9 41
Total 61 43 82 186