GE193 - Course Introduction

Review Session

Dr Robert Batzinger
Instructor Emeritus

8/15/22

1 Quantitative decision making

  • Numbers used to find a goal:

    • Maximum
    • Minimum
    • Significant difference (using t-test scores)

2 Backpack Problem

  • Objective: Maximize the value given the constraints
  • Algorithm: load the highest value items first

2.1 Search for maximum

2.2 Numerical represent

  • A measurement
  • Recognized value system: such as money value
  • Relative preference: Likert 5 point scale
  • Ranking
  • Composite score

2.3 Common Scales Modeling used for modeling

2.4 Descriptive statistics()

  • summarizes and organizes characteristics of a data set

  • provide understanding of the distribution, central tendency, and variability of the data

  • uses tables or graphs to visualize the data

  • Additional details:

    • https://www.scribbr.com/statistics/descriptive-statistics/)

    • https://www.investopedia.com/terms/d/descriptive_statistics.asp

2.5 Dataset

  • A collection of data that is
  • A random sampling
  • Accuracy represents the population being studied

Mercury content of fish from Florida rivers or lakes (measured as mg/kg)

\[\begin{matrix} 1.230 &1.330 &0.040 &0.044 &1.200 &0.270 \\ 0.490 &0.190 &0.830 &0.810 &0.710 &0.500 \\ 0.490 &1.160 &0.050 &0.150 &0.190 &0.770 \\ 1.080 &0.980 &0.630 &0.560 &0.410 &0.730 \\ 0.590 &0.340 &0.750 &0.870 &0.560 &0.170 \\ 0.180 &0.190 &0.040 &0.490 &1.100 &0.160 \\ 0.100 &0.210 &0.860 &0.520 &0.650 &0.270 \\ 0.940 &0.400 &0.430 &0.250 &0.270 &\\ \end{matrix}\]

2.6 Frequency distribution:

  • summarizes the frequency of each value or category of a variable.
  • Works for categorical (qualitative) and numerical (quantitative) variables.

2.7 Visualization

Bar chart

Pie chart

Histogram

2.8 Visualization

Box Plot

Violin Plot

2.9 Basic description

Quantitative Stats

  • Mean, Median, Mode
  • Distribution

Qualitative Stats

  • Distribution/ Clusters
  • Tabulation

\[\small\begin{matrix} & Heavy & Light & Non- \\ Factors & Smoker & Smoker & smoker\\ Cancer & 20 & 9 & 5 \\ Cancerfree & 40 & 30 & 60 \\ \end{matrix}\]

2.10 Basic description

2.11 Measures of variability:

  • Range: minimum to maximum

  • Quartiles values: \[x[n/4], x[n/2], x[2n/4]\]

  • Standard deviation: \[\sigma = \sqrt{\sum (\bar x - x)^2 / (n - 1)}\]

  • Standard error of the mean: \[SE = \sigma/\sqrt{n}\]

2.12 Comparisons of subpopulations

Qualitative

  • t- test:

\[t = \frac{diff}{SE}\]

Quantitative

  • Chi Square test:

\[\chi^2 = \sum\frac{(obs - exp)^2}{exp}\]

3 t-test

\[\displaystyle{t=\frac{difference}{standard\ error}}\]

3.1 t test - One sample

\[{\displaystyle t={\frac {{\bar {x}}-\mu _{0}}{ \frac{s}{\sqrt {n}}}} = \frac{\Delta x}{\frac{\sigma}{\sqrt{n}}}}\]

\[df = n - 1\]

3.2 2 Sample of equal size equal variance and size

\[{\displaystyle t={\frac {{\bar {X}}_{1}-{\bar {X}}_{2}}{s_{p}{\sqrt {\frac {2}{n}}}}}}={\frac {{\bar {X}}_{1}-{\bar {X}}_{2}}{\sqrt {\frac {s_{1}^{2}+s_{2}^{2}}{n}}}}\]

\[{\displaystyle s_{p}={\sqrt {\frac {s_{1}^{2}+s_{2}^{2}}{2}}}}\] \[df= 2n - 2\]

3.3 2 Sample t-test

\[t = \frac{\bar x_1 - \bar x_2}{s_p\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\]

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 +(n_2-1)s_2^2}{n_1+n_2-2}}\]

\[df = n_1 + n_2 -2\]

3.4 Welch’s t-test

\[t = \frac{\quad\bar x_1 - \bar x_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

\[{\displaystyle \mathrm {d.f.} ={\frac {\left({\frac {s_{1}^{2}}{n_{1}}}+{\frac {s_{2}^{2}}{n_{2}}}\right)^{2}}{{\frac {\left(s_{1}^{2}/n_{1}\right)^{2}}{n_{1}-1}}+{\frac {\left(s_{2}^{2}/n_{2}\right)^{2}}{n_{2}-1}}}}} \]

4 Minimum sample size

\[\displaystyle \left[t = \frac{\Delta x}{\frac{\sigma}{\sqrt{n}}}\right] \quad \longrightarrow\quad \left[n = \left(\frac{t \sigma}{\Delta x}\right)^2\right]\] \[n = \left(\frac{t\sigma}{M}\right)^2\]

5 Statistical issues

Correlation does not equal casuation

Type I and Type II error

5.1 Decision making:

  • Comparison of values on a shared standard
  • Statistically significant differences
  • Weighted scores to change emphasis
  • Backpack algorithm

5.2 shortcomings of datasets

  • Biased data: usually due to sampling methods
  • Biased weighting: choosing weights to force an outcome
  • Shifting values over time: grading system is not standardized or permenant
  • Incomplete data about characteristics: missing key aspects of the problem
  • Incomplete understanding of the relationships between factors: factors are not independant and influence each other

5.3 Decision mechanisms:

  • Letting others decide: often good place to start in an unknown field
  • Micromanaged, structured decision tree: allows for successful completion of a difficult process
  • Random search: helps to discover new solutions
  • Goal seeking: decisions in the flow towards an objective
  • Information-based decision making: iterative development process trying new things and evaluating the outcomes

5.4 Usefulness of statistics

6 Correlation vs Regression

  • Correlaton R value
  • Goodness of fit
  • Types of relationships

6.1 Correlation

7 Gas prices in Thailand

8 Linear Regression


Call:
lm(formula = evo95 ~ wks)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.88969 -0.35508 -0.08585  0.39108  1.22185 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 31.45700    0.21288  147.77   <2e-16 ***
wks          0.35577    0.01432   24.84   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5163 on 23 degrees of freedom
Multiple R-squared:  0.9641,    Adjusted R-squared:  0.9625 
F-statistic: 617.3 on 1 and 23 DF,  p-value: < 2.2e-16

9 Normal Population: x = 50 +/- 15

10 Uniform Population for 0-100

11 Parento Distribution

11.1 Exam

  • GE193: 12 May 2023: 13:00-16:00, PC401

I hope this outline is helpful for you.😊