GE193 - Course Introduction

Spreadsheets and Data collection

Dr Robert Batzinger
Instructor Emeritus

1/12/23

0.1 GE193 Modules

  1. Course orientation
  2. Practical Mathematics for tidying data
  3. Statistical analysis for decision making
  4. Logical Data Analysis
  5. Midterm
  6. Probability for making Decision
  7. Decision making
  8. Final exam

1 Normal Distribution, n = 10000

\[x = 20 \pm 4\] \[\small\begin{matrix} std\ dev & difference & count \\ 1 & 0\leq x< 4 & 6757 \\ 2 & 4\leq x< 8 & 2739 \\ 3 & 8\leq x< 12 & 467 \\ 4 & 12\leq x< 16 & 37 \\ 5+ & 16\leq x & 0 \\ \end{matrix}\]

1.1 Normal Distribution, n=100

\[x = 20 \pm 4\] \[\small\begin{matrix} std\ dev & difference & count \\ 1 & 0\leq x< 4 & 73 \\ 2 & 4\leq x< 8 & 23 \\ 3 & 8\leq x< 12 & 4 \\ 4 & 12\leq x< 16 & 0 \\ 5+ & 16\leq x & 0 \\ \end{matrix}\]

2 Precision vs accurancy

\[20 \pm 8\]

\[25\pm 2\]

3 Measurements

3.1

3.2

3.3

3.4 Data measurements

3.5 Linear data

3.6 Mean

\[\eqalign{ \bar x &=& \sum_{i=1}^n \ x_i\\ data &=& [47, 40, 47, 59, 40] \\ sum &=& 233 \\ mean &=& \frac{233}{5} = 46.6\\ }\]

3.7 Standard Deviation

\[\eqalign{ std\ dev &=& \sqrt{\sum_{i=1}^n \frac{(x_i - \bar x)^2}{n-1}}=\sqrt{\frac{\displaystyle\sum_{i=1}^n x^2_i\ - n \bar x^2}{n-1}}\\ }\]

3.8 Standard Deviation

$$

\[\begin{matrix} (47 -46.6)^2 & 0.4^2 & 0.16\\ (40-46.6)^2 & 6.6^2 & 43.56\\ (47 -46.6)^2 & 0.4^2 & 0.16\\ (59 -46.6)^2 & 12.4^2 & 153.76\\ (40 -46.6)^2 & 6.6^2 & \underline{43.56}\\ & & 241.20 &\rightarrow & \sqrt{\frac{241.20}{4}} &=& 7.765\\ \end{matrix}\]

$$

4 Alternative method

\[\begin{matrix} 47 & 47^2 & 2209\\ 40 & 40^2 & 1600\\ 47 & 47^2 & 2209\\ 59 & 59^2 & 3481 \\ \underline{40} & 40^2 & \underline{1600}\\ 233 & & 11099 & 11099.0\\ 46.6& 46.6^2& 5(2171.56) & 10857.8 &\sqrt{\frac{241.2}{4}} &=& 7.765\\ \end{matrix}\]

5 Spreadsheet

https://www.libreoffice.org

6 Spreadsheet anatomy

LibreOffice

6.1 Comparing distribution types

6.2 Population size

6.3 Data range

6.4 Population size

6.5 Change in Variation

6.6 Functions of importance

  • AVERAGE(), AVERAGEIF(), AVERAGEIFS()
  • STDDEV
  • COUNT(), COUNTIF(), COUNTIFS()
  • VLOOKUP(), HLOOKUP()

7 Depression among Medical Students

Carrard, Valerie, Bourquin, Céline, Berney, Sylvie, Schlegel, Katja, Gaume, Jacques, Bart, Pierre-Alexandre, Preisig, Martin, Schmid Mast, Marianne, & Berney, Alexandre. (2022). Dataset for the paper “The relationship between medical students’ empathy, mental health, and burnout: A cross-sectional study” published in Medical Teacher (2022) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5702895

7.1 Data

'data.frame':   886 obs. of  20 variables:
 $ id       : int  2 4 9 10 13 14 17 21 23 24 ...
 $ age      : int  18 26 21 21 21 26 23 23 23 22 ...
 $ year     : int  1 4 3 2 3 5 5 4 4 2 ...
 $ sex      : int  1 1 2 2 1 2 2 1 2 2 ...
 $ glang    : int  120 1 1 1 1 1 1 1 1 1 ...
 $ part     : int  1 1 0 0 1 1 1 1 1 1 ...
 $ job      : int  0 0 0 1 0 1 0 1 1 0 ...
 $ stud_h   : int  56 20 36 51 22 10 15 8 20 20 ...
 $ health   : int  3 4 3 5 4 2 3 4 2 5 ...
 $ psyt     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ jspe     : int  88 109 106 101 102 102 117 118 118 108 ...
 $ qcae_cog : int  62 55 64 52 58 48 58 65 69 56 ...
 $ qcae_aff : int  27 37 39 33 28 37 38 40 46 36 ...
 $ amsp     : int  17 22 17 18 21 17 23 32 23 22 ...
 $ erec_mean: num  0.738 0.69 0.69 0.833 0.69 ...
 $ cesd     : int  34 7 25 17 14 14 45 6 43 11 ...
 $ stai_t   : int  61 33 73 48 46 56 56 36 43 43 ...
 $ mbi_ex   : int  17 14 24 16 22 18 28 11 26 18 ...
 $ mbi_cy   : int  13 11 7 10 14 15 17 10 21 6 ...
 $ mbi_ea   : int  20 26 23 21 23 18 16 27 22 23 ...

7.2 Comparison of psychological tests


Call:
glm(formula = cesd ~ mbi_ex + mbi_cy + mbi_ea, data = dta)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-29.828   -5.827   -1.205    5.135   34.262  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.97903    2.83422   3.521 0.000452 ***
mbi_ex       1.06435    0.06846  15.547  < 2e-16 ***
mbi_cy       0.13763    0.08333   1.652 0.098968 .  
mbi_ea      -0.46595    0.08129  -5.732 1.36e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 78.93939)

    Null deviance: 116609  on 885  degrees of freedom
Residual deviance:  69625  on 882  degrees of freedom
AIC: 6391

Number of Fisher Scoring iterations: 2

8 Data Table

age year sex glang part job stud_h health psyt jspe qcae_cog qcae_aff amsp erec_mean cesd stai_t mbi_ex mbi_cy mbi_ea
age 1.000 0.593 -0.073 0.030 0.187 0.226 -0.294 -0.030 0.015 0.223 0.059 -0.008 0.000 -0.019 -0.139 -0.082 -0.183 -0.002 0.046
year 0.593 1.000 -0.057 -0.101 0.147 0.217 -0.521 0.084 0.001 0.301 0.077 0.061 -0.014 0.080 -0.242 -0.165 -0.244 0.077 -0.010
sex -0.073 -0.057 1.000 0.006 0.000 0.021 -0.014 -0.088 0.158 0.126 0.116 0.359 -0.130 0.211 0.229 0.251 0.156 0.019 -0.044
glang 0.030 -0.101 0.006 1.000 -0.050 -0.051 0.063 -0.014 -0.043 -0.095 -0.071 -0.088 -0.044 -0.099 0.098 0.092 0.038 0.037 -0.002
part 0.187 0.147 0.000 -0.050 1.000 0.048 -0.104 0.077 0.027 0.122 0.040 0.010 0.062 0.030 -0.106 -0.072 -0.012 -0.019 0.048
job 0.226 0.217 0.021 -0.051 0.048 1.000 -0.202 -0.023 0.060 0.078 0.025 0.000 0.028 0.015 -0.060 -0.063 -0.066 0.010 -0.018
stud_h -0.294 -0.521 -0.014 0.063 -0.104 -0.202 1.000 -0.074 -0.033 -0.135 -0.023 -0.032 -0.002 -0.056 0.174 0.152 0.186 -0.087 0.102
health -0.030 0.084 -0.088 -0.014 0.077 -0.023 -0.074 1.000 -0.137 -0.004 -0.027 -0.063 0.027 0.024 -0.358 -0.305 -0.286 -0.189 0.224
psyt 0.015 0.001 0.158 -0.043 0.027 0.060 -0.033 -0.137 1.000 0.048 0.046 0.123 -0.073 0.003 0.268 0.293 0.177 0.146 -0.163
jspe 0.223 0.301 0.126 -0.095 0.122 0.078 -0.135 -0.004 0.048 1.000 0.343 0.263 0.099 0.097 -0.080 -0.075 -0.041 -0.007 0.083
qcae_cog 0.059 0.077 0.116 -0.071 0.040 0.025 -0.023 -0.027 0.046 0.343 1.000 0.259 0.387 0.074 -0.034 -0.078 -0.024 -0.025 0.184
qcae_aff -0.008 0.061 0.359 -0.088 0.010 0.000 -0.032 -0.063 0.123 0.263 0.259 1.000 -0.071 0.141 0.251 0.331 0.216 0.128 -0.114
amsp 0.000 -0.014 -0.130 -0.044 0.062 0.028 -0.002 0.027 -0.073 0.099 0.387 -0.071 1.000 0.003 -0.152 -0.249 -0.073 -0.029 0.221
erec_mean -0.019 0.080 0.211 -0.099 0.030 0.015 -0.056 0.024 0.003 0.097 0.074 0.141 0.003 1.000 0.030 0.038 0.015 0.062 -0.035
cesd -0.139 -0.242 0.229 0.098 -0.106 -0.060 0.174 -0.358 0.268 -0.080 -0.034 0.251 -0.152 0.030 1.000 0.716 0.606 0.408 -0.454
stai_t -0.082 -0.165 0.251 0.092 -0.072 -0.063 0.152 -0.305 0.293 -0.075 -0.078 0.331 -0.249 0.038 0.716 1.000 0.530 0.332 -0.463
mbi_ex -0.183 -0.244 0.156 0.038 -0.012 -0.066 0.186 -0.286 0.177 -0.041 -0.024 0.216 -0.073 0.015 0.606 0.530 1.000 0.505 -0.481
mbi_cy -0.002 0.077 0.019 0.037 -0.019 0.010 -0.087 -0.189 0.146 -0.007 -0.025 0.128 -0.029 0.062 0.408 0.332 0.505 1.000 -0.566
mbi_ea 0.046 -0.010 -0.044 -0.002 0.048 -0.018 0.102 0.224 -0.163 0.083 0.184 -0.114 0.221 -0.035 -0.454 -0.463 -0.481 -0.566 1.000

8.1 Example of a linear application

Manufacturing one salami (x) requires 12 oz of beef and 4 oz of pork. Manufacturing one bologna (y) requires 10 oz of beef and 3 oz of pork. There are 200 kg of beef and 50 kg of pork available. Beef sells at 180 bht/kg and pork at 150 bht/kg. Hot dogs are sold at 110 bht per package and Bologna at 50 bht per package. Determine the optimal units of hot dogs and bologna for maximum profit.

8.2 Line

\[\eqalign{ x& y& mx& y2 &dy^2& var & value \\ \hline 1& 8& 2.89999& 7.19999& 0.6400 & m& 2.90 \\ 2& 10& 5.79999& 10.09999& 0.0100& b &4.30 \\ 3& 12& 8.69999& 12.99999& 1.0000& &&\\ 4& 15& 11.59999& 15.89999& 0.8100& &&\\ 5& 20& 14.49999& 18.79999& 1.4400& SumErr& 3.9\\ }\]

8.3 Line

Residual standard error: 0.9661 on 3 degrees of freedom
Multiple R-squared:  0.9346,    Adjusted R-squared:  0.9128 
F-statistic: 42.86 on 1 and 3 DF,  p-value: 0.007246

8.4 Line

Deviance Residuals: 
  1     2     3     4     5  
-0.2   0.8  -0.2  -1.2   0.8  

Coefficients:
        Estimate Std. Error t value Pr(>|t|)   
(Intercept)   5.2000     1.0132   5.132  0.01433 * 
x             2.0000     0.3055   6.547  0.00725 **  

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
                0.05 ‘.’ 0.1 ‘ ’ 1

8.5 Solver

8.6 Linear problem

\[\tiny\begin{matrix} Criteria & & Hot\ dogs & Bologna &Totals & Relat & Limit & Notes\\ \hline Amt & used& kg/unit & kg/unit & kg & & kg & Amt\ left \\ Beef & & 0.4 & 0.3 & 200 & <= & \color{green}{200} & 0 \\ Pork & & 0.1 & 0.2 & 50 & <= & \color{green}{100} & 50 \\ \hline Units& made & \color{yellow}{500} & \color{yellow}{0} & & & & & \\ \hline Costs: & bht/kg & & & & & & & \\ Beef & 180 & 72.00 & 54.00 & 36000.00 & & & \\ Pork & 150 & 15.00 & 0.85 & 7500.00 & & & & \\ \hline Price & bht/piece & 110 & 50 & 55000.00 & & & \\ \hline Profit & & & & \color{red}{11500.00} & & & \\ \hline \end{matrix}\]

8.7 Mental Health -1

  • id: Participants ID number (string)
  • age: age at questionnaire 20-21 (numeric)
  • year: CURICULUM YEAR : In which curriculum year are you? (1=Bmed1, 2=Bmed2, 3=Bmed3, 4=Mmed1, 5=Mmed2, 6=Mmed3)
  • sex: GENDER : To which gender do you identify the most ? (1=Man, 2=Woman, 3=Non-binary)

8.8 Mental Health - 2

  • glang: MOTHER TONGUE: What is your mother tongue? (1=French, 15=German, 20=English, 37=Arab, 51=Basque, 52=Bulgarian, 53=Catalan, 54=Chinese, 59=Korean, 60=Croatian, 62=Danish, 63=Spanish, 82=Estonian, 83=Finnish, 84=Galician, 85=Greek, 86=Hebrew, 87=Hindi, 88=Hungarian, 89=Indonesian, 90=Italian, 92=Japanese, 93=Kazakh, 94=Latvian, 95=Lithuanian, 96=Malay, 98=Dutch, 100=Norwegian, 101=Polish, 102=Portuguese, 104=Romanian, 106=Russian, 108=Serbian, 112=Slovak, 113=Slovenian, 114=Swedish, 116=Czech, 117=Thai, 118=Turkish, 119=Ukrainian, 120=Vietnamese, 121=Other)

8.9 Mental Health - 3

  • part: PARTNERSHIP STATUS : Do you have a partner? (0=No, 1=Yes)
  • job: HAVING A JOB : Do you have a paid job? (0=No, 1=Yes)
  • stud_h: HOURS OF STUDY PER WEEK : On average how many hours per week do you study on top of courses?)
  • health: SATISFACTION WITH HEALTH : How satisfied are you with your health? (1=Verydissatisfied, 2=Dissatisfied, 3=Neithersatisfiednordissatisfied, 4=Satisfied, 5=Verysatisfied)

8.10 Mental Health - 4

  • psyt: PSYCHOTHERAPY LAST YEAR : During the last 12 months have you ever consulted a psychotherapist or a psychiatrist for your health? (0=No, 1=Yes)
  • jspe: JSPE total empathy score (numeric)
  • qcae_cog: QCAE Cognitive empathy score (numeric)
  • qcae_aff: QCAE Affective empathy score (numeric)
  • amsp: AMSP total score (numeric)

8.11 Mental Health - 5

  • erec_mean: GERT : mean value of correct responses (numeric)
  • cesd: CES-D total score (numeric)
  • stai_t: STAI score (numeric)
  • mbi_ex: MBI Emotional Exhaustion (numeric)
  • mbi_cy: MBI Cynicism (numeric)
  • mbi_ea: MBI Academic Efficacy (numeric)

8.12 Data categories

Demographic info Personal Situation Mental Health factors Empathy Mental Stability
id, age, year, sex, glang part, job, stud_h, health, psyt jspe, qcae_cog, qcae_aff, amsp, erec_mean GERT, cesd, stai_t mbi_ex, mbi_cy, mbi_ea