dr.sc. Luka Šikić
02 ožujak, 2020
# Učitaj paket
library(lsr)
# Učitaj podatke u radni prostor
load("./PODATCI/aflsmall.Rdata")
who() # Pregledaj učitane podatke## -- Name -- -- Class -- -- Size --
## afl.finalists factor 400
## afl.margins numeric 176
## [1] 56 31 56 8 32 14 36 56 19 1 3
## [1] Hawthorn Melbourne Carlton Melbourne Hawthorn
## 17 Levels: Adelaide Brisbane Carlton Collingwood Essendon Fitzroy ... Western Bulldogs
Histogram pobjedničkih bodova(afl.margins) iz AFL 2010 lige američkog nogometa.
Skraćeni zapis \[ \bar{X} = \frac{1}{N} \sum_{i=1}^N X_i \]
Kalkulator
## [1] 36.6
## [1] 36.6
## [1] 30.5
# Definiraj vektor od 10 brojeva
vektor_10 <- c( -15,2,3,4,5,6,7,8,9,12 )
mean( x = vektor_10 ) # Izračunaj AS## [1] 4.1
## [1] 5.5
## [1] 5.5
## [1] 33.75
## afl.finalists
## Adelaide Brisbane Carlton Collingwood
## 26 25 26 28
## Essendon Fitzroy Fremantle Geelong
## 32 0 6 39
## Hawthorn Melbourne North Melbourne Port Adelaide
## 27 28 28 17
## Richmond St Kilda Sydney West Coast
## 6 24 26 38
## Western Bulldogs
## 24
## [1] "Geelong"
## [1] 39
## [1] 3
## [1] 8
## [1] 116
## [1] 0
## [1] 0 116
## 50%
## 30.5
## 25% 75%
## 12.75 50.50
## [1] 37.75
\[ \mbox{Var}(X) = \frac{1}{N} \sum_{i=1}^N \left( X_i - \bar{X} \right)^2 \]
\[\mbox{Var}(X) = \frac{\sum_{i=1}^N \left( X_i - \bar{X} \right)^2}{N}\]
| \(i\) | \(X_i\) | \(X_i - \bar{X}\) | \((X_i - \bar{X})^2\) |
|---|---|---|---|
| 1 | 56 | 19.4 | 376.36 |
| 2 | 31 | -5.6 | 31.36 |
| 3 | 56 | 19.4 | 376.36 |
| 4 | 8 | -28.6 | 817.96 |
| 5 | 32 | -4.6 | 21.16 |
## [1] 324.64
## [1] 675.9718
## [1] 679.8345
\[ s = \sqrt{ \frac{1}{N} \sum_{i=1}^N \left( X_i - \bar{X} \right)^2 } \] \[ \hat\sigma = \sqrt{ \frac{1}{N-1} \sum_{i=1}^N \left( X_i - \bar{X} \right)^2 } \]
## [1] 26.07364
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 12.75 30.50 35.30 50.50 116.00
# Pregled logičke varijable
ekstremi <- afl.margins > 50 # Stvori log varijablu
head(ekstremi,5) # Pogledaj podatke## [1] TRUE FALSE TRUE FALSE FALSE
## Mode FALSE TRUE
## logical 132 44
## Adelaide Brisbane Carlton Collingwood
## 26 25 26 28
## Essendon Fitzroy Fremantle Geelong
## 32 0 6 39
## Hawthorn Melbourne North Melbourne Port Adelaide
## 27 28 28 17
## Richmond St Kilda Sydney West Coast
## 6 24 26 38
## Western Bulldogs
## 24
# Pregled tekstualne varijable
txt <- as.character( afl.finalists ) # Stvori txt var
summary( object = txt ) # Deskriptivna stat## Length Class Mode
## 400 character character
rm(list = ls()) # Očisti radni prostor
load("./PODATCI/clinicaltrial.Rdata") # Učitaj podatke
who(TRUE) # Pregled podataka ## -- Name -- -- Class -- -- Size --
## clin.trial data.frame 18 x 3
## $drug factor 18
## $therapy factor 18
## $mood.gain numeric 18
## drug therapy mood.gain
## 1 placebo no.therapy 0.5
## 2 placebo no.therapy 0.3
## 3 placebo no.therapy 0.1
## 4 anxifree no.therapy 0.6
## 5 anxifree no.therapy 0.4
## drug therapy mood.gain
## placebo :6 no.therapy:9 Min. :0.1000
## anxifree:6 CBT :9 1st Qu.:0.4250
## joyzepam:6 Median :0.8500
## Mean :0.8833
## 3rd Qu.:1.3000
## Max. :1.8000
# Deksriptivna statistika na podatkovnom okviru
describe(clin.trial) # Desktiptivna stat/ druga funkcija## clin.trial
##
## 3 Variables 18 Observations
## --------------------------------------------------------------------------------
## drug
## n missing distinct
## 18 0 3
##
## Value placebo anxifree joyzepam
## Frequency 6 6 6
## Proportion 0.333 0.333 0.333
## --------------------------------------------------------------------------------
## therapy
## n missing distinct
## 18 0 2
##
## Value no.therapy CBT
## Frequency 9 9
## Proportion 0.5 0.5
## --------------------------------------------------------------------------------
## mood.gain
## n missing distinct Info Mean Gmd .05 .10
## 18 0 17 0.996 0.8833 0.6281 0.185 0.270
## .25 .50 .75 .90 .95
## 0.425 0.850 1.300 1.490 1.715
##
## lowest : 0.1 0.2 0.3 0.3 0.4, highest: 1.3 1.4 1.4 1.7 1.8
##
## Value 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.9 1.1 1.2 1.3
## Frequency 1 1 2 1 1 2 1 1 1 1 2
## Proportion 0.056 0.056 0.111 0.056 0.056 0.111 0.056 0.056 0.056 0.056 0.111
##
## Value 1.4 1.7 1.8
## Frequency 2 1 1
## Proportion 0.111 0.056 0.056
## --------------------------------------------------------------------------------
# Pregledaj grupirano prema terapiji
by(data = clin.trial, # Izvor podataka
INDICES = clin.trial$therapy, # Odredi grupiranje
FUN = summary) # Odredi funkciju## clin.trial$therapy: no.therapy
## drug therapy mood.gain
## placebo :3 no.therapy:9 Min. :0.1000
## anxifree:3 CBT :0 1st Qu.:0.3000
## joyzepam:3 Median :0.5000
## Mean :0.7222
## 3rd Qu.:1.3000
## Max. :1.7000
## ------------------------------------------------------------
## clin.trial$therapy: CBT
## drug therapy mood.gain
## placebo :3 no.therapy:0 Min. :0.300
## anxifree:3 CBT :9 1st Qu.:0.800
## joyzepam:3 Median :1.100
## Mean :1.044
## 3rd Qu.:1.300
## Max. :1.800
# Pregledaj grupirano prema razlici u raspoloženju
aggregate(formula = mood.gain ~ drug + therapy, # Prikaz
data = clin.trial, # Podatci
FUN = mean) # AS## drug therapy mood.gain
## 1 placebo no.therapy 0.300000
## 2 anxifree no.therapy 0.400000
## 3 joyzepam no.therapy 1.466667
## 4 placebo CBT 0.600000
## 5 anxifree CBT 1.033333
## 6 joyzepam CBT 1.500000
# Pregledaj grupirano prema razlici u raspoloženju
aggregate(mood.gain ~ drug + therapy, # Prikaz
clin.trial, # Podatci
sd) # Standardna devijacija## drug therapy mood.gain
## 1 placebo no.therapy 0.2000000
## 2 anxifree no.therapy 0.2000000
## 3 joyzepam no.therapy 0.2081666
## 4 placebo CBT 0.3000000
## 5 anxifree CBT 0.2081666
## 6 joyzepam CBT 0.2645751
\[ \mbox{standardna vrijednost} = \frac{\mbox{vrijednost opservacije} - \mbox{prosjek}}{\mbox{standardna devijacija}} \]
\[ z_i = \frac{X_i - \bar{X}}{\hat\sigma} \]
\[ z = \frac{35 - 17}{5} = 3.6 \]
## [1] 0.9998409
rm(list = ls()) # Očisti radni prostor
# Učitaj podatke
load("./PODATCI/parenthood.Rdata")
who(TRUE) # Pregled podataka ## -- Name -- -- Class -- -- Size --
## parenthood data.frame 100 x 4
## $dan.sleep numeric 100
## $baby.sleep numeric 100
## $dan.grump numeric 100
## $day integer 100
## dan.sleep baby.sleep dan.grump day
## 1 7.59 10.18 56 1
## 2 7.91 11.66 60 2
## 3 5.14 7.92 82 3
## 4 7.71 9.61 55 4
## 5 6.68 9.75 67 5
## 6 5.99 5.04 72 6
## 7 8.19 10.45 53 7
## parenthood
##
## 4 Variables 100 Observations
## --------------------------------------------------------------------------------
## dan.sleep
## n missing distinct Info Mean Gmd .05 .10
## 100 0 90 1 6.965 1.164 5.138 5.434
## .25 .50 .75 .90 .95
## 6.292 7.030 7.740 8.172 8.473
##
## lowest : 4.84 4.86 4.91 4.98 5.09, highest: 8.47 8.52 8.66 8.72 9.00
## --------------------------------------------------------------------------------
## baby.sleep
## n missing distinct Info Mean Gmd .05 .10
## 100 0 88 1 8.049 2.381 4.698 5.591
## .25 .50 .75 .90 .95
## 6.425 7.950 9.635 11.083 11.612
##
## lowest : 3.25 3.46 4.17 4.18 4.66, highest: 11.66 11.68 11.75 11.78 12.07
## --------------------------------------------------------------------------------
## dan.grump
## n missing distinct Info Mean Gmd .05 .10
## 100 0 37 0.998 63.71 11.33 50.0 52.9
## .25 .50 .75 .90 .95
## 57.0 62.0 71.0 78.1 82.0
##
## lowest : 41 44 46 48 50, highest: 80 82 86 89 91
## --------------------------------------------------------------------------------
## day
## n missing distinct Info Mean Gmd .05 .10
## 100 0 100 1 50.5 33.67 5.95 10.90
## .25 .50 .75 .90 .95
## 25.75 50.50 75.25 90.10 95.05
##
## lowest : 1 2 3 4 5, highest: 96 97 98 99 100
## --------------------------------------------------------------------------------
Grafički prikaz varijabli u parenthood podatkovnom skupu.
Dijagram rasipanja za varijable Sati spavanja/roditelj i Raspoloženje.
Dijagram rasipanja za varijable Sati spavanja/dijete i Raspoloženje.
Dijagram rasipanja za varijable Sati spavanja/dijete i Sati spavanja/roditelj.
Različiti smjer i intenzitet korelacije.
# Izračunaj korelaciju između spavanja i raspoloženja
cor(x = parenthood$dan.sleep, y = parenthood$dan.grump)## [1] -0.903384
## dan.sleep baby.sleep dan.grump day
## dan.sleep 1.00000000 0.62794934 -0.90338404 -0.09840768
## baby.sleep 0.62794934 1.00000000 -0.56596373 -0.01043394
## dan.grump -0.90338404 -0.56596373 1.00000000 0.07647926
## day -0.09840768 -0.01043394 0.07647926 1.00000000
| Korelacija | Snaga | Smjer |
|---|---|---|
| -1.0 to -0.9 | Izrazito jaka | Negativna |
| -0.9 to -0.7 | Jaka | Negativna |
| -0.7 to -0.4 | Umjerena | Negativna |
| -0.4 to -0.2 | Slaba | Negativna |
| -0.2 to 0 | Zanemariva | Negativna |
| 0 to 0.2 | Zanemariva | Pozitivna |
| 0.2 to 0.4 | Slaba | Pozitivna |
| 0.4 to 0.7 | Umjerena | Pozitivna |
| 0.7 to 0.9 | Jaka | Pozitivna |
| 0.9 to 1.0 | Izrazito jaka | Pozitivna |
rm(list=ls()) # Očisti radni prostor
load("./PODATCI/effort.Rdata") # Učitaj podatke
lsr::who(TRUE) # Pregledaj podatke## -- Name -- -- Class -- -- Size --
## effort data.frame 10 x 2
## $hours numeric 10
## $grade numeric 10
## hours grade
## 1 2 13
## 2 76 91
## 3 40 79
## 4 6 14
## 5 16 21
## effort
##
## 2 Variables 10 Observations
## --------------------------------------------------------------------------------
## hours
## n missing distinct Info Mean Gmd .05 .10
## 10 0 10 1 36.8 30.76 3.80 5.60
## .25 .50 .75 .90 .95
## 18.75 34.00 55.75 68.80 72.40
##
## lowest : 2 6 16 27 28, highest: 40 46 59 68 76
##
## Value 2 6 16 27 28 40 46 59 68 76
## Frequency 1 1 1 1 1 1 1 1 1 1
## Proportion 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
## --------------------------------------------------------------------------------
## grade
## n missing distinct Info Mean Gmd .05 .10
## 10 0 10 1 59.6 36.8 13.45 13.90
## .25 .50 .75 .90 .95
## 27.50 76.50 84.75 88.30 89.65
##
## lowest : 13 14 21 47 74, highest: 79 84 85 88 91
##
## Value 13 14 21 47 74 79 84 85 88 91
## Frequency 1 1 1 1 1 1 1 1 1 1
## Proportion 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
## --------------------------------------------------------------------------------
## [1] 0.909402
Odnos između sati studiranja i ocjene (svaka točka predstavlja jednog studenta). Isprekidana linija prikazuje linearni odnos. Korelacija između ove dvije varijable je visoka \(r = .91\). Valja primjetiti da više sati učenja uvijek dodnosi veću ocjenu što se odražava u visokom Spearman koeficijentu korelacije of \(rho = 1\).
| Rang sati rada | Rang visine ocjene | |
|---|---|---|
| student | 1 | 1 |
| student | 2 | 10 |
| student | 3 | 6 |
| student | 4 | 2 |
| student | 5 | 3 |
| student | 6 | 5 |
| student | 7 | 4 |
| student | 8 | 8 |
| student | 9 | 7 |
| student | 10 | 9 |
## [1] 1
## [1] 1