Msc Quarto second attempt

Let’s write about Iron Maiden

Band Members

My favourite band members are

  • Adrian Smith

  • Dave Murray

My favourite Albums

  1. Piece of Mind

  2. Power Slave

Lyrics

Lyrics are not their strong point, but here is a quote from a decent song called Wasted Years

Don’t waste your time always searching for all of those years

Spotify data
I got these data using the spotifyr pacjage
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
str(diamonds)
tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
ggplot(diamonds, aes(x = depth, y = price)) +
         geom_point()

This scatter plot shows that Nick is the dogs bollocks

library(tidyverse)
library(modeldata)
?ggplot

?crickets
View(crickets)

# The basics

ggplot(crickets, aes(x = temp, 
                     y = rate)) + 
  geom_point() +
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: McDonald (2009)")

ggplot(crickets, aes(x = temp, 
                     y = rate,
                     color = species)) + 
  geom_point() +
  labs(x = "Temperature",
       y = "Chirp rate",
       color = "Species",
       title = "Cricket chirps",
       caption = "Source: McDonald (2009)") +
  scale_color_brewer(palette = "Dark2")

# Modifiying basic properties of the plot

ggplot(crickets, aes(x = temp, 
                     y = rate)) + 
  geom_point(color = "red",
             size = 2,
             alpha = .3,
             shape = "square") +
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: McDonald (2009)")

# Learn more about the options for the geom_abline()
# with ?geom_point

# Adding another layer


ggplot(crickets, aes(x = temp, 
                     y = rate)) + 
  geom_point() +
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(x = "Temperature",
       y = "Chirp rate",
       title = "Cricket chirps",
       caption = "Source: McDonald (2009)")
`geom_smooth()` using formula = 'y ~ x'

ggplot(crickets, aes(x = temp, 
                     y = rate,
                     color = species)) + 
  geom_point() +
  geom_smooth(method = "lm",
              se = FALSE) +
  labs(x = "Temperature",
       y = "Chirp rate",
       color = "Species",
       title = "Cricket chirps",
       caption = "Source: McDonald (2009)") +
  scale_color_brewer(palette = "Dark2") 
`geom_smooth()` using formula = 'y ~ x'

# Other plots

ggplot(crickets, aes(x = rate)) + 
  geom_histogram(bins = 15) # one quantitative variable

ggplot(crickets, aes(x = rate)) + 
  geom_freqpoly(bins = 15)

ggplot(crickets, aes(x = species)) + 
  geom_bar(color = "black",
           fill = "lightblue")

ggplot(crickets, aes(x = species, 
                     fill = species)) + 
  geom_bar(show.legend = FALSE) +
  scale_fill_brewer(palette = "Dark2")

ggplot(crickets, aes(x = species, 
                     y = rate,
                     color = species)) + 
  geom_boxplot(show.legend = FALSE) +
  scale_color_brewer(palette = "Dark2") +
  theme_minimal()

?theme_minimal()

# faceting

# not great:
ggplot(crickets, aes(x = rate, 
                     fill = species)) + 
  geom_histogram(bins = 15) +
  scale_fill_brewer(palette = "Dark2")

ggplot(crickets, aes(x = rate,
                     fill = species)) + 
  geom_histogram(bins = 15,
                 show.legend = FALSE) + 
  facet_wrap(~species) +
  scale_fill_brewer(palette = "Dark2")

?facet_wrap

ggplot(crickets, aes(x = rate,
                     fill = species)) + 
  geom_histogram(bins = 15,
                 show.legend = FALSE) + 
  facet_wrap(~species,
             ncol = 1) +
  scale_fill_brewer(palette = "Dark2") + 
  theme_minimal()

freq <- c(18,5,11,3,0)
freq
[1] 18  5 11  3  0
species <- c("buzzard", "hobby", "kestrel", "merlin", "redkite")
species
[1] "buzzard" "hobby"   "kestrel" "merlin"  "redkite"

spec_freq <- data.frame (species,freq) spec_freq

Weeks 1-4 restart

Using Quarto with r and Python for reports, slides and web publishing

Wild dog image

The why of R - fresh start
library(tidyverse)
installed.packages("ggplot2")
     Package LibPath Version Priority Depends Imports LinkingTo Suggests
     Enhances License License_is_FOSS License_restricts_use OS_type Archs
     MD5sum NeedsCompilation Built
library(ggplot2)
data("diamonds")
head(diamonds)
# A tibble: 6 × 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31
4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
#This is a comment!

## Problem A (also a comment)
1 + 2 + 3 + 4
[1] 10
## Problem B (hit "Enter' aftet the plus sign before '4')
1 + 2 + 3 +
  4
[1] 10
## Problem C
9 * 10 ^ 2
[1] 900
# No breaks
(1+2+3) + (4+5+6) + (7+8+9) + (10+11+12) + (13+14+15) + (16+17+18)
[1] 171
#Include breaks
(1+2+3)+
  (4+5+6)+ ##notice the indentation after the first line
  (7+8+9) +
  (10+11+12) +
  (13+14+15) +
  (16+17+18)
[1] 171
#This is a comment
#Comments don't get evaluated as code by r

##Problem #1
1+2+3 # sum of 1,2,and3
[1] 6
##Problem #2
1-2-3 #difference of 1,2,then 3
[1] -4
## Problem #3
"hello"
[1] "hello"
##Problem #4
#4+5+7

##Problem #5
#hello
#creating an object named "student.names" that contains 3 names
student.names <- c("matt", "remi", "wendy", "craig")

#creating an object named "fav.color"
fav.color <- c ("red", "purple", "teal", "yellow")

#creating an object named "age"
age <- c(29,2,24,15)

#height in centimeters
height <- c (117.8, 38.1, 170.18, 30.48)

#human or cat?
species <- c("human", "cat", "human", "cat")

data <- tibble(student.names, fav.color, age, height, species)
#calculating the mean of numbers 1 through 5
(1+2+3+4+5)/5
[1] 3
#method 1
mean(1:5)
[1] 3
#the colon indicates 'through' as in 1 through 5

#method2
mean(1,2,3,4,5)
[1] 1
#typing all the numbers manually

#method 3
Numbers <- c(1,2,3,4,5)
mean(Numbers)
[1] 3
numbers <- c(1,2,3,4,5)
numbers2 <- c(1,2,3,4,NA)

mean(x=numbers)
[1] 3
mean(x=numbers2)
[1] NA
mean(x=numbers, na.rm = TRUE)
[1] 3
mean(x=numbers2, na.rm = TRUE)
[1] 2.5
#there are two major camps pf coding syntax: tidyverse and base R
#a data set is a group of related variables as variable represents a column #and a row represents a unique observation
#baseR dataeets are called dat frames or df
#tidyverse datasets are called tibbles or tbl -have all properties of data #frames and more
#Pipes are a shortcut tool (Ctrl + Shift + M) that tidyverse use for more #efficient coding. Hitting Entre after a pipe will autoindent. Pipe is derived
#from package magrittr but tidyverse loads this

#there are many sturctural types of data eg numerical, letters,
#boolean - (true/false), categorical. We can also have datasets

#VECTOR a data strcuture which contains a single type of values eg all letter
#characters. If you combine muliple vectors together you get a dataset. Each
#column within a dataset is a vector eg name, hair colour,age, human t/false

##defining 3 objects all of which are vectors
#a vector containing names
Names <- c("Sam", "Tina", "Alex")

#a vector conating character values for "Hair Color"
Hair <- c ("brown", "black","blonde")

#a vector containing numeric values for age
Age <- c(24,41,2)

#a vector containg true/flase to denote human
Human <- c (TRUE, FALSE, TRUE)

##executing three lines of code to view these objects defintions in console
Names
[1] "Sam"  "Tina" "Alex"
Hair
[1] "brown"  "black"  "blonde"
Age
[1] 24 41  2
Human
[1]  TRUE FALSE  TRUE
##defining an object names my dataset
#this dataset combines the four vectors previously defined
mydataset <- tibble(Names, Hair, Age, Human)

##viewing the dataset in the console
mydataset
# A tibble: 3 × 4
  Names Hair     Age Human
  <chr> <chr>  <dbl> <lgl>
1 Sam   brown     24 TRUE 
2 Tina  black     41 FALSE
3 Alex  blonde     2 TRUE 
#atomn vector types Character (a string eg letters , numbers),numeric, 
#integer, logical, comlplex and defaluts to the least specialist
str(mtcars)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
str(diamonds)
tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
str(npk)
'data.frame':   24 obs. of  5 variables:
 $ block: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
 $ N    : Factor w/ 2 levels "0","1": 1 2 1 2 2 2 1 1 1 2 ...
 $ P    : Factor w/ 2 levels "0","1": 2 2 1 1 1 2 1 2 2 2 ...
 $ K    : Factor w/ 2 levels "0","1": 2 1 1 2 1 2 2 1 1 2 ...
 $ yield: num  49.5 62.8 46.8 57 59.8 58.5 55.5 56 62.8 55.8 ...

Common mistakes

  1. Capitalisation
  2. Mis-spelling
  3. Closing Punctuation
  4. Continuing Punctuation
  5. Conflicting code
  6. Libraries are not loaded
  7. The unsaved object
library(tidyverse)

##This is an example code
diamonds %>%
  group_by(clarity) %>%
  summarise(m =mean(price)) %>%
  ungroup()
# A tibble: 8 × 2
  clarity     m
  <ord>   <dbl>
1 I1      3924.
2 SI2     5063.
3 SI1     3996.
4 VS2     3925.
5 VS1     3839.
6 VVS2    3284.
7 VVS1    2523.
8 IF      2865.
library(tidyverse)
View(diamonds)
#str lets us look at the structure see 3.3.10 of week #3 wendy book regarding structures
str(diamonds)
tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
names(diamonds)
 [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
 [8] "x"       "y"       "z"      
#ordered factors are cut, color and clarity. 6 #variables are numeric carat, depth, table, x,y,z
#1 variable has an interger structure price

#putting a ?infront of built in data sets brings help

?diamonds

Chapter 6 Wendy Huynh

Basic Data Management

#mutate () can be used to create variables based on #existing variables
diamonds %>%
  mutate(JustOne = 1,
         Values = "something",
         Simple = TRUE)
# A tibble: 53,940 × 13
   carat cut    color clarity depth table price     x     y     z JustOne Values
   <dbl> <ord>  <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>   <dbl> <chr> 
 1  0.23 Ideal  E     SI2      61.5    55   326  3.95  3.98  2.43       1 somet…
 2  0.21 Premi… E     SI1      59.8    61   326  3.89  3.84  2.31       1 somet…
 3  0.23 Good   E     VS1      56.9    65   327  4.05  4.07  2.31       1 somet…
 4  0.29 Premi… I     VS2      62.4    58   334  4.2   4.23  2.63       1 somet…
 5  0.31 Good   J     SI2      63.3    58   335  4.34  4.35  2.75       1 somet…
 6  0.24 Very … J     VVS2     62.8    57   336  3.94  3.96  2.48       1 somet…
 7  0.24 Very … I     VVS1     62.3    57   336  3.95  3.98  2.47       1 somet…
 8  0.26 Very … H     SI1      61.9    55   337  4.07  4.11  2.53       1 somet…
 9  0.22 Fair   E     VS2      65.1    61   337  3.87  3.78  2.49       1 somet…
10  0.23 Very … H     VS1      59.4    61   338  4     4.05  2.39       1 somet…
# ℹ 53,930 more rows
# ℹ 1 more variable: Simple <lgl>
diamonds %>%
  mutate(price200= price-200)
# A tibble: 53,940 × 11
   carat cut       color clarity depth table price     x     y     z price200
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>    <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43      126
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31      126
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31      127
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63      134
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75      135
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48      136
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47      136
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53      137
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49      137
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39      138
# ℹ 53,930 more rows
#if saving this as a data set give it a name like #diamonds.new so dont contaminate original data

Nesting Functions

#we can use other functions inside mutate to create # #new variables this is nesting where a funtion such #as mean nests inside eg mutate

library(tidyverse)
diamonds %>%
  mutate (
    m = mean(price), #calc mean price
  sd=sd(price),            #calcs sd
  med=median(price)     #calc median
  )
# A tibble: 53,940 × 13
   carat cut       color clarity depth table price     x     y     z     m    sd
   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 3933. 3989.
 2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 3933. 3989.
 3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31 3933. 3989.
 4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 3933. 3989.
 5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75 3933. 3989.
 6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 3933. 3989.
 7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47 3933. 3989.
 8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53 3933. 3989.
 9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49 3933. 3989.
10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39 3933. 3989.
# ℹ 53,930 more rows
# ℹ 1 more variable: med <dbl>

6.1.1.0.1

library(tidyverse)
view((midwest))

midwest %>%
  mutate(avg.pop.den = mean(popdensity))
# A tibble: 437 × 29
     PID county  state  area poptotal popdensity popwhite popblack popamerindian
   <int> <chr>   <chr> <dbl>    <int>      <dbl>    <int>    <int>         <int>
 1   561 ADAMS   IL    0.052    66090      1271.    63917     1702            98
 2   562 ALEXAN… IL    0.014    10626       759      7054     3496            19
 3   563 BOND    IL    0.022    14991       681.    14477      429            35
 4   564 BOONE   IL    0.017    30806      1812.    29344      127            46
 5   565 BROWN   IL    0.018     5836       324.     5264      547            14
 6   566 BUREAU  IL    0.05     35688       714.    35157       50            65
 7   567 CALHOUN IL    0.017     5322       313.     5298        1             8
 8   568 CARROLL IL    0.027    16805       622.    16519      111            30
 9   569 CASS    IL    0.024    13437       560.    13384       16             8
10   570 CHAMPA… IL    0.058   173025      2983.   146506    16559           331
# ℹ 427 more rows
# ℹ 20 more variables: popasian <int>, popother <int>, percwhite <dbl>,
#   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
#   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
#   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
#   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, category <chr>, …
#mutate is used to create the new column avg.pop.den
#mean(popdensity) calulates from the entire data set
#calulating for entire data set so same value for each row

?midwest

#massive drum roll i did the line below all by myself
midwest %>%
  mutate(avg.area = mean(area))
# A tibble: 437 × 29
     PID county  state  area poptotal popdensity popwhite popblack popamerindian
   <int> <chr>   <chr> <dbl>    <int>      <dbl>    <int>    <int>         <int>
 1   561 ADAMS   IL    0.052    66090      1271.    63917     1702            98
 2   562 ALEXAN… IL    0.014    10626       759      7054     3496            19
 3   563 BOND    IL    0.022    14991       681.    14477      429            35
 4   564 BOONE   IL    0.017    30806      1812.    29344      127            46
 5   565 BROWN   IL    0.018     5836       324.     5264      547            14
 6   566 BUREAU  IL    0.05     35688       714.    35157       50            65
 7   567 CALHOUN IL    0.017     5322       313.     5298        1             8
 8   568 CARROLL IL    0.027    16805       622.    16519      111            30
 9   569 CASS    IL    0.024    13437       560.    13384       16             8
10   570 CHAMPA… IL    0.058   173025      2983.   146506    16559           331
# ℹ 427 more rows
# ℹ 20 more variables: popasian <int>, popother <int>, percwhite <dbl>,
#   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
#   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
#   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
#   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, category <chr>, avg.area <dbl>

How to write a research question

In Week 3 pre sessions

https://research.com/research/how-to-write-a-research-question

Week 5 How to chose correct analyses

Need to be able to:

  • Identify the nature of variables

  • Foresee type of analyses

  • Draw graphs of predicted results

    Types of variable:

  • Categorical - ordinal/ nominal/binary

  • Numerical - Quantatitive - discrete or continuous

Frequency tests - powerful for testing assoc between catergorical variables

  • Chi square

  • G tests (log version of chi square)

  • Contingency tables

  • Log linear models

Types of hypotheses

  • Scientific - statements to explain an observed phenomenon. meant to generate logical predictions. Working guidelines

  • Statistical - logical predictions, confirmed by stats, can be drawn in a graph

?penguins
view(penguins)