tasha’s workbook

overview 1

Data Analysis: focusing on the basics.Covering aspects dealing with data and less is MORE in statistics

Research methods: covering the theoretical and philosophical aspects of doing science. Making sense of science and working on writing and reading skills.

test

Sample data of penguins

library(palmerpenguins)
data(package = 'palmerpenguins')
head(penguins)

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# ℹ 2 more variables: sex <fct>, year <int>

##histograms

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(palmerpenguins)

data("penguins")
penguins %>% 
group_by(species) %>% 
  ggplot(aes(x=bill_length_mm, color=species, fill=species))+
  geom_histogram()

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_bin()`).

boxplots

library(tidyverse)
library(palmerpenguins)

data("penguins")
penguins %>% 
group_by(species) %>% 
  ggplot(aes(x=species, 
             y=bill_length_mm, 
             color=species, 
             fill=species))+
  geom_boxplot(alpha=0.5)+
  theme(axis.text=element_text(size=16),
        axis.title=element_text(size=16))

Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

speices of peguins

library(tidyverse)
library(palmerpenguins)

penguins %>% 
  ggplot(aes(x=species,
             color=species, 
             fill=species))+
  geom_bar(alpha=0.5)+
  theme(axis.text=element_text(size=16),
        axis.title=element_text(size=16))

##visualising correlations

penguins %>% 
  ggplot(aes(x=bill_length_mm, 
             y = bill_depth_mm))+
  geom_point()+
  theme(axis.text=element_text(size=16),
        axis.title=element_text(size=16))

Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

moments of centrality

mean. medium and mode ## moments of dispersion variance, standard deviation, standard error. range and quarantines

##checking via histograms

set.seed(999)
normal<-rnorm(100)
normal %>% 
  as.tibble() %>% 
  ggplot(aes(value))+
  geom_histogram(color="#DD4A48", fill="#DD4A48")+
  geom_vline(xintercept=c(mean(normal), (mean(normal)+sd(normal)),mean(normal)-sd(normal)), 
             linetype="dashed")

Warning: `as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##checking via boxplot

set.seed(999)
normal<-rnorm(100)
normal %>% 
  as.tibble() %>% 
  ggplot(aes(value))+
  geom_boxplot(fill="#DD4A48",alpha=0.7)

##types of variables categorical: ordinal: categories that maintain an order / Nominal: that has no ranking order / binary; nominal variables with two categories. Numerical: Discrete; numbered values that can only take certain values/ continuous; numbered values that are measured can be any number within a particular range.

##Inductive VS Deductivism? They are opposite approaches to reasoning that differ in how they start and what they use to reach a conclusion. Inductive: Observation/ pattern/ hypothesis/ theory Deductivism : Theory/ hypothesis/ observation/ confirmation

##types of good and bad questions Bad questions: 1.is there any difference between a and b? 2.is A bigger than B? 3.Can X influence Y?

Good questions: 1.what explains the differences between A and B? 2.What makes A bigger than B? 3. How X can influence Y?

##diamonds

diamonds%>% #utilizes the diamonds dataset group_by(color,clarity)%>% #groups data by the color and clarity variables. mutate(price200=mean(price))%>% #creates new variables (average price by groups) ungroup()%>% #data no longer grouped by color and clarity mutate(random=10+price)%>% #new variable,original price+$10 select(cut,color,clarity,price,price200,random10)%>% #retain only these listed columns. arrange(color)%>% #visualize data ordered by color. group_by(cut)%>% #group data by cut mutate(dis=n_distinct(price) #counts the number of unique price values per cut. rowID=row_number())%>% #numbers each row consecutively for each cut ungroup() #final ungrouping of data.