Lesson 2 | 31 Jan 2020

Reading and exploring the data (with style)

For example, to add striped lines (alternative row colors) to your table and you want to highlight the hovered row, you can simply type:

The option condensed can also be handy in many cases when you don’t want your table to be too large. It has slightly shorter row height.

For some small tables with only few columns, a page wide table looks awful. To make it easier, you can specify whether you want the table to have full_width or not in kable_styling. By default, full_width is set to be TRUE for HTML tables (note that for LaTeX, the default is FALSE since I don’t want to change the “common” looks unless you specified it.)

What is the difference between tidy() function vs. kable() function for table styling

What I am noticing so far, is that tidy() works best for matrices and vectors. For example, if you used the function as.matrix() to convert your data into a matrix and then do tidy() then you can use the tidy function.

On the other hand kable() seems to be much more flexible when it comes to its arguments. It can take in matrices, data frames, and vectors.

morphology_data <- read_csv("mate_trials_summer_2019.csv")
Parsed with column specification:
cols(
  ID_num = col_double(),
  TgroupID = col_character(),
  GgroupID = col_character(),
  sex = col_character(),
  beak = col_double(),
  thorax = col_double(),
  wing = col_double(),
  body = col_double(),
  w_morph = col_character(),
  recorder = col_character(),
  computer = col_character(),
  date_recorded_by_hand = col_character(),
  data_recorded_on_excel = col_character(),
  notes = col_character()
)
# Using tidy() to display the table 
morph_matrix <- as.matrix(morphology_data)
tidy(head(morph_matrix))
Warning: 'tidy.matrix' is deprecated.
See help("Deprecated")
#Using kable() and kable_styling() to make 
kable(head(morphology_data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F)
ID_num TgroupID GgroupID sex beak thorax wing body w_morph recorder computer date_recorded_by_hand data_recorded_on_excel notes
475 NA T1 F 6.29 3.69 7.81 11.22 S A yes 09.11.19 09.11.19 NA
268 NA T2 F 8.44 3.73 10.20 14.00 L A yes 09.11.19 09.11.19 NA
261 NA T3 F 8.44 3.68 9.57 13.21 S A yes 09.11.19 09.11.19 too big for microscope
261 NA T3 F 8.55 3.56 9.49 13.13 S A no 09.17.19 09.17.19 NA
284 NA T4 F 8.42 3.83 9.54 13.15 L A yes 09.11.19 09.11.19 NA
327 NA T5 F 8.82 3.79 10.20 13.90 L A yes 09.11.19 09.11.19 NA

What we learned together last time

The differences between vectors and lists in R.

list1 = c(1,2,3) # this is a vector NOT a list

list2 = list(1,2,3) # this is a list NOT a vector

Updates from the last lesson.

  1. When using select(), there is no need to place your column names that you want to pull out from your dataframe as a list of strings. You can just list them inside the curved parentheses as their column names.

  2. ‘$’ refers to a specific column relative to a specific data frame. This is the efficient R notation that will automatically call to that column.

female_data <- morphology_data %>%
  select(sex, beak) %>%
  filter(sex =="F")

kable(head(female_data)) %>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                  full_width = F, position = "left")
sex beak
F 6.29
F 8.44
F 8.44
F 8.55
F 8.42
F 8.82
f_mean <- mean(female_data$beak)
f_sd <- sd(female_data$beak)
f_max <- max(female_data$beak)
f_min <- min(female_data$beak)

# Get rid of rows with NA values

male_data <- morphology_data %>%
  select(sex, beak) %>%
  filter(sex =="M", beak != "NA")

kable(head(male_data)) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = F, position = "left")
sex beak
M 6.00
M 6.40
M 5.82
M 6.35
M 5.45
M 5.97
m_mean <- mean(male_data$beak)
m_sd <- sd(male_data$beak)
m_max <- max(male_data$beak)
m_min <- min(male_data$beak)
m_min2 <- min(male_data$beak)

# Let's make a dataframe from scratch summarizing the statistics.

summary_table = matrix(byrow = TRUE, c(f_mean, f_sd, f_max, f_min, 
                        m_mean, m_sd, m_max, m_min), nrow = 2, 
                       dimnames = list(c("female","male"),
                                       c("mean", "sd", "max", "min")))
kable(summary_table) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), 
                full_width = F, position = "left")
mean sd max min
female 7.641 0.9773 9.34 5.77
male 5.781 0.4585 6.91 4.69

Data Visualization: Graphing the data.

To help visualize the data we can graph histograms - this will help us see the distribution. To do so we can quickly use generic functions such as

  • hist()
  • boxplot()
  • plot()
female_hist <- hist(female_data$beak)

male_hist <- hist(male_data$beak)

female_bp <- boxplot(female_data$beak)

male_bp <- boxplot(male_data$beak)

But the ggformula package has specific functions that can make data visualizaiotn more dynamic:

  • gf_histogram() - count and density histograms in ggformula
  • gf_boxplot() - displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all “outlying” points individually.
  • gf_point() - scatterplots in ggformula

We can then use grid.arrange() to help space out how we want our graphs to be placed when we knit to a PDF. grid.arrange() arranges multiple grobs on a page. What is a grob? A grid graphical object (“grob”) is a description of a graphical item. These basic classes provide default behavior for validating, drawing, and modifying graphical objects.

R Colors! https://status.rstudio.com

Generic argument input into the ggformula functions:

gf_( y ~ x, data = _____, )

  • y can be NULL (meaning don’t have to put anything in there) or it can be your y/response variable (usually not for histograms or boxplots but more so for ‘point’ graphs)
  • x is you x variable
  • data can be your full data, or subsets of the data
# How does beak length differ by sex?
p0 <- gf_histogram(~ beak, data = morphology_data, bins= 15, binwidth= 1, color =~sex, fill=~sex,
                   title="Beak length distribution by sex",
                   xlab= "Beak length (mm)",
                   ylab= "Number of soapberry bugs")

# How does thorax length differ by wing morph?
p5 <- gf_boxplot( ~thorax, data=morphology_data, color= ~w_morph)

# What is the relationship between beak length and thorax?
p6 <- gf_point(beak ~ thorax, data=morphology_data, color= ~w_morph)

# Let's arrange the graphs using grid.arrange()
grid.arrange(p0, p5, p6, ncol=2.5)
Warning: Removed 1 rows containing non-finite values (stat_bin).
Warning: Removed 1 rows containing non-finite values (stat_boxplot).
Warning: Removed 1 rows containing missing values (geom_point).

# More graphs we can make to see different relationships:
p1 <- gf_histogram(~beak, data = female_data, color = "white", bins = 15, 
                   title = "Female Beak Lengths", xlab = "beak lengths (mm)")
p2 <- gf_histogram(~beak, data=male_data, color = "white", bins = 15, 
                   title = "Male Break Lengths", xlab = "beak lengths (mm)")
p3 <- gf_boxplot(~beak, data= female_data)
p4 <- gf_boxplot(~beak, data=male_data)

grid.arrange(grobs = list(p1, p2, p3, p4), ncol=2)

Data visualization: Plotting the data.

Using the generic plot() function, we can plot our data and take advantage of the many arguments the function can take in order to produce a presentable graph.

Arguments:

Font size can be modified using the graphical parameter : cex. The default value is 1. If cex value is inferior to 1, then the text size is decreased. Conversely, any value of cex greater than 1 can increase the font size. The following arguments can be used to change the font size :

  • cex.main : text size for main title
  • cex.lab : text size for axis title
  • cex.sub : text size of the sub-title
plot(female_data$beak)

plot(male_data$beak, main="male beak lengths",
     xlab = "Index",
     ylab = "beak lengths (mm)",
     sub = "Figure 2. Male beak lengths....",
     col= 'blue')

R Fundamentals

Data Types in R:

  • Numbers (which includes integers and floats: e.g. 6.88 and 1)

  • Strings (which includes anything in quotations). Strings in R imply a character vector.

String Specific Functions:

  • print()
  • cat() - The cat function can be used to concatenate strings and then output them.

Basic operations that can be used in R

\(+\) or \(-\) addition or subtraction

\(*\) multiplication

/ division

%% remainder

** exponent

What are some if statements that you can do with these operations?

l <- c(1, 2, 3, 4, 5)
for (i in l) {
  if (i %% 2 == 0)
    print(i) }
[1] 2
[1] 4

For loops, making counts, and if statements.

i<-0
l <- vector(mode = "list", length = 0)
for (b in female_data$beak) {
  if (b < 6.00) {
    i <- i + 1
    l <- c(l, b)
    cat("Beak length that's smaller than 6: ", b)
  }}
Beak length that's smaller than 6:  5.77
print(i)
[1] 1
print(l)
[[1]]
[1] 5.77

Questions