R markdown

What does R Markdown do

R Markdown is like a magic notebook for data work. You can write text, add code, and see the results of that code all in one place. When you’re done, you can turn your work into a report, presentation, or even a website. It’s a way to share your code, results, and explanations in a single document.

Header and formatting

Text formatting

Bold

Meow Meow

Italic

Meow Meow

Strikethrough

Meow

Blockquote

Why did the cat sit on the computer? > Because it wanted to keep an eye on the mouse!

Endash

2023–08–20

Emdash

Cat___Felis catus

Superscript

x2

Subscript

H2O

Horizontal lines

Use three or more dashes—, ***, ___

Lists

Domestic Cats: The ones you might have at home!

  • Domestic Cats
    • Devon Rex
    • Siamese
    • Tuxedo
  1. Domestic Cats
    • Devon Rex
    • Siamese
    • Tuxedo

Unordered list

  • item 2
    • sub-item 1
    • sub-item 2

Ordered list

  1. item 1
  2. item 2
    • sub-item 1
    • sub-item 2

Table

Header 1 Header 2
Row1Col1 Row1Col2
Row2Col1 Row2Col2
Cat breed Average weight (lbs)
Devon Rex 5-10
Ragdool 8-20
American Shorthair 10-15

Others

Insert Image

Orca
Orca

Code Chunks

# This is a code chunk in R Markdown
data <- c(1, 2, 3, 4, 5)
mean(data)
## [1] 3

Output Formats

R Markdown can be rendered into various formats: > HTML PDF Word Slides And more…


#R project

this automatically sets the working directory to the project’s root folder.

  • How to create a RStudio Project:
    • Open R Studio
    • Start a new project
    • Chose a director and project a name for your project
    • And then create a markdown file or any file, this file will always be in the folder where you created the project

NA

NA stands for not available in a dataset > Missing or undefined data, like blanks in the data

#create a list  of cat names
cat_name<-c("Orca", "Charlie", "Luna", "Paws","Cookie")
#create a list of cat weight recorded
cat_weight<-c(4.5, 6.2, NA, 3.8, NA)
#create a list of cat potty activity detected
Cat_potty<-c(TRUE, FALSE, NA, FALSE, TRUE)

Data cleaning options for NA

In a list
#check if a value in the list is NA
na_check_cat_weight<- is.na(cat_weight)
na_check_cat_weight
## [1] FALSE FALSE  TRUE FALSE  TRUE
#Count the number of NA in the list
sum(is.na(cat_weight))
## [1] 2
#Remove NA from the list
na_removal_cat_weight <- na.omit(cat_weight)
na_removal_cat_weight
## [1] 4.5 6.2 3.8
## attr(,"na.action")
## [1] 3 5
## attr(,"class")
## [1] "omit"
#replace NA with another value, in this case, 0
na_replace_cat_weight <- cat_weight
na_replace_cat_weight[is.na(cat_weight)] <- 0
na_replace_cat_weight
## [1] 4.5 6.2 0.0 3.8 0.0
Combine the lists into dataframe
cat_health <- data.frame(name = cat_name, weight = cat_weight, potty_activity = Cat_potty)
cat_health
##      name weight potty_activity
## 1    Orca    4.5           TRUE
## 2 Charlie    6.2          FALSE
## 3    Luna     NA             NA
## 4    Paws    3.8          FALSE
## 5  Cookie     NA           TRUE
Operation with ‘NA’ in the data
#use na.rm=TRUE to ignore na is it exist
mean_weight<-mean(cat_health$weight,na.rm=TRUE)
mean_weight
## [1] 4.833333
#use complete.cases to identify rows in a data frame that contains NA
luna_row <- cat_health[cat_health$name == "Luna", ]  # Subsetting the dataframe for Luna's row
luna_completion <- complete.cases(luna_row)  # Checking completeness for Luna's row
luna_completion
## [1] FALSE

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Assignment demo

Stats Team

  • For the function assigned
    • create a data frame
    • run your function on your data frame
    • put down notes for the input and output of this function

Function: Head()

# Creating an data frame for anything you are interested in, you can also make it up (min, 7 rows)
disney_movies <- data.frame(
  Movie = c("The Lion King", "Frozen", "Aladdin", "Mulan", "Beauty and the Beast", "Tangled", "Pocahontas"),
  Release_Year = c(1994, 2013, 1992, 1998, 1991, 2010, 1995),
  IMDb_Rating = c(8.5, 7.4, 8.0, 7.6, 8.0, 7.7, 6.7),
  Protagonist = c("Simba", "Elsa & Anna", "Aladdin", "Mulan", "Belle", "Rapunzel", "Pocahontas"),
  Antagonist = c("Scar", "Duke of Weselton", "Jafar", "Shan Yu", "Gaston", "Mother Gothel", "Governor Ratcliffe"),
  Main_Song = c("Circle of Life", "Let It Go", "A Whole New World", "Reflection", "Beauty and the Beast", "I See the Light", "Colors of the Wind")
)

# This display your data frame
disney_movies
##                  Movie Release_Year IMDb_Rating Protagonist         Antagonist
## 1        The Lion King         1994         8.5       Simba               Scar
## 2               Frozen         2013         7.4 Elsa & Anna   Duke of Weselton
## 3              Aladdin         1992         8.0     Aladdin              Jafar
## 4                Mulan         1998         7.6       Mulan            Shan Yu
## 5 Beauty and the Beast         1991         8.0       Belle             Gaston
## 6              Tangled         2010         7.7    Rapunzel      Mother Gothel
## 7           Pocahontas         1995         6.7  Pocahontas Governor Ratcliffe
##              Main_Song
## 1       Circle of Life
## 2            Let It Go
## 3    A Whole New World
## 4           Reflection
## 5 Beauty and the Beast
## 6      I See the Light
## 7   Colors of the Wind
#run you function here
head(disney_movies)
##                  Movie Release_Year IMDb_Rating Protagonist       Antagonist
## 1        The Lion King         1994         8.5       Simba             Scar
## 2               Frozen         2013         7.4 Elsa & Anna Duke of Weselton
## 3              Aladdin         1992         8.0     Aladdin            Jafar
## 4                Mulan         1998         7.6       Mulan          Shan Yu
## 5 Beauty and the Beast         1991         8.0       Belle           Gaston
## 6              Tangled         2010         7.7    Rapunzel    Mother Gothel
##              Main_Song
## 1       Circle of Life
## 2            Let It Go
## 3    A Whole New World
## 4           Reflection
## 5 Beauty and the Beast
## 6      I See the Light
head(disney_movies,2)
##           Movie Release_Year IMDb_Rating Protagonist       Antagonist
## 1 The Lion King         1994         8.5       Simba             Scar
## 2        Frozen         2013         7.4 Elsa & Anna Duke of Weselton
##        Main_Song
## 1 Circle of Life
## 2      Let It Go

Notes:( here, put in notes of what is the input and what the function does to the dataframe and what is the output )

Input: head(dataframe) or head(dataframe, #)
Output:

by default, head() pulls out the first 6 rows in a data frame head(data_frame, #) returns the first # rows in a data frame

Name Assigned Function
Christina Nguyen head()
Marina Zhao tail()
Gardenia Chang str()
Kelly Zhen summary()
Jack Oberdorfer length()
Emily Zhu dim()
Wesley Martinez class()
Francesca Casimiro names()

Theory team

  • Include everything in the list:

  • Title

  • Subtitle (name, date, etc)

  • Text formatting

    • Bold,
    • Italic
    • Strikethrough
    • Blockquote
    • Endash
  • List

  • Table

  • In the code chunks:

    • A string
    • A variable
    • An math equation of your choice
  • In a separate csv file:

    • create a table of at least 3 columns and 5 rows
    • import the csv into your R studio
  • Next in the code chunk, create:

    • 5 vectors (that is the same as the info in your table)
    • use your vector to combine into a dataframe (should look the same as your table in the csv)

What to upload in the end: