R project

An RStudio Project is like your personal workspace. It sets the working directory to the project’s root folder, making it easier to manage files and run code.

this automatically sets the working directory to the project’s root folder.

R markdown

What does R Markdown do

R Markdown is like a magic notebook for data work. You can write text, add code, and see the results of that code all in one place. When you’re done, you can turn your work into a report, presentation, or even a website. It’s a way to share your code, results, and explanations in a single document.

Header and formatting

Basic Text Formatting Okay, let’s dive into some basic text formatting.

Text formatting

For Bold, type: “Double Asterisk, Meow, Double Asterisk” or “Double Underscore, Meow, Double Underscore” (Pause and demonstrate)

Bold

Meow Meow To make your text italic, it’s almost the same, but you’ll use just one asterisk or one underscore on each side of your text. #### Italic Meow Meow There’s more! You can even do a strikethrough by wrapping your text with double tildes.

For Strikethrough, type: “Double Tilde, Meow, Double Tilde” (Pause and demonstrate) #### Strikethrough Meow Want to create a quote? Use the “Greater Than” symbol to make a blockquote. For nested quotes, just add another “Greater Than” symbol. #### Blockquote

Why did the cat sit on the computer? > Because it wanted to keep an eye on the mouse!

Endash

2023–08–20

2023-08-20

Emdash

Cat_Felis catus Cat______Felis catus


Lists

Lists Lists are super easy. For bullet points, you use an asterisk followed by a space. To nest items, simply indent them and use a dash followed by a space.

For Bullet List, type: “Asterisk, Space, Your Item” For Nested Items, type: “Indent, Dash, Space, Your Sub-item”

Domestic Cats: The ones you might have at home!

  • Domestic Cats
    • Devon Rex
    • Siamese
    • Tuxedo

For numbered lists, you use the number, a dot, and a space.

  1. Domestic Cats
    • Devon Rex
    • Siamese
    • Tuxedo

Unordered list

  • item 2
    • sub-item 1
    • sub-item 2

Ordered list

  1. item 1
  2. item 2
    • sub-item 1
    • sub-item 2

Tables To make tables, you’ll use vertical bars and dashes. I won’t get too into the weeds here, but here’s a simple example you can follow:

For Simple Table, type: “Vertical Bar, Header 1, Vertical Bar, Header 2, Vertical Bar, Enter, Vertical Bar, Dash, Vertical Bar, Dash, Vertical Bar”

Table

Header 1 Header 2
Row1Col1 Row1Col2
Row2Col1 Row2Col2
Cat breed Average weight (lbs)
Devon Rex 5-10
Ragdool 8-20
American Shorthair 10-15

Others

Insert Image

Orca
Orca

Code Chunks

# This is a code chunk in R Markdown
data <- c(1, 2, 3, 4, 5)
mean(data)
## [1] 3

Output Formats

R Markdown can be rendered into various formats: * HTML * PDF * Word * Slides * And more…


NA

Alright, folks, let’s move on to a critical topic in data analysis—handling missing or undefined data. In R, this is often represented by the term ‘NA,’ which stands for “Not Available.”

Why does this matter? Well, missing data can significantly impact your analyses and conclusions. So it’s essential to know how to deal with it.

NA stands for not available in a dataset > Missing or undefined data, like blanks in the data

First, let’s create some sample data to work with. I’m going to generate three lists—cat names, their weights, and their potty activities. Note that I’ll intentionally include some ‘NA’ values.

#create a list  of cat names
cat_name<-c("Orca", "Charlie", "Luna", "Paws","Cookie")
#create a list of cat weight recorded
cat_weight<-c(4.5, 6.2, NA, 3.8, NA)
#create a list of cat potty activity detected
Cat_potty<-c(TRUE, FALSE, NA, FALSE, TRUE)

Data cleaning options for NA

In a list

Checking for ‘NA’ in Lists Now, what if you want to know which values in our weight list are missing? R provides a straightforward function called is.na() for that.

#check if a value in the list is NA
na_check_cat_weight<- is.na(cat_weight)
na_check_cat_weight
## [1] FALSE FALSE  TRUE FALSE  TRUE

See the output? The function returns a logical vector, indicating ‘TRUE’ where the data is ‘NA’ and ‘FALSE’ otherwise.

Next, you might want to know how many ‘NA’ values are there. For that, we can use sum() along with is.na().

#Count the number of NA in the list
sum(is.na(cat_weight))
## [1] 2

As you see, it shows the total count of ‘NA’ values in our list.

Now, what if you want to get rid of these pesky ‘NA’ values? Simple, you can use the na.omit() function.

#Remove NA from the list
na_removal_cat_weight <- na.omit(cat_weight)
na_removal_cat_weight
## [1] 4.5 6.2 3.8
## attr(,"na.action")
## [1] 3 5
## attr(,"class")
## [1] "omit"

Notice how the ‘NA’ values are removed from the list.

Replacing ‘NA’ Values Alternatively, you might want to replace ‘NA’ with a specific value. Let’s say, zero.

#replace NA with another value, in this case, 0
na_replace_cat_weight <- cat_weight
na_replace_cat_weight[is.na(cat_weight)] <- 0
na_replace_cat_weight
## [1] 4.5 6.2 0.0 3.8 0.0

Combining Lists into a Data Frame Now, let’s combine our lists into a data frame, which is a more complex data structure.

Combine the lists into dataframe
cat_health <- data.frame(name = cat_name, weight = cat_weight, potty_activity = Cat_potty)
cat_health
##      name weight potty_activity
## 1    Orca    4.5           TRUE
## 2 Charlie    6.2          FALSE
## 3    Luna     NA             NA
## 4    Paws    3.8          FALSE
## 5  Cookie     NA           TRUE

Let’s say you want to calculate the mean weight of these cats, ignoring the ‘NA’ values. You can do this by adding a parameter na.rm=TRUE to the mean() function

Operation with ‘NA’ in the data
#use na.rm=TRUE to ignore na is it exist
mean_weight<-mean(cat_health$weight,na.rm=TRUE)
mean_weight
## [1] 4.833333

Finally, you can use complete.cases() to identify rows that have ‘NA’ values.

#use complete.cases to identify rows in a data frame that contains NA
luna_row <- cat_health[cat_health$name == "Luna", ]  # Get Luna's information from the table
luna_completion <- complete.cases(luna_row)  # Check if Luna's information is complete
luna_completion 
## [1] FALSE

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.