Lab1ReadData Plot

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: cereal: This is a data frame containing information about different cereals. It has rows representing each cereal and columns representing various attributes.

cereal[,1]: This part of the code selects the entire first column of the cereal data frame. The comma , is used to separate row and column indices. By leaving the row index empty before the comma, it selects all rows. The 1 after the comma indicates that only the first column should be selected.

rownames(cereal): This part of the code accesses the row names of the cereal data frame. In R, data frames have row names that can be used to label each row. By default, row names are set to the row numbers, but they can be changed to more meaningful values.

rownames(cereal) = cereal[,1]: This line of code sets the row names of the cereal data frame to the values found in its first column. This can be useful when the first column of a data frame contains unique identifiers for each row, such as cereal names, and you want to use these identifiers as row names for easier access and readability…;lpp[]

summary(cereal[,“sugars”]): The summary() function computes summary statistics for the “sugars” column of the cereal data frame. It returns the minimum, 1st quartile, median, mean, 3rd quartile, and maximum values for this column.

cereal[cereal==-1] = NA: This line of code replaces all “-1” values in the cereal data frame with NA (missing values). The expression cereal==-1 creates a logical matrix with the same dimensions as cereal, where each element is TRUE if the corresponding element in cereal is equal to -1 and FALSE otherwise. By using this logical matrix to index cereal, we can selectively assign NA values to all elements of cereal that are equal to -1.

summary(cereal[,“sugars”]): After redefining the “-1” values as NA, the summary() function is called again to generate the summary statistics for the “sugars” column of the cereal data frame. The output will now exclude the previously “-1” values, as they have been replaced with NA and are treated as missing values.

library(ggplot2)
library(tidyr)
## Reading in the data
cereal = read.csv("https://wimr-genomics.vip.sydney.edu.au/AMED3002/data/Cereal.csv")
head(cereal)

##                        name mfr type calories protein fat sodium fiber carbo
## 1                 100%_Bran   N    C       70       4   1    130  10.0   5.0
## 2         100%_Natural_Bran   Q    C      120       3   5     15   2.0   8.0
## 3                  All-Bran   K    C       70       4   1    260   9.0   7.0
## 4 All-Bran_with_Extra_Fiber   K    C       50       4   0    140  14.0   8.0
## 5            Almond_Delight   R    C      110       2   2    200   1.0  14.0
## 6   Apple_Cinnamon_Cheerios   G    C      110       2   2    180   1.5  10.5
##   sugars potass vitamins shelf weight cups   rating
## 1      6    280       25     3      1 0.33 68.40297
## 2      8    135        0     3      1 1.00 33.98368
## 3      5    320       25     3      1 0.33 59.42551
## 4      0    330       25     3      1 0.50 93.70491
## 5      8     -1       25     3      1 0.75 34.38484
## 6     10     70       25     1      1 0.75 29.50954

#summary(cereal)
rownames(cereal)= cereal[, 1]
print(rownames(cereal))

##  [1] "100%_Bran"                             
##  [2] "100%_Natural_Bran"                     
##  [3] "All-Bran"                              
##  [4] "All-Bran_with_Extra_Fiber"             
##  [5] "Almond_Delight"                        
##  [6] "Apple_Cinnamon_Cheerios"               
##  [7] "Apple_Jacks"                           
##  [8] "Basic_4"                               
##  [9] "Bran_Chex"                             
## [10] "Bran_Flakes"                           
## [11] "Cap'n'Crunch"                          
## [12] "Cheerios"                              
## [13] "Cinnamon_Toast_Crunch"                 
## [14] "Clusters"                              
## [15] "Cocoa_Puffs"                           
## [16] "Corn_Chex"                             
## [17] "Corn_Flakes"                           
## [18] "Corn_Pops"                             
## [19] "Count_Chocula"                         
## [20] "Cracklin'_Oat_Bran"                    
## [21] "Cream_of_Wheat_(Quick)"                
## [22] "Crispix"                               
## [23] "Crispy_Wheat_&_Raisins"                
## [24] "Double_Chex"                           
## [25] "Froot_Loops"                           
## [26] "Frosted_Flakes"                        
## [27] "Frosted_Mini-Wheats"                   
## [28] "Fruit_&_Fibre_Dates,_Walnuts,_and_Oats"
## [29] "Fruitful_Bran"                         
## [30] "Fruity_Pebbles"                        
## [31] "Golden_Crisp"                          
## [32] "Golden_Grahams"                        
## [33] "Grape_Nuts_Flakes"                     
## [34] "Grape-Nuts"                            
## [35] "Great_Grains_Pecan"                    
## [36] "Honey_Graham_Ohs"                      
## [37] "Honey_Nut_Cheerios"                    
## [38] "Honey-comb"                            
## [39] "Just_Right_Crunchy__Nuggets"           
## [40] "Just_Right_Fruit_&_Nut"                
## [41] "Kix"                                   
## [42] "Life"                                  
## [43] "Lucky_Charms"                          
## [44] "Maypo"                                 
## [45] "Muesli_Raisins,_Dates,_&_Almonds"      
## [46] "Muesli_Raisins,_Peaches,_&_Pecans"     
## [47] "Mueslix_Crispy_Blend"                  
## [48] "Multi-Grain_Cheerios"                  
## [49] "Nut&Honey_Crunch"                      
## [50] "Nutri-Grain_Almond-Raisin"             
## [51] "Nutri-grain_Wheat"                     
## [52] "Oatmeal_Raisin_Crisp"                  
## [53] "Post_Nat._Raisin_Bran"                 
## [54] "Product_19"                            
## [55] "Puffed_Rice"                           
## [56] "Puffed_Wheat"                          
## [57] "Quaker_Oat_Squares"                    
## [58] "Quaker_Oatmeal"                        
## [59] "Raisin_Bran"                           
## [60] "Raisin_Nut_Bran"                       
## [61] "Raisin_Squares"                        
## [62] "Rice_Chex"                             
## [63] "Rice_Krispies"                         
## [64] "Shredded_Wheat"                        
## [65] "Shredded_Wheat_'n'Bran"                
## [66] "Shredded_Wheat_spoon_size"             
## [67] "Smacks"                                
## [68] "Special_K"                             
## [69] "Strawberry_Fruit_Wheats"               
## [70] "Total_Corn_Flakes"                     
## [71] "Total_Raisin_Bran"                     
## [72] "Total_Whole_Grain"                     
## [73] "Triples"                               
## [74] "Trix"                                  
## [75] "Wheat_Chex"                            
## [76] "Wheaties"                              
## [77] "Wheaties_Honey_Gold"

colnames(cereal)

##  [1] "name"     "mfr"      "type"     "calories" "protein"  "fat"     
##  [7] "sodium"   "fiber"    "carbo"    "sugars"   "potass"   "vitamins"
## [13] "shelf"    "weight"   "cups"     "rating"

summary(cereal[,"sugars"])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -1.000   3.000   7.000   6.922  11.000  15.000

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.1     ✔ purrr     1.0.1
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

cereal %>%
  filter(carbo==min(carbo, na.rm = TRUE)) %>%
  select(name)

##                          name
## Quaker_Oatmeal Quaker_Oatmeal

## redefine "-1" as NA
cereal[cereal==-1] = NA

summary(cereal[,"sugars"])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.000   7.000   7.026  11.000  15.000       1

ggplot(cereal, aes(x=mfr, y = carbo, fill=mfr)) + geom_boxplot()

## Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

## We can do this in by grouping by manufacturer and then summarising the sugar variable with mean.

cereal %>%
  group_by(mfr) %>%
  summarise(meanSugars = mean(sugars, na.rm = TRUE))

## # A tibble: 7 × 2
##   mfr   meanSugars
##   <chr>      <dbl>
## 1 A           3   
## 2 G           7.95
## 3 K           7.57
## 4 N           1.83
## 5 P           8.78
## 6 Q           6.14
## 7 R           6.12

cereal %>%
  group_by(mfr) %>%
  summarise(meanSugars = mean(sugars, na.rm = TRUE), medianSugars = median(sugars, na.rm = TRUE)) %>%
  pivot_longer(cols=c("meanSugars", "medianSugars"), names_to="summary", values_to="sugars") %>%
  ggplot(aes(x=mfr, y=sugars, colour=summary, group=summary)) + geom_point() + geom_line()

cereal %>%
  filter(shelf == 2) %>%
  select(name)

##                                            name
## Apple_Jacks                         Apple_Jacks
## Cap'n'Crunch                       Cap'n'Crunch
## Cinnamon_Toast_Crunch     Cinnamon_Toast_Crunch
## Cocoa_Puffs                         Cocoa_Puffs
## Corn_Pops                             Corn_Pops
## Count_Chocula                     Count_Chocula
## Cream_of_Wheat_(Quick)   Cream_of_Wheat_(Quick)
## Froot_Loops                         Froot_Loops
## Frosted_Mini-Wheats         Frosted_Mini-Wheats
## Fruity_Pebbles                   Fruity_Pebbles
## Golden_Grahams                   Golden_Grahams
## Honey_Graham_Ohs               Honey_Graham_Ohs
## Kix                                         Kix
## Life                                       Life
## Lucky_Charms                       Lucky_Charms
## Maypo                                     Maypo
## Nut&Honey_Crunch               Nut&Honey_Crunch
## Raisin_Bran                         Raisin_Bran
## Smacks                                   Smacks
## Strawberry_Fruit_Wheats Strawberry_Fruit_Wheats
## Trix                                       Trix

respiratory <- read.delim("https://wimr-genomics.vip.sydney.edu.au/AMED3002/data/respiratory.txt", sep = "\t")

tab <- table(respiratory$treatment, respiratory$status)
tab

##            
##             good poor
##   placebo    127  158
##   treatment  172   98

chisq.test(tab)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab
## X-squared = 19.682, df = 1, p-value = 9.148e-06

test = chisq.test(tab)
test$expected >= 5

##            
##             good poor
##   placebo   TRUE TRUE
##   treatment TRUE TRUE

OR <- (tab[1,1]*tab[2,2])/(tab[2,1]*tab[1,2])
OR

## [1] 0.4579776

Including Plots

You can also embed plots, for example: library(tidyverse): This line loads the tidyverse library, which is a collection of R packages designed for data manipulation and visualization. It includes packages such as dplyr, ggplot2, tidyr, and readr.

cereal %>%: The %>% operator is a pipe from the magrittr package, which is also part of the tidyverse. It is used to chain together multiple functions. In this case, it takes the cereal data frame as input and passes it as the first argument to the next function, filter().

filter(carbo==min(carbo, na.rm = TRUE)): The filter() function from the dplyr package is used to subset the rows of the cereal data frame based on a condition. In this case, the condition is carbo == min(carbo, na.rm = TRUE), which selects the rows where the “carbo” value is equal to the minimum “carbo” value in the data frame. The na.rm = TRUE argument within the min() function is used to remove NA values when calculating the minimum.

select(name): The select() function from the dplyr package is used to choose a specific column from the filtered data frame. In this case, the “name” column is selected, which contains the names of the cereals.

cereal %>%: The %>% operator is used to pass the cereal data frame as input to the following functions.

group_by(mfr): The group_by() function is used to group the data frame by the “mfr” (manufacturer) column.

summarise(meanSugars = mean(sugars, na.rm = TRUE), medianSugars = median(sugars, na.rm = TRUE)): The summarise() function is used to calculate summary statistics for each group (manufacturer). Here, the mean and median of the “sugars” column are calculated, ignoring NA values with na.rm = TRUE.

pivot_longer(cols=c(“meanSugars”, “medianSugars”), names_to=“summary”, values_to=“sugars”): The pivot_longer() function is used to reshape the data from wide format to long format, which is suitable for plotting with ggplot2. The “meanSugars” and “medianSugars” columns are converted to a single “sugars” column, and a new “summary” column is created to indicate whether the value is a mean or median.

ggplot(aes(x=mfr, y=sugars, colour=summary, group=summary)): The ggplot() function initializes a new ggplot object with the specified aesthetics (x, y, color, and group). The x-axis represents the manufacturer, the y-axis represents the sugar values, the color represents the summary statistic (mean or median), and the group aesthetic is used to draw lines connecting points with the same summary statistic.

geom_point() + geom_line(): The geom_point() function adds a scatterplot layer, and the geom_line() function adds lines connecting the points. The resulting plot displays the mean and median sugar values for each manufacturer, with distinct colors for each summary statistic and lines connecting points of the same statistic.

respiratory <- read.delim(“https://wimr-genomics.vip.sydney.edu.au/AMED3002/data/respiratory.txt”, sep = “): The read.delim() function is used to read a tab-separated file from the provided URL. The sep =” argument specifies that the delimiter between values in the file is a tab character. The resulting data frame is assigned to the variable respiratory.

tab <- table(respiratory\(treatment, respiratory\)status): The table() function is used to create a contingency table based on two categorical variables, “treatment” and “status”, in the respiratory data frame. The resulting table shows the frequency distribution of the “status” variable across different “treatment” groups.

tab: This line of code prints the contingency table to the console.

Lab1ReadData Plot

Wdenhong Ma

2023-04-15

R Markdown

Including Plots