October 25, 2023

Sponsored scholarship by DOST-PCIEERD and MOOCSX Philippines

Approval of Central Mindanao University to Join the training program by DOST-PCIEERD Scholarship and MOOCSX Philippines through COURSERA

Objectives

  • Provide History and Overview of R
  • Introduce basic commands in R
  • Introduce R Script and R Markdown
  • Install some R packages
  • Illustrate: generate R data, data in R, and Export Excel Data in R

History and Overview of R

  • 1993: The origins of R can be traced to a programming language called “S,” which was developed at Bell Laboratories by John Chambers and his colleagues. S was designed for data analysis and graphics.

History and Overview of R

  • 1995: Ross Ihaka and Robert Gentleman, both statisticians at the University of Auckland in New Zealand, began developing R as an open-source implementation of the S language. Their goal was to create a free, accessible, and extensible statistical software tool.

  • 1997: R version 0.50 was released, marking the first public release of the R language. It included basic functionality for data manipulation, statistical modeling, and graphics.

History and Overview of R

  • 2000: The R Project for Statistical Computing was officially announced, establishing R as an open-source project. The R community started to grow, and contributions from developers worldwide began to enhance the language’s capabilities and package ecosystem.

  • 2004: R version 2.0.0 was released, introducing significant improvements and new features. This release marked a major milestone in the development of R.

  • 2009: The Comprehensive R Archive Network (CRAN) became the primary repository for R packages. CRAN provides a centralized platform for developers to share and distribute their R packages.

History and Overview of R

  • 2011: The RStudio Integrated Development Environment (IDE) was released. RStudio offers a user-friendly interface, code editing features, debugging tools, and enhanced data visualization capabilities, making it a popular choice among R users.

  • 2016: R ranked as the top programming language for data science in the annual “Kaggle Data Science Survey,” solidifying its position as a leading tool in the field.

  • Present: R continues to evolve and thrive, with regular updates and new releases. The R community remains active, contributing to the development of new packages, improving performance, and expanding the language’s capabilities.

The R Installation

  • Obtain a copy of an R language installer from a dependable source or directly from the Internet. The URL is http://cran.r-project.org/
  • The latest version of R is 4.3.1

The R Installation

  • Once the installation is done, start R by clicking the Desktop icon for R

The R Console

  • Along the top of the window is a limited set of menus, which can be used for various tasks including opening, loading and saving script windows, loading and saving your workspace, and installing packages.
  • When you open an R session (i.e. start the R program), the R console opens and you are presented with a screen like this:

The R Console

The R Logo

The RStudio

  • RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

  • RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux).

The RStudio

The RStudio

Basic R commands

Can be used as an interactive calculator

Addition

5+7
## [1] 12

Subtraction

10-5
## [1] 5

Storing result to a variable

x<-5+7

Call the variable x

x
## [1] 12

Introduction on R Script and R Markdown

R Script

R Script

R Markdown

Installing packages

Installing packages

Working with data in R, generated data, and excel data

library(readxl)
## Warning: package 'readxl' was built under R version 4.2.3
Sample <- read_excel("D:/PSQ 2023/CMU Webinar/Data.xlsx")
Sample
## # A tibble: 15 × 2
##    Time    Accuracy
##    <chr>      <dbl>
##  1 Ten           95
##  2 Ten           90
##  3 Ten           65
##  4 Ten           95
##  5 Ten           85
##  6 Fifteen       70
##  7 Fifteen       65
##  8 Fifteen       50
##  9 Fifteen       55
## 10 Fifteen       70
## 11 Twenty        45
## 12 Twenty        55
## 13 Twenty        45
## 14 Twenty        40
## 15 Twenty        70

Some Statistics

Computation for Mean

mean(Sample$Accuracy)
## [1] 66.33333

Computation for Standard Deviation

sd(Sample$Accuracy)
## [1] 18.36793

Other Summary Statistics

summary(Sample$Accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   40.00   52.50   65.00   66.33   77.50   95.00

Mean computation group by categories

(aggregate(Sample$Accuracy, list(Sample$Time), mean))
##   Group.1  x
## 1 Fifteen 62
## 2     Ten 86
## 3  Twenty 51
(aggregate(Sample$Accuracy, list(Sample$Time), sd))
##   Group.1         x
## 1 Fifteen  9.082951
## 2     Ten 12.449900
## 3  Twenty 11.937336

Example of Graphical Presentation

ggboxplot(Sample, x = "Time", y = "Accuracy", fill="Time")

Using dplyr package

Sample%>%
  group_by(Time)%>%
  summarize(Mean = mean(Accuracy), SD = sd(Accuracy))
# A tibble: 3 × 3
  Time     Mean    SD
  <chr>   <dbl> <dbl>
1 Fifteen    62  9.08
2 Ten        86 12.4 
3 Twenty     51 11.9 

Additional Example

library(readxl)
Sample2 <- read_excel("D:/PSQ 2023/CMU Webinar/Data1.xlsx")
## New names:
## • `Relax3` -> `Relax3...40`
## • `Relax3` -> `Relax3...41`
## • `Education` -> `Education...50`
## • `` -> `...51`
## • `Income` -> `Income...52`
## • `` -> `...71`
## • `Education` -> `Education...72`
## • `` -> `...73`
## • `` -> `...74`
## • `` -> `...75`
## • `Income` -> `Income...76`
## • `` -> `...77`
## • `` -> `...79`
## • `` -> `...80`
## • `` -> `...81`
## • `` -> `...82`
Sample2
## # A tibble: 145 × 82
##      No. Gender   age Reappraisal1 Reappraisal2 Reappraisal3 Reappraisal4
##    <dbl> <chr>  <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
##  1     1 male      43            4            2            2            2
##  2     2 male      40            3            2            4            1
##  3     3 male      60            3            2            2            2
##  4     4 male      50            2            2            4            2
##  5     5 male      42            4            4            2            4
##  6     6 female    42            2            4            4            3
##  7     7 male      54            4            2            3            3
##  8     8 male      40            2            2            2            2
##  9     9 male      56            2            2            3            2
## 10    10 male      43            4            2            4            4
## # ℹ 135 more rows
## # ℹ 75 more variables: Reappraisal5 <dbl>, ReappraisalMean <dbl>,
## #   SocialSupport1 <dbl>, SocialSupport2 <dbl>, SocialSupport3 <dbl>,
## #   SocialSupportMean <dbl>, ProbSolving1 <dbl>, ProbSolving2 <dbl>,
## #   ProbSolving3 <dbl>, ProbSolving4 <dbl>, ProbSolvingMean <dbl>, Rel1 <dbl>,
## #   Rel2 <dbl>, Rel3 <dbl>, Rel4 <dbl>, RelMean <dbl>, Tol1 <dbl>, Tol2 <dbl>,
## #   TolMean <dbl>, Emo1 <dbl>, Emo2 <dbl>, Emo3 <dbl>, Emo4 <dbl>, …

Number of rows and columns in a dataset

dim(Sample2)
## [1] 145  82

Mean and Standard Deviation of Age

library(dplyr)
Sample2 %>% 
    summarize(`Mean Age` = mean(age), `SD of Age` = sd(age))
## # A tibble: 1 × 2
##   `Mean Age` `SD of Age`
##        <dbl>       <dbl>
## 1       49.3        8.39

Age classified by Gender

library(dplyr)
Sample2 %>% 
    group_by(Gender)%>%
    summarize(`Mean Age` = mean(age), `SD of Age` = sd(age))
## # A tibble: 2 × 3
##   Gender `Mean Age` `SD of Age`
##   <chr>       <dbl>       <dbl>
## 1 female       46.0        8.82
## 2 male         50.5        7.92

Distribution of Age

library(dplyr)
Sample2%>%
  mutate(Agecode=ifelse(age<=50, "at most 50 years old", "More than 50 years old"))%>%
  group_by(Agecode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3
##   Agecode                count Percentage
##   <chr>                  <int>      <dbl>
## 1 More than 50 years old    69       47.6
## 2 at most 50 years old      76       52.4

Socio-Demographic Profile

Gender

library(dplyr)
Sample2%>%
  group_by(Gender)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3
##   Gender count Percentage
##   <chr>  <int>      <dbl>
## 1 female    40       27.6
## 2 male     105       72.4

Education

library(dplyr)
Sample2%>%
  group_by(Education)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 14 × 3
##    Education             count Percentage
##    <chr>                 <int>      <dbl>
##  1 Colege graduate           1       0.69
##  2 College graduate         21      14.5 
##  3 College level            16      11.0 
##  4 Elementary graduate      17      11.7 
##  5 Elementary level         21      14.5 
##  6 Ementary level            1       0.69
##  7 High School graduate      4       2.76
##  8 High chool graduate       1       0.69
##  9 High schoo graduate       1       0.69
## 10 High school graduate     22      15.2 
## 11 High school level        37      25.5 
## 12 High scool level          1       0.69
## 13 High sschool graduate     1       0.69
## 14 Highschool level          1       0.69

Education

Sample3<-Sample2%>%
  mutate(Educationcode = recode(`Education`,
                           "Colege graduate" = "College graduate", "Ementary level" = "Elementary level", "High schoo graduate" = "High school graduate", "High School graduate" = "High school graduate", "High sschool graduate" = "High school graduate", "High scool level" = "High school level", "Highschool level " = "High school level", "Highschool level" = "High school level", "Highschool level " = "High school level", "High chool graduate" = "High school graduate"))

Education

library(dplyr)
Sample3%>%
  group_by(Educationcode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 6 × 3
##   Educationcode        count Percentage
##   <fct>                <int>      <dbl>
## 1 Elementary level        22       15.2
## 2 Elementary graduate     17       11.7
## 3 High school level       39       26.9
## 4 High school graduate    29       20  
## 5 College level           16       11.0
## 6 College graduate        22       15.2

Reference

Thank you and God bless