July 25, 2024

Talk outline

  • What Others Say about R
  • My Journey in Learning R
  • Provide History and Overview of R
  • Why You Should Learn the R Programming Language (Pros & Cons)
  • Introduce basic commands in R
  • Introduce R Script and R Markdown
  • Install some R packages
  • Illustrate: generate R data, data in R, and Export Excel Data in R

What others say about R

What others say about R

My Journey in Learning R

My Journey in Learning R

My Journey in Learning R

My Journey in Learning R

History and Overview of R

  • R is a programming language and software that is becoming increasingly popular in the disciplines of statistics and data science.

History and Overview of R

  • R is a dialect of the S programming language and was developed by Ross Ihaka and Robert Gentleman in the year 1995. A stable beta version was released in the year 2000.

  • 2016: R ranked as the top programming language for data science in the annual “Kaggle Data Science Survey,” solidifying its position as a leading tool in the field.

Why You Should Learn the R Programming Language (Pros & Cons)

The pros:

  • R is free
  • R’s popularity is growing – More and more people will use it
  • Almost all statistical methods are available in R
  • New methods are implemented in add-on packages quickly
  • Algorithms for packages and functions are publicly available (transparency and reproducibility)

Why You Should Learn the R Programming Language (Pros & Cons)

The pros:

  • R provides a huge variety of graphical outputs
  • R is very flexible – Essentially everything can be modified for your personal needs
  • R is compatible with all operating systems (e.g. Windows, MAC, or Linux)
  • R has a huge community that is organized in forums to help each other (e.g. Stack Overflow)
  • R is fun

Why You Should Learn the R Programming Language (Pros & Cons)

The cons:

  • Relatively high learning burden at the beginning (even though it’s worth it)
  • No systematic validation of new packages and functions
  • No company in the background that takes responsibility for errors in the code (this is especially important for public institutes)
  • is almost exclusively based on programming (no extensive drop-down menus such as in SPSS)
  • R can have problems with computationally intensive tasks (only important for advanced users)

The R Installation

  • Obtain a copy of an R language installer from a dependable source or directly from the Internet. The URL is http://cran.r-project.org/
  • The latest version of R is 4.3.1

The R Installation

  • Once the installation is done, start R by clicking the Desktop icon for R

The R Console

  • Along the top of the window is a limited set of menus, which can be used for various tasks including opening, loading and saving script windows, loading and saving your workspace, and installing packages.
  • When you open an R session (i.e. start the R program), the R console opens and you are presented with a screen like this:

The R Console

The R Logo

The RStudio

  • RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

  • RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux).

The RStudio

The RStudio

Basic R commands

Can be used as an interactive calculator

Addition

5+7
## [1] 12

Subtraction

10-5
## [1] 5

Storing result to a variable

x<-5+7

Call the variable x

x
## [1] 12

Introduction on R Script and R Markdown

R Script

R Script

R Markdown

Installing packages

Installing packages

Installing packages

Installing packages

Installing packages

Working with data in R, generated data, and excel data

Sample <- read_excel("D:/Seminar about R/Data.xlsx")
Sample
## # A tibble: 15 × 2
##    Time    Accuracy
##    <chr>      <dbl>
##  1 Ten           95
##  2 Ten           90
##  3 Ten           65
##  4 Ten           95
##  5 Ten           85
##  6 Fifteen       70
##  7 Fifteen       65
##  8 Fifteen       50
##  9 Fifteen       55
## 10 Fifteen       70
## 11 Twenty        45
## 12 Twenty        55
## 13 Twenty        45
## 14 Twenty        40
## 15 Twenty        70

Computation for Mean

mean(Sample$Accuracy)
## [1] 66.33333

Computation for Standard Deviation

sd(Sample$Accuracy)
## [1] 18.36793

Summary Statistics

summary(Sample$Accuracy)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   40.00   52.50   65.00   66.33   77.50   95.00

Mean computation group by categories

(aggregate(Sample$Accuracy, list(Sample$Time), mean))
##   Group.1  x
## 1 Fifteen 62
## 2     Ten 86
## 3  Twenty 51

Standard deviation computation group by categories

(aggregate(Sample$Accuracy, list(Sample$Time), sd))
##   Group.1         x
## 1 Fifteen  9.082951
## 2     Ten 12.449900
## 3  Twenty 11.937336

Using dplyr package of the previous computations

Sample%>%
    summarize(Mean = mean(Accuracy), SD = sd(Accuracy))
# A tibble: 1 × 2
   Mean    SD
  <dbl> <dbl>
1  66.3  18.4

Using dplyr package of the previous computations

Sample%>%
  group_by(Time)%>%
  summarize(Mean = mean(Accuracy), SD = sd(Accuracy))
# A tibble: 3 × 3
  Time     Mean    SD
  <chr>   <dbl> <dbl>
1 Fifteen    62  9.08
2 Ten        86 12.4 
3 Twenty     51 11.9 

Additional Example

library(readxl)
Sample2 <- read_excel("D:/Seminar About R/Data1.xlsx")
## New names:
## • `Relax3` -> `Relax3...40`
## • `Relax3` -> `Relax3...41`
## • `Education` -> `Education...50`
## • `` -> `...51`
## • `Income` -> `Income...52`
## • `` -> `...71`
## • `Education` -> `Education...72`
## • `` -> `...73`
## • `` -> `...74`
## • `` -> `...75`
## • `Income` -> `Income...76`
## • `` -> `...77`
## • `` -> `...79`
## • `` -> `...80`
## • `` -> `...81`
## • `` -> `...82`

Number of rows and columns in a dataset

dim(Sample2)
## [1] 145  82

Mean and Standard Deviation of Age

library(dplyr)
Sample2 %>% 
    summarize(`Mean Age` = mean(age), `SD of Age` = sd(age))
## # A tibble: 1 × 2
##   `Mean Age` `SD of Age`
##        <dbl>       <dbl>
## 1       49.3        8.39

Age classified by Gender

library(dplyr)
Sample3<-Sample2%>%
  mutate(Agecode=ifelse(age<=50, "at most 50 years old", "More than 50 years old"))

Distribution of Age

library(dplyr)
Sample3%>%
  group_by(Agecode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3
##   Agecode                count Percentage
##   <chr>                  <int>      <dbl>
## 1 More than 50 years old    69       47.6
## 2 at most 50 years old      76       52.4

Demographic profile: Gender

library(dplyr)
Sample2%>%
  group_by(Gender)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3
##   Gender count Percentage
##   <chr>  <int>      <dbl>
## 1 female    40       27.6
## 2 male     105       72.4

Demographic profile: Gender and Age

library(dplyr)
Sample3%>%
  group_by(Gender, Agecode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## `summarise()` has grouped output by 'Gender'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 4
## # Groups:   Gender [2]
##   Gender Agecode                count Percentage
##   <chr>  <chr>                  <int>      <dbl>
## 1 female More than 50 years old    12       30  
## 2 female at most 50 years old      28       70  
## 3 male   More than 50 years old    57       54.3
## 4 male   at most 50 years old      48       45.7

Education

Sample2$Education<-Sample2$Education...72
table(Sample2$Education...72)
## 
##       Colege graduate      College graduate         College level 
##                     1                    21                    16 
##   Elementary graduate      Elementary level        Ementary level 
##                    17                    21                     1 
##   High chool graduate   High schoo graduate  High school graduate 
##                     1                     1                    22 
##  High School graduate     High school level      High scool level 
##                     4                    37                     1 
## High sschool graduate      Highschool level 
##                     1                     1
library(dplyr)
Sample2%>%
  group_by(Education)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 14 × 3
##    Education             count Percentage
##    <chr>                 <int>      <dbl>
##  1 Colege graduate           1       0.69
##  2 College graduate         21      14.5 
##  3 College level            16      11.0 
##  4 Elementary graduate      17      11.7 
##  5 Elementary level         21      14.5 
##  6 Ementary level            1       0.69
##  7 High School graduate      4       2.76
##  8 High chool graduate       1       0.69
##  9 High schoo graduate       1       0.69
## 10 High school graduate     22      15.2 
## 11 High school level        37      25.5 
## 12 High scool level          1       0.69
## 13 High sschool graduate     1       0.69
## 14 Highschool level          1       0.69

Education

Sample3<-Sample2%>%
  mutate(Educationcode = recode(`Education`,
                           "Colege graduate" = "College graduate", "Ementary level" = "Elementary level", "High schoo graduate" = "High school graduate", "High School graduate" = "High school graduate", "High sschool graduate" = "High school graduate", "High scool level" = "High school level", "Highschool level " = "High school level", "Highschool level" = "High school level", "Highschool level " = "High school level", "High chool graduate" = "High school graduate"))

Education

library(dplyr)
Sample3%>%
  group_by(Educationcode)%>%
  summarise(count=n())%>%
  mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 6 × 3
##   Educationcode        count Percentage
##   <fct>                <int>      <dbl>
## 1 Elementary level        22       15.2
## 2 Elementary graduate     17       11.7
## 3 High school level       39       26.9
## 4 High school graduate    29       20  
## 5 College level           16       11.0
## 6 College graduate        22       15.2

Example of Graphical Presentation

ggboxplot(Sample, x = "Time", y = "Accuracy", fill="Time")

Reference

Thank you and God bless