October 25, 2023
1995: Ross Ihaka and Robert Gentleman, both statisticians at the University of Auckland in New Zealand, began developing R as an open-source implementation of the S language. Their goal was to create a free, accessible, and extensible statistical software tool.
1997: R version 0.50 was released, marking the first public release of the R language. It included basic functionality for data manipulation, statistical modeling, and graphics.
2000: The R Project for Statistical Computing was officially announced, establishing R as an open-source project. The R community started to grow, and contributions from developers worldwide began to enhance the language’s capabilities and package ecosystem.
2004: R version 2.0.0 was released, introducing significant improvements and new features. This release marked a major milestone in the development of R.
2009: The Comprehensive R Archive Network (CRAN) became the primary repository for R packages. CRAN provides a centralized platform for developers to share and distribute their R packages.
2011: The RStudio Integrated Development Environment (IDE) was released. RStudio offers a user-friendly interface, code editing features, debugging tools, and enhanced data visualization capabilities, making it a popular choice among R users.
2016: R ranked as the top programming language for data science in the annual “Kaggle Data Science Survey,” solidifying its position as a leading tool in the field.
Present: R continues to evolve and thrive, with regular updates and new releases. The R community remains active, contributing to the development of new packages, improving performance, and expanding the language’s capabilities.
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux).
5+7
## [1] 12
10-5
## [1] 5
x<-5+7
x
## [1] 12
library(readxl)
## Warning: package 'readxl' was built under R version 4.2.3
Sample <- read_excel("D:/PSQ 2023/CMU Webinar/Data.xlsx")
Sample
## # A tibble: 15 × 2 ## Time Accuracy ## <chr> <dbl> ## 1 Ten 95 ## 2 Ten 90 ## 3 Ten 65 ## 4 Ten 95 ## 5 Ten 85 ## 6 Fifteen 70 ## 7 Fifteen 65 ## 8 Fifteen 50 ## 9 Fifteen 55 ## 10 Fifteen 70 ## 11 Twenty 45 ## 12 Twenty 55 ## 13 Twenty 45 ## 14 Twenty 40 ## 15 Twenty 70
mean(Sample$Accuracy)
## [1] 66.33333
sd(Sample$Accuracy)
## [1] 18.36793
summary(Sample$Accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 40.00 52.50 65.00 66.33 77.50 95.00
(aggregate(Sample$Accuracy, list(Sample$Time), mean))
## Group.1 x ## 1 Fifteen 62 ## 2 Ten 86 ## 3 Twenty 51
(aggregate(Sample$Accuracy, list(Sample$Time), sd))
## Group.1 x ## 1 Fifteen 9.082951 ## 2 Ten 12.449900 ## 3 Twenty 11.937336
ggboxplot(Sample, x = "Time", y = "Accuracy", fill="Time")
Sample%>% group_by(Time)%>% summarize(Mean = mean(Accuracy), SD = sd(Accuracy))
# A tibble: 3 × 3 Time Mean SD <chr> <dbl> <dbl> 1 Fifteen 62 9.08 2 Ten 86 12.4 3 Twenty 51 11.9
library(readxl)
Sample2 <- read_excel("D:/PSQ 2023/CMU Webinar/Data1.xlsx")
## New names: ## • `Relax3` -> `Relax3...40` ## • `Relax3` -> `Relax3...41` ## • `Education` -> `Education...50` ## • `` -> `...51` ## • `Income` -> `Income...52` ## • `` -> `...71` ## • `Education` -> `Education...72` ## • `` -> `...73` ## • `` -> `...74` ## • `` -> `...75` ## • `Income` -> `Income...76` ## • `` -> `...77` ## • `` -> `...79` ## • `` -> `...80` ## • `` -> `...81` ## • `` -> `...82`
Sample2
## # A tibble: 145 × 82 ## No. Gender age Reappraisal1 Reappraisal2 Reappraisal3 Reappraisal4 ## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 male 43 4 2 2 2 ## 2 2 male 40 3 2 4 1 ## 3 3 male 60 3 2 2 2 ## 4 4 male 50 2 2 4 2 ## 5 5 male 42 4 4 2 4 ## 6 6 female 42 2 4 4 3 ## 7 7 male 54 4 2 3 3 ## 8 8 male 40 2 2 2 2 ## 9 9 male 56 2 2 3 2 ## 10 10 male 43 4 2 4 4 ## # ℹ 135 more rows ## # ℹ 75 more variables: Reappraisal5 <dbl>, ReappraisalMean <dbl>, ## # SocialSupport1 <dbl>, SocialSupport2 <dbl>, SocialSupport3 <dbl>, ## # SocialSupportMean <dbl>, ProbSolving1 <dbl>, ProbSolving2 <dbl>, ## # ProbSolving3 <dbl>, ProbSolving4 <dbl>, ProbSolvingMean <dbl>, Rel1 <dbl>, ## # Rel2 <dbl>, Rel3 <dbl>, Rel4 <dbl>, RelMean <dbl>, Tol1 <dbl>, Tol2 <dbl>, ## # TolMean <dbl>, Emo1 <dbl>, Emo2 <dbl>, Emo3 <dbl>, Emo4 <dbl>, …
dim(Sample2)
## [1] 145 82
library(dplyr)
Sample2 %>%
summarize(`Mean Age` = mean(age), `SD of Age` = sd(age))
## # A tibble: 1 × 2 ## `Mean Age` `SD of Age` ## <dbl> <dbl> ## 1 49.3 8.39
library(dplyr)
Sample2 %>%
group_by(Gender)%>%
summarize(`Mean Age` = mean(age), `SD of Age` = sd(age))
## # A tibble: 2 × 3 ## Gender `Mean Age` `SD of Age` ## <chr> <dbl> <dbl> ## 1 female 46.0 8.82 ## 2 male 50.5 7.92
library(dplyr) Sample2%>% mutate(Agecode=ifelse(age<=50, "at most 50 years old", "More than 50 years old"))%>% group_by(Agecode)%>% summarise(count=n())%>% mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3 ## Agecode count Percentage ## <chr> <int> <dbl> ## 1 More than 50 years old 69 47.6 ## 2 at most 50 years old 76 52.4
library(dplyr) Sample2%>% group_by(Gender)%>% summarise(count=n())%>% mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 2 × 3 ## Gender count Percentage ## <chr> <int> <dbl> ## 1 female 40 27.6 ## 2 male 105 72.4
library(dplyr) Sample2%>% group_by(Education)%>% summarise(count=n())%>% mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 14 × 3 ## Education count Percentage ## <chr> <int> <dbl> ## 1 Colege graduate 1 0.69 ## 2 College graduate 21 14.5 ## 3 College level 16 11.0 ## 4 Elementary graduate 17 11.7 ## 5 Elementary level 21 14.5 ## 6 Ementary level 1 0.69 ## 7 High School graduate 4 2.76 ## 8 High chool graduate 1 0.69 ## 9 High schoo graduate 1 0.69 ## 10 High school graduate 22 15.2 ## 11 High school level 37 25.5 ## 12 High scool level 1 0.69 ## 13 High sschool graduate 1 0.69 ## 14 Highschool level 1 0.69
Sample3<-Sample2%>%
mutate(Educationcode = recode(`Education`,
"Colege graduate" = "College graduate", "Ementary level" = "Elementary level", "High schoo graduate" = "High school graduate", "High School graduate" = "High school graduate", "High sschool graduate" = "High school graduate", "High scool level" = "High school level", "Highschool level " = "High school level", "Highschool level" = "High school level", "Highschool level " = "High school level", "High chool graduate" = "High school graduate"))
library(dplyr) Sample3%>% group_by(Educationcode)%>% summarise(count=n())%>% mutate(Percentage =round((count/sum(count)*100),2))
## # A tibble: 6 × 3 ## Educationcode count Percentage ## <fct> <int> <dbl> ## 1 Elementary level 22 15.2 ## 2 Elementary graduate 17 11.7 ## 3 High school level 39 26.9 ## 4 High school graduate 29 20 ## 5 College level 16 11.0 ## 6 College graduate 22 15.2