Data Exploitation with R

Nguyen Ngoc Hien

September 19, 2021

Where is R?

Top 10 data-science software

Why R?

  • The most comprehensive statistical analysis package
  • (Absolutely Free) Open-source that’s why you can run R anywhere any time
  • Top tier companies using R in the world (e.g., facebook, Google, twitter, Ford, Microsoft)
  • Cross-platform which runs on many operating systems (best for GNU/Linux and Microsoft Window)
  • Everyone is welcomed to provide bug fixes, code enhancements, and new packages (normally scientists, statisticians, mathematicians)

First impression

What does it look like?

  • summary: a syntax to show the basic statistics of an object
  • cars: a name of dataset object
  • Press Enter keyboard for the code below to show its basic statistics in R platform
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Interative data visualization

Interative data visualization

Interative data visualization

Covid Infected by Genders and Ages in Viet Nam

Statistical analysis

Does she (Covid) like Men or Women In the 30-39 aged Group ?

prop.test(x = c(51520,50900), n = c(219617,233599),
           alternative = "two.sided",
          conf.level = 0.95)
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(51520, 50900) out of c(219617, 233599)
## X-squared = 180.29, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.01425316 0.01913774
## sample estimates:
##    prop 1    prop 2 
## 0.2345902 0.2178948

Statistical applications (Six Sigma)

Statistical applications (Six Sigma)

List of 15 $ call : language qcc(data = data.subGroup, type = “S”, newdata = newVector, newsizes = 3, newlabels = c(“New.1”, “New.2”, “Ne| truncated … $ type : chr”S" $ data.name : chr “data.subGroup” $ data : num [1:13, 1:3] 8.24 6.3 8.45 12.36 8.43 … ..- attr(, “dimnames”)=List of 2 $ statistics : Named num [1:13] 3.03 3.18 3.8 2.98 2.16 … ..- attr(, “names”)= chr [1:13] “1” “2” “3” “4” … $ sizes : Named int [1:13] 3 3 3 3 3 3 3 3 3 3 … ..- attr(, “names”)= chr [1:13] “1” “2” “3” “4” … $ center : num 2.92 $ std.dev : num 3.29 $ newstats : Named num [1:9] 8.37 4.76 11.02 7.74 9.15 … ..- attr(, “names”)= chr [1:9] “New.1” “New.2” “New.3” NA … $ newdata : num [1:9, 1] 8.37 4.76 11.02 7.74 9.15 … $ newsizes : num [1:9] 3 3 3 3 3 3 3 3 3 $ newdata.name: chr “newVector” $ nsigmas : num 3 $ limits : num [1, 1:2] 0 7.5 ..- attr(, “dimnames”)=List of 2 $ violations :List of 2 - attr(, “class”)= chr “qcc”

Text Mining

List the most 10 used words in the documents:

Hien, N.N (2016). Implementation of lean production systems for small-medium sized enterprises. Unpublished M.sc. Thesis, Vietnamese-German University - Technical Univesity of Berlin, Germany

So many applications

Available R packages (applications)

18,150

Update date: 2021-09-17

Data Exploitation with R (Book)

Thanh Nien Publisher - ISBN: 978-604-334-956-6

Data Exploitation with R (Khai Thác Dữ Liệu với R):

  • Chapter 1: R introduction
  • Chapter 2: Objects and Functions
  • Chapter 3: Basics statistics
  • Chapter 4: Random variables & Probability distribution
  • Chapter 5: Graphs
  • Chapter 6: Statistical hypothesis tests
  • Chapter 7: Analysis of variance
  • Chapter 8: Regression analysis
  • Chapter 9: Six Sigma methods
  • Chapter 10: Text mining

Keep in Contact

Book on Tiki !!

Our community of Data mining, Lean, Six Sigma!! ****

  • Sharing jobs and careers
  • Sharing knowlegeable information & building networks
  • Training courses & certifications, books, apps…

Nguyen Ngoc Hien Contact us: +87 978-325-123