CS 424 Big Data Analytics

Session 5: Basics of R

Instructor: Dr. Bob Batzinger
Academic year: 2021/2022
Semester: 1

Begins June 2021

R Studio Interface

Scalar numbers

## a = 14.92        Output: 14.92 
##  round(a,1)      Output: 14.9 
##  as.integer(a)   Output: 14 
##  is.numeric(a)   Output: TRUE 
##  is.integer(a)   Output: FALSE 
##    a * 2         Output: 29.84 
##    a / 2         Output: 7.46 
##    a + 2         Output: 16.92 
##    a - 2         Output: 12.92 
##    a ^ 2         Output: 222.6064 
##    a %% 2        Output: 0.92

Strings

## s = "PYU CS Dept"        Output: PYU CS Dept 
##  substr(s,start=5,stop=6)    Output: CS 
##  grep("/PYU/",s)         Output:  
##  gsub("PYU","Payap",s)       Output: Payap CS Dept 
##  strsplit(s, " ")        Output: PYU CS Dept 
##  paste("a","=",a)        Output: a = 14.92 
##  toupper(s)              Output: PYU CS DEPT 
##  tolower(s)              Output: pyu cs dept 
##  nchar(s)                Output: 11

Vectors

## vector:              x            2 13 5 17 11 3 
##  reversed:           rev(x)       3 11 17 5 13 2 
##  selected entries:   x[c(1,3,5)]  2 5 11 
##  number of elements: length(x)    6 
##  order map:          order(x)     1 6 3 5 2 4 
##  sorted list:        x[order(x)]  2 3 5 11 13 17 
##  scaled value:       3 * x        6 39 15 51 33 9 
##  offset value:       3 + x        5 16 8 20 14 6 
##  squared values:     x * x        4 169 25 289 121 9 
##  sequence:           5:10         5 6 7 8 9 10 
##  average:            mean(x)      8.5 
##  std dev:            sd(x)        6.058052 
##  sum:                sum(x)       51 
##  sequence:           seq(0,3,.5)  0 0.5 1 1.5 2 2.5 3 
##  repeated num:       rep(a,3)     14.92 14.92 14.92 
##  vector of Labels:   LETTERS[1:8]     A B C D E F G H 
##  vector of labels:   letters[1:8]     a b c d e f g h

Matrix

## mtxa=rbind(c(1,2,3),
##  c(6,5,4),c(7,9,8))

## mtxa -------------

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    6    5    4
## [3,]    7    9    8

## t(mtxa) ----------

##      [,1] [,2] [,3]
## [1,]    1    6    7
## [2,]    2    5    9
## [3,]    3    4    8

## colMeans(mtxa) ------

## [1] 4.666667 5.333333 5.000000

## rowSums(mtxa) ------

## [1]  6 15 24

## diag(3) ---------

##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

## det(mtxa) -------

## [1] 21

## mtxa * mtxa -----

##      [,1] [,2] [,3]
## [1,]    1    4    9
## [2,]   36   25   16
## [3,]   49   81   64

Matrix multiplication

\[\begin{pmatrix} 1 & 2 & 3\\ 6 & 5 & 4\\ 7 & 9 & 8\\ \end{pmatrix}\begin{pmatrix} 10\\ 12\\ 20\\ \end{pmatrix} = \begin{pmatrix} 1 \cdot 10 + 2 \cdot 12 + 3 \cdot 20\\ 6 \cdot 10 + 5 \cdot 12 + 4 \cdot 20\\ 7 \cdot 10 + 9 \cdot 12 + 8 \cdot 20\\ \end{pmatrix} = \begin{pmatrix} 94\\ 200\\ 358\\ \end{pmatrix}\]

## mtxa = rbind(c(1,2,3),c(6,5,4),c(7,9,8))

## mtxb = c(10,12,20)

## mmprod = mtxa %*% mtxb

##      [,1]
## [1,]   94
## [2,]  200
## [3,]  338

Algebraic Solutions

Algebraic Equations

\[\begin{eqnarray} x + 2y + 3z & = & 10\\ 6x + 5y + 4z & = & 12\\ 7x + 9y + 8z & = & 20\\ \end{eqnarray}\]

Rendering in R

## result = solve(mtxa,mtxb)

## [1]  1.523810 -3.619048  5.238095

Matrix version

\[\begin{pmatrix} 1 & 2 & 3 \\ 6 & 5 & 4 \\ 7 & 9 & 8 \\ \end{pmatrix}\begin{pmatrix} x\\ y\\ z\\ \end{pmatrix} = \begin{pmatrix} 10\\ 12\\ 20\\ \end{pmatrix}\]

\[\begin{pmatrix} 1 & 2 & 3\\ 0 & 1 & 2\\ 0 & 0 & 1\\ \end{pmatrix} = \begin{pmatrix} 10.00000\\ 6.85714\\ 5.23809\\ \end{pmatrix}\]

\[\begin{pmatrix} x\\ y\\ z\\ \end{pmatrix} = \begin{pmatrix} 1.523810\\ -3.619048\\ 5.238095\\ \end{pmatrix} \]

Markov Chains

Transition map \[\begin{matrix} From/To & Fresh &Soph & Jr& Sr & Grad & Resign\\ Fresh &0.08& 0.72& 0& 0& 0 &0.2\\ Soph &0 & 0.08& 0.72& 0.1 &0 & 0.1\\ Jr & 0 & 0 & 0.08 & 0.82 & 0 & 0.1\\ Sr & 0 & 0 & 0& 0.1& 0.85 & 0.5\\ Grad & 0 & 0 & 0& 0 & 1 & 0\\ Resign & 0 & 0 &0 & 0 & 0 & 1\\ \end{matrix}\]

Markov Chain in R

Fresh =c(0.08,0.72, 0,0,0,0.20)
Soph = c(0,0.08,0.72,0.1,0,0.1)
Jr = c(0,0,0.08,0.82,0,0.1)
Sr = c(0,0,0,0.1,0.85,0.05)
Grad = c(0,0,0,0,1,0)
Resign = c(0,0,0,0,0,1)
markov=rbind(Fresh,Soph,Jr,Sr,
             Grad,Resign)
students = c(1000,0,0,0,0,0)

students = c(1000,0,0,0,0,0)
results = rbind(students)
for (i in 1:8){
  students = students %*% markov
  results = rbind(results,students)
}

Results of Markov Chain

Converting rates

dat = data.frame(rbind(
c(14, 2963,6.6),
c(14,10110,6.5),
c(11.1,1402112,7.0),
c(13,3214,2.2),
c(17.4,1380004,3.7),
c(6,51780,2.7)))
rownames(dat) = c("AR","AZ","CN",
                  "GE","IN","KR")
colnames(dat) = c('birthrate',
      'population', 'diff.100')
unmatched = round(
  (dat[,1] * dat[,2] * dat[,3]) /
    10000, 3)
dat = cbind(dat,unmatched)

	birthrate	population	diff.100	unmatched
AR	14.0	2963	6.6	27.378
AZ	14.0	10110	6.5	92.001
CN	11.1	1402112	7.0	10894.410
GE	13.0	3214	2.2	9.192
IN	17.4	1380004	3.7	8884.466
KR	6.0	51780	2.7	83.884

Data frame

## dim(dat) ========

## [1] 6 4

## colnames(dat) ====

## [1] "birthrate"  "population" "diff.100"   "unmatched"

## rownames(dat) ====

## [1] "AR" "AZ" "CN" "GE" "IN" "KR"

## head(dat,2) ======

##    birthrate population diff.100 unmatched
## AR        14       2963      6.6    27.378
## AZ        14      10110      6.5    92.001

## tail(dat,2) ======

##    birthrate population diff.100 unmatched
## IN      17.4    1380004      3.7  8884.466
## KR       6.0      51780      2.7    83.884

Data frame inspection

## str(data) ======

## 'data.frame':    6 obs. of  4 variables:
##  $ birthrate : num  14 14 11.1 13 17.4 6
##  $ population: num  2963 10110 1402112 3214 1380004 ...
##  $ diff.100  : num  6.6 6.5 7 2.2 3.7 2.7
##  $ unmatched : num  27.38 92 10894.41 9.19 8884.47 ...

## 
## summary(dat) =====

##    birthrate       population         diff.100       unmatched        
##  Min.   : 6.00   Min.   :   2963   Min.   :2.200   Min.   :    9.192  
##  1st Qu.:11.57   1st Qu.:   4938   1st Qu.:2.950   1st Qu.:   41.505  
##  Median :13.50   Median :  30945   Median :5.100   Median :   87.942  
##  Mean   :12.58   Mean   : 475030   Mean   :4.783   Mean   : 3331.889  
##  3rd Qu.:14.00   3rd Qu.:1047948   3rd Qu.:6.575   3rd Qu.: 6686.350  
##  Max.   :17.40   Max.   :1402112   Max.   :7.000   Max.   :10894.410

Mean and Std Dev

doStat <- function(n=10000) {
    total = 0; sq = 0; cnt = 0; x = 1:n
    for (i in x) {total = total + i
      sq = sq + i * i; cnt = cnt + 1
    }
    mn = round(total / n,5); msq = mn * mn
    doStat = paste("mean=",mn,", std.dev=",
        round(sqrt((sq - msq*cnt)/(cnt-1)),5),"\n")
}

doStat2 <- function(n=10000) { x = 1:n
  doStat2 = paste("mean=",mean(x),", std.dev=",sd(x),"\n")
}

Comparison of Times

doStat()

## time= 0.03611 +/- 0.00215 
##  mean= 5000.5 , std.dev= 2886.89568 
##

mean() and sd()

## time= 0.00289 +/- 0.00127 
##  mean= 5000.5 , std.dev= 2886.89567990717 
##

Register for juliacon2021

JuliaCon is an international annual event of over 40,000 participants.
It is a meeting of developers, users and instructors of this language from around the world.
Normally it costs US$2000 to attend but this year it is online and free.
You must register at https://juliacon.org to be able to attend.
Register soon! (The event is capped at 80K participants and they already have 50K)

juliacon2021 Assignment

Register at https://juliacon.org for the upcoming juliacon2021 online
When the schedule is posted, pick the 5 sessions you will attend and report on. Your report must include each of the following session types:
- A talk (pick 1 out of the 75 offered)
- A lighning talk (pick 1 out of the 116 offered)
- A birds of a feather session (pick one out of the 6 offered)
- An experience report (pick one out of the 21 offered)
- A workshop (pick one out of the 16 offered)

Answer the following questions for each of the sessions you attended:
- What is the main speaker’s name, organization, position and email or twitter address?
- What was the purpose of their application of Julia?
- Why was Julia a good fit for this project?
- What was the product of their research and development with Julia?
- Describe one interesting thing this session taught you about programming in Julia.

Juliacon2021 116 Lightning Talks

Titles A-N
- 3.6x speedup on A64FX by squeezing ShallowWaters.jl into Float16
- A Tour of the differentiable programming landscape with Flux.jl
- Actors.jl: Concurrent Computing with the Actor Model
- Airborne Magnetic Navigation Enhanced with Neural Networks
- AlgebraicDynamics: Compositional dynamical systems
- An individual-based model to simulate Coffee Leaf Rust epidemics
- Analyzing Human Scalp EEG in Julia
- AnyMOD.jl: A Julia package for creating energy system models
- Atomic fields: the new primitives on the block
- AtomicSets.jl
- Automatic dualization with Dualization.jl
- BPFnative.jl: eBPF programming in Julia
- Bias Audit and Mitigation in Julia
- Calculating a million stationary points in a second on the GPU
- Catwalk.jl: A profile guided dispatch optimizer
- Cerberus: A solver for mixed-integer programs with disjunctions
- Chaotic time series predictions with ReservoirComputing.jl
- Clapeyron.jl: An Extensible Implementation of Equations of State
- Clearing the Pipeline Jungle with FeatureTransforms.jl
- ClimaCore.jl: Tools for building spatial discretizations
- ClimateModels.jl – A Simple Interface To Climate Models
- Composable Bayesian Modeling with Soss.jl
- CompositionalNetworks.jl: a scaling glass-box neural network
- Creating a Shared Library Bundle with Package Compiler
- Data driven insight into fish behaviour for aquaculture
- DataSets.jl: A bridge between code and data
- DeconvOptim.jl: Microscopy Image Deconvolution
- Designing Spacecraft Trajectories with Julia
- Designing ecologically optimized vaccines
- Dissemination of Julia in the French-speaking Community
- Diversity and Inclusion in the Julia community
- Easy and Customizable PINN PDE Solving with NeuralPDE.jl
- Effects.jl: Effectively Understand Effects in Regression Models
- Enhanced Sampling in Molecular Dynamics Simulations with Julia
- Exploiting Structure in Kernel Matrices
- ExprTools: Metaprogramming from reflection
- Flexible set projections with MathOptInterface
- FlowAtlas.jl: interactive exploration of phenotypes in cytometry
- FourierTools.jl | Working with the Frequency Space
- Generative Models with Latent Differential Equations in Julia
- Generic programming applied to a Finite Volume solver
- Genify.jl: Transforming Julia into Gen for Bayesian inference
- Global Sensitivity Analysis for SciML models in Julia
- Going to Jupiter with Julia
- HiGHS
- HighFrequencyCovariance: Estimating Covariance Matrices in Julia
- HypertextLiteral : performant string interpolation for HTML/SVG
- Improving Gender Diversity in the Julia Community
- In-Situ Data Analysis with Julia for E3SM at Large Scale
- Intercepts in pairs of geographical tracks from TrackMatcher
- Julia Admittance: A Toolbox for Admittance Extraction
- Julia and deploying complex graphical applications for laypeople
- Julia for data analysis in High Energy Physics
- Julia in the Windows Store
- Julog.jl: Prolog-like Logic Programming in Julia
- Lattice Reduction using LLLplus.jl
- LatticeQCD.jl: Simulation of quantum gauge fields
- Learning to align with differentiable dynamic programming
- MadNLP.jl: A Mad Nonlinear Programming Solver.
- Matlab to Julia: Hours to Minutes for MRI Image Analysis
- Modelling Australia’s National Electricity Market with JuMP
- Modelling cryptographic side-channels with Julia types
- Modia – Modeling Multidomain Engineering Systems with Julia
- Multilingual Natural Language Processing using Julia
- MutableArithmetics: An API for mutable operations
- NOMAD.jl
- New tools to solve PDEs in Julia with Gridap.jl
- Non-linear SDE mechanical simulations
- Nonconvex.jl

Titles O - Z
- Optical simulation with the OpticSim.jl package
- PGFPlotsX.jl - Plotting with LaTeX, directly from Julia
- POMDPs.jl and Interactive Assignments in Julia
- PRS.jl: Fast Polygenic Risk Scores
- Partitions and chains: enabling batch processing for your data
- PhyloNetworks: a Julia package for phylogenetic networks
- Physics-Informed ML Simulator for Wildfire Propagation
- Pluto in Production
- Pluto.jl Notebooks are Web APIs!
- Power Market Tool (POMATO)
- PowerModelsDistributionStateEstimation.jl
- PrettyPrinting: optimal layout for code and data
- Probabilistic Model Checking using POMDPModelChecking.jl
- ReTest.jl - more productive testing
- ReactiveMP.jl: Reactive Message Passing-based Bayesian Inference
- Rewriting Pieces of a Python Codebase in Julia
- Running Programs Forwards, Backwards, and Everything In Between
- Runtime-switchable BLAS/LAPACK backends via libblastrampoline
- Scaling of Oceananigans.jl on multi GPU and CPU systems
- SciML for Structures: Predicting Bridge Behavior
- Semantically Releasing Julia Packages
- Simulating a public transportation system with OpenStreetMapX.jl
- Single-cell resolved cell-cell communication modeling in Julia
- Solving Pokemon Go Battles using Julia
- Solving discrete problems via Boolean satisfiability with Julia
- Solving optimization problems at Fonterra
- Space Engineering in Julia
- Sparse Matrix Decomposition and Completion with Chordal.jl
- SpeedMapping.jl: Implementing Alternating cyclic extrapolations
- Structural lambdas for generic code and delayed evaluation
- SuiteSparseGraphBLAS.jl
- Systems Biology in ModelingToolkit
- TSSOS.jl: exploiting sparsity in polynomial optimization
- The OSCAR Computer Algebra System
- TiledViews.jl
- Towards a symbolic integrator with Rubin.jl
- Types from JSON
- Unbalanced Power Flow Optimization with PowerModelsDistribution
- Using Julia in microscope image processing
- Using Julia to study economic inequality and taxation
- Using optimization to make good guesses for test cases
- WaterLily.jl: Real-time fluid simulation in pure Julia
- Web application for atmospheric dispersion modeling.
- ZXCalculus.jl: A Julia package for the ZX-calculus
- hPF-MD.jl: Hybrid Particle-Field Molecular-Dynamics Simulation
- kubernetes-native julia development
- vOptSolver: an ecosystem for multi-objective linear optimization

Juliacon2021: 75 Talks

Titles: A-I
- A Brief Introduction to InfrastructureModels
- A Derivative-Free Local Optimizer for Multi-Objective Problems
- A Short History of AstroTime.jl
- A deep dive into MakieLayout
- Adaptive and extendable numerical simulations with Trixi.jl
- Agents.jl and the next chapter in agent based modelling
- Alpine and Juniper for Global Optimization in Julia
- Applied Measure Theory for Probabilistic Modeling
- Bayesian Neural Ordinary Differential Equations
- BifurcationKit.jl: bifurcation analysis of large scale systems
- Building High Performance Digital Twins with Julia
- Building on AlphaZero with Julia
- CUDA.jl 3.0
- Calibration analysis of probabilistic models in Julia
- Changing Physics education with Julia
- Code, docs, and tests: what’s in the General registry?
- Conic optimization example problems in Hypatia’s examples folder
- ConstraintProgrammingExtensions.jl
- ConstraintSolver.jl - First constraint solver written in Julia
- Deep Dive: Creating Shared Libraries with PackageCompiler.jl
- Dictionaries.jl - for improved productivity and performance
- Easy, Featureful Parallelism with Dagger.jl
- Efficient graph data structures: What we have and what could be
- Enabling Rapid Microservice Development with a Julia SDK
- Enzyme.jl – Reverse mode differentiation on LLVM IR for Julia
- Everything you need to know about ChainRules 1.0
- ExaTron.jl: a scalable GPU-MPI-based batch solver for small NLPs
- Finding an Effective Strategy for AutoML Pipeline Optimization
- FrankWolfe.jl: scalable constrained optimization
- FunSQL: a library for compositional construction of SQL queries
- Geostatistical Learning
- Global constrained nonlinear optimisation with interval methods
- Hierarchical Multiple Instance Learning
- Hybrid Strategies using Piecewise-Linear Decision Rules
- Infinite-Dimensional Optimization with InfiniteOpt.jl
- Introducing Chemellia: Machine Learning, with Atoms!
- InvertibleNetworks.jl - Memory efficient deep learning in Julia

Titles: J-Z
- JET.jl: The next generation of code checker for Julia
- Javis.jl - Julia Animations and Visualizations
- Julia and Quantum Chemistry: A Love Story
- Julia in VS Code - What’s New
- Julia@Beacon: ~100TBs of EEG vs. 1 JIT Compiler (+ K8s + Arrow)
- JuliaSPICE: A Composable ML Accelerated Analog Circuit Simulator
- JuliaSim: Machine Learning Accelerated Modeling and Simulation
- Linear programming by first-order methods
- Linearly Constrained Separable Optimization
- Modeling Bilevel optimization problems with BilevelJuMP.jl
- Modeling the Economy During the Pandemic
- Monads 2.0, aka Algebraic Effects: ExtensibleEffects.jl
- NExOS.jl for Nonconvex Exterior-point Operator Splitting
- Nonlinear programming on the GPU
- Open and interactive Computational Thinking with Julia and Pluto
- Package latency and what developers can do to reduce it
- Pluto.jl — one year later
- Put some constraints into your life with JuliaCon(straints)
- Query Compilation, Vectorization, and Julia — a Trio Infernal?
- Redwood: A framework for clusterless supercomputing in the cloud
- Release management - lessons learned in JuliaData ecosystem
- Roadmap to Julia BLAS and LinearAlgebra
- Scalable Power System Modeling and Analaysis
- Shaped Data with Acsets
- Simulating Chemical Kinetics with ReactionMechanismSimulator.jl
- SmartTensors: Unsupervised Machine Learning
- Symbolics.jl - fast and flexible symbolic programming
- Symmetry reduction for Sum-of-Squares programming
- The Design of the MiniZinc Modelling Language
- The state of JuMP
- Tomographic Image Reconstruction with Julia
- TopOpt.jl: topology optimization software done right!
- UnitCommitment.jl: Security-Constrained Unit Commitment in JuMP
- Unleashing Algebraic Metaprogramming in Julia with Metatheory.jl
- What’s new in COSMO?
- What’s new in ITensors.jl
- Writting fast sequential Julia Code
- ZigZagBoomerang.jl - parallel inference and variable selection

Juliacon2021: 21 Reports of Experience

Titles: A-J
- Applications of Julia for Network Science Text Analysis
- Awesome Computer Vision Done Quick
- Bootstrapping Data Science and Diversity
- Case Study: Server Side Julia for COVID-19 Patient Workflows
- Disrupting Esoteric Language Microbenchmarks with an 80-line JIT
- Fitting Plate-reader Curves with Julia
- High Performance Tsunami Forecasting
- Julia & Healthcare Technology Assessment Analytics
- Julia as a framework for a Theoretical Physics PhD
- Julia for end to end financial analysis
- Jumping into the Julia Community via Advent Of Code

Titles: K-Z
- Learning during the pandemic
- Musical Julia
- Probabilistic K-Nearest Neighbours
- Processing Light-Sheet Microscopy Data Using Julia
- Sonification: Exploring streaming data using live music coding
- Speeding up cosmological data analysis with Julia
- Strengths and Challenges of Julia for learning Linear Algebra
- The wonderfully helpful Julia community
- Theory is (nearly) implementation with julia types
- Yawipa: a comprehensive and extensible Wiktionary parser

JuliaCon2021: 6 birds of a feather session

Building a Chemistry and Materials Science Ecosystem in Julia
Discussing Gender Diversity in the Julia Community
Fancy Arrays BoF 2
Julia for Computational Biology
Julia in Private Organizations
Live Coding: Outreach and Beyond

Juliacon2021: 16 Workshops

Titles: A-M
- A mathematical look at electronic structure theory
- DataFrames.jl 1.0 tutorial
- Diffractor: Next-Gen AD for Julia
- GPU programming in Julia
- Game development in Julia with GameZero.jl
- Introduction to Bayesian Data Analysis
- Introduction to metaprogramming in Julia
- It’s all Set: A hands-on introduction to JuliaReach
- Modeling Marine Ecosystems At Multiple Scales Using Julia

Titles: P-Z
- Package development in VSCode
- Package development: improving engineering quality & latency
- Parse and broker (log) messages with CombinedParsers(.EBNF)
- Quantum Computing with Julia
- Simulating Big Models in Julia with ModelingToolkit
- Solving differential equations in parallel on GPUs
- Statistics with Julia from the ground up