Rowen Remis R. Iral
7/17/2020
Joint Meetup of R Users Group Philippines and AWS Users Group Philippines
2020 July 22
Rowen Remis R. Iral
IT Engineer / Data Science Architect
ZankPOS Enterprises POSNUC
ZankPOS
POSNUC
B&B Capital Bull and Bear Capital
The cloud is a useful environment for processing data and storing data.
AWS
Amazon Web Services
Feather Data Format
optimized data format for storing data, lightweight and fast.
Apache Arrow
We can use data store
And processes in parallel with R
We can use AWS cloud’s multiprocessor instances or setup some clusters
Parallel Programming in R
Has a snow library not for parallel computing in R but for game effects. Snow, rain and other effects are available if you are a game developer for integration with Twitch and AWS.
Divide and conquer has a certain limit depending on the process or computation used.
You can parallelize, but not all of steps, which will give you a certain minimum time to execute a job. More resources for faster processing.
theoretical speedup in latency of the execution of a task at fixed execution time that can be expected of a system whose resources are improved. More resources for more details, as you can process objects as components of a larger object.
We can parallelize, but there are certain serial process that cannot be run in parallel.
Microsoft R
R has its own package repository called CRAN (Comprehensive R Archive Network)
There is also a package that uses packages on github.
devtools
Galaxy Server – server for Genomics
bioconductor
ROpenSci
New cloud packages for R
cloudyr
SSH Tunneling
You only expose SSH port 22 and then tunnel through it where the port 8787 is not exposed publicly but only behind the SSH port 22 tunnel so you can secure the http traffic if https is not available
VPN - Virtual Private network This also connects remote workers.
VPN keys are needed
Publicly Open Ports are SSH 22 and VPN UDP 1194
Direct Access no Web Server Access
Can connect to Web Server through VPN
Web Access through VPN