R on Cloud Architectures

Rowen Remis R. Iral

7/17/2020

R on Cloud Architectures

Joint Meetup of R Users Group Philippines and AWS Users Group Philippines

2020 July 22

What Am I Currently Doing?

Rowen Remis R. Iral

IT Engineer / Data Science Architect

Architectures

Peritus Knowledge Services Corporation

Peritus

Peritus

https://www.peritus-services.com

Tech Advisor / Digital Marketing

ZankPOS Enterprises POSNUC

ZankPOS

ZankPOS

POSNUC

POSNUC

https://www.zankpos.com https://www.posnuc.com

Tech Advisor

B&B Capital Bull and Bear Capital B&B Capital

https://www.facebook.com/BB-Capital-111214880680174/

R on Cloud aRchitectures

The cloud is a useful environment for processing data and storing data.

AWS Cloud

AWS

Amazon Web Services

Amazon Web Services

https://aws.amazon.com

Faster Data Processing

Feather Data Format

optimized data format for storing data, lightweight and fast.

Apache Arrow

Apache Arrow

List of Parallel Processing Packages in R

We can use data store

And processes in parallel with R

Multi-core and Parallel Programming in R

We can use AWS cloud’s multiprocessor instances or setup some clusters

Parallel Programming in R

AWS Lumberyard

Has a snow library not for parallel computing in R but for game effects. Snow, rain and other effects are available if you are a game developer for integration with Twitch and AWS.

Note on Multiple CPUs and it’s Limitations

Divide and conquer has a certain limit depending on the process or computation used.

Amdahl’s Law

You can parallelize, but not all of steps, which will give you a certain minimum time to execute a job. More resources for faster processing.

Gustafson’s law

theoretical speedup in latency of the execution of a task at fixed execution time that can be expected of a system whose resources are improved. More resources for more details, as you can process objects as components of a larger object.

We can parallelize, but there are certain serial process that cannot be run in parallel.

R and Microsoft R Server (former RevoAnalytics R)

Microsoft R

Microsoft R

ContRolling your Server on the Cloud

R has its own package repository called CRAN (Comprehensive R Archive Network)

There is also a package that uses packages on github.

devtools

devtools

Genomics and bioinformatics packages

Galaxy Server – server for Genomics

bioconductor

bioconductor

R Open Science

ROpenSci

ROpenSci

Cloud Packages

New cloud packages for R

cloudyr

cloudyr

CloudyR Packages for AWs

Securing your RStudio Server on AWS

SSH Tunneling

You only expose SSH port 22 and then tunnel through it where the port 8787 is not exposed publicly but only behind the SSH port 22 tunnel so you can secure the http traffic if https is not available SSH Tunneling

RStudio Server Security with VPN

VPN - Virtual Private network This also connects remote workers.

VPN keys are needed VPN Server

Direct Access

Publicly Open Ports are SSH 22 and VPN UDP 1194

Direct Access no Web Server Access

Direct Access no Web Server Access

With VPN Security through AWS Ec2 Instance

Can connect to Web Server through VPN

Web Access through VPN

Web Access through VPN

Thank you

Rowen Remis R. Iral

http://wenup.wordpress.com