Article Summary: Using Cloud Platforms in Data Science

Cloud Computing Image

Date the article was published: February 3, 2022

Find the article here.

Summary of Article

The article entitled ‘How to use cloud platforms for your data science projects’ introduces some of the most common cloud-based platforms for data science and how you can use them in your own projects. The cloud based platforms discussed in the article include:

Cloud Based Platforms
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
IBM Watson
Microsoft Azure

Data scientists deal with a myriad of complex problems, and solve them via model building, deploying algorithms, etc. This being said, using the right tools is essential to effectively and efficiently operate. Advantages of taking a project to the cloud include ability to scale, access all of the latest tools in one place, and less maintenance from the user.Different cloud-based tools are advantageous for each aspect of a project pipeline.

Cloud Based Platforms: The Specifics

IBM
- IBM has tools for machine learning and automation in every step of the pipeline, including great tools to prepare data and monitor models.
- IBM Cloud Pak Data: helps to collect, explore, analyze data within a modern data warehouse.
- IBM Watson Studio: builds, runs, and analyzes models
- IBM SPSS Modeler: visual solution for data prep and analyzation
Google Cloud
- Cited on the article as one “of the best cloud based platforms for data science” with high-compatability tools in every step of the pipeline
- Data ingestion and data pre-processing: use their applications Dataflow, Cloud Pub/Sub, BigQuery, Cloud Storage
- Data exploration and insights: use vertex AI Workbench and BigQueryML
Microsoft
- Azure Machine Learning: MLOps with single-click deployment
AWS
- Training: can conduct bias detection using Amazon Sagemaker Clarify
- Monitoring Models: Amazon Sagemaker Model detects model drifts

Summary of Applications and Tools

Below are tables of each platform and the services spoken about in the article:

Google Cloud
Data Flow
Cloud Pub/Sub
Big Query
BigQuery Data Transfer Service
Cloud Storage
Storage Transfer Service
Vertex AI Workbench
Big Query ML

Microsoft
Azure AI
Azure

IBM
IBM Cloud Pak Data
IBM Watson Studio
IBM SPSS Modeler
Microsoft Azure

Amazon AWS
SageMaker Data Wrangler
Amazon Athena
AWS Redshift
AWS Lake Formation
AWS S3
Amazon SageMaker Feature Store
Amazon EMR
Sagemaker Studio Notebooks
Sagemaker Clarify
SageMaker Model Monitor

About the Author

The article is by SREEJANI BHATTACHARYYA AIM author Bhattacharyya

About the Website

The article is from Analytics India Magazine (AIM), a company devoted to sharing the newest technologies and their impacts. Their vision is to shape the future of AI, and “TO BRING ABOUT BETTER-INFORMED AND MORE CONSCIOUS DECISIONS ABOUT TECHNOLOGY THROUGH AUTHORITATIVE, INFLUENTIAL, AND TRUSTWORTHY JOURNALISM.”

AIM logo

To read more about AIM and their mission, look at their web page here.

Random Plots and Tables

US Economic Time Series

pce=personal consumption expenditures, in billions of dollars, pop=total population, in thousands, psavert=personal savings rate, uempmed=median duration of unemployment, in weeks, unemploy=number of unemployed in thousands,

econdat<-ggplot(data=economics_long, aes(x=date,y=value01,color=variable))+geom_line()
ggplotly(econdat) #add title and change axes, etc.

Interactive Table

tab<-with(economics_long,table(variable))%>%prop.table()
kable(tab,col.names=c('Proportions of each economic metric in the data','value'))

Proportions of each economic metric in the data	value
pce	0.2
pop	0.2
psavert	0.2
uempmed	0.2
unemploy	0.2

Diamonds Scatterplot

ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +geom_point(alpha = 0.1, aes(color = cut))

interactive table

tab2<-rbind(fivenum(diamonds$carat),fivenum(diamonds$price))
kable(tab2,col.names=c('Min','25%','Median','75%','Max'),caption='5 number summary for carat and price of diamonds')

5 number summary for carat and price of diamonds
Min	25%	Median	75%	Max
0.2	0.4	0.7	1.04	5.01
326.0	950.0	2401.0	5324.50	18823.00

Diamonds Marginal Plot

# classic plot :
scat <- ggplot(diamonds, aes(x=depth, y=table, color=cut, size=carat)) +
      geom_point() +
      theme(legend.position="none")
 
# with marginal histogram
final<- ggMarginal(scat, type="histogram")
final

Article Summary: Using Cloud Platforms in Data Science

Maura Toner

2/3/2022

Summary of Article

Cloud Based Platforms: The Specifics

Summary of Applications and Tools

About the Author

About the Website

Random Plots and Tables

US Economic Time Series

Diamonds Scatterplot

Diamonds Marginal Plot

Article Summary: Using Cloud Platforms in Data Science

Maura Toner

2/3/2022

Summary of Article

Cloud Based Platforms: The Specifics

Summary of Applications and Tools

About the Author

About the Website

Related Articles and Reaction

Related Articles

What I Found Most Helpful

Random Plots and Tables

US Economic Time Series

Diamonds Scatterplot

Diamonds Marginal Plot