Date the article was published: February 3, 2022
Find the article here.
The article entitled ‘How to use cloud platforms for your data science projects’ introduces some of the most common cloud-based platforms for data science and how you can use them in your own projects. The cloud based platforms discussed in the article include:
| Cloud Based Platforms |
|---|
| Amazon Web Services (AWS) |
| Google Cloud Platform (GCP) |
| IBM Watson |
| Microsoft Azure |
Data scientists deal with a myriad of complex problems, and solve them via model building, deploying algorithms, etc. This being said, using the right tools is essential to effectively and efficiently operate. Advantages of taking a project to the cloud include ability to scale, access all of the latest tools in one place, and less maintenance from the user.Different cloud-based tools are advantageous for each aspect of a project pipeline.
Below are tables of each platform and the services spoken about in the article:
| Google Cloud |
|---|
| Data Flow |
| Cloud Pub/Sub |
| Big Query |
| BigQuery Data Transfer Service |
| Cloud Storage |
| Storage Transfer Service |
| Vertex AI Workbench |
| Big Query ML |
| Microsoft |
|---|
| Azure AI |
| Azure |
| IBM |
|---|
| IBM Cloud Pak Data |
| IBM Watson Studio |
| IBM SPSS Modeler |
| Microsoft Azure |
| Amazon AWS |
|---|
| SageMaker Data Wrangler |
| Amazon Athena |
| AWS Redshift |
| AWS Lake Formation |
| AWS S3 |
| Amazon SageMaker Feature Store |
| Amazon EMR |
| Sagemaker Studio Notebooks |
| Sagemaker Clarify |
| SageMaker Model Monitor |
The article is from Analytics India Magazine (AIM), a company devoted to sharing the newest technologies and their impacts. Their vision is to shape the future of AI, and “TO BRING ABOUT BETTER-INFORMED AND MORE CONSCIOUS DECISIONS ABOUT TECHNOLOGY THROUGH AUTHORITATIVE, INFLUENTIAL, AND TRUSTWORTHY JOURNALISM.”
To read more about AIM and their mission, look at their web page here.
pce=personal consumption expenditures, in billions of dollars, pop=total population, in thousands, psavert=personal savings rate, uempmed=median duration of unemployment, in weeks, unemploy=number of unemployed in thousands,
econdat<-ggplot(data=economics_long, aes(x=date,y=value01,color=variable))+geom_line()
ggplotly(econdat) #add title and change axes, etc.
Interactive Table
tab<-with(economics_long,table(variable))%>%prop.table()
kable(tab,col.names=c('Proportions of each economic metric in the data','value'))
| Proportions of each economic metric in the data | value |
|---|---|
| pce | 0.2 |
| pop | 0.2 |
| psavert | 0.2 |
| uempmed | 0.2 |
| unemploy | 0.2 |
ggplot(data = diamonds, mapping = aes(x = carat, y = price)) +geom_point(alpha = 0.1, aes(color = cut))
interactive table
tab2<-rbind(fivenum(diamonds$carat),fivenum(diamonds$price))
kable(tab2,col.names=c('Min','25%','Median','75%','Max'),caption='5 number summary for carat and price of diamonds')
| Min | 25% | Median | 75% | Max |
|---|---|---|---|---|
| 0.2 | 0.4 | 0.7 | 1.04 | 5.01 |
| 326.0 | 950.0 | 2401.0 | 5324.50 | 18823.00 |
# classic plot :
scat <- ggplot(diamonds, aes(x=depth, y=table, color=cut, size=carat)) +
geom_point() +
theme(legend.position="none")
# with marginal histogram
final<- ggMarginal(scat, type="histogram")
final