11/24/22
you don't know how good you have it being an R programmer until you have to do something in PYTHON. IM STRAIGHT UP NOT HAVING A GOOD TIME
— Asmae Toumi (@asmae_toumi) October 18, 2022
RStudio is like buying a new car.
— Shane'); DROP TABLE User;– (@ShaneofMN) October 18, 2022
Python is like buying a new car, but the car is completely disassembled, the parts are all from different manufacturers, and you don't know if any of it is right until you try driving on the freeway
Have to install Python on a new machine, goodbye forever
— Vicki (@vicki@jawns.club), “official” (@vboykis) August 10, 2022
I'm ready to confess that I now prefer the tidyverse (dplyr in particular) over pandas when working with flat data.
— Ethan Douglas (@EthanCDouglas) June 3, 2021
Please respect my privacy during this difficult time.
This is just one example of how supportive and helpful I've found the #rstats community. The community is one of the main reasons I choose R (vs Python) and recommend it to DS beginners. So, thank you to everyone who makes it that way. Fin.
— Emily Robinson (@robinson_es) November 15, 2017
And yet … from early to 2021 to mid-2022, I essentially only programmed in Python!
Companies may offer packages for using their services for Python but not R (e.g. Cloud Services like AWS and Google Cloud)
You need to use a tool like Airflow which uses Python
putrinprod.com
While you can mix R and Python in one project (spoiler alert), might prefer to just use one
Maybe you’re on a bilingual team and take over or help out on a Python project
Maybe you want to switch from the analytics team to the Machine Learning team, and they all use Python
There were 470 “data scientist for R” jobs listed on LinkedIn in Spain vs. 12,000 “data scientist Python” jobs
That’s 25x as many
19,000 vs. 324,000 in the United States
{fig-align: “center”}
```{python}
#| code-line-numbers: "4-5|7-12|14|15|4-15"
import os
import boto3
s3 = boto3.client(
's3',
aws_secret_access_key =
os.getenv('AWS_SECRET_ACCESS_KEY'),
aws_access_key_id =
os.getenv('AWS_ACCESS_KEY_ID'))
buckets = s3.list_buckets()
buckets['Buckets'][0]
```
{'Name': 'dogrates', 'CreationDate': datetime.datetime(2021, 10, 6, 1, 1, 50, tzinfo=tzutc())}
```{r}
boto3 <- import('boto3')
s3 <- boto3$client(
's3',
aws_secret_access_key =
Sys.getenv('AWS_SECRET_ACCESS_KEY'),
aws_access_key_id =
Sys.getenv('AWS_ACCESS_KEY_ID'))
buckets <- s3$list_buckets()
buckets$Buckets[[1]]
```
$Name
[1] "dogrates"
$CreationDate
datetime.datetime(2021, 10, 6, 1, 1, 50, tzinfo=tzutc())
```{python}
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
x = r.penguins_non_missing[['bill_length_mm', 'body_mass_g',
'flipper_length_mm', 'bill_depth_mm']]
y = r.penguins_non_missing['sex']
x_train, x_test, y_train, y_test = train_test_split(x, y)
logisticRegr = linear_model.LogisticRegression()
logisticRegr = logisticRegr.fit(x_train, y_train)
logisticRegr.predict(x_test)[:5]
```
array(['female', 'female', 'male', 'female', 'female'], dtype=object)
“Just in time learning” - you should be seeing immediate payoff that offset struggles
Become a better, more empathetic teacher/mentor by remembering what it’s like to be a beginner.
Here's a recipe to clean up the Ames housing dataset. pic.twitter.com/Hl40kqXeqm
— Matt Harrison (@__mharrison__) October 18, 2022
Nobody in their right mind thinks this code is good? Right?
— DANISHDATTY (@danishdatty) October 18, 2022
I love how controversial this code is, and I love @__mharrison__'s point of view in his interview with @somacdivad.
— Chris May (@_ChrisMay) November 2, 2022
Every bit of code you write is a tradeoff between understandability, flexibility, performance, and many other traits. pic.twitter.com/13iGzYf6vl
Takes care of a lot of the environment issues
Basic Google Colab (which is Python only) and Saturn Cloud are free; AWS SageMaker notebooks has a free tier for first two months
“For the first couple years you make stuff, it’s just not that good… But your taste, the thing that got you into the game, is still killer. And your taste is why your work disappoints you … It is only by going through a volume of work that you will close that gap … It’s normal to take awhile.” -Ira Glass
The only way to write good code is to write tons of shitty code first. Feeling shame about bad code stops you from getting to good code
— Hadley Wickham (@hadleywickham) April 17, 2015
Effective Pandas: Patterns for Data Manipulation by Matt Harrison
Effective Python: 90 Specific Ways to Write Better Python, 2nd edition by Brett Slatkin
Many domain specific books, like Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron and Deep Learning with Python by François Chollet
Blog: hookedondata.org
Book: datascicareer.com
Podcast: podcast.bestbook.cool
Social: @robinsones@fosstodon.org on Mastodon
@robinson_es on Twitter
robinsones on LinkedIn
Slides: https://rpubs.com/robinson_es/python_with_r