import pandas as pd
import numpy as np
cc = pd.read_csv("AER_credit_card_data.csv")
cc.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 1319 entries, 0 to 1318
## Data columns (total 12 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 card 1319 non-null object
## 1 reports 1319 non-null int64
## 2 age 1319 non-null float64
## 3 income 1319 non-null float64
## 4 share 1319 non-null float64
## 5 expenditure 1319 non-null float64
## 6 owner 1319 non-null object
## 7 selfemp 1319 non-null object
## 8 dependents 1319 non-null int64
## 9 months 1319 non-null int64
## 10 majorcards 1319 non-null int64
## 11 active 1319 non-null int64
## dtypes: float64(4), int64(5), object(3)
## memory usage: 123.8+ KB
cc.head()
## card reports age income ... dependents months majorcards active
## 0 yes 0 37.66667 4.5200 ... 3 54 1 12
## 1 yes 0 33.25000 2.4200 ... 3 34 1 13
## 2 yes 0 33.66667 4.5000 ... 4 58 1 5
## 3 yes 0 30.50000 2.5400 ... 0 25 1 7
## 4 yes 0 32.16667 9.7867 ... 2 64 1 5
##
## [5 rows x 12 columns]
Install Pipenv
What’s the version of pipenv you installed?
Use --version to find out
What’s the first hash for scikit-learn you get in Pipfile.lock?
We’ve prepared a dictionary vectorizer and a model.
They were trained (roughly) using this code:
features = ['reports', 'share', 'expenditure', 'owner']
dicts = df[features].to_dict(orient='records')
dv = DictVectorizer(sparse=False)
X = dv.fit_transform(dicts)
model = LogisticRegression(solver='liblinear').fit(X, y)
Note: You don’t need to train the model. This code is just for your reference.
And then saved with Pickle. Download them:
With wget:
PREFIX=https://raw.githubusercontent.com/alexeygrigorev/mlbookcamp-code/master/course-zoomcamp/cohorts/2022/05-deployment/homework
wget $PREFIX/model1.bin
wget $PREFIX/dv.bin
Write a script for loading these models with pickle
Score this client:
{"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}
What’s the probability that this client will get a credit card?
0.162
0.391
0.601
0.993
If you’re getting errors when unpickling the files, check their checksum.
import pickle
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression
dv_file = "dv.bin"
with open(dv_file, "rb") as f_in: # rb read
dv = pickle.load(f_in)
## C:\Users\husad\CONDA~1\envs\ML-ZOO~1\lib\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator DictVectorizer from version 1.0.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
## https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
## warnings.warn(
model_file = "model1.bin"
with open(model_file, "rb") as f_in: # rb read
model = pickle.load(f_in)
## C:\Users\husad\CONDA~1\envs\ML-ZOO~1\lib\site-packages\sklearn\base.py:329: UserWarning: Trying to unpickle estimator LogisticRegression from version 1.0.2 when using version 1.1.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
## https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
## warnings.warn(
customer = {"reports": 0, "share": 0.001694, "expenditure": 0.12, "owner": "yes"}
X = dv.transform([customer])
y_pred = model.predict_proba(X)[0, 1]
y_pred
## 0.16213414434326598
Now let’s serve this model as a web service
Install Flask and gunicorn (or waitress, if you're on Windows)
Write Flask code for serving the model
Now score this client using requests:
url = “YOUR_URL” client = {“reports”: 0, “share”: 0.245, “expenditure”: 3.438, “owner”: “yes”} requests.post(url, json=client).json() What’s the probability that this client will get a credit card?
0.274
0.484
0.698
0.928
---
So that’s that. I have yet to understand Docker so, I give up question 5 and 6.