Introduction

A quick look at what kind of questions get posted on the Data Science StackExchange.

Get the data

Get all open questions:

Query used:

Let’s look at the data

 [1] "Id"                    "PostTypeId"            "AcceptedAnswerId"     
 [4] "ParentId"              "CreationDate"          "DeletionDate"         
 [7] "Score"                 "ViewCount"             "Body"                 
[10] "OwnerUserId"           "OwnerDisplayName"      "LastEditorUserId"     
[13] "LastEditorDisplayName" "LastEditDate"          "LastActivityDate"     
[16] "Title"                 "Tags"                  "AnswerCount"          
[19] "CommentCount"          "FavoriteCount"         "ClosedDate"           
[22] "CommunityOwnedDate"   
Title Tags CreationDate
Difference between Convolution and Pooling? <neural-network><deep-learning> 2020-02-08 07:41:53
Does it make sense to use train_test_split and cross-validation when using GridSearchCV to play with hyperparameters? <cross-validation><model-selection> 2020-02-08 12:04:19
Use Python sklearn in Matlab, MLPRegressor <python><scikit-learn><matlab><mlp> 2020-02-08 13:02:31
Predict a label based on multiple rows each one case? <classification> 2020-02-08 16:00:37
How to summarize multiple time series like dataset <pandas><graphs> 2020-02-08 16:32:43
Keras 1x1 convolution network <keras> 2020-02-08 18:06:21
How to select checkpoint for model evaluation? <neural-network><convolution><evaluation><overfitting> 2020-02-08 18:41:32
Data Labeling domain specific <machine-learning><python><deep-learning><data-cleaning><data-science-model> 2020-02-08 20:32:22
How to optimize client’s portafolio with analytical models? <optimization> 2020-02-08 21:13:49
Stylegan train.py Assertion Error <python><gan><nvidia> 2020-02-08 21:54:39

Some quick EDA

[1] 20663    22

Prepare the Tags data

[1] "machine-learning, python, scikit-learn, clustering, unsupervised-learning"
[2] "neural-network, deep-learning, tensorflow, lstm"                          
[3] "preprocessing"                                                            
[4] "machine-learning, python, cnn, image-classification"                      
[5] "python, scikit-learn, anomaly-detection, outlier, data-imputation"        
[6] "machine-learning, data"                                                   

Association rule mining

Create and filter association rules

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen
        0.3    0.1    1 none FALSE            TRUE       5   0.001      2
 maxlen target   ext
      5  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 20 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[568 item(s), 20664 transaction(s)] done [0.01s].
sorting and recoding items ... [264 item(s)] done [0.00s].
creating transaction tree ... done [0.01s].
checking subsets of size 1 2 3 4 5 done [0.00s].
writing ... [411 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].
LHS RHS support confidence lift count
scikit-learn machine-learning 0.0207607 0.3341121 1.040714 429
regression machine-learning 0.0150019 0.3634232 1.132014 310
cnn machine-learning 0.0126307 0.3182927 0.991438 261
statistics machine-learning 0.0114692 0.3885246 1.210201 237
python,scikit-learn machine-learning 0.0105014 0.3604651 1.122799 217
decision-trees machine-learning 0.0079849 0.3900709 1.215017 165
random-forest machine-learning 0.0078397 0.3537118 1.101764 162
logistic-regression machine-learning 0.0072590 0.3807107 1.185861 150
svm machine-learning 0.0072106 0.3941799 1.227816 149
linear-regression machine-learning 0.0071622 0.3557692 1.108172 148

2020-02-12