Mateusz Tomczak
432561
The goal of this project is to understand what sets of programming languages are used by the developers in different fields of software development using the association rules algorithms. Understanding what languages are often used together can prove quite helpful, especially for aspiring developers that feel overwhelmed by the learning possibilites in this field. It can also provide insight into the programming field as well as show some relationships of languages used in front-end and back-end development. Data used for this project comes from The Public Stack Overflow Developer Survey Results 2023. This dataset provides broad overview of Stack Overflow users and gathers information regarding basic data such as education, position at work, etc., but also more recent topics such as AI tools and thier use in the workspace. For task presented in this project one column was selected , ‘LanguagesHaveWorkedWith’, containing the information regarding programming languages that the developer have worked with. In needs to be pointed out that the survey is open to every Stack Overflow user, so the data collected reflects the information for both professional as well as amateur developers. However, even in the case of amateur programmers the data can point out to the languages that are closely used together, or present a natural progression of programmer skillset. The analysis will mainly use Apriori Algorithm, but ECLAT Algorithm will also be used.
library(arules)
library(arulesViz)
data <- read.transactions("data/stack-overflow-developer-survey-2023/data.csv", sep=',')
We can see that the most popular programming language among the Stack Overflow users is JavaScript, with HTML/CSS at second spot, and Python at third.
options(width = 100)
summary(data)
## transactions as itemMatrix in sparse format with
## 87140 rows (elements/itemsets/transactions) and
## 51 columns (items) and a density of 0.1048925
##
## most frequent items:
## JavaScript HTML/CSS Python SQL TypeScript (Other)
## 55711 46396 43158 42623 34041 244228
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 5361 8753 12229 13157 12424 10182 7647 5548 3807 2558 1827 1192 775 562 361 232
## 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
## 118 120 65 52 30 24 20 12 14 5 4 2 5 2 3 1
## 33 34 35 36 37 38 41 42 43 44 45 46 47 48 49 50
## 4 1 2 1 1 1 2 1 1 1 1 1 1 1 1 6
## 51
## 22
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.00 5.00 5.35 7.00 51.00
##
## includes extended item information - examples:
## labels
## 1 Ada
## 2 Apex
## 3 APL
options(width = 100, max.print = 1000)
size(data)
## [1] 3 2 7 3 6 15 7 5 6 5 8 7 4 3 11 2 4 1 3 4 7 3 4 2 2 7 14 2 6 3 7
## [32] 6 1 3 9 5 8 7 3 3 4 1 4 3 4 7 4 6 4 3 1 6 6 5 5 4 4 6 2 2 2 5
## [63] 1 6 4 3 3 5 3 5 1 9 2 2 2 4 2 1 6 8 1 5 5 6 9 6 5 14 4 2 5 6 2
## [94] 9 4 2 9 5 3 7 2 6 3 3 5 7 6 4 4 4 1 13 6 6 12 4 10 2 6 6 1 3 1 3
## [125] 4 8 1 4 6 8 10 6 7 5 3 5 5 8 4 2 3 8 1 5 16 6 8 5 1 6 8 5 6 1 5
## [156] 5 9 10 4 4 6 6 10 6 7 18 5 9 3 3 8 3 4 5 1 7 2 4 3 10 4 10 3 4 1 3
## [187] 7 6 7 5 4 19 3 5 4 3 3 10 3 7 5 14 4 7 2 4 20 6 5 7 4 10 5 3 3 2 5
## [218] 1 12 3 3 3 4 2 8 3 13 5 6 11 11 6 7 2 6 12 5 4 10 1 2 3 3 2 1 8 3 2
## [249] 4 7 5 1 3 7 7 4 3 10 3 2 4 4 4 9 6 5 4 6 9 3 4 5 6 4 3 6 6 8 3
## [280] 4 2 6 6 5 1 3 6 3 6 2 8 4 4 5 8 5 9 5 11 4 2 2 6 1 8 4 7 6 8 3
## [311] 4 13 6 6 14 7 3 5 7 7 7 6 11 1 1 1 3 2 5 5 5 1 5 6 6 1 3 5 10 1 4
## [342] 9 9 7 11 8 1 1 7 8 4 8 5 4 3 4 8 3 5 3 17 8 4 4 3 1 9 7 4 6 4 4
## [373] 7 2 5 3 9 3 3 4 9 6 3 5 5 4 4 3 10 3 9 14 8 2 7 8 4 2 4 2 3 8 15
## [404] 5 6 2 8 13 5 5 5 5 1 5 11 6 4 8 9 5 7 4 7 6 1 3 8 6 5 4 24 1 7 3
## [435] 6 1 4 4 5 5 6 7 6 6 14 1 4 9 2 8 4 1 3 2 3 4 10 5 6 8 3 8 4 8 4
## [466] 7 2 2 5 11 4 2 5 1 4 5 3 3 16 5 6 6 9 4 8 3 4 6 3 2 2 4 11 8 7 3
## [497] 2 6 4 7 4 6 6 3 7 7 6 8 8 4 11 4 8 1 8 3 6 10 4 3 6 2 2 10 6 4 6
## [528] 4 3 8 7 6 5 6 3 5 5 2 5 7 4 4 10 6 4 9 4 7 5 11 2 12 12 7 10 5 5 4
## [559] 5 4 4 6 5 4 2 4 4 11 5 7 14 3 3 11 12 3 7 1 4 4 9 7 8 2 4 12 2 4 5
## [590] 4 3 3 4 5 5 6 3 6 12 5 3 12 5 2 6 5 5 3 5 7 5 2 7 4 3 4 4 4 9 6
## [621] 3 6 5 6 10 8 11 6 7 4 3 5 8 13 6 5 5 9 3 8 4 6 7 6 2 8 6 13 4 6 2
## [652] 7 5 1 5 4 3 5 7 5 19 6 5 3 3 7 4 3 4 7 3 2 3 7 2 4 4 5 4 2 5 5
## [683] 5 3 2 2 3 6 6 5 6 6 10 6 5 7 6 3 4 3 2 2 1 2 11 2 3 3 3 3 3 8 3
## [714] 1 6 6 10 2 1 3 6 3 4 6 3 1 5 7 4 2 6 3 4 4 6 2 10 2 7 8 6 5 2 4
## [745] 6 7 6 6 9 10 3 3 2 4 2 8 3 3 6 2 8 7 13 2 4 5 2 1 3 4 3 6 10 2 8
## [776] 4 3 10 4 6 5 2 3 6 4 6 1 9 1 6 4 5 8 6 11 3 4 11 3 5 5 5 5 1 4 4
## [807] 5 6 6 2 5 4 10 3 6 3 3 6 11 4 3 6 7 5 10 5 2 22 2 6 6 2 3 3 6 6 7
## [838] 6 4 5 8 5 6 5 8 1 5 3 6 4 4 5 6 2 4 8 5 9 2 3 2 4 4 8 5 4 2 6
## [869] 4 4 13 5 7 4 8 3 2 4 6 7 15 7 7 2 7 7 3 3 4 5 4 7 9 9 1 5 6 7 4
## [900] 7 8 3 7 6 9 6 3 3 5 6 2 5 4 4 6 8 1 3 7 8 4 7 2 3 5 1 4 9 7 3
## [931] 1 2 5 6 5 3 11 4 7 2 6 3 4 7 2 3 3 7 5 10 7 5 10 8 7 6 4 1 6 6 8
## [962] 5 3 8 2 8 11 2 5 5 5 10 11 2 4 6 4 2 4 4 2 4 8 3 7 2 7 5 18 5 7 10
## [993] 3 4 6 3 2 4 5 11
## [ reached getOption("max.print") -- omitted 86140 entries ]
median(size(data))
## [1] 5
As we can see, most of the observations, or ‘transactions’, contain 5 programming lanuages.
length(data)
## [1] 87140
Total number of observations is equal to 87140.
Relative frequency of the programming languages in the dataset:
options(width = 100)
itemFrequency(data, type="relative")
## Ada Apex APL Assembly
## 0.007769107 0.006644480 0.002582052 0.054544411
## Bash/Shell (all shells) C C# C++
## 0.325350011 0.194399816 0.277633693 0.225315584
## Clojure Cobol Crystal Dart
## 0.012680744 0.006610053 0.004464081 0.060511820
## Delphi Elixir Erlang F#
## 0.032487950 0.023272894 0.009960982 0.009742942
## Flow Fortran GDScript Go
## 0.002455818 0.009559330 0.017156300 0.133027312
## Groovy Haskell HTML/CSS Java
## 0.034151939 0.020989213 0.532430571 0.307057608
## JavaScript Julia Kotlin Lisp
## 0.639327519 0.011590544 0.091060363 0.015400505
## Lua MATLAB Nim Objective-C
## 0.061234795 0.038317650 0.003798485 0.023169612
## OCaml Perl PHP PowerShell
## 0.007046133 0.024684416 0.186756943 0.136584806
## Prolog Python R Raku
## 0.008905210 0.495271976 0.042483360 0.001790223
## Ruby Rust SAS Scala
## 0.062588937 0.131133808 0.004900161 0.027794354
## Solidity SQL Swift TypeScript
## 0.013403718 0.489132431 0.046729401 0.390647234
## VBA Visual Basic (.Net) Zig
## 0.035655267 0.040945605 0.008365848
Absolute frequencies:
options(width = 100)
itemFrequency(data, type="absolute")
## Ada Apex APL Assembly
## 677 579 225 4753
## Bash/Shell (all shells) C C# C++
## 28351 16940 24193 19634
## Clojure Cobol Crystal Dart
## 1105 576 389 5273
## Delphi Elixir Erlang F#
## 2831 2028 868 849
## Flow Fortran GDScript Go
## 214 833 1495 11592
## Groovy Haskell HTML/CSS Java
## 2976 1829 46396 26757
## JavaScript Julia Kotlin Lisp
## 55711 1010 7935 1342
## Lua MATLAB Nim Objective-C
## 5336 3339 331 2019
## OCaml Perl PHP PowerShell
## 614 2151 16274 11902
## Prolog Python R Raku
## 776 43158 3702 156
## Ruby Rust SAS Scala
## 5454 11427 427 2422
## Solidity SQL Swift TypeScript
## 1168 42623 4072 34041
## VBA Visual Basic (.Net) Zig
## 3107 3568 729
First ten observations:
options(width = 100)
inspect(data[1:10])
## items
## [1] {HTML/CSS,
## JavaScript,
## Python}
## [2] {Bash/Shell (all shells),
## Go}
## [3] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## PHP,
## Ruby,
## SQL,
## TypeScript}
## [4] {HTML/CSS,
## JavaScript,
## TypeScript}
## [5] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## Ruby,
## SQL,
## TypeScript}
## [6] {Ada,
## Clojure,
## Elixir,
## Go,
## HTML/CSS,
## Java,
## JavaScript,
## Lisp,
## OCaml,
## Raku,
## Ruby,
## Scala,
## Swift,
## TypeScript,
## Zig}
## [7] {Go,
## HTML/CSS,
## JavaScript,
## Python,
## Rust,
## SQL,
## TypeScript}
## [8] {C#,
## JavaScript,
## PowerShell,
## Ruby,
## TypeScript}
## [9] {HTML/CSS,
## Java,
## JavaScript,
## Python,
## SQL,
## TypeScript}
## [10] {C#,
## C++,
## HTML/CSS,
## JavaScript,
## Python}
Item frequency plot for support above 0.1:
itemFrequencyPlot(data, support = 0.1)
Item frequency plot for top 15 programming languages:
itemFrequencyPlot(data, topN = 15)
Here we can better see the top 15 programming languages among Stack Overflow users. We can see that JavaScript is significantly more popular than other languages. However, HTML with CSS, Python and SQL are close behind, with similar number of users.
Graphical representation of 100 random samples from the dataset
set.seed(42)
image(sample(data, 100))
options(width = 100)
ctab<-crossTable(data, measure="count", sort=TRUE)
ctab
## JavaScript HTML/CSS Python SQL TypeScript Bash/Shell (all shells) Java
## JavaScript 55711 41117 27339 31870 29947 19381 18830
## HTML/CSS 41117 46396 23174 27992 24428 17004 15386
## Python 27339 23174 43158 22557 15996 18545 14967
## SQL 31870 27992 22557 42623 18919 16344 15538
## TypeScript 29947 24428 15996 18919 34041 12151 11385
## Bash/Shell (all shells) 19381 17004 18545 16344 12151 28351 10312
## Java 18830 15386 14967 15538 11385 10312 26757
## C# 16957 14834 10614 15048 11186 6876 7553
## C++ 11880 10302 13363 9013 6459 8169 8160
## C 10573 9366 11867 8306 5614 8139 7639
## PHP 14299 12870 7893 11779 7122 5955 5845
## PowerShell 9162 8409 6836 8601 5985 6157 4252
## Go 8042 6185 7401 6375 6000 5909 4402
## Rust 7323 5963 7549 5215 5947 5424 3949
## Kotlin 5243 4242 4283 4173 3926 3074 5553
## Ruby 4227 3430 2749 3252 2529 2562 1673
## Lua 3954 3473 3768 2828 2722 3063 2115
## Dart 3894 3305 2971 2852 2734 1726 2460
## Assembly 3169 2865 3471 2570 1696 3130 2444
## C# C++ C PHP PowerShell Go Rust Kotlin Ruby Lua Dart
## JavaScript 16957 11880 10573 14299 9162 8042 7323 5243 4227 3954 3894
## HTML/CSS 14834 10302 9366 12870 8409 6185 5963 4242 3430 3473 3305
## Python 10614 13363 11867 7893 6836 7401 7549 4283 2749 3768 2971
## SQL 15048 9013 8306 11779 8601 6375 5215 4173 3252 2828 2852
## TypeScript 11186 6459 5614 7122 5985 6000 5947 3926 2529 2722 2734
## Bash/Shell (all shells) 6876 8169 8139 5955 6157 5909 5424 3074 2562 3063 1726
## Java 7553 8160 7639 5845 4252 4402 3949 5553 1673 2115 2460
## C# 24193 7009 5425 4670 7036 2669 2903 2291 1105 1776 1780
## C++ 7009 19634 11340 4190 3467 3173 4269 2232 1162 2245 1633
## C 5425 11340 16940 3957 2910 3153 4253 1975 1181 2289 1497
## PHP 4670 4190 3957 16274 2941 2432 1696 1693 1242 1281 1455
## PowerShell 7036 3467 2910 2941 11902 1710 1618 1200 732 1138 781
## Go 2669 3173 3153 2432 1710 11592 3339 1691 1361 1560 1205
## Rust 2903 4269 4253 1696 1618 3339 11427 1631 971 2011 1014
## Kotlin 2291 2232 1975 1693 1200 1691 1631 7935 684 773 1428
## Ruby 1105 1162 1181 1242 732 1361 971 684 5454 587 414
## Lua 1776 2245 2289 1281 1138 1560 2011 773 587 5336 503
## Dart 1780 1633 1497 1455 781 1205 1014 1428 414 503 5273
## Assembly 1752 3322 3828 1308 1102 947 1453 669 422 862 449
## Assembly Swift R Visual Basic (.Net) MATLAB VBA Groovy Delphi Scala
## JavaScript 3169 2679 2127 2741 2063 2123 2074 1288 1302
## HTML/CSS 2865 2199 1989 2575 1927 2016 1657 1126 986
## Python 3471 1999 3102 1712 2778 1772 1798 838 1504
## SQL 2570 1826 2437 2793 1825 2385 1833 1739 1359
## TypeScript 1696 1809 1063 1396 983 1001 1362 506 1021
## Bash/Shell (all shells) 3130 1475 1662 1132 1486 1143 1831 595 1172
## Java 2444 1677 1444 1560 1626 1208 2248 706 1491
## C# 1752 1163 948 2682 1086 1609 703 1031 486
## C++ 3322 1209 1295 1352 2005 1021 742 912 538
## C 3828 1110 1236 1162 1926 936 664 800 538
## PHP 1308 963 795 1348 908 1096 548 832 368
## PowerShell 1102 555 795 1347 802 1165 771 490 336
## Go 947 769 530 377 432 297 759 225 564
## Rust 1453 704 606 284 521 266 449 146 550
## Kotlin 669 1316 370 379 434 285 871 168 522
## Ruby 422 561 319 273 236 232 319 170 285
## Lua 862 343 343 314 368 262 307 175 230
## Dart 449 744 274 293 326 194 254 148 170
## Assembly 4753 390 469 516 740 459 237 435 235
## Perl Elixir Objective-C Haskell GDScript Lisp Solidity Clojure Julia Erlang
## JavaScript 1483 1381 1431 1242 1131 856 1030 703 503 583
## HTML/CSS 1288 1087 1159 1075 1048 779 843 512 456 451
## Python 1464 914 1066 1351 1086 946 758 557 825 476
## SQL 1479 1055 1041 920 693 724 689 511 425 471
## TypeScript 645 962 911 905 784 473 823 438 310 358
## Bash/Shell (all shells) 1423 842 836 1024 682 855 510 531 488 486
## Java 968 482 1038 899 568 589 466 494 330 312
## C# 569 341 772 530 678 352 324 223 231 189
## C++ 904 379 966 903 640 681 368 260 490 293
## C 1027 437 915 1007 524 789 342 281 458 352
## PHP 934 331 648 386 354 321 333 184 169 196
## PowerShell 550 185 415 343 279 264 220 136 194 151
## Go 513 557 383 510 373 367 343 289 253 270
## Rust 366 594 317 871 586 472 325 271 381 300
## Kotlin 253 254 619 334 277 198 188 181 161 152
## Ruby 386 559 350 225 170 252 140 177 123 221
## Lua 332 254 231 406 382 342 153 159 203 169
## Dart 146 197 309 216 230 126 188 131 131 114
## Assembly 399 164 339 487 235 399 166 131 174 144
## F# Fortran Prolog Zig Ada OCaml Apex Cobol SAS Crystal Nim APL Flow Raku
## JavaScript 557 444 588 468 286 371 442 382 312 290 226 158 186 100
## HTML/CSS 468 429 544 396 256 313 380 368 299 257 209 136 159 87
## Python 416 599 606 483 338 422 255 303 274 197 262 133 121 102
## SQL 529 453 517 297 270 287 394 428 332 256 161 123 139 91
## TypeScript 492 184 353 387 166 284 212 202 171 182 172 99 146 68
## Bash/Shell (all shells) 326 433 436 412 254 336 217 246 170 193 200 145 112 99
## Java 268 360 535 261 275 282 251 336 168 149 145 124 99 71
## C# 690 272 308 244 188 172 201 269 150 192 129 111 94 62
## C++ 280 533 469 395 295 300 129 286 144 156 169 140 76 79
## C 269 540 497 479 329 385 133 307 143 167 183 150 71 89
## PHP 170 266 279 136 143 148 167 248 174 136 94 84 93 60
## PowerShell 354 210 235 159 129 99 144 200 143 120 101 72 72 61
## Go 201 165 198 349 117 162 90 123 85 138 151 82 72 67
## Rust 265 153 235 489 117 283 70 95 70 136 178 95 73 71
## Kotlin 151 122 175 149 97 114 76 105 68 95 101 68 74 50
## Ruby 106 117 120 117 91 101 79 105 77 171 87 68 72 68
## Lua 143 145 140 255 76 144 57 97 60 93 118 78 51 61
## Dart 109 89 130 129 77 92 66 79 63 90 92 57 54 42
## Assembly 142 311 269 222 191 191 90 203 83 90 99 120 57 59
## [ reached getOption("max.print") -- omitted 32 rows ]
Association rules are used to define the relationship between the occurences of two or more items, thus allowing a possibility of discovering patterns of occurence in the data. There are three main measures we will use to evaluate obtained rules:
Support shows how frequent the itemset or a rule occurs in the data. In other words, support is equal to relative frequency of the item or rule.
Percentage value describing the proportion of transaction where the presence of given item (or itemset) results in the presence of another item (or itemset). Higer confidence value indicate stronger rule.
Value describing the increase of probability of having item X on the cart knowing item Y is present over the probability of having item X on the cart without the knowledge about the item Y presence. Values greater than 1 indicate positive relationsip between X and Y. Lift of around 1 implies that the sets are independent. Values below 1 implies negative association between X and Y.
Apriori algorithm uses prior knowledge of frequent items. It allows for reduction of the number of rules obtained for the analysis by allowing minimum support level to be defined at the begining. The algorithm assumes that all nonempty subsets of frequent items must also be frequent and vice versa. This allows for a manageable sized output.
We want to find rules that meet minimum threshold of support, confidence and length. We will search for rules with at least 5% support and 25% confidence level. Minimum length will be set to 2 to avoid obtaining rules with empty sets on one side.
apriori_rules <- apriori(data, parameter = list(support = 0.05, confidence = 0.25, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
## 0.25 0.1 1 none FALSE TRUE 5 0.05 2 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4357
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[51 item(s), 87140 transaction(s)] done [0.02s].
## sorting and recoding items ... [19 item(s)] done [0.00s].
## creating transaction tree ... done [0.02s].
## checking subsets of size 1 2 3 4 5 done [0.01s].
## writing ... [860 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
We can see that for those parameters we have created 860 rules.
Inspecting first 10 rules
inspect(apriori_rules[1:10])
## lhs rhs support confidence coverage lift count
## [1] {Kotlin} => {Java} 0.06372504 0.6998110 0.09106036 2.2790869 5553
## [2] {Kotlin} => {JavaScript} 0.06016755 0.6607435 0.09106036 1.0334977 5243
## [3] {Rust} => {Bash/Shell (all shells)} 0.06224466 0.4746653 0.13113381 1.4589373 5424
## [4] {Rust} => {TypeScript} 0.06824650 0.5204341 0.13113381 1.3322354 5947
## [5] {Rust} => {Python} 0.08663071 0.6606283 0.13113381 1.3338698 7549
## [6] {Rust} => {SQL} 0.05984622 0.4563753 0.13113381 0.9330300 5215
## [7] {Rust} => {HTML/CSS} 0.06843011 0.5218343 0.13113381 0.9800982 5963
## [8] {Rust} => {JavaScript} 0.08403718 0.6408506 0.13113381 1.0023823 7323
## [9] {Go} => {Java} 0.05051641 0.3797447 0.13302731 1.2367212 4402
## [10] {Go} => {Bash/Shell (all shells)} 0.06781042 0.5097481 0.13302731 1.5667684 5909
Inspecting first 10 rules with the highest lift
inspect(sort(apriori_rules, by = "lift")[1:10])
## lhs rhs support confidence coverage lift count
## [1] {Bash/Shell (all shells), C++, Python} => {C} 0.05137709 0.6924981 0.07419096 3.562236 4477
## [2] {Bash/Shell (all shells), C++} => {C} 0.06365619 0.6790305 0.09374570 3.492958 5547
## [3] {C++, Java} => {C} 0.06235942 0.6659314 0.09364241 3.425576 5434
## [4] {C++, HTML/CSS, JavaScript, Python} => {C} 0.05027542 0.6618825 0.07595823 3.404748 4381
## [5] {C++, HTML/CSS, Python} => {C} 0.05612807 0.6557179 0.08559789 3.373038 4891
## [6] {C++, JavaScript, Python} => {C} 0.06296764 0.6478158 0.09719991 3.332389 5487
## [7] {C++, JavaScript, SQL} => {C} 0.05220335 0.6470839 0.08067478 3.328624 4549
## [8] {C++, SQL} => {C} 0.06484967 0.6269832 0.10343126 3.225226 5651
## [9] {C++, HTML/CSS, JavaScript} => {C} 0.06489557 0.6237591 0.10403948 3.208640 5655
## [10] {C++, HTML/CSS} => {C} 0.07298600 0.6173559 0.11822355 3.175702 6360
Inspecting first 10 rules with the highest confidence
inspect(sort(apriori_rules, by = "confidence")[1:10])
## lhs rhs support confidence
## [1] {HTML/CSS, PHP, SQL, TypeScript} => {JavaScript} 0.05277714 0.9780944
## [2] {HTML/CSS, PHP, TypeScript} => {JavaScript} 0.06744319 0.9736581
## [3] {PHP, SQL, TypeScript} => {JavaScript} 0.05958228 0.9652352
## [4] {HTML/CSS, Java, Python, TypeScript} => {JavaScript} 0.05632316 0.9600939
## [5] {PHP, TypeScript} => {JavaScript} 0.07835667 0.9587195
## [6] {HTML/CSS, Python, SQL, TypeScript} => {JavaScript} 0.08742254 0.9573960
## [7] {HTML/CSS, PHP, Python, SQL} => {JavaScript} 0.05678219 0.9555813
## [8] {Bash/Shell (all shells), HTML/CSS, SQL, TypeScript} => {JavaScript} 0.07095479 0.9551985
## [9] {HTML/CSS, PowerShell, TypeScript} => {JavaScript} 0.05407390 0.9548126
## [10] {HTML/CSS, Java, SQL, TypeScript} => {JavaScript} 0.06433326 0.9538880
## coverage lift count
## [1] 0.05395915 1.529880 4599
## [2] 0.06926784 1.522941 5877
## [3] 0.06172825 1.509766 5192
## [4] 0.05866422 1.501725 4908
## [5] 0.08173055 1.499575 6828
## [6] 0.09131283 1.497505 7618
## [7] 0.05942162 1.494666 4948
## [8] 0.07428276 1.494068 6183
## [9] 0.05663300 1.493464 4712
## [10] 0.06744319 1.492018 5606
Here we can see that users that mainly worked on the web development (HTML/CSS, PHP, TypeScript) are likely to have also worked with JavaScript.
Scatterplot of rules’ confidence vs support
plot(apriori_rules)
We can see that most of the rules have low support value, while confidence ranges greatly. Lift values remain, in most cases, above 1.
Graph of the first 100 rules defined by support and lift values.
plot(apriori_rules, method='graph', measure = "support", shading = "lift", engine='html')
We can see clearly that Kotlin language does not appear in many rules, as well as Go and Rust, which suggests that they are not the most popular programming languages.
Matrix of rules
plot(apriori_rules, method="matrix", measure="lift")
Scatterplot of lift vs support
plot(apriori_rules, measure=c("support","lift"), shading="confidence")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
plot(apriori_rules, shading="order")
Jaccard Index is used to visualize the differences between programming languages. Value of the index describes how often (in percentages) the two items overlap. In our case we will use dissimilarity, which equals Jaccard Distance (1 - Jaccard Coefficient). So, in our case, for example a value of 0.9 means that the two items do not overlap in 90% of the observations.
options(width = 100)
data_sel<-data[,itemFrequency(data)>0.05]
jaccard_idx<-dissimilarity(data_sel, which="items")
round(jaccard_idx,2)
## Assembly Bash/Shell (all shells) C C# C++ Dart Go HTML/CSS Java
## Bash/Shell (all shells) 0.90
## C 0.79 0.78
## C# 0.94 0.85 0.85
## C++ 0.84 0.79 0.55 0.81
## Dart 0.95 0.95 0.93 0.94 0.93
## Go 0.94 0.83 0.88 0.92 0.89 0.92
## HTML/CSS 0.94 0.71 0.83 0.73 0.82 0.93 0.88
## Java 0.92 0.77 0.79 0.83 0.79 0.92 0.87 0.73
## JavaScript 0.94 0.70 0.83 0.73 0.81 0.93 0.86 0.33 0.70
## Kotlin 0.94 0.91 0.91 0.92 0.91 0.88 0.91 0.92 0.81
## Lua 0.91 0.90 0.89 0.94 0.90 0.95 0.90 0.93 0.93
## PHP 0.93 0.85 0.86 0.87 0.87 0.93 0.90 0.74 0.84
## PowerShell 0.93 0.82 0.89 0.76 0.88 0.95 0.92 0.83 0.88
## Python 0.92 0.65 0.75 0.81 0.73 0.93 0.84 0.65 0.73
## Ruby 0.96 0.92 0.94 0.96 0.95 0.96 0.91 0.93 0.95
## Rust 0.90 0.84 0.82 0.91 0.84 0.94 0.83 0.89 0.88
## SQL 0.94 0.70 0.84 0.71 0.83 0.94 0.87 0.54 0.71
## TypeScript 0.95 0.76 0.88 0.76 0.86 0.93 0.85 0.56 0.77
## JavaScript Kotlin Lua PHP PowerShell Python Ruby Rust SQL
## Bash/Shell (all shells)
## C
## C#
## C++
## Dart
## Go
## HTML/CSS
## Java
## JavaScript
## Kotlin 0.91
## Lua 0.93 0.94
## PHP 0.75 0.92 0.94
## PowerShell 0.84 0.94 0.93 0.88
## Python 0.62 0.91 0.92 0.85 0.86
## Ruby 0.93 0.95 0.94 0.94 0.96 0.94
## Rust 0.88 0.91 0.86 0.93 0.93 0.84 0.94
## SQL 0.52 0.91 0.94 0.75 0.81 0.64 0.93 0.89
## TypeScript 0.50 0.90 0.93 0.84 0.85 0.74 0.93 0.85 0.67
We can see that most of the programming languages do not overlap, with some expected exceptions, such as JavaScript and HTML/CSS (0.33), which are both neccessary for Front-End web development.
Relationships’ Left Hand Side (LHS) groups and Right Hand Side (RHS)
plot(apriori_rules, method="grouped")
Now we will try to examine association rules but with value on one side set before the algorithm will extract the relationships.
In this case we set Python programming language to the LHS of the rules.
rules_python <- apriori(data=data, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="Python"), control=list(verbose=F))
First 6 rules based on confidence
inspect(head(sort(rules_python, by="confidence", decreasing=TRUE), 6))
## lhs rhs support confidence coverage lift count
## [1] {Python} => {JavaScript} 0.3137365 0.6334631 0.495272 0.9908272 27339
## [2] {Python} => {HTML/CSS} 0.2659399 0.5369572 0.495272 1.0085019 23174
## [3] {Python} => {SQL} 0.2588593 0.5226609 0.495272 1.0685469 22557
## [4] {Python} => {Bash/Shell (all shells)} 0.2128185 0.4297002 0.495272 1.3207320 18545
## [5] {Python} => {TypeScript} 0.1835667 0.3706381 0.495272 0.9487796 15996
## [6] {Python} => {Java} 0.1717581 0.3467955 0.495272 1.1294151 14967
We can see that developers using Python are also likely to use languages such as JavaScript, HTML/CSS and SQL. First two may suggest connection between Front-End (JS, HTML/CSS) and Back-End (JS, Python) development. SQL may suggest more data-related Python programming.
Graph visualisig obtained relationships
plot(rules_python, method='graph', measure = "support", shading = "lift")
plot(rules_python, method="paracoord", control=list(reorder=TRUE))
Significance of the rules tested with Fisher’s exact test
options(width = 100)
is.significant(rules_python, data)
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [17] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE
Checking if the sets are maximal (do not contain supersets). Superset is a set that is not contained in another (existing) set.
options(width = 100)
is.maximal(rules_python)
## {GDScript,Python} {Lisp,Python} {Elixir,Python}
## TRUE TRUE TRUE
## {Python,Scala} {Haskell,Python} {Objective-C,Python}
## TRUE TRUE TRUE
## {Perl,Python} {Groovy,Python} {Python,VBA}
## TRUE TRUE TRUE
## {Python,R} {MATLAB,Python} {Python,Swift}
## TRUE TRUE TRUE
## {Python,Visual Basic (.Net)} {Python,Ruby} {Dart,Python}
## TRUE TRUE TRUE
## {Assembly,Python} {Lua,Python} {Kotlin,Python}
## TRUE TRUE TRUE
## {Python,Rust} {Go,Python} {PowerShell,Python}
## TRUE TRUE TRUE
## {PHP,Python} {C,Python} {C++,Python}
## TRUE TRUE TRUE
## {C#,Python} {Java,Python} {Bash/Shell (all shells),Python}
## TRUE TRUE TRUE
## {Python,TypeScript} {Python,SQL} {HTML/CSS,Python}
## TRUE TRUE TRUE
## {JavaScript,Python}
## TRUE
Checking if the rule is redundand, meaning that there is more general one with the same or higer confidence value
options(width = 100)
is.redundant(rules_python)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Here we set JavaScript programming language again on the LHS of the rules.
rules_js <- apriori(data=data, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="rhs", lhs="JavaScript"), control=list(verbose=F))
inspect(head(sort(rules_js, by="confidence", decreasing=TRUE),10))
## lhs rhs support confidence coverage lift count
## [1] {JavaScript} => {HTML/CSS} 0.4718499 0.7380410 0.6393275 1.3861731 41117
## [2] {JavaScript} => {SQL} 0.3657333 0.5720594 0.6393275 1.1695388 31870
## [3] {JavaScript} => {TypeScript} 0.3436654 0.5375420 0.6393275 1.3760291 29947
## [4] {JavaScript} => {Python} 0.3137365 0.4907289 0.6393275 0.9908272 27339
## [5] {JavaScript} => {Bash/Shell (all shells)} 0.2224122 0.3478846 0.6393275 1.0692627 19381
## [6] {JavaScript} => {Java} 0.2160891 0.3379943 0.6393275 1.1007521 18830
## [7] {JavaScript} => {C#} 0.1945949 0.3043744 0.6393275 1.0963164 16957
## [8] {JavaScript} => {PHP} 0.1640923 0.2566639 0.6393275 1.3743203 14299
## [9] {JavaScript} => {C++} 0.1363323 0.2132433 0.6393275 0.9464208 11880
## [10] {JavaScript} => {C} 0.1213335 0.1897830 0.6393275 0.9762509 10573
As expected, based on confidence value, association of JavaScript with HTML/CSS takes the first spot, with very high confidence value. Second place of SQL suggest more data-related developers. TypeScript at the third place is also reasonably expected programming language, as it was created as an JavaScript ‘substitute’.
Graph representation of obtained JavaScript relationships
set.seed(42)
plot(rules_js, method='graph', measure = "support", shading = "lift", max=20)
## Warning: Too many rules supplied. Only plotting the best 20 using 'lift' (change control parameter
## max if needed).
plot(rules_js, method="paracoord", control=list(reorder=TRUE))
Significance
options(width = 100)
is.significant(rules_js, data)
## [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
## [17] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
Maximal sets
options(width = 100)
is.maximal(rules_js)
## {JavaScript,Solidity} {GDScript,JavaScript}
## TRUE TRUE
## {Elixir,JavaScript} {Delphi,JavaScript}
## TRUE TRUE
## {JavaScript,Scala} {Haskell,JavaScript}
## TRUE TRUE
## {JavaScript,Objective-C} {JavaScript,Perl}
## TRUE TRUE
## {Groovy,JavaScript} {JavaScript,VBA}
## TRUE TRUE
## {JavaScript,R} {JavaScript,MATLAB}
## TRUE TRUE
## {JavaScript,Swift} {JavaScript,Visual Basic (.Net)}
## TRUE TRUE
## {JavaScript,Ruby} {Dart,JavaScript}
## TRUE TRUE
## {Assembly,JavaScript} {JavaScript,Lua}
## TRUE TRUE
## {JavaScript,Kotlin} {JavaScript,Rust}
## TRUE TRUE
## {Go,JavaScript} {JavaScript,PowerShell}
## TRUE TRUE
## {JavaScript,PHP} {C,JavaScript}
## TRUE TRUE
## {C++,JavaScript} {C#,JavaScript}
## TRUE TRUE
## {Java,JavaScript} {Bash/Shell (all shells),JavaScript}
## TRUE TRUE
## {JavaScript,TypeScript} {JavaScript,Python}
## TRUE TRUE
## {JavaScript,SQL} {HTML/CSS,JavaScript}
## TRUE TRUE
Redundant sets
options(width = 100)
is.redundant(rules_js)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Here we set R programming language again on the RHS of the rules.
rules_r <- apriori(data=data, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="R"), control=list(verbose=F))
inspect(head(sort(rules_r, by="confidence", decreasing=TRUE),10))
## lhs rhs support confidence coverage lift
## [1] {Java, Python, SQL} => {R} 0.01128070 0.10506627 0.1073675 2.473116
## [2] {C++, SQL} => {R} 0.01023640 0.09896816 0.1034313 2.329575
## [3] {Bash/Shell (all shells), Python, SQL} => {R} 0.01225614 0.09791877 0.1251664 2.304873
## [4] {HTML/CSS, Java, Python} => {R} 0.01048887 0.09679127 0.1083658 2.278334
## [5] {C, Python} => {R} 0.01269222 0.09319963 0.1361832 2.193791
## [6] {Python, SQL} => {R} 0.02407620 0.09300882 0.2588593 2.189300
## [7] {Java, JavaScript, Python} => {R} 0.01168235 0.09012838 0.1296190 2.121498
## [8] {HTML/CSS, Python, SQL} => {R} 0.01527427 0.08977472 0.1701400 2.113174
## [9] {Bash/Shell (all shells), HTML/CSS, Python} => {R} 0.01113151 0.08837464 0.1259582 2.080218
## [10] {HTML/CSS, JavaScript, Python, SQL} => {R} 0.01339224 0.08668202 0.1544985 2.040376
## count
## [1] 983
## [2] 892
## [3] 1068
## [4] 914
## [5] 1106
## [6] 2098
## [7] 1018
## [8] 1331
## [9] 970
## [10] 1167
Here, we can see high variety of languages that can suggest also working with R. Java and C++ to Python, SQL and Bash are very different languages, however together they can represent a skillset of proffesional data scientist, so R can be seen as an important language for this field. But we can see that the confidence values for those rules are rather low.
Graph representing obtained relationships
plot(rules_r, method='graph', measure = "support", shading = "lift", max=20)
plot(rules_r, method="paracoord", control=list(reorder=TRUE))
Significance
options(width = 100)
is.significant(rules_r, data)
## [1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [17] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
## [33] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Maximal sets
options(width = 100)
is.maximal(rules_r)
## {C,R} {C++,R}
## FALSE FALSE
## {C#,R} {Java,R}
## TRUE FALSE
## {Bash/Shell (all shells),R} {R,TypeScript}
## FALSE FALSE
## {Python,R} {R,SQL}
## FALSE FALSE
## {HTML/CSS,R} {JavaScript,R}
## FALSE FALSE
## {C,C++,R} {C,Python,R}
## TRUE TRUE
## {C,JavaScript,R} {C++,Python,R}
## TRUE TRUE
## {C++,R,SQL} {C++,JavaScript,R}
## TRUE TRUE
## {Java,Python,R} {Java,R,SQL}
## FALSE FALSE
## {HTML/CSS,Java,R} {Java,JavaScript,R}
## FALSE FALSE
## {Bash/Shell (all shells),Python,R} {Bash/Shell (all shells),R,SQL}
## FALSE FALSE
## {Bash/Shell (all shells),HTML/CSS,R} {Bash/Shell (all shells),JavaScript,R}
## FALSE FALSE
## {Python,R,TypeScript} {JavaScript,R,TypeScript}
## FALSE FALSE
## {Python,R,SQL} {HTML/CSS,Python,R}
## FALSE FALSE
## {JavaScript,Python,R} {HTML/CSS,R,SQL}
## FALSE FALSE
## {JavaScript,R,SQL} {HTML/CSS,JavaScript,R}
## FALSE FALSE
## {Java,Python,R,SQL} {HTML/CSS,Java,Python,R}
## TRUE TRUE
## {Java,JavaScript,Python,R} {Java,JavaScript,R,SQL}
## TRUE TRUE
## {HTML/CSS,Java,JavaScript,R} {Bash/Shell (all shells),Python,R,SQL}
## TRUE TRUE
## {Bash/Shell (all shells),HTML/CSS,Python,R} {Bash/Shell (all shells),JavaScript,Python,R}
## TRUE TRUE
## {Bash/Shell (all shells),JavaScript,R,SQL} {Bash/Shell (all shells),HTML/CSS,JavaScript,R}
## TRUE TRUE
## {JavaScript,Python,R,TypeScript} {HTML/CSS,Python,R,SQL}
## TRUE FALSE
## {JavaScript,Python,R,SQL} {HTML/CSS,JavaScript,Python,R}
## FALSE FALSE
## {HTML/CSS,JavaScript,R,SQL} {HTML/CSS,JavaScript,Python,R,SQL}
## FALSE TRUE
Redundant sets
options(width = 100)
is.redundant(rules_r)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE
## [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
For VBA programming language we also set it on the RHS of the rules.
rules_vba <- apriori(data=data, parameter=list(supp=0.01,conf = 0.005, minlen=2), appearance=list(default="lhs", rhs="VBA"), control=list(verbose=F))
First 6 rows of the rules by confidence
inspect(head(sort(rules_vba, by="confidence", decreasing=TRUE),10))
## lhs rhs support confidence coverage lift count
## [1] {SQL, Visual Basic (.Net)} => {VBA} 0.01101675 0.34371643 0.03205187 9.639990 960
## [2] {JavaScript, Visual Basic (.Net)} => {VBA} 0.01028230 0.32688800 0.03145513 9.168014 896
## [3] {Visual Basic (.Net)} => {VBA} 0.01296764 0.31670404 0.04094560 8.882391 1130
## [4] {PowerShell, SQL} => {VBA} 0.01156759 0.11719567 0.09870324 3.286911 1008
## [5] {HTML/CSS, PowerShell} => {VBA} 0.01021345 0.10583898 0.09649989 2.968397 890
## [6] {JavaScript, PowerShell} => {VBA} 0.01079871 0.10270683 0.10514115 2.880551 941
## [7] {PowerShell} => {VBA} 0.01336929 0.09788271 0.13658481 2.745252 1165
## [8] {C#, HTML/CSS, JavaScript, SQL} => {VBA} 0.01086757 0.09734786 0.11163645 2.730252 947
## [9] {C#, HTML/CSS, SQL} => {VBA} 0.01183154 0.09610365 0.12311223 2.695356 1031
## [10] {C#, JavaScript, SQL} => {VBA} 0.01253156 0.09208196 0.13609135 2.582563 1092
VBA is a programming language associated by most with the Excel software. Knowing this, we can understand some of the relationsips we have obtained. SQL, as a database query language, works well with VBA on data related tasks. Visual Basic, programming language also associated with Microsoft, along with PowerShell show, that the developers working with data and/or in the Windows enviroment are more likely to also know VBA.
Graph of the relationships
plot(rules_vba, method='graph', measure = "support", shading = "lift")
plot(rules_vba, method="paracoord", control=list(reorder=TRUE))
Significance
options(width = 100)
is.significant(rules_vba, data)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [17] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
## [33] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [49] TRUE
Maximal sets
options(width = 100)
is.maximal(rules_vba)
## {VBA,Visual Basic (.Net)} {PowerShell,VBA}
## FALSE FALSE
## {PHP,VBA} {C,VBA}
## FALSE TRUE
## {C++,VBA} {C#,VBA}
## TRUE FALSE
## {Java,VBA} {Bash/Shell (all shells),VBA}
## FALSE FALSE
## {TypeScript,VBA} {Python,VBA}
## FALSE FALSE
## {SQL,VBA} {HTML/CSS,VBA}
## FALSE FALSE
## {JavaScript,VBA} {SQL,VBA,Visual Basic (.Net)}
## FALSE TRUE
## {JavaScript,VBA,Visual Basic (.Net)} {PowerShell,SQL,VBA}
## TRUE TRUE
## {HTML/CSS,PowerShell,VBA} {JavaScript,PowerShell,VBA}
## TRUE TRUE
## {PHP,SQL,VBA} {HTML/CSS,PHP,VBA}
## FALSE FALSE
## {JavaScript,PHP,VBA} {C#,SQL,VBA}
## FALSE FALSE
## {C#,HTML/CSS,VBA} {C#,JavaScript,VBA}
## FALSE FALSE
## {Java,SQL,VBA} {HTML/CSS,Java,VBA}
## FALSE TRUE
## {Java,JavaScript,VBA} {Bash/Shell (all shells),SQL,VBA}
## FALSE TRUE
## {Bash/Shell (all shells),HTML/CSS,VBA} {Bash/Shell (all shells),JavaScript,VBA}
## TRUE TRUE
## {JavaScript,TypeScript,VBA} {Python,SQL,VBA}
## TRUE FALSE
## {HTML/CSS,Python,VBA} {JavaScript,Python,VBA}
## FALSE FALSE
## {HTML/CSS,SQL,VBA} {JavaScript,SQL,VBA}
## FALSE FALSE
## {HTML/CSS,JavaScript,VBA} {JavaScript,PHP,SQL,VBA}
## FALSE TRUE
## {HTML/CSS,JavaScript,PHP,VBA} {C#,HTML/CSS,SQL,VBA}
## TRUE FALSE
## {C#,JavaScript,SQL,VBA} {C#,HTML/CSS,JavaScript,VBA}
## FALSE FALSE
## {Java,JavaScript,SQL,VBA} {HTML/CSS,Python,SQL,VBA}
## TRUE FALSE
## {JavaScript,Python,SQL,VBA} {HTML/CSS,JavaScript,Python,VBA}
## FALSE FALSE
## {HTML/CSS,JavaScript,SQL,VBA} {C#,HTML/CSS,JavaScript,SQL,VBA}
## FALSE TRUE
## {HTML/CSS,JavaScript,Python,SQL,VBA}
## TRUE
Redundant sets
options(width = 100)
is.redundant(rules_vba)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [33] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE
Equivalence Class Clustering and bottom-up Lattice Traversal (ECLAT) algorithm can also be used to extract relationships between the values. Compared to Apriori, which works in horizontal sense, ECLAT works in vertical manner, similarly to Depth-First Search of a graph. It can be used with top-up, bottom-up or hybrid approach and allows fast computing.
eclat_items <- eclat(data, parameter=list(supp=0.05, maxlen=15))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.05 1 15 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 4357
##
## create itemset ...
## set transactions ...[51 item(s), 87140 transaction(s)] done [0.02s].
## sorting and recoding items ... [19 item(s)] done [0.00s].
## creating bit matrix ... [19 row(s), 87140 column(s)] done [0.00s].
## writing ... [318 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(eclat_items[1:10])
## items support count
## [1] {JavaScript, Kotlin} 0.06016755 5243
## [2] {Java, Kotlin} 0.06372504 5553
## [3] {JavaScript, Rust, TypeScript} 0.05859536 5106
## [4] {JavaScript, Python, Rust} 0.05781501 5038
## [5] {HTML/CSS, JavaScript, Rust} 0.06021345 5247
## [6] {JavaScript, Rust} 0.08403718 7323
## [7] {HTML/CSS, Rust} 0.06843011 5963
## [8] {Rust, SQL} 0.05984622 5215
## [9] {Python, Rust} 0.08663071 7549
## [10] {Rust, TypeScript} 0.06824650 5947
As we can see, we obtained a set of 574 rules for minimum confidence parameter of 50%.
eclat_rules<-ruleInduction(eclat_items, data, confidence=0.5)
eclat_rules
## set of 574 rules
inspect(eclat_rules[1:10])
## lhs rhs support confidence lift itemset
## [1] {Kotlin} => {JavaScript} 0.06016755 0.6607435 1.0334977 1
## [2] {Kotlin} => {Java} 0.06372504 0.6998110 2.2790869 2
## [3] {Rust, TypeScript} => {JavaScript} 0.05859536 0.8585842 1.3429489 3
## [4] {JavaScript, Rust} => {TypeScript} 0.05859536 0.6972552 1.7848718 3
## [5] {Python, Rust} => {JavaScript} 0.05781501 0.6673732 1.0438674 4
## [6] {JavaScript, Rust} => {Python} 0.05781501 0.6879694 1.3890740 4
## [7] {JavaScript, Rust} => {HTML/CSS} 0.06021345 0.7165096 1.3457334 5
## [8] {HTML/CSS, Rust} => {JavaScript} 0.06021345 0.8799262 1.3763309 5
## [9] {Rust} => {JavaScript} 0.08403718 0.6408506 1.0023823 6
## [10] {Rust} => {HTML/CSS} 0.06843011 0.5218343 0.9800982 7
First 6 rules by confidence
inspect(head(sort(eclat_rules, by="confidence", decreasing=TRUE),10))
## lhs rhs support confidence
## [1] {HTML/CSS, PHP, SQL, TypeScript} => {JavaScript} 0.05277714 0.9780944
## [2] {HTML/CSS, PHP, TypeScript} => {JavaScript} 0.06744319 0.9736581
## [3] {PHP, SQL, TypeScript} => {JavaScript} 0.05958228 0.9652352
## [4] {HTML/CSS, Java, Python, TypeScript} => {JavaScript} 0.05632316 0.9600939
## [5] {PHP, TypeScript} => {JavaScript} 0.07835667 0.9587195
## [6] {HTML/CSS, Python, SQL, TypeScript} => {JavaScript} 0.08742254 0.9573960
## [7] {HTML/CSS, PHP, Python, SQL} => {JavaScript} 0.05678219 0.9555813
## [8] {Bash/Shell (all shells), HTML/CSS, SQL, TypeScript} => {JavaScript} 0.07095479 0.9551985
## [9] {HTML/CSS, PowerShell, TypeScript} => {JavaScript} 0.05407390 0.9548126
## [10] {HTML/CSS, Java, SQL, TypeScript} => {JavaScript} 0.06433326 0.9538880
## lift itemset
## [1] 1.529880 59
## [2] 1.522941 62
## [3] 1.509766 60
## [4] 1.501725 216
## [5] 1.499575 63
## [6] 1.497505 274
## [7] 1.494666 66
## [8] 1.494068 250
## [9] 1.493464 32
## [10] 1.492018 220
As we can see, using ECLAT algorithm we obtained similar results compared to the Apriori algorithm, based on top 6 relationships ordered by the highest confidence, as JavaScript and other web-dev related languages take the top spot.
First 6 rules by support
inspect(head(sort(eclat_rules, by="support", decreasing=TRUE),10))
## lhs rhs support confidence lift itemset
## [1] {JavaScript} => {HTML/CSS} 0.4718499 0.7380410 1.3861731 299
## [2] {HTML/CSS} => {JavaScript} 0.4718499 0.8862186 1.3861731 299
## [3] {SQL} => {JavaScript} 0.3657333 0.7477184 1.1695388 297
## [4] {JavaScript} => {SQL} 0.3657333 0.5720594 1.1695388 297
## [5] {TypeScript} => {JavaScript} 0.3436654 0.8797333 1.3760291 285
## [6] {JavaScript} => {TypeScript} 0.3436654 0.5375420 1.3760291 285
## [7] {SQL} => {HTML/CSS} 0.3212302 0.6567346 1.2334653 298
## [8] {HTML/CSS} => {SQL} 0.3212302 0.6033279 1.2334653 298
## [9] {Python} => {JavaScript} 0.3137365 0.6334631 0.9908272 293
## [10] {JavaScript, SQL} => {HTML/CSS} 0.2907276 0.7949168 1.4929963 296
In case of top associations based on support we again see that most of the rules describe web development-related relationships.
First 6 rules by lift
inspect(head(sort(eclat_rules, by="lift", decreasing=TRUE),10))
## lhs rhs support confidence lift itemset
## [1] {Bash/Shell (all shells), C++, Python} => {C} 0.05137709 0.6924981 3.562236 85
## [2] {Bash/Shell (all shells), C++} => {C} 0.06365619 0.6790305 3.492958 95
## [3] {C++, Java} => {C} 0.06235942 0.6659314 3.425576 96
## [4] {C++, HTML/CSS, JavaScript, Python} => {C} 0.05027542 0.6618825 3.404748 86
## [5] {C++, HTML/CSS, Python} => {C} 0.05612807 0.6557179 3.373038 88
## [6] {C++, JavaScript, Python} => {C} 0.06296764 0.6478158 3.332389 87
## [7] {C++, JavaScript, SQL} => {C} 0.05220335 0.6470839 3.328624 89
## [8] {C++, SQL} => {C} 0.06484967 0.6269832 3.225226 93
## [9] {C++, HTML/CSS, JavaScript} => {C} 0.06489557 0.6237591 3.208640 90
## [10] {C++, HTML/CSS} => {C} 0.07298600 0.6173559 3.175702 92
Ordering by high lift we can see object-oriented high-level programming language (C, C++, Java) relationships, with Bash/Shell, which are also used to build executable files from those high-level languages raw files.
Scatterplot of rules’ confidence vs support for ECLAT Algorithm
plot(eclat_rules)
Significance
options(width = 100)
is.significant(eclat_rules, data)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
## [121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
## [166] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [196] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [211] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [226] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [241] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [256] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
## [271] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE
## [286] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [301] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [316] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [331] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [346] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [361] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [376] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [391] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [406] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [421] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [436] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [451] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [466] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [481] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [496] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [511] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [526] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [541] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [556] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [571] TRUE TRUE TRUE TRUE
Redundant sets
options(width = 100)
is.redundant(eclat_rules)
## [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [16] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [31] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [46] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [76] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [106] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [136] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [151] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [166] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE
## [196] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [211] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [226] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
## [256] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [271] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [286] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [301] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [316] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [331] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [346] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [361] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [376] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [391] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [406] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [421] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [436] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
## [451] TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [466] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
## [481] TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [496] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [511] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [526] FALSE TRUE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [541] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [556] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [571] FALSE FALSE FALSE FALSE
Redundant relationships:
options(width = 100)
redundant <- eclat_rules[is.redundant(eclat_rules)==TRUE]
inspect(redundant)
## lhs rhs support confidence lift itemset
## [1] {Rust,
## TypeScript} => {JavaScript} 0.05859536 0.8585842 1.342949 3
## [2] {JavaScript,
## Rust} => {HTML/CSS} 0.06021345 0.7165096 1.345733 5
## [3] {HTML/CSS,
## Rust} => {JavaScript} 0.06021345 0.8799262 1.376331 5
## [4] {Go,
## JavaScript} => {HTML/CSS} 0.06445949 0.6984581 1.311829 16
## [5] {HTML/CSS,
## JavaScript,
## PowerShell} => {C#} 0.05341978 0.6073059 2.187436 25
## [6] {C#,
## JavaScript,
## PowerShell} => {HTML/CSS} 0.05341978 0.8361775 1.570491 25
## [7] {PowerShell,
## SQL} => {Python} 0.05601331 0.5674922 1.145819 39
## [8] {PowerShell,
## Python} => {SQL} 0.05601331 0.7140140 1.459756 39
## [9] {HTML/CSS,
## JavaScript,
## PHP,
## SQL} => {Python} 0.05678219 0.5310722 1.072284 66
## [10] {JavaScript,
## PHP,
## SQL} => {Python} 0.06443654 0.5256998 1.061437 67
## [11] {HTML/CSS,
## PHP,
## SQL} => {Python} 0.05942162 0.5278826 1.065844 68
## [12] {PHP,
## SQL} => {Python} 0.07028919 0.5199932 1.049914 72
## [13] {Bash/Shell (all shells),
## C,
## HTML/CSS} => {JavaScript} 0.05105577 0.8898000 1.391775 104
## [14] {Bash/Shell (all shells),
## C} => {JavaScript} 0.06273812 0.6717041 1.050642 105
## [15] {Bash/Shell (all shells),
## C} => {SQL} 0.05033280 0.5388868 1.101720 107
## [16] {C,
## Python} => {SQL} 0.07085150 0.5202663 1.063651 115
## [17] {C,
## HTML/CSS} => {JavaScript} 0.09439982 0.8782832 1.373761 119
## [18] {C#,
## C++} => {Python} 0.05130824 0.6378941 1.287967 131
## [19] {Bash/Shell (all shells),
## C++,
## Python} => {JavaScript} 0.05005738 0.6747100 1.055343 138
## [20] {Bash/Shell (all shells),
## C++} => {JavaScript} 0.06203810 0.6617701 1.035103 139
## [21] {Bash/Shell (all shells),
## C++} => {HTML/CSS} 0.05610512 0.5984821 1.124057 140
## [22] {C++,
## HTML/CSS,
## JavaScript} => {TypeScript} 0.05335093 0.5127951 1.312681 142
## [23] {C++,
## JavaScript,
## Python} => {SQL} 0.05911177 0.6081464 1.243316 146
## [24] {C++,
## HTML/CSS,
## Python} => {SQL} 0.05400505 0.6309157 1.289867 147
## [25] {C++,
## HTML/CSS} => {JavaScript} 0.10403948 0.8800233 1.376483 155
## [26] {C#,
## HTML/CSS,
## SQL,
## TypeScript} => {JavaScript} 0.06699564 0.9422208 1.473769 175
## [27] {C#,
## SQL,
## TypeScript} => {JavaScript} 0.08006656 0.9132199 1.428407 176
## [28] {C#,
## HTML/CSS,
## TypeScript} => {JavaScript} 0.09055543 0.9272620 1.450371 178
## [29] {C#,
## HTML/CSS,
## Python} => {SQL} 0.05849208 0.7230813 1.478293 185
## [30] {Bash/Shell (all shells),
## Java,
## JavaScript,
## SQL} => {HTML/CSS} 0.05300666 0.8202806 1.540634 206
## [31] {Bash/Shell (all shells),
## Java,
## JavaScript} => {HTML/CSS} 0.07074822 0.7802810 1.465507 209
## [32] {HTML/CSS,
## Java,
## JavaScript,
## TypeScript} => {Python} 0.05632316 0.6245069 1.260937 216
## [33] {HTML/CSS,
## Java,
## JavaScript,
## Python} => {TypeScript} 0.05632316 0.5681213 1.454308 216
## [34] {Java,
## JavaScript,
## SQL,
## TypeScript} => {HTML/CSS} 0.06433326 0.8114054 1.523965 220
## [35] {Java,
## JavaScript,
## SQL} => {HTML/CSS} 0.11131513 0.7892596 1.482371 235
## [36] {Java,
## JavaScript} => {HTML/CSS} 0.15865274 0.7342007 1.378961 238
## [37] {Bash/Shell (all shells),
## JavaScript,
## SQL,
## TypeScript} => {Python} 0.05584118 0.6619508 1.336540 245
## [38] {Bash/Shell (all shells),
## JavaScript,
## Python,
## SQL} => {TypeScript} 0.05584118 0.5778411 1.479189 245
## [39] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## TypeScript} => {Python} 0.06457425 0.6341711 1.280450 246
## [40] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## Python} => {TypeScript} 0.06457425 0.5770690 1.477213 246
## [41] {Bash/Shell (all shells),
## JavaScript,
## TypeScript} => {Python} 0.07925178 0.6277041 1.267393 247
## [42] {Bash/Shell (all shells),
## JavaScript,
## Python} => {TypeScript} 0.07925178 0.5555913 1.422233 247
## [43] {Bash/Shell (all shells),
## HTML/CSS,
## TypeScript} => {Python} 0.06800551 0.6306939 1.273429 248
## [44] {Bash/Shell (all shells),
## HTML/CSS,
## Python} => {TypeScript} 0.06800551 0.5399052 1.382079 248
## [45] {Bash/Shell (all shells),
## SQL,
## TypeScript} => {Python} 0.05978885 0.6590765 1.330737 249
## [46] {Bash/Shell (all shells),
## TypeScript} => {Python} 0.08689465 0.6231586 1.258215 257
## [47] {Bash/Shell (all shells),
## JavaScript,
## Python,
## SQL} => {HTML/CSS} 0.07957310 0.8234176 1.546526 258
## [48] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## SQL} => {Python} 0.07957310 0.6622732 1.337191 258
## [49] {Bash/Shell (all shells),
## Python,
## SQL} => {JavaScript} 0.09663759 0.7720730 1.207633 259
## [50] {Bash/Shell (all shells),
## JavaScript,
## SQL} => {Python} 0.09663759 0.6623407 1.337327 259
## [51] {Bash/Shell (all shells),
## Python,
## SQL} => {HTML/CSS} 0.08703236 0.6953333 1.305960 260
## [52] {Bash/Shell (all shells),
## HTML/CSS,
## SQL} => {Python} 0.08703236 0.6616069 1.335846 260
## [53] {Bash/Shell (all shells),
## HTML/CSS,
## Python} => {JavaScript} 0.11190039 0.8883929 1.389574 261
## [54] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript} => {Python} 0.11190039 0.6429513 1.298178 261
## [55] {Bash/Shell (all shells),
## Python} => {JavaScript} 0.14264402 0.6702615 1.048385 262
## [56] {Bash/Shell (all shells),
## JavaScript} => {Python} 0.14264402 0.6413498 1.294945 262
## [57] {Bash/Shell (all shells),
## Python} => {HTML/CSS} 0.12595823 0.5918576 1.111615 263
## [58] {Bash/Shell (all shells),
## HTML/CSS} => {Python} 0.12595823 0.6454952 1.303315 263
## [59] {HTML/CSS,
## JavaScript,
## Python} => {TypeScript} 0.12836814 0.5454723 1.396330 277
## [60] {JavaScript,
## Python} => {TypeScript} 0.16470048 0.5249643 1.343832 278
## [61] {HTML/CSS,
## Python} => {TypeScript} 0.13547165 0.5094071 1.304008 279
## [62] {SQL,
## TypeScript} => {Python} 0.11483819 0.5289392 1.067977 280
## [63] {Python,
## SQL} => {JavaScript} 0.19318338 0.7462872 1.167300 290
## [64] {JavaScript,
## SQL} => {Python} 0.19318338 0.5282083 1.066502 290
## [65] {HTML/CSS,
## Python} => {JavaScript} 0.23533395 0.8849141 1.384133 292
Insignificant relationships:
options(width = 100)
insignificant <- eclat_rules[is.redundant(eclat_rules)==TRUE]
inspect(insignificant)
## lhs rhs support confidence lift itemset
## [1] {Rust,
## TypeScript} => {JavaScript} 0.05859536 0.8585842 1.342949 3
## [2] {JavaScript,
## Rust} => {HTML/CSS} 0.06021345 0.7165096 1.345733 5
## [3] {HTML/CSS,
## Rust} => {JavaScript} 0.06021345 0.8799262 1.376331 5
## [4] {Go,
## JavaScript} => {HTML/CSS} 0.06445949 0.6984581 1.311829 16
## [5] {HTML/CSS,
## JavaScript,
## PowerShell} => {C#} 0.05341978 0.6073059 2.187436 25
## [6] {C#,
## JavaScript,
## PowerShell} => {HTML/CSS} 0.05341978 0.8361775 1.570491 25
## [7] {PowerShell,
## SQL} => {Python} 0.05601331 0.5674922 1.145819 39
## [8] {PowerShell,
## Python} => {SQL} 0.05601331 0.7140140 1.459756 39
## [9] {HTML/CSS,
## JavaScript,
## PHP,
## SQL} => {Python} 0.05678219 0.5310722 1.072284 66
## [10] {JavaScript,
## PHP,
## SQL} => {Python} 0.06443654 0.5256998 1.061437 67
## [11] {HTML/CSS,
## PHP,
## SQL} => {Python} 0.05942162 0.5278826 1.065844 68
## [12] {PHP,
## SQL} => {Python} 0.07028919 0.5199932 1.049914 72
## [13] {Bash/Shell (all shells),
## C,
## HTML/CSS} => {JavaScript} 0.05105577 0.8898000 1.391775 104
## [14] {Bash/Shell (all shells),
## C} => {JavaScript} 0.06273812 0.6717041 1.050642 105
## [15] {Bash/Shell (all shells),
## C} => {SQL} 0.05033280 0.5388868 1.101720 107
## [16] {C,
## Python} => {SQL} 0.07085150 0.5202663 1.063651 115
## [17] {C,
## HTML/CSS} => {JavaScript} 0.09439982 0.8782832 1.373761 119
## [18] {C#,
## C++} => {Python} 0.05130824 0.6378941 1.287967 131
## [19] {Bash/Shell (all shells),
## C++,
## Python} => {JavaScript} 0.05005738 0.6747100 1.055343 138
## [20] {Bash/Shell (all shells),
## C++} => {JavaScript} 0.06203810 0.6617701 1.035103 139
## [21] {Bash/Shell (all shells),
## C++} => {HTML/CSS} 0.05610512 0.5984821 1.124057 140
## [22] {C++,
## HTML/CSS,
## JavaScript} => {TypeScript} 0.05335093 0.5127951 1.312681 142
## [23] {C++,
## JavaScript,
## Python} => {SQL} 0.05911177 0.6081464 1.243316 146
## [24] {C++,
## HTML/CSS,
## Python} => {SQL} 0.05400505 0.6309157 1.289867 147
## [25] {C++,
## HTML/CSS} => {JavaScript} 0.10403948 0.8800233 1.376483 155
## [26] {C#,
## HTML/CSS,
## SQL,
## TypeScript} => {JavaScript} 0.06699564 0.9422208 1.473769 175
## [27] {C#,
## SQL,
## TypeScript} => {JavaScript} 0.08006656 0.9132199 1.428407 176
## [28] {C#,
## HTML/CSS,
## TypeScript} => {JavaScript} 0.09055543 0.9272620 1.450371 178
## [29] {C#,
## HTML/CSS,
## Python} => {SQL} 0.05849208 0.7230813 1.478293 185
## [30] {Bash/Shell (all shells),
## Java,
## JavaScript,
## SQL} => {HTML/CSS} 0.05300666 0.8202806 1.540634 206
## [31] {Bash/Shell (all shells),
## Java,
## JavaScript} => {HTML/CSS} 0.07074822 0.7802810 1.465507 209
## [32] {HTML/CSS,
## Java,
## JavaScript,
## TypeScript} => {Python} 0.05632316 0.6245069 1.260937 216
## [33] {HTML/CSS,
## Java,
## JavaScript,
## Python} => {TypeScript} 0.05632316 0.5681213 1.454308 216
## [34] {Java,
## JavaScript,
## SQL,
## TypeScript} => {HTML/CSS} 0.06433326 0.8114054 1.523965 220
## [35] {Java,
## JavaScript,
## SQL} => {HTML/CSS} 0.11131513 0.7892596 1.482371 235
## [36] {Java,
## JavaScript} => {HTML/CSS} 0.15865274 0.7342007 1.378961 238
## [37] {Bash/Shell (all shells),
## JavaScript,
## SQL,
## TypeScript} => {Python} 0.05584118 0.6619508 1.336540 245
## [38] {Bash/Shell (all shells),
## JavaScript,
## Python,
## SQL} => {TypeScript} 0.05584118 0.5778411 1.479189 245
## [39] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## TypeScript} => {Python} 0.06457425 0.6341711 1.280450 246
## [40] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## Python} => {TypeScript} 0.06457425 0.5770690 1.477213 246
## [41] {Bash/Shell (all shells),
## JavaScript,
## TypeScript} => {Python} 0.07925178 0.6277041 1.267393 247
## [42] {Bash/Shell (all shells),
## JavaScript,
## Python} => {TypeScript} 0.07925178 0.5555913 1.422233 247
## [43] {Bash/Shell (all shells),
## HTML/CSS,
## TypeScript} => {Python} 0.06800551 0.6306939 1.273429 248
## [44] {Bash/Shell (all shells),
## HTML/CSS,
## Python} => {TypeScript} 0.06800551 0.5399052 1.382079 248
## [45] {Bash/Shell (all shells),
## SQL,
## TypeScript} => {Python} 0.05978885 0.6590765 1.330737 249
## [46] {Bash/Shell (all shells),
## TypeScript} => {Python} 0.08689465 0.6231586 1.258215 257
## [47] {Bash/Shell (all shells),
## JavaScript,
## Python,
## SQL} => {HTML/CSS} 0.07957310 0.8234176 1.546526 258
## [48] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript,
## SQL} => {Python} 0.07957310 0.6622732 1.337191 258
## [49] {Bash/Shell (all shells),
## Python,
## SQL} => {JavaScript} 0.09663759 0.7720730 1.207633 259
## [50] {Bash/Shell (all shells),
## JavaScript,
## SQL} => {Python} 0.09663759 0.6623407 1.337327 259
## [51] {Bash/Shell (all shells),
## Python,
## SQL} => {HTML/CSS} 0.08703236 0.6953333 1.305960 260
## [52] {Bash/Shell (all shells),
## HTML/CSS,
## SQL} => {Python} 0.08703236 0.6616069 1.335846 260
## [53] {Bash/Shell (all shells),
## HTML/CSS,
## Python} => {JavaScript} 0.11190039 0.8883929 1.389574 261
## [54] {Bash/Shell (all shells),
## HTML/CSS,
## JavaScript} => {Python} 0.11190039 0.6429513 1.298178 261
## [55] {Bash/Shell (all shells),
## Python} => {JavaScript} 0.14264402 0.6702615 1.048385 262
## [56] {Bash/Shell (all shells),
## JavaScript} => {Python} 0.14264402 0.6413498 1.294945 262
## [57] {Bash/Shell (all shells),
## Python} => {HTML/CSS} 0.12595823 0.5918576 1.111615 263
## [58] {Bash/Shell (all shells),
## HTML/CSS} => {Python} 0.12595823 0.6454952 1.303315 263
## [59] {HTML/CSS,
## JavaScript,
## Python} => {TypeScript} 0.12836814 0.5454723 1.396330 277
## [60] {JavaScript,
## Python} => {TypeScript} 0.16470048 0.5249643 1.343832 278
## [61] {HTML/CSS,
## Python} => {TypeScript} 0.13547165 0.5094071 1.304008 279
## [62] {SQL,
## TypeScript} => {Python} 0.11483819 0.5289392 1.067977 280
## [63] {Python,
## SQL} => {JavaScript} 0.19318338 0.7462872 1.167300 290
## [64] {JavaScript,
## SQL} => {Python} 0.19318338 0.5282083 1.066502 290
## [65] {HTML/CSS,
## Python} => {JavaScript} 0.23533395 0.8849141 1.384133 292
As we can see, the association rules can show relatively well associations of programming languages that are used by the developers. Both Apriori and ECLAT algorithms were able to capture relationships between the languages, mainly the web development-related tools’ relations were visible, which can be attributted to the popularity of the JavaScript language. But some data-related and object-oriented programming relationships were also visible. Obtained relationships provide a good source of information about the knowlegde of different programming languages and groups that they create, depending on the main work focus of different developers.
However, we need to remember that the data used was collected using public survey. This can suggest that some of the observations were not representative. Also, this survey did not collect the data regarding each language knowledge and experience level, which may have caused the users to select languages that they did not in fact known well.
Regardless of that, we can conclude that the association rules can be effectively used to analyse the relationships between the programming languages in the different sectors of the software development