The purpose of this case study is to explore common passwords and look for trends in those passwords. Specifically, I look for both structural and sentiment trends in the passwords. I have two research questions for this case study:
Research Questions: What are common patterns in the passwords? What sentiments can be seen in common passwords?
My data set contains 10,000 of the most common passwords. I found this data set on Kaggle, and the data was collected by SecLists (Bansal, 2022). The data was collected as part of an effort to test security measures. The key variables are:
My target audience could include anyone interested in cyber-security or password strength. Another possible target audience could be people who are looking to do similar analyses in the future.
My first step in this case study is to load the appropriate libraries for my analysis. After loading these libraries, I can move on to working the data wrangling step.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidytext)
library(dplyr)
library(readr)
library(tidyr)
library(textdata)
library(ggplot2)
library(textdata)
library(igraph)
##
## Attaching package: 'igraph'
##
## The following objects are masked from 'package:lubridate':
##
## %--%, union
##
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
##
## The following objects are masked from 'package:purrr':
##
## compose, simplify
##
## The following object is masked from 'package:tidyr':
##
## crossing
##
## The following object is masked from 'package:tibble':
##
## as_data_frame
##
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
##
## The following object is masked from 'package:base':
##
## union
library(ggraph)
library(vroom)
library(knitr)
I began by uploading my data set that I got from Kaggle.
common_passwords <- read.csv("common_passwords.csv")
The data wrangling required for this particular data set and these research questions is not extensive. Since I am working with passwords instead of sentences or paragraphs, there isn’t a need to tokenize, stem, or remove stop words from my data set. These typical text preprocessing steps would be detrimental to my analysis since the structure of the passwords are important for my analysis. Additionally, these steps would be difficult because many passwords are not real words, instead using things like keyboard patterns, compound words, and mixtures of letters and other symbols.
I did not end up creating any additional features since the original data set gave me the information I needed for my analysis. However, I removed the “num_syllables” feature since it was not a reliable feature and the values were confusing at times. Also, I simply did not need that feature for my analysis.
common_passwords <- common_passwords |>
select(!num_syllables)
For my analysis, I first looked at averages and counts to find trends in password composition. Next, I performed a sentiment analysis using both the Bing and AFINN sentiment dictionaries.
First, I took a look at the average values for each of the password features:
What we can see here is that the most common password features are for them to be about 6-7 characters long, mostly featuring lowercase letters, with 1-2 vowels and numbers.
mean(common_passwords$length)
## [1] 6.6513
mean(common_passwords$num_chars)
## [1] 5.0303
mean(common_passwords$num_digits)
## [1] 1.6176
mean(common_passwords$num_upper)
## [1] 0.0253
mean(common_passwords$num_lower)
## [1] 5.005
mean(common_passwords$num_special)
## [1] 0.0034
mean(common_passwords$num_vowels)
## [1] 1.8059
Second, I took a look at the actual counts of passwords with different compositions. I broke my data into groups based on the the presence or absence of numbers, letters, and special characters. This lead me to see that there are:
The three most common styles of passwords in the data set are: letters only, numbers only, and only letters and numbers. This suggests that using passwords that include special characters, alone or with letters and/or numbers, create safer passwords. The least common pairing in the data set included letters, numbers, and special characters which supports the common advice for creating strong passwords.
passwords_num <- common_passwords |>
filter(num_digits > 0) |>
filter(num_special == 0) |>
filter(num_chars == 0)
passwords_num_char <- common_passwords |>
filter(num_digits > 0) |>
filter(num_special == 0) |>
filter(num_chars > 0)
passwords_char <- common_passwords |>
filter(num_digits == 0) |>
filter(num_special == 0)
passwords_spec <- common_passwords |>
filter(num_special > 0) |>
filter(num_digits == 0) |>
filter(num_chars == 0)
passwords_spec_char <- common_passwords |>
filter(num_special > 0) |>
filter(num_digits == 0) |>
filter(num_chars > 0)
passwords_spec_num <- common_passwords |>
filter(num_special > 0) |>
filter(num_digits > 0) |>
filter(num_chars == 0)
passwords_num_char_spec <- common_passwords |>
filter(num_special > 0) |>
filter(num_digits > 0) |>
filter(num_chars > 0)
Third, I conducted a sentiment analysis of the passwords. Sentiment analysis only works with real words, so this part of the analysis does end up exclude a lot of the passwords in the data set. My sentiment analyses only record the sentiment of a small fraction of the 7177 passwords that contain only words; however, it is important to keep in mind that some of the letters-only instances include things like keyboard patterns and sequences (ex. qwerty, abcdef). Instances following pattern and sequence conventions would not convey sentiment at all.
Additionally, some passwords are compound words (ex. wrongpassword) and therefore cannot be accurately represented in this part of the analysis. While the passwords made of compound words can convey sentiments, there was not a timely way to separate out all of these words. This same idea is true for the passwords that are made of both words and other symbols.
The Bing sentiment analysis shows the sentiment values for 402 instances in my data set out of the 7177 instances that contain only letters. First, I retrieved the Bing sentiment dictionary (Mohammad & Turney, 2013) and then I used an inner join to get the sentiments of the words appearing in that dictionary.
bing <- get_sentiments("bing")
password_sentiment <- passwords_char |>
rename(word = password)
password_sentiment_bing <- inner_join(password_sentiment, bing, by = "word")
password_sentiment_bing
## word length num_chars num_digits num_upper num_lower num_special
## 1 master 6 6 0 0 6 0
## 2 killer 6 6 0 0 6 0
## 3 easy 4 4 0 0 4 0
## 4 freedom 7 7 0 0 7 0
## 5 fuck 4 4 0 0 4 0
## 6 love 4 4 0 0 4 0
## 7 sexy 4 4 0 0 4 0
## 8 welcome 7 7 0 0 7 0
## 9 monster 7 7 0 0 7 0
## 10 angel 5 5 0 0 5 0
## 11 winner 6 6 0 0 6 0
## 12 golden 6 6 0 0 6 0
## 13 bitch 5 5 0 0 5 0
## 14 dick 4 4 0 0 4 0
## 15 stupid 6 6 0 0 6 0
## 16 success 7 7 0 0 7 0
## 17 lucky 5 5 0 0 5 0
## 18 creative 8 8 0 0 8 0
## 19 trouble 7 7 0 0 7 0
## 20 happy 5 5 0 0 5 0
## 21 fucking 7 7 0 0 7 0
## 22 bullshit 8 8 0 0 8 0
## 23 magic 5 5 0 0 5 0
## 24 darkness 8 8 0 0 8 0
## 25 heaven 6 6 0 0 6 0
## 26 lover 5 5 0 0 5 0
## 27 destiny 7 7 0 0 7 0
## 28 lovely 6 6 0 0 6 0
## 29 paradise 8 8 0 0 8 0
## 30 genius 6 6 0 0 6 0
## 31 cool 4 4 0 0 4 0
## 32 speedy 6 6 0 0 6 0
## 33 viper 5 5 0 0 5 0
## 34 champion 8 8 0 0 8 0
## 35 spooky 6 6 0 0 6 0
## 36 free 4 4 0 0 4 0
## 37 death 5 5 0 0 5 0
## 38 liberty 7 7 0 0 7 0
## 39 sucker 6 6 0 0 6 0
## 40 rocky 5 5 0 0 5 0
## 41 passion 7 7 0 0 7 0
## 42 bastard 7 7 0 0 7 0
## 43 rascal 6 6 0 0 6 0
## 44 shit 4 4 0 0 4 0
## 45 cunt 4 4 0 0 4 0
## 46 galore 6 6 0 0 6 0
## 47 sweet 5 5 0 0 5 0
## 48 super 5 5 0 0 5 0
## 49 naughty 7 7 0 0 7 0
## 50 slut 4 4 0 0 4 0
## 51 zombie 6 6 0 0 6 0
## 52 smooth 6 6 0 0 6 0
## 53 awesome 7 7 0 0 7 0
## 54 danger 6 6 0 0 6 0
## 55 nemesis 7 7 0 0 7 0
## 56 sublime 7 7 0 0 7 0
## 57 victory 7 7 0 0 7 0
## 58 classic 7 7 0 0 7 0
## 59 crazy 5 5 0 0 5 0
## 60 hard 4 4 0 0 4 0
## 61 pretty 6 6 0 0 6 0
## 62 good 4 4 0 0 4 0
## 63 chronic 7 7 0 0 7 0
## 64 bondage 7 7 0 0 7 0
## 65 perfect 7 7 0 0 7 0
## 66 serenity 8 8 0 0 8 0
## 67 outlaw 6 6 0 0 6 0
## 68 strike 6 6 0 0 6 0
## 69 rockstar 8 8 0 0 8 0
## 70 defender 8 8 0 0 8 0
## 71 cancer 6 6 0 0 6 0
## 72 beauty 6 6 0 0 6 0
## 73 savage 6 6 0 0 6 0
## 74 precious 8 8 0 0 8 0
## 75 pervert 7 7 0 0 7 0
## 76 enjoy 5 5 0 0 5 0
## 77 madman 6 6 0 0 6 0
## 78 assassin 8 8 0 0 8 0
## 79 joker 5 5 0 0 5 0
## 80 tickle 6 6 0 0 6 0
## 81 mistress 8 8 0 0 8 0
## 82 wicked 6 6 0 0 6 0
## 83 great 5 5 0 0 5 0
## 84 trumpet 7 7 0 0 7 0
## 85 insane 6 6 0 0 6 0
## 86 wonder 6 6 0 0 6 0
## 87 stormy 6 6 0 0 6 0
## 88 rusty 5 5 0 0 5 0
## 89 triumph 7 7 0 0 7 0
## 90 dirty 5 5 0 0 5 0
## 91 smoke 5 5 0 0 5 0
## 92 poison 6 6 0 0 6 0
## 93 weed 4 4 0 0 4 0
## 94 loser 5 5 0 0 5 0
## 95 fatcat 6 6 0 0 6 0
## 96 whore 5 5 0 0 5 0
## 97 madness 7 7 0 0 7 0
## 98 pleasure 8 8 0 0 8 0
## 99 smile 5 5 0 0 5 0
## 100 sluts 5 5 0 0 5 0
## 101 suck 4 4 0 0 4 0
## 102 smiles 6 6 0 0 6 0
## 103 sticky 6 6 0 0 6 0
## 104 peace 5 5 0 0 5 0
## 105 unreal 6 6 0 0 6 0
## 106 faster 6 6 0 0 6 0
## 107 freak 5 5 0 0 5 0
## 108 maniac 6 6 0 0 6 0
## 109 patriot 7 7 0 0 7 0
## 110 nasty 5 5 0 0 5 0
## 111 grace 5 5 0 0 5 0
## 112 support 7 7 0 0 7 0
## 113 chaos 5 5 0 0 5 0
## 114 mirage 6 6 0 0 6 0
## 115 harmony 7 7 0 0 7 0
## 116 gold 4 4 0 0 4 0
## 117 sucks 5 5 0 0 5 0
## 118 slick 5 5 0 0 5 0
## 119 gangster 8 8 0 0 8 0
## 120 shark 5 5 0 0 5 0
## 121 stranger 8 8 0 0 8 0
## 122 wisdom 6 6 0 0 6 0
## 123 masters 7 7 0 0 7 0
## 124 beautiful 9 9 0 0 9 0
## 125 hell 4 4 0 0 4 0
## 126 devil 5 5 0 0 5 0
## 127 shemale 7 7 0 0 7 0
## 128 dark 4 4 0 0 4 0
## 129 loving 6 6 0 0 6 0
## 130 tempest 7 7 0 0 7 0
## 131 prodigy 7 7 0 0 7 0
## 132 mystery 7 7 0 0 7 0
## 133 defiant 7 7 0 0 7 0
## 134 garbage 7 7 0 0 7 0
## 135 dusty 5 5 0 0 5 0
## 136 broken 6 6 0 0 6 0
## 137 marvel 6 6 0 0 6 0
## 138 grateful 8 8 0 0 8 0
## 139 lonely 6 6 0 0 6 0
## 140 smudge 6 6 0 0 6 0
## 141 faith 5 5 0 0 5 0
## 142 dead 4 4 0 0 4 0
## 143 scream 6 6 0 0 6 0
## 144 mature 6 6 0 0 6 0
## 145 darling 7 7 0 0 7 0
## 146 desert 6 6 0 0 6 0
## 147 fallen 6 6 0 0 6 0
## 148 terror 6 6 0 0 6 0
## 149 strong 6 6 0 0 6 0
## 150 anarchy 7 7 0 0 7 0
## 151 tricky 6 6 0 0 6 0
## 152 funny 5 5 0 0 5 0
## 153 crash 5 5 0 0 5 0
## 154 breeze 6 6 0 0 6 0
## 155 skinny 6 6 0 0 6 0
## 156 ready 5 5 0 0 5 0
## 157 freaks 6 6 0 0 6 0
## 158 unknown 7 7 0 0 7 0
## 159 blow 4 4 0 0 4 0
## 160 smut 4 4 0 0 4 0
## 161 mighty 6 6 0 0 6 0
## 162 madden 6 6 0 0 6 0
## 163 hustler 7 7 0 0 7 0
## 164 handsome 8 8 0 0 8 0
## 165 demon 5 5 0 0 5 0
## 166 saint 5 5 0 0 5 0
## 167 retard 6 6 0 0 6 0
## 168 miracle 7 7 0 0 7 0
## 169 rotten 6 6 0 0 6 0
## 170 fast 4 4 0 0 4 0
## 171 grumpy 6 6 0 0 6 0
## 172 sparkle 7 7 0 0 7 0
## 173 treasure 8 8 0 0 8 0
## 174 fright 6 6 0 0 6 0
## 175 attack 6 6 0 0 6 0
## 176 zenith 6 6 0 0 6 0
## 177 revenge 7 7 0 0 7 0
## 178 bloody 6 6 0 0 6 0
## 179 aspire 6 6 0 0 6 0
## 180 stolen 6 6 0 0 6 0
## 181 positive 8 8 0 0 8 0
## 182 easy 4 4 0 0 4 0
## 183 dragoon 7 7 0 0 7 0
## 184 cunts 5 5 0 0 5 0
## 185 slave 5 5 0 0 5 0
## 186 rich 4 4 0 0 4 0
## 187 murder 6 6 0 0 6 0
## 188 midget 6 6 0 0 6 0
## 189 ghetto 6 6 0 0 6 0
## 190 fortune 7 7 0 0 7 0
## 191 ding 4 4 0 0 4 0
## 192 work 4 4 0 0 4 0
## 193 divine 6 6 0 0 6 0
## 194 twisted 7 7 0 0 7 0
## 195 majestic 8 8 0 0 8 0
## 196 better 6 6 0 0 6 0
## 197 strange 7 7 0 0 7 0
## 198 spank 5 5 0 0 5 0
## 199 smelly 6 6 0 0 6 0
## 200 underdog 8 8 0 0 8 0
## 201 damage 6 6 0 0 6 0
## 202 static 6 6 0 0 6 0
## 203 scratch 7 7 0 0 7 0
## 204 misfit 6 6 0 0 6 0
## 205 hang 4 4 0 0 4 0
## 206 frozen 6 6 0 0 6 0
## 207 kill 4 4 0 0 4 0
## 208 flounder 8 8 0 0 8 0
## 209 losers 6 6 0 0 6 0
## 210 fallout 7 7 0 0 7 0
## 211 dawn 4 4 0 0 4 0
## 212 bonkers 7 7 0 0 7 0
## 213 fresh 5 5 0 0 5 0
## 214 respect 7 7 0 0 7 0
## 215 freeze 6 6 0 0 6 0
## 216 champ 5 5 0 0 5 0
## 217 premier 7 7 0 0 7 0
## 218 shun 4 4 0 0 4 0
## 219 goofy 5 5 0 0 5 0
## 220 oasis 5 5 0 0 5 0
## 221 filthy 6 6 0 0 6 0
## 222 sting 5 5 0 0 5 0
## 223 blossom 7 7 0 0 7 0
## 224 secure 6 6 0 0 6 0
## 225 overkill 8 8 0 0 8 0
## 226 illusion 8 8 0 0 8 0
## 227 confused 8 8 0 0 8 0
## 228 zippy 5 5 0 0 5 0
## 229 amazing 7 7 0 0 7 0
## 230 nice 4 4 0 0 4 0
## 231 idiot 5 5 0 0 5 0
## 232 bravo 5 5 0 0 5 0
## 233 blessing 8 8 0 0 8 0
## 234 wild 4 4 0 0 4 0
## 235 peach 5 5 0 0 5 0
## 236 motley 6 6 0 0 6 0
## 237 glory 5 5 0 0 5 0
## 238 cloud 5 5 0 0 5 0
## 239 squash 6 6 0 0 6 0
## 240 menace 6 6 0 0 6 0
## 241 famous 6 6 0 0 6 0
## 242 sinister 8 8 0 0 8 0
## 243 misery 6 6 0 0 6 0
## 244 chunky 6 6 0 0 6 0
## 245 supreme 7 7 0 0 7 0
## 246 destroy 7 7 0 0 7 0
## 247 cheater 7 7 0 0 7 0
## 248 wrestle 7 7 0 0 7 0
## 249 courage 7 7 0 0 7 0
## 250 recovery 8 8 0 0 8 0
## 251 lighter 7 7 0 0 7 0
## 252 fuzzy 5 5 0 0 5 0
## 253 flexible 8 8 0 0 8 0
## 254 bright 6 6 0 0 6 0
## 255 warning 7 7 0 0 7 0
## 256 magical 7 7 0 0 7 0
## 257 silly 5 5 0 0 5 0
## 258 elite 5 5 0 0 5 0
## 259 lost 4 4 0 0 4 0
## 260 crack 5 5 0 0 5 0
## 261 lemon 5 5 0 0 5 0
## 262 friendly 8 8 0 0 8 0
## 263 criminal 8 8 0 0 8 0
## 264 tank 4 4 0 0 4 0
## 265 silent 6 6 0 0 6 0
## 266 hack 4 4 0 0 4 0
## 267 abnormal 8 8 0 0 8 0
## 268 hollow 6 6 0 0 6 0
## 269 boom 4 4 0 0 4 0
## 270 promise 7 7 0 0 7 0
## 271 patience 8 8 0 0 8 0
## 272 paranoid 8 8 0 0 8 0
## 273 hopeless 8 8 0 0 8 0
## 274 butcher 7 7 0 0 7 0
## 275 shock 5 5 0 0 5 0
## 276 weird 5 5 0 0 5 0
## 277 suicide 7 7 0 0 7 0
## 278 hopeful 7 7 0 0 7 0
## 279 sporty 6 6 0 0 6 0
## 280 reckless 8 8 0 0 8 0
## 281 nightmare 9 9 0 0 9 0
## 282 doomsday 8 8 0 0 8 0
## 283 superb 6 6 0 0 6 0
## 284 insanity 8 8 0 0 8 0
## 285 snappy 6 6 0 0 6 0
## 286 killjoy 7 7 0 0 7 0
## 287 happiness 9 9 0 0 9 0
## 288 faithful 8 8 0 0 8 0
## 289 stress 6 6 0 0 6 0
## 290 radical 7 7 0 0 7 0
## 291 loveless 8 8 0 0 8 0
## 292 hooligan 8 8 0 0 8 0
## 293 chilly 6 6 0 0 6 0
## 294 carnage 7 7 0 0 7 0
## 295 praise 6 6 0 0 6 0
## 296 forsaken 8 8 0 0 8 0
## 297 fiction 7 7 0 0 7 0
## 298 delight 7 7 0 0 7 0
## 299 puppet 6 6 0 0 6 0
## 300 gorgeous 8 8 0 0 8 0
## 301 frost 5 5 0 0 5 0
## 302 bitter 6 6 0 0 6 0
## 303 winners 7 7 0 0 7 0
## 304 sharp 5 5 0 0 5 0
## 305 punk 4 4 0 0 4 0
## 306 wasted 6 6 0 0 6 0
## 307 trust 5 5 0 0 5 0
## 308 loud 4 4 0 0 4 0
## 309 smart 5 5 0 0 5 0
## 310 fatty 5 5 0 0 5 0
## 311 comfort 7 7 0 0 7 0
## 312 sweetness 9 9 0 0 9 0
## 313 sneaky 6 6 0 0 6 0
## 314 survivor 8 8 0 0 8 0
## 315 spotty 6 6 0 0 6 0
## 316 runaway 7 7 0 0 7 0
## 317 knock 5 5 0 0 5 0
## 318 deadly 6 6 0 0 6 0
## 319 dungeon 7 7 0 0 7 0
## 320 superior 8 8 0 0 8 0
## 321 mischief 8 8 0 0 8 0
## 322 grand 5 5 0 0 5 0
## 323 bitchy 6 6 0 0 6 0
## 324 fearless 8 8 0 0 8 0
## 325 evil 4 4 0 0 4 0
## 326 rampage 7 7 0 0 7 0
## 327 luck 4 4 0 0 4 0
## 328 hate 4 4 0 0 4 0
## 329 cold 4 4 0 0 4 0
## 330 choke 5 5 0 0 5 0
## 331 addict 6 6 0 0 6 0
## 332 right 5 5 0 0 5 0
## 333 limited 7 7 0 0 7 0
## 334 godlike 7 7 0 0 7 0
## 335 glow 4 4 0 0 4 0
## 336 excite 6 6 0 0 6 0
## 337 virus 5 5 0 0 5 0
## 338 squeak 6 6 0 0 6 0
## 339 rogue 5 5 0 0 5 0
## 340 punch 5 5 0 0 5 0
## 341 pain 4 4 0 0 4 0
## 342 modern 6 6 0 0 6 0
## 343 best 4 4 0 0 4 0
## 344 rough 5 5 0 0 5 0
## 345 moron 5 5 0 0 5 0
## 346 malice 6 6 0 0 6 0
## 347 sloppy 6 6 0 0 6 0
## 348 progress 8 8 0 0 8 0
## 349 hottest 7 7 0 0 7 0
## 350 greedy 6 6 0 0 6 0
## 351 awful 5 5 0 0 5 0
## 352 junk 4 4 0 0 4 0
## 353 hung 4 4 0 0 4 0
## 354 hardball 8 8 0 0 8 0
## 355 funky 5 5 0 0 5 0
## 356 bomb 4 4 0 0 4 0
## 357 angelic 7 7 0 0 7 0
## 358 tender 6 6 0 0 6 0
## 359 mad 3 3 0 0 3 0
## 360 grin 4 4 0 0 4 0
## 361 burns 5 5 0 0 5 0
## 362 bugs 4 4 0 0 4 0
## 363 beloved 7 7 0 0 7 0
## 364 bastards 8 8 0 0 8 0
## 365 lonesome 8 8 0 0 8 0
## 366 hatred 6 6 0 0 6 0
## 367 foolish 7 7 0 0 7 0
## 368 burn 4 4 0 0 4 0
## 369 wonderful 9 9 0 0 9 0
## 370 rage 4 4 0 0 4 0
## 371 prosper 7 7 0 0 7 0
## 372 honor 5 5 0 0 5 0
## 373 errors 6 6 0 0 6 0
## 374 bungle 6 6 0 0 6 0
## 375 worthy 6 6 0 0 6 0
## 376 venom 5 5 0 0 5 0
## 377 romantic 8 8 0 0 8 0
## 378 protect 7 7 0 0 7 0
## 379 pinnacle 8 8 0 0 8 0
## 380 pathetic 8 8 0 0 8 0
## 381 loose 5 5 0 0 5 0
## 382 dirt 4 4 0 0 4 0
## 383 cannibal 8 8 0 0 8 0
## 384 savior 6 6 0 0 6 0
## 385 jabber 6 6 0 0 6 0
## 386 compact 7 7 0 0 7 0
## 387 chill 5 5 0 0 5 0
## 388 trauma 6 6 0 0 6 0
## 389 stellar 7 7 0 0 7 0
## 390 spoiled 7 7 0 0 7 0
## 391 sick 4 4 0 0 4 0
## 392 jerk 4 4 0 0 4 0
## 393 crappy 6 6 0 0 6 0
## 394 cloudy 6 6 0 0 6 0
## 395 burning 7 7 0 0 7 0
## 396 sucked 6 6 0 0 6 0
## 397 shady 5 5 0 0 5 0
## 398 screwy 6 6 0 0 6 0
## 399 grimace 7 7 0 0 7 0
## 400 downer 6 6 0 0 6 0
## 401 doom 4 4 0 0 4 0
## 402 charisma 8 8 0 0 8 0
## num_vowels sentiment
## 1 2 positive
## 2 2 negative
## 3 2 positive
## 4 3 positive
## 5 1 negative
## 6 2 positive
## 7 1 positive
## 8 3 positive
## 9 2 negative
## 10 2 positive
## 11 2 positive
## 12 2 positive
## 13 1 negative
## 14 1 negative
## 15 2 negative
## 16 2 positive
## 17 1 positive
## 18 4 positive
## 19 3 negative
## 20 1 positive
## 21 2 negative
## 22 2 negative
## 23 2 positive
## 24 2 negative
## 25 3 positive
## 26 2 positive
## 27 2 positive
## 28 2 positive
## 29 4 positive
## 30 3 positive
## 31 2 positive
## 32 2 positive
## 33 2 negative
## 34 3 positive
## 35 2 negative
## 36 2 positive
## 37 2 negative
## 38 2 positive
## 39 2 negative
## 40 1 negative
## 41 3 positive
## 42 2 negative
## 43 2 negative
## 44 1 negative
## 45 1 negative
## 46 3 positive
## 47 2 positive
## 48 2 positive
## 49 2 negative
## 50 1 negative
## 51 3 negative
## 52 2 positive
## 53 4 positive
## 54 2 negative
## 55 3 negative
## 56 3 positive
## 57 2 positive
## 58 2 positive
## 59 1 negative
## 60 1 negative
## 61 1 positive
## 62 2 positive
## 63 2 negative
## 64 3 negative
## 65 2 positive
## 66 3 positive
## 67 3 negative
## 68 2 negative
## 69 2 positive
## 70 3 positive
## 71 2 negative
## 72 3 positive
## 73 3 negative
## 74 4 positive
## 75 2 negative
## 76 2 positive
## 77 2 negative
## 78 3 negative
## 79 2 negative
## 80 2 positive
## 81 2 negative
## 82 2 negative
## 83 2 positive
## 84 2 positive
## 85 3 negative
## 86 2 positive
## 87 1 negative
## 88 1 negative
## 89 2 positive
## 90 1 negative
## 91 2 negative
## 92 3 negative
## 93 2 negative
## 94 2 negative
## 95 2 negative
## 96 2 negative
## 97 2 negative
## 98 4 positive
## 99 2 positive
## 100 1 negative
## 101 1 negative
## 102 2 positive
## 103 1 negative
## 104 3 positive
## 105 3 positive
## 106 2 positive
## 107 2 negative
## 108 3 negative
## 109 3 positive
## 110 1 negative
## 111 2 positive
## 112 2 positive
## 113 2 negative
## 114 3 negative
## 115 2 positive
## 116 1 positive
## 117 1 negative
## 118 1 positive
## 119 2 negative
## 120 1 negative
## 121 2 negative
## 122 2 positive
## 123 2 positive
## 124 5 positive
## 125 1 negative
## 126 2 negative
## 127 3 negative
## 128 1 negative
## 129 2 positive
## 130 2 negative
## 131 2 positive
## 132 1 negative
## 133 3 negative
## 134 3 negative
## 135 1 negative
## 136 2 negative
## 137 2 positive
## 138 3 positive
## 139 2 negative
## 140 2 negative
## 141 2 positive
## 142 2 negative
## 143 2 negative
## 144 3 positive
## 145 2 positive
## 146 2 negative
## 147 2 negative
## 148 2 negative
## 149 1 positive
## 150 2 negative
## 151 1 negative
## 152 1 negative
## 153 1 negative
## 154 3 positive
## 155 1 negative
## 156 2 positive
## 157 2 negative
## 158 2 negative
## 159 1 negative
## 160 1 negative
## 161 1 positive
## 162 2 negative
## 163 2 negative
## 164 3 positive
## 165 2 negative
## 166 2 positive
## 167 2 negative
## 168 3 positive
## 169 2 negative
## 170 1 positive
## 171 1 negative
## 172 2 positive
## 173 4 positive
## 174 1 negative
## 175 2 negative
## 176 2 positive
## 177 3 negative
## 178 2 negative
## 179 3 positive
## 180 2 negative
## 181 4 positive
## 182 2 positive
## 183 3 negative
## 184 1 negative
## 185 2 negative
## 186 1 positive
## 187 2 negative
## 188 2 negative
## 189 2 negative
## 190 3 positive
## 191 1 negative
## 192 1 positive
## 193 3 positive
## 194 2 negative
## 195 3 positive
## 196 2 positive
## 197 2 negative
## 198 1 negative
## 199 1 negative
## 200 3 negative
## 201 3 negative
## 202 2 negative
## 203 1 negative
## 204 2 negative
## 205 1 negative
## 206 2 negative
## 207 1 negative
## 208 3 negative
## 209 2 negative
## 210 3 negative
## 211 1 positive
## 212 2 negative
## 213 1 positive
## 214 2 positive
## 215 3 negative
## 216 1 positive
## 217 3 positive
## 218 1 negative
## 219 2 negative
## 220 3 positive
## 221 1 negative
## 222 1 negative
## 223 2 positive
## 224 3 positive
## 225 3 negative
## 226 4 negative
## 227 3 negative
## 228 1 positive
## 229 3 positive
## 230 2 positive
## 231 3 negative
## 232 2 positive
## 233 2 positive
## 234 1 negative
## 235 2 positive
## 236 2 negative
## 237 1 positive
## 238 2 negative
## 239 2 negative
## 240 3 negative
## 241 3 positive
## 242 3 negative
## 243 2 negative
## 244 1 negative
## 245 3 positive
## 246 2 negative
## 247 3 negative
## 248 2 negative
## 249 4 positive
## 250 3 positive
## 251 2 positive
## 252 1 negative
## 253 3 positive
## 254 1 positive
## 255 2 negative
## 256 3 positive
## 257 1 negative
## 258 3 positive
## 259 1 negative
## 260 1 negative
## 261 2 negative
## 262 2 positive
## 263 3 negative
## 264 1 negative
## 265 2 positive
## 266 1 negative
## 267 3 negative
## 268 2 negative
## 269 2 positive
## 270 3 positive
## 271 4 positive
## 272 4 negative
## 273 3 negative
## 274 2 negative
## 275 1 negative
## 276 2 negative
## 277 4 negative
## 278 3 positive
## 279 1 positive
## 280 2 negative
## 281 3 negative
## 282 3 negative
## 283 2 positive
## 284 3 negative
## 285 1 positive
## 286 2 negative
## 287 3 positive
## 288 3 positive
## 289 1 negative
## 290 3 negative
## 291 3 negative
## 292 4 negative
## 293 1 negative
## 294 3 negative
## 295 3 positive
## 296 3 negative
## 297 3 negative
## 298 2 positive
## 299 2 negative
## 300 4 positive
## 301 1 negative
## 302 2 negative
## 303 2 positive
## 304 1 positive
## 305 1 negative
## 306 2 negative
## 307 1 positive
## 308 2 negative
## 309 1 positive
## 310 1 negative
## 311 2 positive
## 312 3 positive
## 313 2 negative
## 314 3 positive
## 315 1 negative
## 316 3 negative
## 317 1 negative
## 318 2 negative
## 319 3 negative
## 320 4 positive
## 321 3 negative
## 322 1 positive
## 323 1 negative
## 324 3 positive
## 325 2 negative
## 326 3 negative
## 327 1 positive
## 328 2 negative
## 329 1 negative
## 330 2 negative
## 331 2 negative
## 332 1 positive
## 333 3 negative
## 334 3 positive
## 335 1 positive
## 336 3 positive
## 337 2 negative
## 338 3 negative
## 339 3 negative
## 340 1 negative
## 341 2 negative
## 342 2 positive
## 343 1 positive
## 344 2 negative
## 345 2 negative
## 346 3 negative
## 347 1 negative
## 348 2 positive
## 349 2 positive
## 350 2 negative
## 351 2 negative
## 352 1 negative
## 353 1 negative
## 354 2 negative
## 355 1 negative
## 356 1 negative
## 357 3 positive
## 358 2 positive
## 359 1 negative
## 360 1 positive
## 361 1 negative
## 362 1 negative
## 363 3 positive
## 364 2 negative
## 365 4 negative
## 366 2 negative
## 367 3 negative
## 368 1 negative
## 369 3 positive
## 370 2 negative
## 371 2 positive
## 372 2 positive
## 373 2 negative
## 374 2 negative
## 375 1 positive
## 376 2 negative
## 377 3 positive
## 378 2 positive
## 379 3 positive
## 380 3 negative
## 381 3 negative
## 382 1 negative
## 383 3 negative
## 384 3 positive
## 385 2 negative
## 386 2 positive
## 387 1 negative
## 388 3 negative
## 389 2 positive
## 390 3 negative
## 391 1 negative
## 392 1 negative
## 393 1 negative
## 394 2 negative
## 395 2 negative
## 396 2 negative
## 397 1 negative
## 398 1 negative
## 399 3 negative
## 400 2 negative
## 401 2 negative
## 402 3 positive
The Bing sentiment dictionary categorizes words into a binary sentiment category: positive or negative. I counted the total number of positive and negative words in the passwords, resulting in:
This indicates that, according to the Bing sentiment dictionary, there is a trend towards using language with negative sentiments more than language with positive sentiments in common passwords.
count(password_sentiment_bing, sentiment)
## sentiment n
## 1 negative 240
## 2 positive 162
The AFINN sentiment analysis shows the sentiment values for 260 instances in my data set out of the 7177 instances that contain only letters. First, I retrieved the AFINN sentiment dictionary and then I used an inner join to get the sentiments of the words appearing in that dictionary.
afinn <- get_sentiments("afinn")
password_sentiment_afinn <- inner_join(password_sentiment, afinn, by = "word")
password_sentiment_afinn
## word length num_chars num_digits num_upper num_lower num_special
## 1 sunshine 8 8 0 0 8 0
## 2 asshole 7 7 0 0 7 0
## 3 easy 4 4 0 0 4 0
## 4 freedom 7 7 0 0 7 0
## 5 fuck 4 4 0 0 4 0
## 6 love 4 4 0 0 4 0
## 7 diamond 7 7 0 0 7 0
## 8 sexy 4 4 0 0 4 0
## 9 welcome 7 7 0 0 7 0
## 10 please 6 6 0 0 6 0
## 11 winner 6 6 0 0 6 0
## 12 bitch 5 5 0 0 5 0
## 13 dick 4 4 0 0 4 0
## 14 united 6 6 0 0 6 0
## 15 stupid 6 6 0 0 6 0
## 16 jackass 7 7 0 0 7 0
## 17 success 7 7 0 0 7 0
## 18 lucky 5 5 0 0 5 0
## 19 shithead 8 8 0 0 8 0
## 20 creative 8 8 0 0 8 0
## 21 trouble 7 7 0 0 7 0
## 22 happy 5 5 0 0 5 0
## 23 fucking 7 7 0 0 7 0
## 24 bullshit 8 8 0 0 8 0
## 25 darkness 8 8 0 0 8 0
## 26 heaven 6 6 0 0 6 0
## 27 tits 4 4 0 0 4 0
## 28 chance 6 6 0 0 6 0
## 29 fire 4 4 0 0 4 0
## 30 dickhead 8 8 0 0 8 0
## 31 lovely 6 6 0 0 6 0
## 32 paradise 8 8 0 0 8 0
## 33 dreams 6 6 0 0 6 0
## 34 cock 4 4 0 0 4 0
## 35 cool 4 4 0 0 4 0
## 36 hahaha 6 6 0 0 6 0
## 37 bitches 7 7 0 0 7 0
## 38 badass 6 6 0 0 6 0
## 39 free 4 4 0 0 4 0
## 40 death 5 5 0 0 5 0
## 41 spirit 6 6 0 0 6 0
## 42 bastard 7 7 0 0 7 0
## 43 shit 4 4 0 0 4 0
## 44 nigger 6 6 0 0 6 0
## 45 cunt 4 4 0 0 4 0
## 46 sweet 5 5 0 0 5 0
## 47 super 5 5 0 0 5 0
## 48 justice 7 7 0 0 7 0
## 49 slut 4 4 0 0 4 0
## 50 vision 6 6 0 0 6 0
## 51 jesus 5 5 0 0 5 0
## 52 awesome 7 7 0 0 7 0
## 53 danger 6 6 0 0 6 0
## 54 wanker 6 6 0 0 6 0
## 55 crazy 5 5 0 0 5 0
## 56 hard 4 4 0 0 4 0
## 57 pretty 6 6 0 0 6 0
## 58 good 4 4 0 0 4 0
## 59 perfect 7 7 0 0 7 0
## 60 strike 6 6 0 0 6 0
## 61 defender 8 8 0 0 8 0
## 62 cancer 6 6 0 0 6 0
## 63 enjoy 5 5 0 0 5 0
## 64 fucked 6 6 0 0 6 0
## 65 wicked 6 6 0 0 6 0
## 66 great 5 5 0 0 5 0
## 67 insane 6 6 0 0 6 0
## 68 triumph 7 7 0 0 7 0
## 69 dirty 5 5 0 0 5 0
## 70 poison 6 6 0 0 6 0
## 71 loser 5 5 0 0 5 0
## 72 whore 5 5 0 0 5 0
## 73 help 4 4 0 0 4 0
## 74 forget 6 6 0 0 6 0
## 75 madness 7 7 0 0 7 0
## 76 pleasure 8 8 0 0 8 0
## 77 smile 5 5 0 0 5 0
## 78 suck 4 4 0 0 4 0
## 79 desire 6 6 0 0 6 0
## 80 smiles 6 6 0 0 6 0
## 81 peace 5 5 0 0 5 0
## 82 dream 5 5 0 0 5 0
## 83 immortal 8 8 0 0 8 0
## 84 curious 7 7 0 0 7 0
## 85 nasty 5 5 0 0 5 0
## 86 ghost 5 5 0 0 5 0
## 87 grace 5 5 0 0 5 0
## 88 support 7 7 0 0 7 0
## 89 safety 6 6 0 0 6 0
## 90 thanks 6 6 0 0 6 0
## 91 chaos 5 5 0 0 5 0
## 92 sucks 5 5 0 0 5 0
## 93 slick 5 5 0 0 5 0
## 94 battle 6 6 0 0 6 0
## 95 paradox 7 7 0 0 7 0
## 96 dumbass 7 7 0 0 7 0
## 97 beautiful 9 9 0 0 9 0
## 98 hell 4 4 0 0 4 0
## 99 fuckface 8 8 0 0 8 0
## 100 loving 6 6 0 0 6 0
## 101 kiss 4 4 0 0 4 0
## 102 defiant 7 7 0 0 7 0
## 103 broken 6 6 0 0 6 0
## 104 rescue 6 6 0 0 6 0
## 105 marvel 6 6 0 0 6 0
## 106 grateful 8 8 0 0 8 0
## 107 lonely 6 6 0 0 6 0
## 108 active 6 6 0 0 6 0
## 109 faith 5 5 0 0 5 0
## 110 shitty 6 6 0 0 6 0
## 111 dead 4 4 0 0 4 0
## 112 fuckers 7 7 0 0 7 0
## 113 scream 6 6 0 0 6 0
## 114 mature 6 6 0 0 6 0
## 115 fallen 6 6 0 0 6 0
## 116 terror 6 6 0 0 6 0
## 117 strong 6 6 0 0 6 0
## 118 fitness 7 7 0 0 7 0
## 119 funny 5 5 0 0 5 0
## 120 crash 5 5 0 0 5 0
## 121 engage 6 6 0 0 6 0
## 122 escape 6 6 0 0 6 0
## 123 cheers 6 6 0 0 6 0
## 124 cancel 6 6 0 0 6 0
## 125 retard 6 6 0 0 6 0
## 126 miracle 7 7 0 0 7 0
## 127 sparkle 7 7 0 0 7 0
## 128 treasure 8 8 0 0 8 0
## 129 natural 7 7 0 0 7 0
## 130 hope 4 4 0 0 4 0
## 131 fright 6 6 0 0 6 0
## 132 attack 6 6 0 0 6 0
## 133 revenge 7 7 0 0 7 0
## 134 bloody 6 6 0 0 6 0
## 135 stolen 6 6 0 0 6 0
## 136 positive 8 8 0 0 8 0
## 137 faggot 6 6 0 0 6 0
## 138 easy 4 4 0 0 4 0
## 139 damnit 6 6 0 0 6 0
## 140 romance 7 7 0 0 7 0
## 141 rich 4 4 0 0 4 0
## 142 murder 6 6 0 0 6 0
## 143 better 6 6 0 0 6 0
## 144 strange 7 7 0 0 7 0
## 145 damage 6 6 0 0 6 0
## 146 kill 4 4 0 0 4 0
## 147 haha 4 4 0 0 4 0
## 148 droopy 6 6 0 0 6 0
## 149 fresh 5 5 0 0 5 0
## 150 solution 8 8 0 0 8 0
## 151 grant 5 5 0 0 5 0
## 152 hacked 6 6 0 0 6 0
## 153 jewels 6 6 0 0 6 0
## 154 woohoo 6 6 0 0 6 0
## 155 combat 6 6 0 0 6 0
## 156 secure 6 6 0 0 6 0
## 157 pissing 7 7 0 0 7 0
## 158 confused 8 8 0 0 8 0
## 159 amazing 7 7 0 0 7 0
## 160 nuts 4 4 0 0 4 0
## 161 nice 4 4 0 0 4 0
## 162 idiot 5 5 0 0 5 0
## 163 blessing 8 8 0 0 8 0
## 164 glory 5 5 0 0 5 0
## 165 menace 6 6 0 0 6 0
## 166 frisky 6 6 0 0 6 0
## 167 motherfucker 12 12 0 0 12 0
## 168 misery 6 6 0 0 6 0
## 169 destroy 7 7 0 0 7 0
## 170 cheater 7 7 0 0 7 0
## 171 courage 7 7 0 0 7 0
## 172 strength 8 8 0 0 8 0
## 173 bright 6 6 0 0 6 0
## 174 warning 7 7 0 0 7 0
## 175 silly 5 5 0 0 5 0
## 176 lost 4 4 0 0 4 0
## 177 friendly 8 8 0 0 8 0
## 178 criminal 8 8 0 0 8 0
## 179 yummy 5 5 0 0 5 0
## 180 kinder 6 6 0 0 6 0
## 181 fight 5 5 0 0 5 0
## 182 dipshit 7 7 0 0 7 0
## 183 promise 7 7 0 0 7 0
## 184 hopeless 8 8 0 0 8 0
## 185 shock 5 5 0 0 5 0
## 186 weird 5 5 0 0 5 0
## 187 suicide 7 7 0 0 7 0
## 188 hopeful 7 7 0 0 7 0
## 189 reckless 8 8 0 0 8 0
## 190 superb 6 6 0 0 6 0
## 191 insanity 8 8 0 0 8 0
## 192 bummer 6 6 0 0 6 0
## 193 happiness 9 9 0 0 9 0
## 194 faithful 8 8 0 0 8 0
## 195 hooligan 8 8 0 0 8 0
## 196 stop 4 4 0 0 4 0
## 197 praise 6 6 0 0 6 0
## 198 delight 7 7 0 0 7 0
## 199 alone 5 5 0 0 5 0
## 200 bitter 6 6 0 0 6 0
## 201 wasted 6 6 0 0 6 0
## 202 trust 5 5 0 0 5 0
## 203 smart 5 5 0 0 5 0
## 204 comfort 7 7 0 0 7 0
## 205 comedy 6 6 0 0 6 0
## 206 spam 4 4 0 0 4 0
## 207 sneaky 6 6 0 0 6 0
## 208 heroes 6 6 0 0 6 0
## 209 survivor 8 8 0 0 8 0
## 210 straight 8 8 0 0 8 0
## 211 superior 8 8 0 0 8 0
## 212 mischief 8 8 0 0 8 0
## 213 jewel 5 5 0 0 5 0
## 214 grand 5 5 0 0 5 0
## 215 fuckhead 8 8 0 0 8 0
## 216 yeah 4 4 0 0 4 0
## 217 vitamin 7 7 0 0 7 0
## 218 huge 4 4 0 0 4 0
## 219 fearless 8 8 0 0 8 0
## 220 evil 4 4 0 0 4 0
## 221 luck 4 4 0 0 4 0
## 222 hate 4 4 0 0 4 0
## 223 choke 5 5 0 0 5 0
## 224 wealth 6 6 0 0 6 0
## 225 limited 7 7 0 0 7 0
## 226 excite 6 6 0 0 6 0
## 227 pain 4 4 0 0 4 0
## 228 gagged 6 6 0 0 6 0
## 229 best 4 4 0 0 4 0
## 230 moron 5 5 0 0 5 0
## 231 progress 8 8 0 0 8 0
## 232 greedy 6 6 0 0 6 0
## 233 awful 5 5 0 0 5 0
## 234 funky 5 5 0 0 5 0
## 235 bomb 4 4 0 0 4 0
## 236 tender 6 6 0 0 6 0
## 237 prospect 8 8 0 0 8 0
## 238 mad 3 3 0 0 3 0
## 239 beloved 7 7 0 0 7 0
## 240 bastards 8 8 0 0 8 0
## 241 lonesome 8 8 0 0 8 0
## 242 foolish 7 7 0 0 7 0
## 243 wonderful 9 9 0 0 9 0
## 244 cocksucker 10 10 0 0 10 0
## 245 rage 4 4 0 0 4 0
## 246 honor 5 5 0 0 5 0
## 247 errors 6 6 0 0 6 0
## 248 worthy 6 6 0 0 6 0
## 249 protect 7 7 0 0 7 0
## 250 pathetic 8 8 0 0 8 0
## 251 loose 5 5 0 0 5 0
## 252 dirt 4 4 0 0 4 0
## 253 sparkles 8 8 0 0 8 0
## 254 pressure 8 8 0 0 8 0
## 255 alive 5 5 0 0 5 0
## 256 trauma 6 6 0 0 6 0
## 257 sick 4 4 0 0 4 0
## 258 jerk 4 4 0 0 4 0
## 259 douche 6 6 0 0 6 0
## 260 doom 4 4 0 0 4 0
## num_vowels value
## 1 3 2
## 2 3 -4
## 3 2 1
## 4 3 2
## 5 1 -4
## 6 2 3
## 7 3 1
## 8 1 3
## 9 3 2
## 10 3 1
## 11 2 4
## 12 1 -5
## 13 1 -4
## 14 3 1
## 15 2 -2
## 16 2 -4
## 17 2 2
## 18 1 3
## 19 3 -4
## 20 4 2
## 21 3 -2
## 22 1 3
## 23 2 -4
## 24 2 -4
## 25 2 -1
## 26 3 2
## 27 1 -2
## 28 2 2
## 29 2 -2
## 30 3 -4
## 31 2 3
## 32 4 3
## 33 2 1
## 34 1 -5
## 35 2 1
## 36 3 3
## 37 2 -5
## 38 2 -3
## 39 2 1
## 40 2 -2
## 41 2 1
## 42 2 -5
## 43 1 -4
## 44 2 -5
## 45 1 -5
## 46 2 2
## 47 2 3
## 48 3 2
## 49 1 -5
## 50 3 1
## 51 2 1
## 52 4 4
## 53 2 -2
## 54 2 -3
## 55 1 -2
## 56 1 -1
## 57 1 1
## 58 2 3
## 59 2 3
## 60 2 -1
## 61 3 2
## 62 2 -1
## 63 2 2
## 64 2 -4
## 65 2 -2
## 66 2 3
## 67 3 -2
## 68 2 4
## 69 1 -2
## 70 3 -2
## 71 2 -3
## 72 2 -4
## 73 1 2
## 74 2 -1
## 75 2 -3
## 76 4 3
## 77 2 2
## 78 1 -3
## 79 3 1
## 80 2 2
## 81 3 2
## 82 2 1
## 83 3 2
## 84 4 1
## 85 1 -3
## 86 1 -1
## 87 2 1
## 88 2 2
## 89 2 1
## 90 1 2
## 91 2 -2
## 92 1 -3
## 93 1 2
## 94 2 -1
## 95 3 -1
## 96 2 -3
## 97 5 3
## 98 1 -4
## 99 3 -4
## 100 2 2
## 101 1 2
## 102 3 -1
## 103 2 -1
## 104 3 2
## 105 2 3
## 106 3 3
## 107 2 -2
## 108 3 1
## 109 2 1
## 110 1 -3
## 111 2 -3
## 112 2 -4
## 113 2 -2
## 114 3 2
## 115 2 -2
## 116 2 -3
## 117 1 2
## 118 2 1
## 119 1 4
## 120 1 -2
## 121 3 1
## 122 3 -1
## 123 2 2
## 124 2 -1
## 125 2 -2
## 126 3 4
## 127 2 3
## 128 4 2
## 129 3 1
## 130 2 2
## 131 1 -2
## 132 2 -1
## 133 3 -2
## 134 2 -3
## 135 2 -2
## 136 4 2
## 137 2 -3
## 138 2 1
## 139 2 -4
## 140 3 2
## 141 1 2
## 142 2 -2
## 143 2 2
## 144 2 -1
## 145 3 -3
## 146 1 -3
## 147 2 3
## 148 2 -2
## 149 1 1
## 150 4 1
## 151 1 1
## 152 2 -1
## 153 2 1
## 154 4 3
## 155 2 -1
## 156 3 2
## 157 2 -3
## 158 3 -2
## 159 3 4
## 160 1 -3
## 161 2 3
## 162 3 -3
## 163 2 3
## 164 1 2
## 165 3 -2
## 166 1 2
## 167 4 -5
## 168 2 -2
## 169 2 -3
## 170 3 -3
## 171 4 2
## 172 1 2
## 173 1 1
## 174 2 -3
## 175 1 -1
## 176 1 -3
## 177 2 2
## 178 3 -3
## 179 1 3
## 180 2 2
## 181 1 -1
## 182 2 -3
## 183 3 1
## 184 3 -2
## 185 1 -2
## 186 2 -2
## 187 4 -2
## 188 3 2
## 189 2 -2
## 190 2 5
## 191 3 -2
## 192 2 -2
## 193 3 3
## 194 3 3
## 195 4 -2
## 196 1 -1
## 197 3 3
## 198 2 3
## 199 3 -2
## 200 2 -2
## 201 2 -2
## 202 1 1
## 203 1 1
## 204 2 2
## 205 2 1
## 206 1 -2
## 207 2 -1
## 208 3 2
## 209 3 2
## 210 2 1
## 211 4 2
## 212 3 -1
## 213 2 1
## 214 1 3
## 215 3 -4
## 216 2 1
## 217 3 1
## 218 2 1
## 219 3 2
## 220 2 -3
## 221 1 3
## 222 2 -3
## 223 2 -2
## 224 2 3
## 225 3 -1
## 226 3 3
## 227 2 -2
## 228 2 -2
## 229 1 3
## 230 2 -3
## 231 2 2
## 232 2 -2
## 233 2 -3
## 234 1 2
## 235 1 -1
## 236 2 2
## 237 2 1
## 238 1 -3
## 239 3 3
## 240 2 -5
## 241 4 -2
## 242 3 -2
## 243 3 4
## 244 3 -5
## 245 2 -2
## 246 2 2
## 247 2 -2
## 248 1 2
## 249 2 1
## 250 3 -2
## 251 3 -3
## 252 1 -2
## 253 2 3
## 254 3 -1
## 255 3 1
## 256 3 -3
## 257 1 -2
## 258 1 -3
## 259 3 -3
## 260 2 -2
The Bing sentiment dictionary assigns a sentiment value ranging from -5 to 5 (only whole numbers and skipping 0) to each word in the dictionary. Again, I counted the total number for each value, resulting in:
This indicates that, according to the AFINN sentiment dictionary, there is a fairly even amount of both positive and negative sentiment words in common passwords.
count(password_sentiment_afinn, value)
## value n
## 1 -5 10
## 2 -4 16
## 3 -3 33
## 4 -2 49
## 5 -1 24
## 6 1 39
## 7 2 49
## 8 3 32
## 9 4 7
## 10 5 1
The next part of my analysis was making models for the counts and sentiment analyses I have done. These models show my results in a set of bar charts that are easy to read.
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = length)) +
labs(x = "Length of Password",
y = "Number of Instances")
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = num_chars)) +
labs(x = "Number of Letters in Password",
y = "Number of Instances")
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = num_digits)) +
labs(x = "Number of Numbers in Password",
y = "Number of Instances")
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = num_special)) +
labs(x = "Number of Special Characters in Password",
y = "Number of Instances")
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = num_upper)) +
labs(x = "Number of Uppercase Letters in Password",
y = "Number of Instances")
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = num_lower)) +
labs(x = "Number of Lowercase Letters in Password",
y = "Number of Instances")
ggplot(data = common_passwords) +
geom_bar(mapping = aes(x = num_vowels)) +
labs(x = "Number of Vowels in Password",
y = "Number of Instances")
ggplot(data = password_sentiment_bing) +
geom_bar(mapping = aes(x = sentiment, fill = sentiment)) +
labs(x = "Sentiment", y = "Number of Instances",
caption = "Only plots the 402 passwords that were a real word and appeared in the Bing sentiment dictionary") +
scale_fill_discrete(name = "Sentiment",
labels = c("Negative", "Positive"))
ggplot(data = password_sentiment_afinn) +
geom_bar(mapping = aes(x = value)) +
labs(x = "Sentiment Values", y = "Number of Instances",
caption = "Only plots the 260 passwords that were a real word and appeared in the AFINN sentiment dictionary")
My research questions for this analysis were:
The common patterns I found in the passwords were that they were mainly composed of only letters (7,177/10,000 instances) and contained mostly lowercase letters as opposed to uppercase letters (avg: 5.005 vs 0.0253 per password). The use of special characters was very rare (appeared in 12/10,000 instances; avg: 0.0034 per password). Lastly, the passwords were on average 6.6513 symbols long (letters, numbers, and/or special characters).
For the sentiment analyses, I found that there was a slight favoring of negative sentiment words in the Bing analysis (240 negative vs 162 positive) and roughly an equal amount of positive and negative sentiment words in the AFINN analysis (132 negative vs. 128 positive). Overall, I would consider the sentiment analysis inconclusive and in need of further study.
Something that could be improved in the future would be the sentiment analysis section. Having a stronger sentiment analysis would provide an interesting starting place for future research into cyber-security and passwords.
My biggest limitation was not being able to efficiently separate the individual words in all of the passwords to have a large bag-of-words for my sentiment analyses. I couldn’t think of a way to use R to do this when there are many possibilities for where the password would have to be split (ex. at any number, special character, or at the end of words that were pushed together without a non-letter separator). Going through all 10,000 passwords individually to make my bag of words was something I didn’t have time for at the end of this semester.
As for legal and ethical concerns, I don’t have many for this analysis. I am working with common passwords, so I think there is some ethical considerations with that; however, these aren’t connected to anyone in particular and are just a list of common passwords. This is an open access data set, so I don’t think that there are any legal issues.
Bansal, S. (2022). 10000 most common passwords. [Data set]. Kaggle. https://www.kaggle.com/datasets/shivamb/10000-most-common-passwords
Mohammad, S. M., & Turney, P. (2013). “Crowdsourcing a word-emotion association lexicon.” Computational Intelligence, 29(3), 436-465.