Reading our dataset in after loading the necessary libraries.
users <- read_csv("users.csv")
## Rows: 982 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): user_name, password
## dbl (1): id
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nrow(users)
## [1] 982
head(users, n = 12)
## # A tibble: 12 × 3
## id user_name password
## <dbl> <chr> <chr>
## 1 1 vance.jennings joobheco
## 2 2 consuelo.eaton 0869347314
## 3 3 mitchel.perkins fabypotter
## 4 4 odessa.vaughan aharney88
## 5 5 araceli.wilder acecdn3000
## 6 6 shawn.harrington 5278049
## 7 7 evelyn.gay master
## 8 8 noreen.hale murphy
## 9 9 gladys.ward lwsves2
## 10 10 brant.zimmerman 1190KAREN5572497
## 11 11 leanna.abbott aivlys24
## 12 12 milford.hubbard hubbard
# Passwords should not be too short
# Calculating the lengths of users' passwords
users$length <- str_length(users$password)
# Flagging the users with too short passwords
users$too_short <- users$length < 8
# Calculating the sum of the users
sum(users$too_short)
## [1] 376
head(users)
## # A tibble: 6 × 5
## id user_name password length too_short
## <dbl> <chr> <chr> <int> <lgl>
## 1 1 vance.jennings joobheco 8 FALSE
## 2 2 consuelo.eaton 0869347314 10 FALSE
## 3 3 mitchel.perkins fabypotter 10 FALSE
## 4 4 odessa.vaughan aharney88 9 FALSE
## 5 5 araceli.wilder acecdn3000 10 FALSE
## 6 6 shawn.harrington 5278049 7 TRUE
# Lets look at the most common passwords people use
common_passwords <- read_lines("10_million_password_list_top_10000.txt")
A recent review identified a few problematic passwords among the first 12 users. According to NIST’s Special Publication 800-63B, verifiers must compare new passwords against a list of commonly-used, expected, or compromised passwords to ensure their security.
This includes passwords from previous data breaches, dictionary words, repetitive or sequential patterns (e.g., ‘aaaaaa’, ‘1234abcd’), and context-specific words like the service name or username. The next step involves checking passwords against these known compromised lists, which contain popular passwords from websites that haven’t followed proper encryption practices.
head(common_passwords, 50)
## [1] "123456" "password" "12345678" "qwerty" "123456789"
## [6] "12345" "1234" "111111" "1234567" "dragon"
## [11] "123123" "baseball" "abc123" "football" "monkey"
## [16] "letmein" "696969" "shadow" "master" "666666"
## [21] "qwertyuiop" "123321" "mustang" "1234567890" "michael"
## [26] "654321" "pussy" "superman" "1qaz2wsx" "7777777"
## [31] "fuckyou" "121212" "000000" "qazwsx" "123qwe"
## [36] "killer" "trustno1" "jordan" "jennifer" "zxcvbnm"
## [41] "asdfgh" "hunter" "buster" "soccer" "harley"
## [46] "batman" "andrew" "tigger" "sunshine" "iloveyou"
The list of passwords was arranged with the most common ones at the top, so it’s no surprise to see passwords like “123456” and “qwerty” appearing frequently. Since hackers also have access to these common passwords, it’s crucial that none of our users are using them. We should now flag all passwords in our user database that appear in the top 10,000 most commonly used passwords.
# Flagging the users with passwords that are common passwords
users$common_password <- users$password %in% common_passwords
sum(users$common_password)
## [1] 129
# Taking a look at the 10 first rows
head(users, 10)
## # A tibble: 10 × 6
## id user_name password length too_short common_password
## <dbl> <chr> <chr> <int> <lgl> <lgl>
## 1 1 vance.jennings joobheco 8 FALSE FALSE
## 2 2 consuelo.eaton 0869347314 10 FALSE FALSE
## 3 3 mitchel.perkins fabypotter 10 FALSE FALSE
## 4 4 odessa.vaughan aharney88 9 FALSE FALSE
## 5 5 araceli.wilder acecdn3000 10 FALSE FALSE
## 6 6 shawn.harrington 5278049 7 TRUE FALSE
## 7 7 evelyn.gay master 6 TRUE TRUE
## 8 8 noreen.hale murphy 6 TRUE TRUE
## 9 9 gladys.ward lwsves2 7 TRUE FALSE
## 10 10 brant.zimmerman 1190KAREN5572497 16 FALSE FALSE
Many of our users’ passwords turned out to be common English words, which is also a concern. According to NIST guidelines, verifiers must compare passwords against lists that include context-specific words, such as the service name, username, or related terms.
Given that our users’ usernames are formatted as their first and last names separated by a dot, we should focus on this specific check. For now, let’s flag any passwords that match either a user’s first or last name.
words <- read_lines("google-10000-english.txt")
# Flagging the users with passwords that are common words
users$common_word <- str_to_lower(users$password) %in% words
# Counting the number of users using common words as passwords
sum(users$common_word)
## [1] 137
head(users, 12)
## # A tibble: 12 × 7
## id user_name password length too_short common_password common_word
## <dbl> <chr> <chr> <int> <lgl> <lgl> <lgl>
## 1 1 vance.jennings joobheco 8 FALSE FALSE FALSE
## 2 2 consuelo.eaton 08693473… 10 FALSE FALSE FALSE
## 3 3 mitchel.perkins fabypott… 10 FALSE FALSE FALSE
## 4 4 odessa.vaughan aharney88 9 FALSE FALSE FALSE
## 5 5 araceli.wilder acecdn30… 10 FALSE FALSE FALSE
## 6 6 shawn.harrington 5278049 7 TRUE FALSE FALSE
## 7 7 evelyn.gay master 6 TRUE TRUE TRUE
## 8 8 noreen.hale murphy 6 TRUE TRUE TRUE
## 9 9 gladys.ward lwsves2 7 TRUE FALSE FALSE
## 10 10 brant.zimmerman 1190KARE… 16 FALSE FALSE FALSE
## 11 11 leanna.abbott aivlys24 8 FALSE FALSE FALSE
## 12 12 milford.hubbard hubbard 7 TRUE FALSE FALSE
One thing to notice is that our users’ usernames consist of their first names and last names separated by a dot. For now, let’s just flag passwords that are the same as either a user’s first or last name.
Passwords should not be your name either.
# Extracting first and last names into their own columns
users$first_name <- str_extract(users$user_name, "^\\w+")
users$last_name <- str_extract(users$user_name, "\\w+$")
# Flagging the users with passwords that matches their names
users$uses_name <- users$password == users$first_name |
users$password == users$last_name
# Counting the number of users using names as passwords
sum(users$uses_name)
## [1] 50
head(users, 12)
## # A tibble: 12 × 10
## id user_name password length too_short common_password common_word
## <dbl> <chr> <chr> <int> <lgl> <lgl> <lgl>
## 1 1 vance.jennings joobheco 8 FALSE FALSE FALSE
## 2 2 consuelo.eaton 08693473… 10 FALSE FALSE FALSE
## 3 3 mitchel.perkins fabypott… 10 FALSE FALSE FALSE
## 4 4 odessa.vaughan aharney88 9 FALSE FALSE FALSE
## 5 5 araceli.wilder acecdn30… 10 FALSE FALSE FALSE
## 6 6 shawn.harrington 5278049 7 TRUE FALSE FALSE
## 7 7 evelyn.gay master 6 TRUE TRUE TRUE
## 8 8 noreen.hale murphy 6 TRUE TRUE TRUE
## 9 9 gladys.ward lwsves2 7 TRUE FALSE FALSE
## 10 10 brant.zimmerman 1190KARE… 16 FALSE FALSE FALSE
## 11 11 leanna.abbott aivlys24 8 FALSE FALSE FALSE
## 12 12 milford.hubbard hubbard 7 TRUE FALSE FALSE
## # ℹ 3 more variables: first_name <chr>, last_name <chr>, uses_name <lgl>
When reviewing passwords, we found that some, like Milford Hubbard’s (user number 12), were highly repetitive. NIST guidelines require that passwords should not contain repetitive or sequential characters, such as ‘aaaaaa’ or ‘1234abcd’.
Determining what qualifies as repetitive can be tricky. For instance, ‘11111’ is clearly repetitive, while ‘12345’ might be considered sequential, and ‘13579’ less so. However, defining repetitiveness can get complex, so we’ll keep it simple for now. We will flag all passwords that contain 4 or more consecutive repeated characters.
# Passwords should not be repetitive
# Splitting the passwords into vectors of single characters
split_passwords <- str_split(users$password, "")
# Picking out the max number of repeat characters for each password
users$max_repeats <- sapply(split_passwords, function(split_password) {
max(rle(split_password)$lengths)
})
# Flagging the passwords with >= 4 repeats
users$too_many_repeats <- users$max_repeats >= 4
# Taking a look at the users with too many repeats
users[users$too_many_repeats,]
## # A tibble: 6 × 12
## id user_name password length too_short common_password common_word
## <dbl> <chr> <chr> <int> <lgl> <lgl> <lgl>
## 1 147 patti.dixon 555555 6 TRUE TRUE FALSE
## 2 573 cornelia.bradley 555555 6 TRUE TRUE FALSE
## 3 645 essie.lopez 11111 5 TRUE TRUE FALSE
## 4 799 charley.key 888888 6 TRUE TRUE FALSE
## 5 808 thurman.osborne rinnnng0 8 FALSE FALSE FALSE
## 6 942 mitch.ferguson aaaaaa 6 TRUE TRUE FALSE
## # ℹ 5 more variables: first_name <chr>, last_name <chr>, uses_name <lgl>,
## # max_repeats <int>, too_many_repeats <lgl>
Now we have implemented all the basic tests for bad passwords suggested by NIST Special Publication 800-63B! What’s left is just to flag all bad passwords and maybe to send these users an e-mail that strongly suggests they change their password.
# Flagging all passwords that are bad
users$bad_password <- users$too_short | users$common_password |
users$common_word | users$uses_name | users$too_many_repeats
# Counting the number of bad passwords
sum(users$bad_password)
## [1] 424
# Looking at the first 100 bad passwords
head(users$bad_password, 100)
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE
## [13] FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE
## [25] FALSE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [37] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
## [49] TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE
## [61] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
## [85] TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE
## [97] FALSE TRUE FALSE FALSE
# Enter a password that passes the NIST requirements
new_password <- "B@33lerS&"
In this notebook, we’ve implemented the password checks as recommended by NIST Special Publication 800-63B. While these checks help in identifying weak passwords, there’s always room for improvement, such as using a more extensive list of common passwords. However, it’s important to note that passing these checks doesn’t necessarily mean a password is strong—it just ensures that it’s not obviously weak.
In addition to the checks we’ve covered, NIST also advises against imposing certain password rules:
So, the next time a website or app insists that your password must include a number, symbol, and both upper and lower case characters, consider pointing them to NIST Special Publication 800-63B!