Bad passwords and the NIST guidelines

Check what passwords fail to conform to the National Institute of Standards and Technology password guidelines.

Reading our dataset in after loading the necessary libraries.

users <- read_csv("users.csv")
## Rows: 982 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): user_name, password
## dbl (1): id
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nrow(users)
## [1] 982
head(users, n = 12)
## # A tibble: 12 × 3
##       id user_name        password        
##    <dbl> <chr>            <chr>           
##  1     1 vance.jennings   joobheco        
##  2     2 consuelo.eaton   0869347314      
##  3     3 mitchel.perkins  fabypotter      
##  4     4 odessa.vaughan   aharney88       
##  5     5 araceli.wilder   acecdn3000      
##  6     6 shawn.harrington 5278049         
##  7     7 evelyn.gay       master          
##  8     8 noreen.hale      murphy          
##  9     9 gladys.ward      lwsves2         
## 10    10 brant.zimmerman  1190KAREN5572497
## 11    11 leanna.abbott    aivlys24        
## 12    12 milford.hubbard  hubbard
# Passwords should not be too short
# Calculating the lengths of users' passwords
users$length <- str_length(users$password)

# Flagging the users with too short passwords
users$too_short <- users$length < 8

# Calculating the sum of the users
sum(users$too_short)
## [1] 376
head(users)
## # A tibble: 6 × 5
##      id user_name        password   length too_short
##   <dbl> <chr>            <chr>       <int> <lgl>    
## 1     1 vance.jennings   joobheco        8 FALSE    
## 2     2 consuelo.eaton   0869347314     10 FALSE    
## 3     3 mitchel.perkins  fabypotter     10 FALSE    
## 4     4 odessa.vaughan   aharney88       9 FALSE    
## 5     5 araceli.wilder   acecdn3000     10 FALSE    
## 6     6 shawn.harrington 5278049         7 TRUE
# Lets look at the most common passwords people use
common_passwords <- read_lines("10_million_password_list_top_10000.txt")

A recent review identified a few problematic passwords among the first 12 users. According to NIST’s Special Publication 800-63B, verifiers must compare new passwords against a list of commonly-used, expected, or compromised passwords to ensure their security.

This includes passwords from previous data breaches, dictionary words, repetitive or sequential patterns (e.g., ‘aaaaaa’, ‘1234abcd’), and context-specific words like the service name or username. The next step involves checking passwords against these known compromised lists, which contain popular passwords from websites that haven’t followed proper encryption practices.

head(common_passwords, 50)
##  [1] "123456"     "password"   "12345678"   "qwerty"     "123456789" 
##  [6] "12345"      "1234"       "111111"     "1234567"    "dragon"    
## [11] "123123"     "baseball"   "abc123"     "football"   "monkey"    
## [16] "letmein"    "696969"     "shadow"     "master"     "666666"    
## [21] "qwertyuiop" "123321"     "mustang"    "1234567890" "michael"   
## [26] "654321"     "pussy"      "superman"   "1qaz2wsx"   "7777777"   
## [31] "fuckyou"    "121212"     "000000"     "qazwsx"     "123qwe"    
## [36] "killer"     "trustno1"   "jordan"     "jennifer"   "zxcvbnm"   
## [41] "asdfgh"     "hunter"     "buster"     "soccer"     "harley"    
## [46] "batman"     "andrew"     "tigger"     "sunshine"   "iloveyou"

The list of passwords was arranged with the most common ones at the top, so it’s no surprise to see passwords like “123456” and “qwerty” appearing frequently. Since hackers also have access to these common passwords, it’s crucial that none of our users are using them. We should now flag all passwords in our user database that appear in the top 10,000 most commonly used passwords.

# Flagging the users with passwords that are common passwords
users$common_password <- users$password %in% common_passwords
sum(users$common_password)
## [1] 129
# Taking a look at the 10 first rows
head(users, 10)
## # A tibble: 10 × 6
##       id user_name        password         length too_short common_password
##    <dbl> <chr>            <chr>             <int> <lgl>     <lgl>          
##  1     1 vance.jennings   joobheco              8 FALSE     FALSE          
##  2     2 consuelo.eaton   0869347314           10 FALSE     FALSE          
##  3     3 mitchel.perkins  fabypotter           10 FALSE     FALSE          
##  4     4 odessa.vaughan   aharney88             9 FALSE     FALSE          
##  5     5 araceli.wilder   acecdn3000           10 FALSE     FALSE          
##  6     6 shawn.harrington 5278049               7 TRUE      FALSE          
##  7     7 evelyn.gay       master                6 TRUE      TRUE           
##  8     8 noreen.hale      murphy                6 TRUE      TRUE           
##  9     9 gladys.ward      lwsves2               7 TRUE      FALSE          
## 10    10 brant.zimmerman  1190KAREN5572497     16 FALSE     FALSE

Many of our users’ passwords turned out to be common English words, which is also a concern. According to NIST guidelines, verifiers must compare passwords against lists that include context-specific words, such as the service name, username, or related terms.

Given that our users’ usernames are formatted as their first and last names separated by a dot, we should focus on this specific check. For now, let’s flag any passwords that match either a user’s first or last name.

words <- read_lines("google-10000-english.txt")

# Flagging the users with passwords that are common words
users$common_word <- str_to_lower(users$password) %in% words

# Counting the number of users using common words as passwords
sum(users$common_word)
## [1] 137
head(users, 12)
## # A tibble: 12 × 7
##       id user_name        password  length too_short common_password common_word
##    <dbl> <chr>            <chr>      <int> <lgl>     <lgl>           <lgl>      
##  1     1 vance.jennings   joobheco       8 FALSE     FALSE           FALSE      
##  2     2 consuelo.eaton   08693473…     10 FALSE     FALSE           FALSE      
##  3     3 mitchel.perkins  fabypott…     10 FALSE     FALSE           FALSE      
##  4     4 odessa.vaughan   aharney88      9 FALSE     FALSE           FALSE      
##  5     5 araceli.wilder   acecdn30…     10 FALSE     FALSE           FALSE      
##  6     6 shawn.harrington 5278049        7 TRUE      FALSE           FALSE      
##  7     7 evelyn.gay       master         6 TRUE      TRUE            TRUE       
##  8     8 noreen.hale      murphy         6 TRUE      TRUE            TRUE       
##  9     9 gladys.ward      lwsves2        7 TRUE      FALSE           FALSE      
## 10    10 brant.zimmerman  1190KARE…     16 FALSE     FALSE           FALSE      
## 11    11 leanna.abbott    aivlys24       8 FALSE     FALSE           FALSE      
## 12    12 milford.hubbard  hubbard        7 TRUE      FALSE           FALSE

One thing to notice is that our users’ usernames consist of their first names and last names separated by a dot. For now, let’s just flag passwords that are the same as either a user’s first or last name.

Passwords should not be your name either.

# Extracting first and last names into their own columns
users$first_name <- str_extract(users$user_name, "^\\w+")
users$last_name <- str_extract(users$user_name, "\\w+$")

# Flagging the users with passwords that matches their names
users$uses_name <- users$password == users$first_name | 
  users$password == users$last_name

# Counting the number of users using names as passwords
sum(users$uses_name)
## [1] 50
head(users, 12)
## # A tibble: 12 × 10
##       id user_name        password  length too_short common_password common_word
##    <dbl> <chr>            <chr>      <int> <lgl>     <lgl>           <lgl>      
##  1     1 vance.jennings   joobheco       8 FALSE     FALSE           FALSE      
##  2     2 consuelo.eaton   08693473…     10 FALSE     FALSE           FALSE      
##  3     3 mitchel.perkins  fabypott…     10 FALSE     FALSE           FALSE      
##  4     4 odessa.vaughan   aharney88      9 FALSE     FALSE           FALSE      
##  5     5 araceli.wilder   acecdn30…     10 FALSE     FALSE           FALSE      
##  6     6 shawn.harrington 5278049        7 TRUE      FALSE           FALSE      
##  7     7 evelyn.gay       master         6 TRUE      TRUE            TRUE       
##  8     8 noreen.hale      murphy         6 TRUE      TRUE            TRUE       
##  9     9 gladys.ward      lwsves2        7 TRUE      FALSE           FALSE      
## 10    10 brant.zimmerman  1190KARE…     16 FALSE     FALSE           FALSE      
## 11    11 leanna.abbott    aivlys24       8 FALSE     FALSE           FALSE      
## 12    12 milford.hubbard  hubbard        7 TRUE      FALSE           FALSE      
## # ℹ 3 more variables: first_name <chr>, last_name <chr>, uses_name <lgl>

When reviewing passwords, we found that some, like Milford Hubbard’s (user number 12), were highly repetitive. NIST guidelines require that passwords should not contain repetitive or sequential characters, such as ‘aaaaaa’ or ‘1234abcd’.

Determining what qualifies as repetitive can be tricky. For instance, ‘11111’ is clearly repetitive, while ‘12345’ might be considered sequential, and ‘13579’ less so. However, defining repetitiveness can get complex, so we’ll keep it simple for now. We will flag all passwords that contain 4 or more consecutive repeated characters.

# Passwords should not be repetitive
# Splitting the passwords into vectors of single characters
split_passwords <- str_split(users$password, "")

# Picking out the max number of repeat characters for each password
users$max_repeats <- sapply(split_passwords, function(split_password) {
    max(rle(split_password)$lengths)
})
# Flagging the passwords with >= 4 repeats
users$too_many_repeats <- users$max_repeats >= 4

# Taking a look at the users with too many repeats
users[users$too_many_repeats,]
## # A tibble: 6 × 12
##      id user_name        password length too_short common_password common_word
##   <dbl> <chr>            <chr>     <int> <lgl>     <lgl>           <lgl>      
## 1   147 patti.dixon      555555        6 TRUE      TRUE            FALSE      
## 2   573 cornelia.bradley 555555        6 TRUE      TRUE            FALSE      
## 3   645 essie.lopez      11111         5 TRUE      TRUE            FALSE      
## 4   799 charley.key      888888        6 TRUE      TRUE            FALSE      
## 5   808 thurman.osborne  rinnnng0      8 FALSE     FALSE           FALSE      
## 6   942 mitch.ferguson   aaaaaa        6 TRUE      TRUE            FALSE      
## # ℹ 5 more variables: first_name <chr>, last_name <chr>, uses_name <lgl>,
## #   max_repeats <int>, too_many_repeats <lgl>

Now we have implemented all the basic tests for bad passwords suggested by NIST Special Publication 800-63B! What’s left is just to flag all bad passwords and maybe to send these users an e-mail that strongly suggests they change their password.

# Flagging all passwords that are bad
users$bad_password <- users$too_short | users$common_password | 
users$common_word | users$uses_name | users$too_many_repeats

# Counting the number of bad passwords
sum(users$bad_password)
## [1] 424
# Looking at the first 100 bad passwords
head(users$bad_password, 100)
##   [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE
##  [13] FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE
##  [25] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE
##  [37] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE
##  [49]  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE
##  [61] FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
##  [85]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE
##  [97] FALSE  TRUE FALSE FALSE
# Enter a password that passes the NIST requirements
new_password <- "B@33lerS&"

In this notebook, we’ve implemented the password checks as recommended by NIST Special Publication 800-63B. While these checks help in identifying weak passwords, there’s always room for improvement, such as using a more extensive list of common passwords. However, it’s important to note that passing these checks doesn’t necessarily mean a password is strong—it just ensures that it’s not obviously weak.

In addition to the checks we’ve covered, NIST also advises against imposing certain password rules:

  1. Verifiers should not require specific character composition rules (e.g., mixing different character types or prohibiting consecutive repeated characters).
  2. Verifiers should not mandate arbitrary password changes (e.g., requiring passwords to be changed periodically without cause).

So, the next time a website or app insists that your password must include a number, symbol, and both upper and lower case characters, consider pointing them to NIST Special Publication 800-63B!