The NIST Special Publication 800-63B

If you – 50 years ago – needed to come up with a secret password you were probably part of a secret espionage organization or (more likely) you were pretending to be a spy when playing as a kid. Today, many of us are forced to come up with new passwords all the time when signing into sites and apps. As a password inventeur it is your responsibility to come up with good, hard-to-crack passwords. But it is also in the interest of sites and apps to make sure that you use good passwords. The problem is that it’s really hard to define what makes a good password. However, the National Institute of Standards and Technology (NIST) knows what the second best thing is: To make sure you’re at least not using a bad password. In this notebook, we will go through the rules in NIST Special Publication 800-63B which details what checks a verifier (what the NIST calls a second party responsible for storing and verifying passwords) should perform to make sure users don’t pick bad passwords. We will go through the passwords of users from a fictional company and use R to flag the users with bad passwords. But us being able to do this already means the fictional company is breaking one of the rules of 800-63B:

Verifiers SHALL store memorized secrets in a form that is resistant to offline attacks. Memorized secrets SHALL be salted and hashed using a suitable one-way key derivation function.

That is, never save users’ passwords in plaintext, always encrypt the passwords! Keeping this in mind for the next time we’re building a password management system, let’s load in the data.

Warning: The list of passwords and the fictional user database both contain real passwords leaked from real websites. These passwords have not been filtered in any way and include words that are explicit, derogatory and offensive.

Setup

knitr::opts_chunk$set(cache=TRUE)
options(scipen = 9999)
rm(list=ls())

# Importing the tidyverse library
library(tidyverse)

# Loading in datasets/users.csv 
users <- read_csv("datasets/users.csv")

## Parsed with column specification:
## cols(
##   id = col_integer(),
##   user_name = col_character(),
##   password = col_character()
## )

# Counting how many users we've got
glimpse(users)

## Observations: 982
## Variables: 3
## $ id        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1...
## $ user_name <chr> "vance.jennings", "consuelo.eaton", "mitchel.perkins...
## $ password  <chr> "joobheco", "0869347314", "fabypotter", "aharney88",...

length(users$user_name)

## [1] 982

# Taking a look at the 12 first users
head(users, 12)

## # A tibble: 12 x 3
##       id user_name        password        
##    <int> <chr>            <chr>           
##  1     1 vance.jennings   joobheco        
##  2     2 consuelo.eaton   0869347314      
##  3     3 mitchel.perkins  fabypotter      
##  4     4 odessa.vaughan   aharney88       
##  5     5 araceli.wilder   acecdn3000      
##  6     6 shawn.harrington 5278049         
##  7     7 evelyn.gay       master          
##  8     8 noreen.hale      murphy          
##  9     9 gladys.ward      lwsves2         
## 10    10 brant.zimmerman  1190KAREN5572497
## 11    11 leanna.abbott    aivlys24        
## 12    12 milford.hubbard  hubbard

Passwords should not be too short

If we take a look at the first 12 users above we already see some bad passwords. But let’s not get ahead of ourselves and start flagging passwords manually. What is the first thing we should check according to the NIST Special Publication 800-63B?

Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length.

Ok, so the passwords of our users shouldn’t be too short. Let’s start by checking that!

# Calculating the lengths of users' passwords
users$length <- str_length(users$password)

# Flagging the users with too short passwords
users$too_short <- case_when(users$length <= 7 ~ TRUE,
                             users$length >= 8 ~ FALSE)

# Counting the number of users with too short passwords
sum(users$too_short)

## [1] 376

# Taking a look at the 12 first rows
head(users, 12)

## # A tibble: 12 x 5
##       id user_name        password         length too_short
##    <int> <chr>            <chr>             <int> <lgl>    
##  1     1 vance.jennings   joobheco              8 FALSE    
##  2     2 consuelo.eaton   0869347314           10 FALSE    
##  3     3 mitchel.perkins  fabypotter           10 FALSE    
##  4     4 odessa.vaughan   aharney88             9 FALSE    
##  5     5 araceli.wilder   acecdn3000           10 FALSE    
##  6     6 shawn.harrington 5278049               7 TRUE     
##  7     7 evelyn.gay       master                6 TRUE     
##  8     8 noreen.hale      murphy                6 TRUE     
##  9     9 gladys.ward      lwsves2               7 TRUE     
## 10    10 brant.zimmerman  1190KAREN5572497     16 FALSE    
## 11    11 leanna.abbott    aivlys24              8 FALSE    
## 12    12 milford.hubbard  hubbard               7 TRUE

Common passwords people use

Already this simple rule flagged a couple of offenders among the first 12 users. Next up in Special Publication 800-63B is the rule that :

verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised.

Passwords obtained from previous breach corpuses. Dictionary words. Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’). Context-specific words, such as the name of the service, the username, and derivatives thereof.

We’re going to check these in order and start with Passwords obtained from previous breach corpuses, that is, websites where hackers have leaked all the users’ passwords. As many websites don’t follow the NIST guidelines and encrypt passwords there now exist large lists of the most popular passwords. Let’s start by loading in the 10,000 most common passwords which I’ve taken from here.

# Reading in the top 10000 passwords
common_passwords <- read_lines("datasets/10_million_password_list_top_10000.txt")

# Taking a look at the top 100
head(common_passwords, 100)

##   [1] "123456"     "password"   "12345678"   "qwerty"     "123456789" 
##   [6] "12345"      "1234"       "111111"     "1234567"    "dragon"    
##  [11] "123123"     "baseball"   "abc123"     "football"   "monkey"    
##  [16] "letmein"    "696969"     "shadow"     "master"     "666666"    
##  [21] "qwertyuiop" "123321"     "mustang"    "1234567890" "michael"   
##  [26] "654321"     "pussy"      "superman"   "1qaz2wsx"   "7777777"   
##  [31] "fuckyou"    "121212"     "000000"     "qazwsx"     "123qwe"    
##  [36] "killer"     "trustno1"   "jordan"     "jennifer"   "zxcvbnm"   
##  [41] "asdfgh"     "hunter"     "buster"     "soccer"     "harley"    
##  [46] "batman"     "andrew"     "tigger"     "sunshine"   "iloveyou"  
##  [51] "fuckme"     "2000"       "charlie"    "robert"     "thomas"    
##  [56] "hockey"     "ranger"     "daniel"     "starwars"   "klaster"   
##  [61] "112233"     "george"     "asshole"    "computer"   "michelle"  
##  [66] "jessica"    "pepper"     "1111"       "zxcvbn"     "555555"    
##  [71] "11111111"   "131313"     "freedom"    "777777"     "pass"      
##  [76] "fuck"       "maggie"     "159753"     "aaaaaa"     "ginger"    
##  [81] "princess"   "joshua"     "cheese"     "amanda"     "summer"    
##  [86] "love"       "ashley"     "6969"       "nicole"     "chelsea"   
##  [91] "biteme"     "matthew"    "access"     "yankees"    "987654321" 
##  [96] "dallas"     "austin"     "thunder"    "taylor"     "matrix"

LOL, 123456? qwerty? Classic!

Passwords should not be common passwords

The list of passwords was ordered, with the most common passwords first, and so we shouldn’t be surprised to see passwords like 123456 and qwerty above. As hackers also have access to this list of common passwords, it’s important that none of our users use these passwords!

Let’s flag all the passwords in our user database that are among the top 10,000 used passwords.

# Flagging the users with passwords that are common passwords
users$common_password <- users$password %in% common_passwords

# Counting the number of users using common passwords
sum(users$common_password)

## [1] 129

# Taking a look at the 12 first rows
head(users, 12)

## # A tibble: 12 x 6
##       id user_name        password         length too_short common_password
##    <int> <chr>            <chr>             <int> <lgl>     <lgl>          
##  1     1 vance.jennings   joobheco              8 FALSE     FALSE          
##  2     2 consuelo.eaton   0869347314           10 FALSE     FALSE          
##  3     3 mitchel.perkins  fabypotter           10 FALSE     FALSE          
##  4     4 odessa.vaughan   aharney88             9 FALSE     FALSE          
##  5     5 araceli.wilder   acecdn3000           10 FALSE     FALSE          
##  6     6 shawn.harrington 5278049               7 TRUE      FALSE          
##  7     7 evelyn.gay       master                6 TRUE      TRUE           
##  8     8 noreen.hale      murphy                6 TRUE      TRUE           
##  9     9 gladys.ward      lwsves2               7 TRUE      FALSE          
## 10    10 brant.zimmerman  1190KAREN5572497     16 FALSE     FALSE          
## 11    11 leanna.abbott    aivlys24              8 FALSE     FALSE          
## 12    12 milford.hubbard  hubbard               7 TRUE      FALSE

Passwords should not be common words

Ay ay ay! It turns out many of our users use common passwords, and of the first 12 users there are already two. However, as most common passwords also tend to be short, they were already flagged as being too short. What is the next thing we should check?

Verifiers SHALL compare the prospective secrets against a list that contains […] dictionary words.

This follows the same logic as before: It is easy for hackers to check users’ passwords against common English words and therefore common English words make bad passwords. Let’s check our users’ passwords against the top 10,000 English words from Google’s Trillion Word Corpus.

# Reading in a list of the 10000 most common words
words <- read_lines("datasets/google-10000-english.txt")

# Flagging the users with passwords that are common words
users$common_word <- users$password %in% words

# Counting the number of users using common words as passwords
sum(users$common_word)

## [1] 136

# Taking a look at the 12 first rows
head(users, 12)

## # A tibble: 12 x 7
##       id user_name  password  length too_short common_password common_word
##    <int> <chr>      <chr>      <int> <lgl>     <lgl>           <lgl>      
##  1     1 vance.jen… joobheco       8 FALSE     FALSE           FALSE      
##  2     2 consuelo.… 08693473…     10 FALSE     FALSE           FALSE      
##  3     3 mitchel.p… fabypott…     10 FALSE     FALSE           FALSE      
##  4     4 odessa.va… aharney88      9 FALSE     FALSE           FALSE      
##  5     5 araceli.w… acecdn30…     10 FALSE     FALSE           FALSE      
##  6     6 shawn.har… 5278049        7 TRUE      FALSE           FALSE      
##  7     7 evelyn.gay master         6 TRUE      TRUE            TRUE       
##  8     8 noreen.ha… murphy         6 TRUE      TRUE            TRUE       
##  9     9 gladys.wa… lwsves2        7 TRUE      FALSE           FALSE      
## 10    10 brant.zim… 1190KARE…     16 FALSE     FALSE           FALSE      
## 11    11 leanna.ab… aivlys24       8 FALSE     FALSE           FALSE      
## 12    12 milford.h… hubbard        7 TRUE      FALSE           FALSE

Passwords should not be your name

It turns out many of our passwords were common English words too! Next up on the NIST list:

Verifiers SHALL compare the prospective secrets against a list that contains […] context-specific words, such as the name of the service, the username, and derivatives thereof.

Ok, so there are many things we could check here. One thing to notice is that our users’ usernames consist of their first names and last names separated by a dot. For now, let’s just flag passwords that are the same as either a user’s first or last name.

# Extracting first and last names into their own columns
users$first_name <- str_extract(users$user_name, "^\\w+")
users$last_name <- str_extract(users$user_name, "\\w+$")

# Flagging the users with passwords that matches their names
users$uses_name <- str_detect(users$password, fixed(users$first_name)) | 
                   str_detect(users$password, fixed(users$last_name))

users[users$uses_name == TRUE,]

## # A tibble: 50 x 10
##       id user_name password length too_short common_password common_word
##    <int> <chr>     <chr>     <int> <lgl>     <lgl>           <lgl>      
##  1    12 milford.… hubbard       7 TRUE      FALSE           FALSE      
##  2    23 jenny.wo… woodard       7 TRUE      FALSE           FALSE      
##  3    31 rosanna.… reid          4 TRUE      FALSE           TRUE       
##  4    85 dorian.c… dorian        6 TRUE      TRUE            FALSE      
##  5    90 jordan.h… hurley        6 TRUE      TRUE            FALSE      
##  6   113 juliette… juliette      8 FALSE     TRUE            FALSE      
##  7   123 caleb.po… caleb         5 TRUE      FALSE           FALSE      
##  8   137 rhea.ware ware          4 TRUE      FALSE           TRUE       
##  9   148 lara.bri… lara          4 TRUE      TRUE            FALSE      
## 10   152 roland.w… roland        6 TRUE      TRUE            TRUE       
## # ... with 40 more rows, and 3 more variables: first_name <chr>,
## #   last_name <chr>, uses_name <lgl>

# Counting the number of users using names as passwords
length(users[users$uses_name == TRUE,])

## [1] 10

# Taking a look at the 12 first rows
head(users, 12)

## # A tibble: 12 x 10
##       id user_name password length too_short common_password common_word
##    <int> <chr>     <chr>     <int> <lgl>     <lgl>           <lgl>      
##  1     1 vance.je… joobheco      8 FALSE     FALSE           FALSE      
##  2     2 consuelo… 0869347…     10 FALSE     FALSE           FALSE      
##  3     3 mitchel.… fabypot…     10 FALSE     FALSE           FALSE      
##  4     4 odessa.v… aharney…      9 FALSE     FALSE           FALSE      
##  5     5 araceli.… acecdn3…     10 FALSE     FALSE           FALSE      
##  6     6 shawn.ha… 5278049       7 TRUE      FALSE           FALSE      
##  7     7 evelyn.g… master        6 TRUE      TRUE            TRUE       
##  8     8 noreen.h… murphy        6 TRUE      TRUE            TRUE       
##  9     9 gladys.w… lwsves2       7 TRUE      FALSE           FALSE      
## 10    10 brant.zi… 1190KAR…     16 FALSE     FALSE           FALSE      
## 11    11 leanna.a… aivlys24      8 FALSE     FALSE           FALSE      
## 12    12 milford.… hubbard       7 TRUE      FALSE           FALSE      
## # ... with 3 more variables: first_name <chr>, last_name <chr>,
## #   uses_name <lgl>

Passwords should not be repetitive

Milford Hubbard (user number 12 above), what where you thinking!? Ok, so the last thing we are going to check is a bit tricky:

verifiers SHALL compare the prospective secrets [so that they don’t contain] repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).

This is tricky to check because what is repetitive is hard to define. Is 11111 repetitive? Yes! Is 12345 repetitive? Well, kind of. Is 13579 repetitive? Maybe not..? To check for repetitiveness can be arbitrarily complex, but here we’re only going to do something simple. We’re going to flag all passwords that contain 4 or more repeated characters.

# Splitting the passwords into vectors of single characters
split_passwords <- str_split(users$password, "")

# Picking out the max number of repeat characters for each password
users$max_repeats <- sapply(split_passwords, function(split_password) {
    rle_password <- rle(split_password)
    max(rle_password$lengths)
})

# Flagging the passwords with >= 4 repeats
users$too_many_repeats <- case_when(users$max_repeats >= 4 ~ TRUE,
                                   users$max_repeats <= 3 ~ FALSE)

# Taking a look at the users with too many repeats
users[(users$too_many_repeats == TRUE), ]

## # A tibble: 6 x 12
##      id user_name password length too_short common_password common_word
##   <int> <chr>     <chr>     <int> <lgl>     <lgl>           <lgl>      
## 1   147 patti.di… 555555        6 TRUE      TRUE            FALSE      
## 2   573 cornelia… 555555        6 TRUE      TRUE            FALSE      
## 3   645 essie.lo… 11111         5 TRUE      TRUE            FALSE      
## 4   799 charley.… 888888        6 TRUE      TRUE            FALSE      
## 5   808 thurman.… rinnnng0      8 FALSE     FALSE           FALSE      
## 6   942 mitch.fe… aaaaaa        6 TRUE      TRUE            FALSE      
## # ... with 5 more variables: first_name <chr>, last_name <chr>,
## #   uses_name <lgl>, max_repeats <int>, too_many_repeats <lgl>

All together now!

Now we have implemented all the basic tests for bad passwords suggested by NIST Special Publication 800-63B! What’s left is just to flag all bad passwords and maybe to send these users an e-mail that strongly suggests they change their password.

# Flagging all passwords that are bad
users$bad_password <- users$too_short |
                    users$common_password |
                    users$common_word |
                    users$uses_name |
                    users$too_many_repeats

# Counting the number of bad passwords
sum(users$bad_password)

## [1] 424

# Looking at the first 100 bad passwords
head(users[(users$bad_password == TRUE), ], 100)

## # A tibble: 100 x 13
##       id user_name password length too_short common_password common_word
##    <int> <chr>     <chr>     <int> <lgl>     <lgl>           <lgl>      
##  1     6 shawn.ha… 5278049       7 TRUE      FALSE           FALSE      
##  2     7 evelyn.g… master        6 TRUE      TRUE            TRUE       
##  3     8 noreen.h… murphy        6 TRUE      TRUE            TRUE       
##  4     9 gladys.w… lwsves2       7 TRUE      FALSE           FALSE      
##  5    12 milford.… hubbard       7 TRUE      FALSE           FALSE      
##  6    14 jamie.co… 310356        6 TRUE      FALSE           FALSE      
##  7    16 lorrie.g… oZ4k0QE       7 TRUE      FALSE           FALSE      
##  8    17 domingo.… chelsea       7 TRUE      TRUE            TRUE       
##  9    18 martin.p… zvc1939       7 TRUE      FALSE           FALSE      
## 10    19 shelby.m… nickgd        6 TRUE      FALSE           FALSE      
## # ... with 90 more rows, and 6 more variables: first_name <chr>,
## #   last_name <chr>, uses_name <lgl>, max_repeats <int>,
## #   too_many_repeats <lgl>, bad_password <lgl>

Otherwise, the password should be up to the user!

The password checks recommended by the NIST Special Publication 800-63B had been implemented. It’s certainly possible to better implement these checks, for example, by using a longer list of common passwords. Also note that the NIST checks in no way guarantee that a chosen password is good, just that it’s not obviously bad.

Apart from the checks we’ve implemented above the NIST is also clear with what password rules should not be imposed:

Verifiers SHOULD NOT impose other composition rules (e.g., requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets. Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically).

So the next time a website or app tells you to “include both a number, symbol and an upper and lower case character in your password” you should send them a copy of NIST Special Publication 800-63B.

# Let's enter a password that passes the NIST requirements
# PLEASE DO NOT USE AN EXISTING PASSWORD HERE
new_password <- "AdventureTimewithFinnandJake"

Well, it’s just me being a nerd and use my favorite cartoon as an imaginary password.

The Good, The Bad and The Ugly (Password)