Lecture 12

Counting Occurrences

There is a basic pattern for counting using a dictionary. Let’s use the string ‘Mississippi’ to demonstrate.

s = 'Mississippi'
count_dict = {}
for c in s:
    if c in count_dict:
        count_dict[c] = count_dict[c] + 1
    else:
        count_dict[c] = 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k])

## M 1
## i 4
## s 4
## p 2

A Simplified Version

The get method with the alt option is useful in simplifying this code.

s = 'Mississippi'
count_dict = {}
for c in s:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k])

## M 1
## i 4
## s 4
## p 2

Exercise

Use the simplified version of the counting code above to count the digits in the number 2727130252053142514510171943. You can use the str() function to create a string.

Answer

s = str(2727130252053142514510171943)
count_dict = {}
for c in s:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k])

## 2 5
## 7 3
## 1 6
## 3 3
## 0 3
## 5 4
## 4 3
## 9 1

Exercise

Modify the code in the previous exercise so that in each line of the output the count of occurrences of a digit is followed by the proportion of times the digit occurs.

Answer

s = str(2727130252053142514510171943)
count_dict = {}
for c in s:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k],count_dict[k]/len(s))

## 2 5 0.17857142857142858
## 7 3 0.10714285714285714
## 1 6 0.21428571428571427
## 3 3 0.10714285714285714
## 0 3 0.10714285714285714
## 5 4 0.14285714285714285
## 4 3 0.10714285714285714
## 9 1 0.03571428571428571

Exercise

Let’s apply the code to calculate counts and relative frequencies of occurrences in a list rather than a string. To create a list without a lot of typing, we can start with a string and use list() to convert the string, s, to a list, l. This is just a proof of generality. Use the string ‘This is just a proof of generality.’

s = 'This is just a proof of generality.' 
l = list(s)
count_dict = {}
for c in l:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k],count_dict[k]/len(l))

## T 1 0.02857142857142857
## h 1 0.02857142857142857
## i 3 0.08571428571428572
## s 3 0.08571428571428572
##   6 0.17142857142857143
## j 1 0.02857142857142857
## u 1 0.02857142857142857
## t 2 0.05714285714285714
## a 2 0.05714285714285714
## p 1 0.02857142857142857
## r 2 0.05714285714285714
## o 3 0.08571428571428572
## f 2 0.05714285714285714
## g 1 0.02857142857142857
## e 2 0.05714285714285714
## n 1 0.02857142857142857
## l 1 0.02857142857142857
## y 1 0.02857142857142857
## . 1 0.02857142857142857

Most Frequently?

Sometimes we want to know which value occurs most frequently. In the previous example the blank space was the most common character. It is always possible that there is a tie. In an extreme case, all values could occur with equal frequency. This is just to say that the answer may be a list rather than a single value. The following code solves this problem.

s = 'Mississippi'
count_dict = {}
for c in s:
    if c in count_dict:
        count_dict[c] = count_dict[c] + 1
    else:
        count_dict[c] = 1

# Let's look at the results

print('Number of Occurrences for each Item')
for k in count_dict.keys():
    print(k,count_dict[k])
    
# What is the maximum count?
maxcount = max(count_dict.values())
print(' ')
print('Maximum Count')
print(maxcount)

Win_list = []
for i in count_dict.keys():
    if count_dict[i] >= maxcount:
        Win_list.append(i)
print(' ')
print("Most Frequent Items")
print(Win_list)

## Number of Occurrences for each Item
## M 1
## i 4
## s 4
## p 2
##  
## Maximum Count
## 4
##  
## Most Frequent Items
## ['i', 's']

Exercise

Create a list, numlist, of the numbers between 1 and 100 as strings. print the list to verify correctness.

Answer

numlist = []
for i in range(1,101):
    s = str(i)
    numlist.append(s)
print(numlist)

## ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100']

Exercise

Drop the print from the previous exercise. Extend the code to create a list, fdlist, of the first digit in each item in numlist. Print this list to verify correctness.

Answer

numlist = []
fdlist = []
for i in range(1,101):
    s = str(i)
    numlist.append(s)
    fdlist.append(s[0])
print(fdlist)

## ['1', '2', '3', '4', '5', '6', '7', '8', '9', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3', '3', '3', '3', '3', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '7', '7', '7', '7', '7', '7', '7', '7', '7', '7', '8', '8', '8', '8', '8', '8', '8', '8', '8', '8', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '1']

Exercise

Drop the print from the previous exercise. Extend the code to get counts and relative frequencies of the items in fdlist.

Answer

numlist = []
fdlist = []
for i in range(1,101):
    s = str(i)
    numlist.append(s)
    fdlist.append(s[0])

count_dict = {}
for c in fdlist:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k],count_dict[k]/len(fdlist))

## 1 12 0.12
## 2 11 0.11
## 3 11 0.11
## 4 11 0.11
## 5 11 0.11
## 6 11 0.11
## 7 11 0.11
## 8 11 0.11
## 9 11 0.11

Exercise

Change the 100 in the previous answer to 1,000. What do you notice.

Answer

numlist = []
fdlist = []
for i in range(1,1001):
    s = str(i)
    numlist.append(s)
    fdlist.append(s[0])

count_dict = {}
for c in fdlist:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k],count_dict[k]/len(fdlist))

## 1 112 0.112
## 2 111 0.111
## 3 111 0.111
## 4 111 0.111
## 5 111 0.111
## 6 111 0.111
## 7 111 0.111
## 8 111 0.111
## 9 111 0.111

Exercise

Change the 1,000 in the previouse exercise to 1,500. What do you see?

numlist = []
fdlist = []
for i in range(1,1501):
    s = str(i)
    numlist.append(s)
    fdlist.append(s[0])

count_dict = {}
for c in fdlist:
    count_dict[c] = count_dict.get(c,0) + 1

# Let's look at the results

for k in count_dict.keys():
    print(k,count_dict[k],count_dict[k]/len(fdlist))

## 1 612 0.408
## 2 111 0.074
## 3 111 0.074
## 4 111 0.074
## 5 111 0.074
## 6 111 0.074
## 7 111 0.074
## 8 111 0.074
## 9 111 0.074