Harold Nelson
9/30/2020
There is a basic pattern for counting using a dictionary. Let’s use the string ‘Mississippi’ to demonstrate.
s = 'Mississippi'
count_dict = {}
for c in s:
if c in count_dict:
count_dict[c] = count_dict[c] + 1
else:
count_dict[c] = 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k])
## M 1
## i 4
## s 4
## p 2
The get method with the alt option is useful in simplifying this code.
s = 'Mississippi'
count_dict = {}
for c in s:
count_dict[c] = count_dict.get(c,0) + 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k])
## M 1
## i 4
## s 4
## p 2
Use the simplified version of the counting code above to count the digits in the number 2727130252053142514510171943. You can use the str() function to create a string.
s = str(2727130252053142514510171943)
count_dict = {}
for c in s:
count_dict[c] = count_dict.get(c,0) + 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k])
## 2 5
## 7 3
## 1 6
## 3 3
## 0 3
## 5 4
## 4 3
## 9 1
Modify the code in the previous exercise so that in each line of the output the count of occurrences of a digit is followed by the proportion of times the digit occurs.
s = str(2727130252053142514510171943)
count_dict = {}
for c in s:
count_dict[c] = count_dict.get(c,0) + 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k],count_dict[k]/len(s))
## 2 5 0.17857142857142858
## 7 3 0.10714285714285714
## 1 6 0.21428571428571427
## 3 3 0.10714285714285714
## 0 3 0.10714285714285714
## 5 4 0.14285714285714285
## 4 3 0.10714285714285714
## 9 1 0.03571428571428571
Let’s apply the code to calculate counts and relative frequencies of occurrences in a list rather than a string. To create a list without a lot of typing, we can start with a string and use list() to convert the string, s, to a list, l. This is just a proof of generality. Use the string ‘This is just a proof of generality.’
s = 'This is just a proof of generality.'
l = list(s)
count_dict = {}
for c in l:
count_dict[c] = count_dict.get(c,0) + 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k],count_dict[k]/len(l))
## T 1 0.02857142857142857
## h 1 0.02857142857142857
## i 3 0.08571428571428572
## s 3 0.08571428571428572
## 6 0.17142857142857143
## j 1 0.02857142857142857
## u 1 0.02857142857142857
## t 2 0.05714285714285714
## a 2 0.05714285714285714
## p 1 0.02857142857142857
## r 2 0.05714285714285714
## o 3 0.08571428571428572
## f 2 0.05714285714285714
## g 1 0.02857142857142857
## e 2 0.05714285714285714
## n 1 0.02857142857142857
## l 1 0.02857142857142857
## y 1 0.02857142857142857
## . 1 0.02857142857142857
Sometimes we want to know which value occurs most frequently. In the previous example the blank space was the most common character. It is always possible that there is a tie. In an extreme case, all values could occur with equal frequency. This is just to say that the answer may be a list rather than a single value. The following code solves this problem.
s = 'Mississippi'
count_dict = {}
for c in s:
if c in count_dict:
count_dict[c] = count_dict[c] + 1
else:
count_dict[c] = 1
# Let's look at the results
print('Number of Occurrences for each Item')
## Number of Occurrences for each Item
## M 1
## i 4
## s 4
## p 2
## Maximum Count
## 4
Win_list = []
for i in count_dict.keys():
if count_dict[i] >= maxcount:
Win_list.append(i)
print(' ')
## Most Frequent Items
## ['i', 's']
Create a list, numlist, of the numbers between 1 and 100 as strings. print the list to verify correctness.
## ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100']
Drop the print from the previous exercise. Extend the code to create a list, fdlist, of the first digit in each item in numlist. Print this list to verify correctness.
numlist = []
fdlist = []
for i in range(1,101):
s = str(i)
numlist.append(s)
fdlist.append(s[0])
print(fdlist)
## ['1', '2', '3', '4', '5', '6', '7', '8', '9', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3', '3', '3', '3', '3', '3', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '5', '5', '5', '5', '5', '5', '5', '5', '5', '5', '6', '6', '6', '6', '6', '6', '6', '6', '6', '6', '7', '7', '7', '7', '7', '7', '7', '7', '7', '7', '8', '8', '8', '8', '8', '8', '8', '8', '8', '8', '9', '9', '9', '9', '9', '9', '9', '9', '9', '9', '1']
Drop the print from the previous exercise. Extend the code to get counts and relative frequencies of the items in fdlist.
numlist = []
fdlist = []
for i in range(1,101):
s = str(i)
numlist.append(s)
fdlist.append(s[0])
count_dict = {}
for c in fdlist:
count_dict[c] = count_dict.get(c,0) + 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k],count_dict[k]/len(fdlist))
## 1 12 0.12
## 2 11 0.11
## 3 11 0.11
## 4 11 0.11
## 5 11 0.11
## 6 11 0.11
## 7 11 0.11
## 8 11 0.11
## 9 11 0.11
Change the 100 in the previous answer to 200. What do you notice.
numlist = []
fdlist = []
for i in range(1,201):
s = str(i)
numlist.append(s)
fdlist.append(s[0])
count_dict = {}
for c in fdlist:
count_dict[c] = count_dict.get(c,0) + 1
# Let's look at the results
for k in count_dict.keys():
print(k,count_dict[k],count_dict[k]/len(fdlist))
## 1 111 0.555
## 2 12 0.06
## 3 11 0.055
## 4 11 0.055
## 5 11 0.055
## 6 11 0.055
## 7 11 0.055
## 8 11 0.055
## 9 11 0.055
Google for “Benford’s law”. What do you find?