Data Set

The “worldfloras.csv” file contains information about all the countries in the world. This file is in ‘comma separated’ format.

with open('worldfloras.csv', 'r') as f:
  world = f.read()

Regular Expressions

We will use the Python ‘re’ library for regular expressions.

  1. Display the country names that start with characters ‘D’ or ‘E’.
import re
pattern = re.compile(r'[^ ,][DE]\w+\s[A-Za-z]+|[^ ,][DE]\w+')
matches = pattern.findall(world)
for match in matches: 
  print(match)
  
## 
## Denmark
## 
## Dominican Republic
## 
## Ecuador
## 
## Egypt
## 
## El Salvador
## 
## Ethiopia
  1. Display the country names that start with the word ‘New’.
pattern = re.compile(r'New\s\w+')
matches = pattern.findall(world)
for match in matches:
  print(match)
## New Caledonia
## New Zealand
## New Guinea
  1. Display the country names that have a character ‘y’ as the 2nd character (indexing starts from 1) in their name.
pattern = re.compile(r'[A-Z]y\w+')
matches = pattern.findall(world)
for match in matches:
  print(match)
## Cyprus
## Syria
  1. Display the country names that have a character ‘y’ as the 6th character (indexing starts from 1) in their name.
pattern = re.compile(r'[A-Z][a-z]{4}y\w*')
matches = pattern.findall(world)
for match in matches:
  print(match)
## Norway
## Sicily
## Turkey
  1. Display the country names that have a character ‘c’ as the 4th character (indexing starts from 1) in their name.
pattern = re.compile(r'[A-Z][a-z]{2}c\w*')
matches = pattern.findall(world)
for match in matches:
  print(match)
## Czechoslovakia
## Liechtenstein
## Seychelles