NLP in Python: Regular Expressions

We will use the Python ‘re’ library for regular expressions.

import re
pattern = re.compile(r'[^ ,][DE]\w+\s[A-Za-z]+|[^ ,][DE]\w+')
matches = pattern.findall(world)
for match in matches: 
  print(match)

## 
## Denmark
## 
## Dominican Republic
## 
## Ecuador
## 
## Egypt
## 
## El Salvador
## 
## Ethiopia

pattern = re.compile(r'New\s\w+')
matches = pattern.findall(world)
for match in matches:
  print(match)

## New Caledonia
## New Zealand
## New Guinea

Display the country names that have a character ‘y’ as the 2nd character (indexing starts from 1) in their name.

pattern = re.compile(r'[A-Z]y\w+')
matches = pattern.findall(world)
for match in matches:
  print(match)

## Cyprus
## Syria

Display the country names that have a character ‘y’ as the 6th character (indexing starts from 1) in their name.

pattern = re.compile(r'[A-Z][a-z]{4}y\w*')
matches = pattern.findall(world)
for match in matches:
  print(match)

## Norway
## Sicily
## Turkey

Display the country names that have a character ‘c’ as the 4th character (indexing starts from 1) in their name.

pattern = re.compile(r'[A-Z][a-z]{2}c\w*')
matches = pattern.findall(world)
for match in matches:
  print(match)

## Czechoslovakia
## Liechtenstein
## Seychelles