Regular expressions

A resume of the main regular expressions

- ^ -

represents the start of a line

^i think

- will match:   

i think it's better to go and have some rest   
i think you might go with her    

- $ -

represents the end of a line

tomorrow$   

- will match:

we will finish the project report tomorrow   
I´ll go to pick him up tomorrow

- [Aa] -

represents either a lowercase or an uppercase letter

- Several characters into brackets means that the expression matches every string that contains whichever character is included into the brackets, not in that order neither all of them.

[Tt][Hh][Ee] 

- Will match 'the' were every letter can be written in both, lowercase or uppercase.      
- As the position within the line is not indicated, it will match wherever the 'the' was placed   

THe main building is a hospital   
I already ask him for thE report   
HE IS FINALLY NOT LENDING US THE VAN   

- [a-z] -

represents a range of letters.

- [a-zA-Z] -

represents a range of letters.

- [0-9] -

represents a range of numbers.

- [^] -

`[^?]$`

- matches any line which does NOT end with a question mark.   

He is supposed to come at three   
he will come anyway!   
when is he coming? I really do not know ##the question mark is not at the end of the line.   

- "." -

is used to refer any character (a letter, a number, a symbol… anything).

a.b   

matches the lines   

the format has to be aa:bb
it seems like she doesn't know the alphabet, she says acb   
it is necessary to enumerate the letters like this: a,b,c...   
they wanted to do an ab test   ##the period `"."` also can match nothing   

- | -

The 'pipe' metacharacter is used to indicate alternatives

yesterday|today|tomorrow   

- will match the lines within either 'yesterday' or 'today' or 'tomorrow'   

Patty came yesterday   
tomorrow is the Luke's turn   
I do not know who is coming today   
^[Gg]ood|[Bb]ad   

It will match 'good' (starting with a capital or lowercase letter) at the begining of the line and 'bad' (starting with a capital or lowercase letter) wherever in the line

- Parentheses can be used as they are used in maths

^([Gg]ood|[Bb]ad)   

The '^' affects to everyting inside the parentheses. So, it will match the good or bad both at the beginning of the line

- ? -

The question mark indicates that the parenthesized subexpression is optional

[Gg]eorge( [Ww]\.)? [Bb]ush  

the 'space w. (written either in lower or in uppercase)' is optional.  
this regular expression will match  

BBC reported that George W. Bush claimed God told him to invade   
yesterday george bush was on tv   
it's said that george w. bush is coming back to politics   

- \ -

we use the slashback '\' when we want to use metacharacters (e.g. '.') as literals, to scape the metacharacter's meaning

- * - and - + -

are metacharacters used to indicate repetition of the pattern tha precedes the * or the +

- * - means that repeat whatever precedes the star any number of times, including none

(.*)   

it looks for a parentheses, followed by a dot - any character- that is going to be repeated any number of times, including no times, followed by another parenthesis.   
so, the expresion __`.*`__ basically means any string of characters   
particulary, this regular expression will match:   

I told him to come this afternoon (at 3 pm)   
It's easy to find several ways (train, metro, by car...) to get there   
() ##just the parentheses with nothing inside them   

- the _.*__ allways matches the longest possible string that satifies the regular expression_
- to turn this behaviour of and make it to take the shortest prossible string that satisfies the regular expresion, we use the question mark (.*?)

- + - means that repeat whatever precedes the cross at least once. (this means that what precedes to the cross must appear at least once)

[0-9]+   

matches those lines which have, at least, one number on them   

this car has 220 cv   
I told him to come this afternoon (at 3 pm)  

- {} -

let us specify the minimum and the maximum number of times that the regular sub-expression parenthesized that precedes the range in curly braces can be repeated.

[0-9] ([a-d]){1,3}   

i look for a number, follow by one space, follow by 1 up to 3 letters between a and d   

He lives in the 9 d apartment   
The password is 7 dac   

- \n -

where 'n' is a number

this expressions allow us to find the exact string found it with a regular sub expression, as many times as the number.

 +([a-zA-Z]+) +\1   

Matches when the string found it with the regular subexpression '([a-zA-Z]+)' is repeated once   

blah blah blah blah   
time for bed, night night twitter