Lecture 12.11.2018


  • use github to create a repository, download, upload and update code
  • git basic commands
  • various linux commands
  • grep, agrep, egrep


Part 1

Create a bash script that will

  1. Download the fasta file.
  2. Count how many sequences have at least an indel (i.e. at least a single ‘-’)
  3. Count how many sequences have at least two continuous indels (i.e. ‘–’)
  4. Count how many sequences have a pattern that looks like that N_N_N, where N can be any nucleotide.
  5. Count how many sequences have no ‘-’
  6. Extract all motifs that are three ’G’s then a pyrimidine and then a purine.

Part 2

Create a github repository and upload your previous script. Link your remote (github) repository to your local one. The name of the repository will be “linux exercise 2”. The name of the script should be pattern.sh

Part 3

Show (demonstrate) all commands needed to add the following functionality to your script and synchronize it to the remote github repository.

In your local repository add the following option to the script

  • export only the sequence names and find out how many sequences are from human (hg19).