Source file ⇒ 2017-lec20.Rmd

Announcements

  1. Have an awesome break!

Today

  1. Secure Copy (SCP) from your local computer to the remote Berkeley SCF server
  2. Command line tools useful for data cleaning
  3. Converting one liners at the command line into a shell scipt

0. Secure Copy (SCP) from your local computer to the remote Berkeley SCF server

If you had lab already this week then you already did this. I am reposting the instructions for your convenience.

We can’t upload files to the server directly. Instead, we need to use something called scp. Here’s how it works—you will have an opportunity to practice this in lab this week.

The instructions here are different for Mac and PC users:

MAC USERS

  1. Open a terminal window by typing terminal in the spotlight search in the top right of your screen.

  2. In the terminal, use the following command to upload your file to the ‘radagast‘ Berkeley statistics server.

scp pathToFileOnYourComputer/file.extension username@server:/PathToCopyFileInto

PC USERS

You will need to download and install WinSCP:

Here is a link to download the latest version of WinSCP:

WinSCP

Here is a you tube video on how to use WinSCP:

you tube video

Find your file in the left side of WinSCP and drag it to the Documents directory on the right side of WinSCP.

1. Command line tools useful for data cleaning

Here are some useful command line shortcuts:

wget download a file from the web
egrep - print lines matching a pattern (regex)
cut - extract columns of data from a field-delimited file

1a. egrep and cut

EXAMPLE:

Here is a tab delimited data set about potatoes

wget -O potatoes.txt http://s3.amazonaws.com/assets.datacamp.com/course/importing_data_into_r/potatoes.txt
head potatoes.txt
## --2017-04-01 11:04:53--  http://s3.amazonaws.com/assets.datacamp.com/course/importing_data_into_r/potatoes.txt
## Resolving s3.amazonaws.com... 52.216.227.75
## Connecting to s3.amazonaws.com|52.216.227.75|:80... connected.
## HTTP request sent, awaiting response... 200 OK
## Length: 3575 (3.5K) [text/plain]
## Saving to: 'potatoes.txt'
## 
##      0K ...                                                   100%  136M=0s
## 
## 2017-04-01 11:04:54 (136 MB/s) - 'potatoes.txt' saved [3575/3575]
## 
## area temp    size    storage method  texture flavor  moistness
## 1    1   1   1   1   2.9 3.2 3.0
## 1    1   1   1   2   2.3 2.5 2.6
## 1    1   1   1   3   2.5 2.8 2.8
## 1    1   1   1   4   2.1 2.9 2.4
## 1    1   1   1   5   1.9 2.8 2.2
## 1    1   1   2   1   1.8 3.0 1.7
## 1    1   1   2   2   2.6 3.1 2.4
## 1    1   1   2   3   3.0 3.0 2.9
## 1    1   1   2   4   2.2 3.2 2.5

lets cut out the first and second and third columns and save to a file called small_potatoes

cat potatoes.txt | cut -f 1-3 > small_potatoes
head small_potatoes
## area temp    size
## 1    1   1
## 1    1   1
## 1    1   1
## 1    1   1
## 1    1   1
## 1    1   1
## 1    1   1
## 1    1   1
## 1    1   1

lets keep only those small potatoes with size equal to 2:

cat potatoes.txt | cut -f 1-3 | egrep "^.[[:space:]]2" > size2_small_potatoes
head size2_small_potatoes
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1
## 1    2   1

EXAMPLE:

  1. download the following csv file to your computer

swimming_pools.csv

  1. scp swimming_pools.csv to your scf account

For example:

scp ~/Desktop/swimming_pools.csv alucas@radagast.berkeley.edu:~/.

  1. In terminal, find all the swimming pools that have Centre in the name

cat swimming_pools.csv | egrep Centre

  1. cut out the name and address of those swimming pools

cat swimming_pools.csv | egrep Centre | cut -d "," -f 1-2

  1. save the the results of those names in a file called centre_pools.csv

cat swimming_pools.csv | egrep Centre | cut -d "," -f 1-2 > small_pools

then if have mac on your computer’s terminal type:

scp alucas@radagast.berkeley: ~/small.pools ~/Desktop/.

# unix command
cat swimming_pools.csv | egrep Centre | cut -d "," -f 1-2 > center_pools.csv
  1. transfer center_pools.csv to your desktop using SCP or WinSCP as instructed above.

scp pathToFileOnYourComputer/file.extension username@server:/PathToCopyFileInto

In Class exercise

Do example 1a in Star wars.

https://scf.berkeley.edu:3838/shiny/alucas/Lecture-20-collection/

1b. The unix command sed

Sed has many uses but we will focus on sed for substitution

syntax: sed s/regex/replacement/FLAG file OR

cat file | sed s/regex/replacement/FLAG

FLAGS can be any of the following:

  • nothing Replace only first instance of Regexp with replacement
  • g Replace all the instances of Regexp with replacement
  • n Could be any number, replae nth instance of regex with replacement
  • i match Regex in a case insensitive manner.

EXAMPLE:

echo one two three, three two one, one one hundred > file
cat file | sed s/one/ONE/g  
## ONE two three, three two ONE, ONE ONE hundred

EXAMPLE:

echo day sunday | sed s/day/night/
## night sunday

2. Converting one liners at the command line into a shell scipt

We saw the following commands to make a file called small_potatoes

wget -O potatoes.txt http://s3.amazonaws.com/assets.datacamp.com/course/importing_data_into_r/potatoes.txt
cat potatoes.txt | cut -f 1-2 > small_potatoes
head -5 small_potatoes

Suppose we would like to actually make this into a script that we can reuse.

Steps:

  1. type nano potatoes.sh
  2. write shebang at top of script #!/usr/bin/env bash
  3. write your script in nano with $1, $2, … as parameters


wget -O potatoes.txt http://s3.amazonaws.com/assets.datacamp.com/course/importing_data_into_r$
cat potatoes.txt | cut -f $1-$2 > small_potatoes
head -$3 small_potatoes
  1. exit nano and save potatoes.sh

Files and directories in Unix may have three types of permissions: read (r), write (w), and execute (x). Each permission may be on or off for each of three categories of users: the file or directory owner; other people in the same group as the owner; and all others.

  1. in terminal add permission to execute (chmod u+x potatoes.sh)

  2. parameterize (./potatoes.sh 1 2 5)

In Class exercise

Do example 2a,2b in sed and scripts

https://scf.berkeley.edu:3838/shiny/alucas/Lecture-20-collection/