R vs. Python vs. SQL

Lynna Jirpongopas

Class Introductions

  • Name
  • What do you do for work?
  • Which programming language do you already know?
  • Why take this class?
  • Fun fact!

Class Objectives

  • Build basic understanding on the purposes of the 3 tools
  • Remove barriers & frustration
  • Get a taste for R, Python, and SQL
  • Name 2-3 similarities among the tools
  • Name 2-3 differences

Schedule

Time Topics
9:00 - 9:30 Intro, Tools Overview & Syntax
9:30 - 9:40 Exercise
9:40 - 9:55 Loading a .csv File Demo
9:55 - 10:10 Exercise
5 minutes break
10:15 - 10:30 Defining Data Types
10:30 - 10:45 Exercise
5 minutes break
10:50 - 11:10 Simple Statistics
11:10 - 11:30 Exercise
11:30 - 11:50 Aggregate Functions
11:50 - 12:15 Last Exercise
12:15 - 12:30 Wrap up

Background History

SQL has been around since 1970

  • Initially developed by IBM
  • Structured Query Language for manipulating and retreiving data

Python was invented in the late 1980's

  • Guido van Rossum at a Dutch Math & Computer Sci research center
  • General-purpose, high-level programming language

R 0.16 was released in 1997

  • Invented at University of Auckland, New Zealand
  • Statistical computing

Tools For Tools

R : RStudio, RGui
Packages : ggplot2, reshape

Python : Jupyter Notebook (iPython), Mac Terminal
Libraries : Pandas, NumPy

SQL : PostgreSQL, SQLite, Toad for Oracle, much more…

Syntax Differences Samples

R vs. Python Print

#This is R
print("Hello World")
#This is Python
print "Hello World"

R vs. Python Math Operations 1/2

same arithmatic symbols for adding, subtracting, multiplication, and division

+ - * /

Note: Python output number type = input data type

R vs. Python Math Operations 2/2

x to the power of y

#This is R
x^y
#This is Python
pow(x, y)

x ** y

Exercise Problems

Find a partner. Take turns writing in R & Python.

1) Print Hello World

2) 6 to the power of 4

3) Make x=3, m=8, b=6, and y=mx+b.

What is y?

4) What is 8 divided by 3?

The answer should be in decimal form.

Housekeeping before loading data

Know the path to your data

.csv Load R

#define path to file
dataFile <- "C:/Users/lynnaj/Documents/girldevelopit/
fun_data/vdeOilData.csv"

#load data
vdeOilDF <- read.csv(dataFile)

#view the first 5 lines of the data
head(vdeOilDF)

#https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

#You can also double click on the dataframe in Environment tab

.csv Load Python

import pandas as pd

datafile = 'C:\\Users\\lynnaj\\Documents\\girldevelopit\\fun_data\\vdeOilData.csv'

vdeOilDF = pd.read_csv(datafile)

vdeOilDF.head()

.csv Load PostgreSQL

Create an empty table:

CREATE TABLE VDEandOil (vde_prices double, week_number double, month varchar, year double, oil_prices double);

Stuff the data into the table:

COPY VDEandOil FROM '/path/to/VDEandOil.txt' DELIMITER ',' CSV;

http://stackoverflow.com/questions/2987433/how-to-import-csv-file-data-into-a-postgresql-table

Exercise

1) Read vdeOilData.csv or sensorsData.csv onto your RStudio Software

2) Read the same file onto your iPython Notebook

3) Import the file onto your PostgreSQL DB

Answers

vdeOilDF$Date <- as.Date(vdeOilDF$Date, format="%Y-%m-%d")
vdeOilDF$weekNumber <- as.factor(vdeOilDF$weekNumber)
vdeOilDF$vdeClose <- as.numeric(vdeOilDF$vdeClose)

R Data Types

Python Data Types

SQL Data Types

Exercise

Answers

vdeOilDF$pricePerBarrel <- as.numeric(vdeOilDF$pricePerBarrel)
vdeOilDF$year <- as.numeric(vdeOilDF$year)
vdeOilDF$month <- as.factor(vdeOilDF$month)

R Statistics Summary

summary(vdeOilDF)

mean(vdeOilDF$pricePerBarrel)

max()
min()
sd()

Python Statistics Summary

SQL Statistics Summary

Exercise

R Aggregate Functions

Python Aggregate Functions

SQL Aggregate Functions

Exercise

R Visualization

Python Visualization

Wrap up

Slide With Code

summary(cars)
     speed           dist       
 Min.   : 4.0   Min.   :  2.00  
 1st Qu.:12.0   1st Qu.: 26.00  
 Median :15.0   Median : 36.00  
 Mean   :15.4   Mean   : 42.98  
 3rd Qu.:19.0   3rd Qu.: 56.00  
 Max.   :25.0   Max.   :120.00  

Slide With Plot

plot of chunk unnamed-chunk-12

Online Resources