D) Probability Distributions and Generating Random Variables

Some examples:

Roll a die 10 times

sample(1:6,10,replace=T)

##  [1] 3 1 1 5 5 1 1 5 5 2

Toss a coin 10 times

sample(c('H','T'),10,replace=T)

##  [1] "T" "T" "T" "T" "T" "T" "H" "H" "T" "T"

Lottery: pick 6 from 1-54

## Default is without replacement
sample(1:54,6)

## [1] 39 11 45  6 21 28

Pick a card

cards = paste(rep(c('A',2:10,'J','Q','K'),4),c('H','D','S','C'))
sample(cards,5)

## [1] "A C" "9 S" "7 C" "J C" "A H"

E) Setting a Seed

Every sequence of pseudo-random numbers is started with a seed. If R picks the seed, you will get a different sequence of numbers each time you run the same random number generating code. Feel free to run sample(cards,5) or any of the other examples again and you will observe the results change from one run to the next. If you set the seed, you will get the same sequence every time (this come in handy when debugging code!).

## Without seed
rnorm(2)

## [1]  0.8090746 -1.3322895

rnorm(2)

## [1] -1.5696463 -0.1606194

## With seed (specify any positive integer)
set.seed(318)
rnorm(2)

## [1] -1.2718108 -0.7809122

## And again
set.seed(318)
rnorm(2)

## [1] -1.2718108 -0.7809122

F) Computation Time

Sometimes it may be useful to time your code: how long does it take to execute? The function proc.time() will accomplish this.

Then elapsed will be the actual time your computer spends on the whole endeavor. Let’s look at the following example.

start = proc.time()
RandNorm = rnorm(10000000)
elapsed = as.numeric(proc.time()-start)
elapsed

## [1] 0.89 0.00 0.97   NA   NA

An object of class proc.time() is a numeric vector of length 5, containing:

The user CPU time - the CPU time spent by the current process (i.e., the current R session)
The system time - the CPU time spent by the kernel (the operating system) on behalf of the current process. The operating system is used for things like opening files, doing input or output, starting other processes, and looking at the system clock: operations that involve resources that many processes must share.
The total elapsed time for the currently running R process
(And 5) The cumulative sum of user and system times of any child processes spawned by it on which it has waited. (The print method uses the summary method to combine the child times with those of the main process.)

If you would like to learn more about the differences between user time and system time, go to the following Stack Overflow link: https://stackoverflow.com/questions/5688949/what-are-user-and-system-times-measuring-in-r-system-timeexp-output. On modern systems the times will be that accurate, but on older systems they might only be accurate to \(\frac{1}{100}\) or \(\frac{1}{60}\) of a second which is still pretty accurate.

ASSIGNMENT 10 SHOULD BE POSTED!

Data Management

An extremely handy function is setwd() (set working directory). The input to this function should be the full file path to the folder you would like to work from. Alternatively, you can click Session -> Set Working Directory -> Choose Directory and select the folder you would like to work in manually (I think this is easier). Setting the working directory will save you from having to type out the entire path name everytime you load a file or create a file.

A) Data Export

cat(): outputs the user-defined objects, concatenating the representations. cat() performs much less conversion than print. It converts its arguments to character vectors, concatenates them to a single character vector, appends the given sep=string(s) to each element and then outputs them.

cat("Good Morning! \nHi!")

## Good Morning! 
## Hi!

## This will create a text file in your working directory
#cat(file="test.txt","123456","987654",sep='\n')

A matrix or data frame is typically written as a rectangular grid of data, possible with row and column labels. write.table() cononvets a specified argument to a data frame and creates a text file displaying the data if specified. write.csv() will create a csv file. write() should be used with matrices. Let’s create a data frame to write out to our working directory.

age = 18:29
height = c(76.1,77,78.1,78.2,78.8,79.7,79.9,81.1,81.2,81.8,82.8,83.5)
village = data.frame(Age=age,Height=height)
## Write this data frame to a  file in the working directory
#write.table(village,file='village.txt',sep="\t",col.names=NA,quote=F)

Try adding each of these inputs in one at a time to see their effect: sep="\t" indicates that a tab separates the columns, col.names=NA makes titles align with columns when row/index names are exported, and quote=F suppresses quotes around character values. We can also write matrix data to a file.

x = matrix(1,20,20)
#write(x,file='matrix.txt')
#write(x,file='matrix.txt',ncolumns=20)

Probability Distributions Continued, Random Number Generation, and Data Management