Data 604 Discussion02 Farhana Zahir

Random Seed¶

What is a random seed in simulation, and why should we care?

A random seed is used to ensure that results are reproducible. In other words, using this parameter makes sure that anyone who re-runs the code will get the exact same outputs. Reproducibility is an extremely important concept in data science and other fields.

How Seed Function Works ? Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator. For the first time when there is no previous value, it uses current system time.

There are two very common tasks where random seeds are used:

Splitting data into training/validation/test sets: random seeds ensure that the data is divided the same way every time the code is run
Model training: algorithms such as random forest and gradient boosting are non-deterministic (for a given input, the output is not always the same) and so require a random seed argument for reproducible results

In addition to reproducibility, random seeds are also important for bench-marking results. If we are testing multiple versions of an algorithm, it’s important that all versions use the same data and are as similar as possible (except for the parameters we are testing).

# example of random.seed
# Running without seed
import random

for i in range(3):
    print(random.randint(1, 1000))

# This gives us three different numbers

#running the same function with seed

for i in range(3):
    random.seed(1006) 
    print(random.randint(1, 1000)) #this gives us the same no all three times

58
235
455
355
355
355

Syntax of random seed¶

random.seed(a=None, version=2)

This function accepts two parameters. Both are optional.

a is the seed value. If the a value is None, then by default, current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time. If seed value is in the form of an integer is used as it is.
With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used.

I could not find any source that explains how to trace back random seed generated by system with a=None

# This code gnerates seed value using system time, and prints the seed

import random
import sys

# create a seed
seedValue = random.randrange(sys.maxsize)
# save this seed somewhere. So if you like the result you can use this seed to reproduce it

# Now, Seed the random number generator
random.seed(seedValue)
print("Seed was:", seedValue)

num = random.randint(10, 500)
print("Random Number", num)

#About sys.maxsize
#sys.maxsize
#An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2**31 - 1 on a 32-bit platform
#and 2**63 - 1 on a 64-bit platform.

Seed was: 7109714834504173296
Random Number 469

How was simulation used in the article this week?

The article this week used Monte Carlo Simulations to estimate the aircraft required to cover demand by day of operation. The study uses repetitive sampling and flexible parameters so that alternate scenarios may be investigated. 250 runs of simulations were completed. The article shows results of 2 scenarios on fig 3, scenario 1 using daily operational flight hours of 6.0 to 8.0 hours, and scenario 2 stating the same parameter to 2.5 to 8.0 hours.

Share some code from last week's exercises that was particularly rewarding or frustrating. We will comment on each other's works.

# Create a new State object with an additional state variable, t_first_empty, initialized to -1 as a special value to 
#indicate that it has not been set

bikeshare = State(olin=10, wellesley=2, 
                  olin_empty=0, wellesley_empty=0,
                  clock=0, t_first_empty=-1)

#I did not understand why t_first_empty is set to -1 and not to 0, but a classmate explained that if it is set to 0 and 
#there is no frustrated customer, we will not be able to differentiate.

#Code that is really frustating me right now has to do with the following numpy functions

#What is the difference in using np.random.random vs np.random.rand?
#I have plotted both below and cannot seem to find the specialty for each.

import numpy as np
import matplotlib.pyplot as plt

caserandom=np.random.random(100)
plt.plot(caserandom)
plt.ylim(0,1)

(0, 1)

caserand=np.random.rand(100)
plt.plot(caserand)
plt.ylim(0,1)

(0, 1)