A stock market is where buyers and sellers trade shares of a company, and is one of the most popular ways for individuals and companies to invest money. The size of the world stock market is now estimated to be in the trillions. The largest stock market in the world is the New York Stock Exchange (NYSE), located in New York City. About 2,800 companies are listed on the NYSE. In this problem, we’ll look at the monthly stock prices of five of these companies: IBM, General Electric (GE), Procter and Gamble, Coca Cola, and Boeing. The data used in this problem comes from Infochimps.
There are two main types of crimes: violent crimes, and property crimes. In this problem, we’ll focus on one specific type of property crime, called “motor vehicle theft” (sometimes referred to as grand theft auto). This is the act of stealing, or attempting to steal, a car. In this problem, we’ll use some basic data analysis in R to understand the motor vehicle thefts in Chicago.
Download and read the following files into R, using the read.csv function: IBMStock.csv, GEStock.csv, ProcterGambleStock.csv, CocaCola.csv, and BoeingStock.csv
ID: the date of the stock price, always given as the first of the month.
StockPrice: the average stock price of the company in the given month.
In this problem, we’ll take a look at how the stock dynamics of these companies have changed over time.
Right now, the date variable is stored as a factor. We can convert this to a “Date” object in R by using the following five commands (one for each data set):
# Load the datasets
IBM = read.csv("IBMStock.csv")
GE = read.csv("GEStock.csv")
ProcterGamble = read.csv("ProcterGambleStock.csv")
CocaCola = read.csv("CocaColaStock.csv")
Boeing = read.csv("BoeingStock.csv")
# Convert the factor into a date object
IBM$Date = as.Date(IBM$Date, "%m/%d/%y")
GE$Date = as.Date(GE$Date, "%m/%d/%y")
CocaCola$Date = as.Date(CocaCola$Date, "%m/%d/%y")
ProcterGamble$Date = as.Date(ProcterGamble$Date, "%m/%d/%y")
Boeing$Date = as.Date(Boeing$Date, "%m/%d/%y")# Outputs the string
str(IBM)
## 'data.frame': 480 obs. of 2 variables:
## $ Date : Date, format: "1970-01-01" "1970-02-01" "1970-03-01" "1970-04-01" ...
## $ StockPrice: num 360 347 327 320 270 ...Using the str function, we can see that each data set has 480 observations. We have monthly data for 40 years, so there are 12*40 = 480 observations.
# Outputs the summary
z = summary(IBM)
kable(z)| Date | StockPrice | |
|---|---|---|
| Min. :1970-01-01 | Min. : 43.40 | |
| 1st Qu.:1979-12-24 | 1st Qu.: 88.34 | |
| Median :1989-12-16 | Median :112.11 | |
| Mean :1989-12-15 | Mean :144.38 | |
| 3rd Qu.:1999-12-08 | 3rd Qu.:165.41 | |
| Max. :2009-12-01 | Max. :438.90 |
Using the summary function, the minimum value of the Date variable is January 1, 1970 for any dataset.
# Outputs the summary
z = summary(IBM)
kable(z)| Date | StockPrice | |
|---|---|---|
| Min. :1970-01-01 | Min. : 43.40 | |
| 1st Qu.:1979-12-24 | 1st Qu.: 88.34 | |
| Median :1989-12-16 | Median :112.11 | |
| Mean :1989-12-15 | Mean :144.38 | |
| 3rd Qu.:1999-12-08 | 3rd Qu.:165.41 | |
| Max. :2009-12-01 | Max. :438.90 |
Using the summary function, the maximum value of the Date variable is December 1, 2009 for any dataset.
# Outputs the summary
z = summary(IBM)
kable(z)| Date | StockPrice | |
|---|---|---|
| Min. :1970-01-01 | Min. : 43.40 | |
| 1st Qu.:1979-12-24 | 1st Qu.: 88.34 | |
| Median :1989-12-16 | Median :112.11 | |
| Mean :1989-12-15 | Mean :144.38 | |
| 3rd Qu.:1999-12-08 | 3rd Qu.:165.41 | |
| Max. :2009-12-01 | Max. :438.90 |
By typing summary(IBM), we can see that the mean value of the IBM StockPrice is 144.38.
# Outputs the summary
z = summary(GE)
kable(z)| Date | StockPrice | |
|---|---|---|
| Min. :1970-01-01 | Min. : 9.294 | |
| 1st Qu.:1979-12-24 | 1st Qu.: 44.214 | |
| Median :1989-12-16 | Median : 55.812 | |
| Mean :1989-12-15 | Mean : 59.303 | |
| 3rd Qu.:1999-12-08 | 3rd Qu.: 72.226 | |
| Max. :2009-12-01 | Max. :156.844 |
By typing summary(GE), we can see that the minimum value of the GE StockPrice is 9.294.
# Outputs the summary
z = summary(CocaCola)
kable(z)| Date | StockPrice | |
|---|---|---|
| Min. :1970-01-01 | Min. : 30.06 | |
| 1st Qu.:1979-12-24 | 1st Qu.: 42.76 | |
| Median :1989-12-16 | Median : 51.44 | |
| Mean :1989-12-15 | Mean : 60.03 | |
| 3rd Qu.:1999-12-08 | 3rd Qu.: 69.62 | |
| Max. :2009-12-01 | Max. :146.58 |
By typing summary(CocaCola), we can see that the maximum value of the Coca-Cola StockPrice is 146.58.
# Output the summary
z = summary(Boeing)
kable(z)| Date | StockPrice | |
|---|---|---|
| Min. :1970-01-01 | Min. : 12.74 | |
| 1st Qu.:1979-12-24 | 1st Qu.: 34.64 | |
| Median :1989-12-16 | Median : 44.88 | |
| Mean :1989-12-15 | Mean : 46.59 | |
| 3rd Qu.:1999-12-08 | 3rd Qu.: 57.21 | |
| Max. :2009-12-01 | Max. :107.28 |
By typing summary(Boeing), we can see that the median value of the Boeing StockPrice is 44.88.
# Calculate the standard deviation
sd(ProcterGamble$StockPrice)
## [1] 18.19414By typing sd(ProcterGamble$StockPrice), we can see that the standard deviation of the Procter & Gamble StockPrice is 18.19414.
# Scatter plot
plot(CocaCola$Date, CocaCola$StockPrice, type="l")# Scatterplot
plot(CocaCola$Date, CocaCola$StockPrice, type="l", col="red")
lines(ProcterGamble$Date, ProcterGamble$StockPrice, col="blue")
abline(v=as.Date(c("2000-03-01")), lwd=2)
legend("bottomleft",
legend=c("Coca Cola", "ProcterGamble"),
col=c("red", "blue"), lty=1:2, cex=0.8)Procter and Gamble.
Coca-Cola.
Coca-Cola.
Let’s take a look at how the stock prices changed from 1995-2005 for all five companies. In your R console, start by typing the following plot command:
# Scatterplot
plot(CocaCola$Date[301:432], CocaCola$StockPrice[301:432], type="l", col="red", ylim=c(0,210), xlab = "Date", ylab = "Stock Price")
lines(ProcterGamble$Date[301:432], ProcterGamble$StockPrice[301:432], col="blue")
lines(IBM$Date[301:432], IBM$StockPrice[301:432], col="green")
lines(GE$Date[301:432], GE$StockPrice[301:432], col="purple")
lines(Boeing$Date[301:432], Boeing$StockPrice[301:432], col="orange")
abline(v=as.Date(c("2000-03-01")), lwd=2)
legend( "topleft",
legend=c("Coca Cola", "ProcterGamble", "IBM", "GE", "Boeing"),
col=c("red", "blue", "green", "purple", "orange"), lty=1:2, cex=0.8)General Electric (GE).
IBM.
# Scatterplot
plot(CocaCola$Date[301:432], CocaCola$StockPrice[301:432], type="l", col="red", ylim=c(0,210), xlab = "Date", ylab = "Stock Price")
lines(ProcterGamble$Date[301:432], ProcterGamble$StockPrice[301:432], col="blue")
lines(IBM$Date[301:432], IBM$StockPrice[301:432], col="green")
lines(GE$Date[301:432], GE$StockPrice[301:432], col="purple")
lines(Boeing$Date[301:432], Boeing$StockPrice[301:432], col="orange")
abline(v=as.Date(c("1997-09-01")), lwd=2)
abline(v=as.Date(c("1997-11-01")), lwd=2)
legend( "topleft",
legend=c("Coca Cola", "ProcterGamble", "IBM", "GE", "Boeing"),
col=c("red", "blue", "green", "purple", "orange"), lty=1:2, cex=0.8)Two companies had a decreasing trend in stock prices from September 1997 to November 1997: Boeing and Procter & Gamble.
# Scatterplot
plot(CocaCola$Date[301:432], CocaCola$StockPrice[301:432], type="l", col="red", ylim=c(0,210), xlab = "Date", ylab = "Stock Price")
lines(ProcterGamble$Date[301:432], ProcterGamble$StockPrice[301:432], col="blue")
lines(IBM$Date[301:432], IBM$StockPrice[301:432], col="green")
lines(GE$Date[301:432], GE$StockPrice[301:432], col="purple")
lines(Boeing$Date[301:432], Boeing$StockPrice[301:432], col="orange")
abline(v=as.Date(c("2004-1-01")), lwd=2)
abline(v=as.Date(c("2006-1-01")), lwd=2)
legend( "topleft",
legend=c("Coca Cola", "ProcterGamble", "IBM", "GE", "Boeing"),
col=c("red", "blue", "green", "purple", "orange"), lty=1:2, cex=0.8)Boeing is steadily increasing from 2004 to the beginning of 2006.
Lastly, let’s see if stocks tend to be higher or lower during certain months.
# Compare two variables using a statistical measure
z = tapply(IBM$StockPrice, months(IBM$Date), mean)
kable(z)| x | |
|---|---|
| April | 152.1168 |
| August | 140.1455 |
| December | 140.7593 |
| February | 152.6940 |
| January | 150.2384 |
| July | 139.0670 |
| June | 139.0907 |
| March | 152.4327 |
| May | 151.5022 |
| November | 138.0187 |
| October | 137.3466 |
| September | 139.0885 |
The overall average stock price for IBM is 144.375, which can be computed using the command mean(IBM$StockPrice). Comparing the monthly averages to this, using the command tapply(IBM$StockPrice, months(IBM$Date), mean), we can see that the price has historically been higher than average January - May, and lower than average during the remaining months.
# Compares two variables using a statistical measure
z = tapply(GE$StockPrice, months(GE$Date), mean)
kable(z)| x | |
|---|---|
| April | 64.48009 |
| August | 56.50315 |
| December | 59.10217 |
| February | 62.52080 |
| January | 62.04511 |
| July | 56.73349 |
| June | 56.46844 |
| March | 63.15055 |
| May | 60.87135 |
| November | 57.28879 |
| October | 56.23897 |
| September | 56.23913 |
z = tapply(CocaCola$StockPrice, months(CocaCola$Date), mean)
kable(z)| x | |
|---|---|
| April | 62.68888 |
| August | 58.88014 |
| December | 59.73223 |
| February | 60.73475 |
| January | 60.36849 |
| July | 58.98346 |
| June | 60.81208 |
| March | 62.07135 |
| May | 61.44358 |
| November | 59.10268 |
| October | 57.93887 |
| September | 57.60024 |
z = tapply(IBM$StockPrice, months(IBM$Date), mean)
kable(z)| x | |
|---|---|
| April | 152.1168 |
| August | 140.1455 |
| December | 140.7593 |
| February | 152.6940 |
| January | 150.2384 |
| July | 139.0670 |
| June | 139.0907 |
| March | 152.4327 |
| May | 151.5022 |
| November | 138.0187 |
| October | 137.3466 |
| September | 139.0885 |
z = tapply(ProcterGamble$StockPrice, months(ProcterGamble$Date), mean)
kable(z)| x | |
|---|---|
| April | 77.68671 |
| August | 76.82266 |
| December | 78.29661 |
| February | 79.02575 |
| January | 79.61798 |
| July | 76.64556 |
| June | 77.39275 |
| March | 77.34761 |
| May | 77.85958 |
| November | 78.45610 |
| October | 76.67903 |
| September | 76.62385 |
z = tapply(Boeing$StockPrice, months(Boeing$Date), mean)
kable(z)| x | |
|---|---|
| April | 47.04686 |
| August | 46.86311 |
| December | 46.17315 |
| February | 46.89223 |
| January | 46.51097 |
| July | 46.55360 |
| June | 47.38525 |
| March | 46.88208 |
| May | 48.13716 |
| November | 45.14990 |
| October | 45.21603 |
| September | 46.30485 |
General Electric has an average stock price of 64.48 in April, which is higher than any other month. Coca-Cola has an average stock price of 62.69 in April, which is higher than any other month.
Having lower stock prices in December is a trend that holds for all five companies.