Introduction

This project is meant to give us some practice on using different datasets for analysis work. I chose three different datasets, this is the first dataset made by me. # Coffee Price by Wilson Chau I took this dataset from my own findings. I saw that this data set focused on coffee prices, which is my favorite topic: Coffee and Money.

library(knitr)
library(stringr)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
require(ggplot2)
coffee <- read.csv("https://raw.githubusercontent.com/Wilchau/Data607Project2/main/Data_1.csv")
summary(coffee)
##      Date                Open            High            Low       
##  Length:13          Min.   :111.8   Min.   :117.8   Min.   :110.6  
##  Class :character   1st Qu.:116.2   1st Qu.:118.2   1st Qu.:114.8  
##  Mode  :character   Median :117.8   Median :120.5   Median :115.8  
##                     Mean   :118.1   Mean   :120.9   Mean   :115.2  
##                     3rd Qu.:119.2   3rd Qu.:121.4   3rd Qu.:116.7  
##                     Max.   :123.5   Max.   :126.0   Max.   :117.5  
##      Close           Volume        Currency        
##  Min.   :112.5   Min.   : 3717   Length:13         
##  1st Qu.:116.2   1st Qu.: 5184   Class :character  
##  Median :116.8   Median : 6626   Mode  :character  
##  Mean   :116.8   Mean   : 6325                     
##  3rd Qu.:117.8   3rd Qu.: 7364                     
##  Max.   :119.0   Max.   :10115
str(coffee)
## 'data.frame':    13 obs. of  7 variables:
##  $ Date    : chr  "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06" ...
##  $ Open    : num  122 116 115 119 117 ...
##  $ High    : num  124 120 121 121 118 ...
##  $ Low     : num  116 116 115 116 114 ...
##  $ Close   : num  116 116 119 117 114 ...
##  $ Volume  : int  6640 5492 6165 5094 6855 7499 7499 3976 5184 3717 ...
##  $ Currency: chr  "USD" "USD" "USD" "USD" ...

#Cleaning the data a bit and observation I see that this dataset shows a timeline of the prices of the coffee. I think I will focus on numeric calculations, and comparing it from the start of 1/1/00 to 1/19/00. I will also clean up the data a bit more and try to make it more relevant.

new_coffee <- coffee[-c(7)]
new_coffee %>% arrange(desc(Open))
##          Date   Open   High    Low  Close Volume
## 1  2000-01-10 123.50 126.00 116.70 117.55   7499
## 2  2000-01-10 123.50 126.00 116.70 117.55   7499
## 3  2000-01-03 122.25 124.00 116.10 116.50   6640
## 4  2000-01-13 119.25 120.00 117.50 118.55   3717
## 5  2000-01-06 119.00 121.40 116.50 116.85   5094
## 6  2000-01-12 117.80 120.50 116.90 118.95   5184
## 7  2000-01-14 117.75 120.25 112.25 112.55  10115
## 8  2000-01-07 117.25 117.75 113.80 114.15   6855
## 9  2000-01-19 116.50 118.25 114.75 116.70   6626
## 10 2000-01-04 116.25 120.50 115.75 116.25   5492
## 11 2000-01-11 115.50 118.25 115.50 117.80   3976
## 12 2000-01-05 115.00 121.00 115.00 118.60   6165
## 13 2000-01-18 111.75 118.25 110.60 115.75   7364
summary(new_coffee)
##      Date                Open            High            Low       
##  Length:13          Min.   :111.8   Min.   :117.8   Min.   :110.6  
##  Class :character   1st Qu.:116.2   1st Qu.:118.2   1st Qu.:114.8  
##  Mode  :character   Median :117.8   Median :120.5   Median :115.8  
##                     Mean   :118.1   Mean   :120.9   Mean   :115.2  
##                     3rd Qu.:119.2   3rd Qu.:121.4   3rd Qu.:116.7  
##                     Max.   :123.5   Max.   :126.0   Max.   :117.5  
##      Close           Volume     
##  Min.   :112.5   Min.   : 3717  
##  1st Qu.:116.2   1st Qu.: 5184  
##  Median :116.8   Median : 6626  
##  Mean   :116.8   Mean   : 6325  
##  3rd Qu.:117.8   3rd Qu.: 7364  
##  Max.   :119.0   Max.   :10115

#After cleaning, Analyazing the data I removed the Currency, and did a descending order for Open price. We can see that $123.50 is the highest price. I also presented a summary to show the statistics for the prices for Open/high/low/Close and volume measurements.

coffee_analysis <- new_coffee[,c("Date", "Open", "High", "Low", "Volume")]
coffee_analysis$Delta <- coffee_analysis$High - coffee_analysis$Low
head(coffee_analysis)
##         Date   Open   High    Low Volume Delta
## 1 2000-01-03 122.25 124.00 116.10   6640  7.90
## 2 2000-01-04 116.25 120.50 115.75   5492  4.75
## 3 2000-01-05 115.00 121.00 115.00   6165  6.00
## 4 2000-01-06 119.00 121.40 116.50   5094  4.90
## 5 2000-01-07 117.25 117.75 113.80   6855  3.95
## 6 2000-01-10 123.50 126.00 116.70   7499  9.30

#statistical analysis I look into Volume and open + high/low prices. From my accounting backbround, Volume plays a huge role in making the price goes into a shock where it can be high or low. We can see that on 1/10/00 that the delta change from high - low is 9.30 and that represents the 2nd highest change in value. I believe that the outlier on 1/14/00 has a delta of 8.00 and highest volume of 10,115. I believe that this one date might have had some variable change, but overall. From seeing this calculation. We can see that a bigger delta change = more volume due to supply and demand for coffee.

#Conclusion Looking at the firsrt 20 dates of 2000 where coffee price where changing at a time of demand. I can see that lowest price of coffee was $110.60 to the highest price of coffee $126. I can see that volume does change the price of coffee, and how market demand can cause coffee price to skyrocket or be less of a demand.