This project is meant to give us some practice on using different datasets for analysis work. I chose three different datasets, this is the first dataset made by me. # Coffee Price by Wilson Chau I took this dataset from my own findings. I saw that this data set focused on coffee prices, which is my favorite topic: Coffee and Money.
library(knitr)
library(stringr)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
require(ggplot2)
coffee <- read.csv("https://raw.githubusercontent.com/Wilchau/Data607Project2/main/Data_1.csv")
summary(coffee)
## Date Open High Low
## Length:13 Min. :111.8 Min. :117.8 Min. :110.6
## Class :character 1st Qu.:116.2 1st Qu.:118.2 1st Qu.:114.8
## Mode :character Median :117.8 Median :120.5 Median :115.8
## Mean :118.1 Mean :120.9 Mean :115.2
## 3rd Qu.:119.2 3rd Qu.:121.4 3rd Qu.:116.7
## Max. :123.5 Max. :126.0 Max. :117.5
## Close Volume Currency
## Min. :112.5 Min. : 3717 Length:13
## 1st Qu.:116.2 1st Qu.: 5184 Class :character
## Median :116.8 Median : 6626 Mode :character
## Mean :116.8 Mean : 6325
## 3rd Qu.:117.8 3rd Qu.: 7364
## Max. :119.0 Max. :10115
str(coffee)
## 'data.frame': 13 obs. of 7 variables:
## $ Date : chr "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06" ...
## $ Open : num 122 116 115 119 117 ...
## $ High : num 124 120 121 121 118 ...
## $ Low : num 116 116 115 116 114 ...
## $ Close : num 116 116 119 117 114 ...
## $ Volume : int 6640 5492 6165 5094 6855 7499 7499 3976 5184 3717 ...
## $ Currency: chr "USD" "USD" "USD" "USD" ...
#Cleaning the data a bit and observation I see that this dataset shows a timeline of the prices of the coffee. I think I will focus on numeric calculations, and comparing it from the start of 1/1/00 to 1/19/00. I will also clean up the data a bit more and try to make it more relevant.
new_coffee <- coffee[-c(7)]
new_coffee %>% arrange(desc(Open))
## Date Open High Low Close Volume
## 1 2000-01-10 123.50 126.00 116.70 117.55 7499
## 2 2000-01-10 123.50 126.00 116.70 117.55 7499
## 3 2000-01-03 122.25 124.00 116.10 116.50 6640
## 4 2000-01-13 119.25 120.00 117.50 118.55 3717
## 5 2000-01-06 119.00 121.40 116.50 116.85 5094
## 6 2000-01-12 117.80 120.50 116.90 118.95 5184
## 7 2000-01-14 117.75 120.25 112.25 112.55 10115
## 8 2000-01-07 117.25 117.75 113.80 114.15 6855
## 9 2000-01-19 116.50 118.25 114.75 116.70 6626
## 10 2000-01-04 116.25 120.50 115.75 116.25 5492
## 11 2000-01-11 115.50 118.25 115.50 117.80 3976
## 12 2000-01-05 115.00 121.00 115.00 118.60 6165
## 13 2000-01-18 111.75 118.25 110.60 115.75 7364
summary(new_coffee)
## Date Open High Low
## Length:13 Min. :111.8 Min. :117.8 Min. :110.6
## Class :character 1st Qu.:116.2 1st Qu.:118.2 1st Qu.:114.8
## Mode :character Median :117.8 Median :120.5 Median :115.8
## Mean :118.1 Mean :120.9 Mean :115.2
## 3rd Qu.:119.2 3rd Qu.:121.4 3rd Qu.:116.7
## Max. :123.5 Max. :126.0 Max. :117.5
## Close Volume
## Min. :112.5 Min. : 3717
## 1st Qu.:116.2 1st Qu.: 5184
## Median :116.8 Median : 6626
## Mean :116.8 Mean : 6325
## 3rd Qu.:117.8 3rd Qu.: 7364
## Max. :119.0 Max. :10115
#After cleaning, Analyazing the data I removed the Currency, and did a descending order for Open price. We can see that $123.50 is the highest price. I also presented a summary to show the statistics for the prices for Open/high/low/Close and volume measurements.
coffee_analysis <- new_coffee[,c("Date", "Open", "High", "Low", "Volume")]
coffee_analysis$Delta <- coffee_analysis$High - coffee_analysis$Low
head(coffee_analysis)
## Date Open High Low Volume Delta
## 1 2000-01-03 122.25 124.00 116.10 6640 7.90
## 2 2000-01-04 116.25 120.50 115.75 5492 4.75
## 3 2000-01-05 115.00 121.00 115.00 6165 6.00
## 4 2000-01-06 119.00 121.40 116.50 5094 4.90
## 5 2000-01-07 117.25 117.75 113.80 6855 3.95
## 6 2000-01-10 123.50 126.00 116.70 7499 9.30
#statistical analysis I look into Volume and open + high/low prices. From my accounting backbround, Volume plays a huge role in making the price goes into a shock where it can be high or low. We can see that on 1/10/00 that the delta change from high - low is 9.30 and that represents the 2nd highest change in value. I believe that the outlier on 1/14/00 has a delta of 8.00 and highest volume of 10,115. I believe that this one date might have had some variable change, but overall. From seeing this calculation. We can see that a bigger delta change = more volume due to supply and demand for coffee.
#Conclusion Looking at the firsrt 20 dates of 2000 where coffee price where changing at a time of demand. I can see that lowest price of coffee was $110.60 to the highest price of coffee $126. I can see that volume does change the price of coffee, and how market demand can cause coffee price to skyrocket or be less of a demand.