**This report is my submission towards the week-6 assignment. In this report, I have described the dataset which I will be working on for BANA-8090 final project. My efforts so far to clean the data and a plan regarding the analysis that I wish to carry out are also listed.
This is a public dataset which contains the historical data about the U.S. Imports of Crude Oil from 1973-present. The dataset has Monthly report that provides national import data including quantity, price of unit for crude oil. It also measures the changes from the previous period. Statistics are also reported on a year-to-date basis. The data was downloaded from an Online Resource where we can find many national dataset regarding various economic and social aspects.
The refrences can be found here. The publisher of the dataset being used is US Census Bureau, Department of Commerce. The license of access and use is held by U.S. Government Work.
The data is available as a text file and is also downloadable as a pdf. The pdf version gives the information about what each column denotes. The information on data source, non sampling errors and definitions can be found here.
Month The month for which import of crude oil is provided
Quantity Quantity of crude oil imported (in thousands of barrel)
Change in Quantity The change in the import of crude oil from the previous period stated
Value Value of crude oil imported (in thousands of Dollars)
Change in Value The change in the amount spent on import of crude oil from the previous period stated
Unit Price Price of one unit crude oil imported (in Dollars)
Change in Price The change in the price of one unit of crude oil compared to the previous period stated
The data is imported from the .txt file using read_table. The raw data has 286 observations and 13 variables.
url<-"http://www.census.gov/foreign-trade/statistics/historical/petr.txt"
library(readr)
library(tidyverse)
dataset<-read_table(url, col_names = FALSE, skip = 12)
dataset.tibble<-as_tibble(dataset)
dataset.tibble
The data looked mostly clean and did not had any missing values as such. But the Major Challenges I have encountered are :
1. The “Misc” Column has two columns merged which have to be Separated.
2. After that is done the last seven columns have to be gathered in continuation of the existing rows which would then double the entries.
3. Add a “Year” Column.
delrows<-function(df,n){df[-seq(1,nrow(df), by=n),]}
dataset1<-delrows(dataset,14)
colnames(dataset1)<-c("Month", "Quantity", "Change in Quantity", "Value", "Change in Value", "Unit Price", "Misc", "Quantity", "Change in Quantity", "Value", "Change in Value", "Unit Price", "Change in Price")
dataset1.tibble<-as_tibble(dataset1)
dataset1.tibble
After Cleaning the data completely I would analyse the data to achieve the following objectives:
-> Visualize the trend in imports of crude oil by U.S. over the years.
-> Calculate the statistics of amount spent on the crude oil in each individual year which would give an idea about the trend of consumption in an year that is over which quarter is consumption the highest.
-> I would try to evalute that over what factors does the price of a unit depends and how has this changed over years.