Prefecture Data Analysis Procedure

漁業データの数理解析レポート　2019-10-26
- 県　　岩手
- 魚種　ひらめ
- いわて大漁ナビ http://www.suigi.pref.iwate.jp/shikyosearch
0:分析準備・環境設定
0-1 Libraryの読み込み
0-2 Working Directoryの確認
1: データの読み込み
1-1 CSV形式の大漁ナビデータひらめの市況データ
1-2 excel形式の月毎にまとめたひらめ水揚げデータの読み込み
1-3 消費者物価指数 Consumer Price Indexの読み込み
2: 分析用データフレームの作成
2-1 年月日データの整理
2-2 日毎市況データの整理
2-3 とりひきごと市況データの日にちを時間データ
2-4 CPIをつかって現在価値の算出
2-5 データクリーニング　1 負の値とデータ欠損の削除
2-6 日毎市況データの整理テーブル化しデータ数をまとめる
3 Outlier Detection 外れ値の検出
3-1 データクリーニング　一変量による外れ値の検出　univariate detections
3-2 データクリーニング　多変数による外れ値の検出　multivariate detection
3-3 データクリーニング　多変数による外れ値の検出　multivariate detection 　Visualizing kNN distance
2-9 データクリーニング　季節性トレンドをベースにした外れ値の検出　seasonal detection
- 3: Time series analysis

漁業データの数理解析レポート　2019-10-26

県　　岩手

魚種　ひらめ

いわて大漁ナビ http://www.suigi.pref.iwate.jp/shikyosearch

0:分析準備・環境設定

0-1 Libraryの読み込み

library(tidyverse)

# tidyverse 必要なパッケージの読み込み　　
library(readr)　 #  tidyverse のファイル読み込みパッケージ
library(gdata)   # excelファイルの読み込み

library(gmodels)

library(DT)
library(xtable)
library(grid)

library(outliers)

library(ggplot2) #
library(gridExtra) # 複数の図を一括して表示
library(lubridate) # date and time format
library(tibbletime)

# outlier library
library(anomalize)  # time series anomaly detection ｂby tidy package
#https://towardsdatascience.com/tidy-anomaly-detection-using-r-82a0c776d523
library(FNN)

0-2 Working Directoryの確認

#いまのRがどこの場所で作業をしているかの確認
getwd()

[1] "/Users/gakugaku/Dropbox/Drop_REsearch/#_0_Prefecture_Analysis_Procedure/Project_Main"

1: データの読み込み

1-1 CSV形式の大漁ナビデータひらめの市況データ

#  CSV 形式の市況データを読み込む
setwd("/Users/gakugaku/Dropbox/#_0_gakuGroup/gakugroup_Yuya_Share/Current_Working_Data/")

Data_Daily_CSV <- read_csv("hirame_iwate.csv", 
                           col_types = cols(
                  X1 = col_double(),
                  水揚日 = col_date(format = ""),
                  市場 = col_factor(),
                  漁業種 = col_factor(),
                  魚種 = col_factor(),
                  規格 = col_factor(),
                  `数量(kg)` = col_double(),
                  `金額(円)` = col_double(),
                  `高値(円)` = col_double(),
                  `平均値(円)` = col_double(),
                  `安値(円)` = col_double()
) )

glimpse(Data_Daily_CSV)

Observations: 178,679
Variables: 11
$ X1           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,…
$ 水揚日       <date> 1994-01-05, 1994-01-07, 1994-03-28, 1994-04-01, 1994…
$ 市場         <fct> 釜石, 釜石, 大船渡, 久慈, 久慈, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡, 大…
$ 漁業種       <fct> 定置網, 小延縄, その他, 定置網, 底刺網, その他, その他, その他, 定置網, 定置網, その他…
$ 魚種         <fct> ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラ…
$ 規格         <fct> 生鮮, 生鮮, 山, 活魚, 生鮮, 山, 山, 生鮮, 山, 生鮮, 山, 生鮮, 山, 生鮮, 山,…
$ `数量(kg)`   <dbl> 2.0, 10.5, 1.0, 3.8, 15.8, 0.5, 4.0, 3.2, 2.2, 7.6, …
$ `金額(円)`   <dbl> 10000, 20894, 3300, 25460, 37940, 2000, 6800, 20160, …
$ `高値(円)`   <dbl> 5000, 3839, 0, 6700, 2500, 0, 0, 6800, 0, 6500, 0, 70…
$ `平均値(円)` <dbl> 5000.00, 1989.90, 3300.00, 6700.00, 2401.27, 4000.00, …
$ `安値(円)`   <dbl> 5000, 1389, 0, 6700, 2350, 0, 0, 6000, 0, 5500, 0, 30…

Fish_Market<- unique(Data_Daily_CSV$Fish_Market)
print(Fish_Market)

NULL

unique(Data_Daily_CSV$Fishery_Type)

NULL

unique(Data_Daily_CSV$Product_Type)

NULL

1-2 excel形式の月毎にまとめたひらめ水揚げデータの読み込み

1-3 消費者物価指数 Consumer Price Indexの読み込み

#  excel 形式のConsumer Price Indexの読み込み
#CPI <- read.xls("cpi_2015base_clean.xlsx")


CPI <- read_csv("Data/HIRAME_Aanlysis_CPI2015a.csv",
                col_types = cols(
                        Year_Month = col_date(format = "%Y%m"),
                        All_Items = col_double(),
                        Food = col_double(),
                        Fish_Seafood = col_double(),
                        Fresh_Fish_Seafood = col_double())
                                 )
                        
#> names(CPI)
#[1] "Year_Month"         "All_Items"          "Food"               "Fish_Seafood"       "Fresh_Fish_Seafood"

#  dataの中身を確認
str(CPI)

Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    592 obs. of  5 variables:
 $ Year_Month        : Date, format: "1970-01-01" "1970-02-01" ...
 $ All_Items         : num  30.8 30.9 31.2 31.5 31.3 31.3 31.3 31.2 31.8 32.3 ...
 $ Food              : num  29.8 30 30.4 30.7 30.2 29.9 29.8 29.6 30.4 31.2 ...
 $ Fish_Seafood      : num  22.8 22.4 22.8 22.4 22 21.7 22.5 23.1 23.4 23.5 ...
 $ Fresh_Fish_Seafood: num  24.6 23.8 24.4 23.6 22.7 22.5 23.7 24.7 25.2 25.1 ...
 - attr(*, "spec")=
  .. cols(
  ..   Year_Month = col_date(format = "%Y%m"),
  ..   All_Items = col_double(),
  ..   Food = col_double(),
  ..   Fish_Seafood = col_double(),
  ..   Fresh_Fish_Seafood = col_double()
  .. )

2: 分析用データフレームの作成

2-1 年月日データの整理

str(Data_Daily_CSV)

Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame':    178679 obs. of  11 variables:
 $ X1        : num  1 2 3 4 5 6 7 8 9 10 ...
 $ 水揚日    : Date, format: "1994-01-05" "1994-01-07" ...
 $ 市場      : Factor w/ 14 levels "釜石","大船渡",..: 1 1 2 3 3 2 2 2 2 2 ...
 $ 漁業種    : Factor w/ 22 levels "定置網","小延縄",..: 1 2 3 1 4 3 3 3 1 1 ...
 $ 魚種      : Factor w/ 1 level "ヒラメ": 1 1 1 1 1 1 1 1 1 1 ...
 $ 規格      : Factor w/ 8 levels "生鮮","山","活魚",..: 1 1 2 3 1 2 2 1 2 1 ...
 $ 数量(kg)  : num  2 10.5 1 3.8 15.8 0.5 4 3.2 2.2 7.6 ...
 $ 金額(円)  : num  10000 20894 3300 25460 37940 ...
 $ 高値(円)  : num  5000 3839 0 6700 2500 ...
 $ 平均値(円): num  5000 1990 3300 6700 2401 ...
 $ 安値(円)  : num  5000 1389 0 6700 2350 ...
 - attr(*, "spec")=
  .. cols(
  ..   X1 = col_double(),
  ..   水揚日 = col_date(format = ""),
  ..   市場 = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
  ..   漁業種 = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
  ..   魚種 = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
  ..   規格 = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
  ..   `数量(kg)` = col_double(),
  ..   `金額(円)` = col_double(),
  ..   `高値(円)` = col_double(),
  ..   `平均値(円)` = col_double(),
  ..   `安値(円)` = col_double()
  .. )

#str(Data_Monthly_EXCEL)

2-2 日毎市況データの整理

# 変数名の確認
names(Data_Daily_CSV)

 [1] "X1"         "水揚日"     "市場"       "漁業種"     "魚種"      
 [6] "規格"       "数量(kg)"   "金額(円)"   "高値(円)"   "平均値(円)"
[11] "安値(円)"

# 変数名を扱いやすいように英語に変更
names(Data_Daily_CSV) <- c("Index","Landing_Day","Fish_Market","Fishery_Type",     "Fish_Speces", "Product_Type","Landings_kg",   "Landing_Value", "High_Price",   "Average_Price", "Law_Price"  )

2-3 とりひきごと市況データの日にちを時間データ

参考　https://oku.edu.mie-u.ac.jp/~okumura/stat/timeseries.html

#  date型からより使い易いPOSIXlt型
Temp_1 <- as.Date(Data_Daily_CSV$Landing_Day,format="%Y%m%d")  
Temp_2  <- ymd(CPI$Year_Month)  

#  POSIXlt型から年と月を抽出し新しい変数を作る
Data_Daily_CSV$Year <- year(Temp_1)
Data_Daily_CSV$Month <- month(Temp_1)

CPI$Year <- year(Temp_2)
CPI$Month <- month(Temp_2)

2-4 CPIをつかって現在価値の算出

#  CPIデータのひもづけ
Data_Daily_CSV <- Data_Daily_CSV %>%
                  left_join(CPI, by=c("Year","Month")) %>%
                  mutate(Real_Average_Price = (Average_Price / Fresh_Fish_Seafood) *100,
                         Real_Landing_Value = (Landing_Value / Fresh_Fish_Seafood) *100)

glimpse(Data_Daily_CSV)

Observations: 178,679
Variables: 20
$ Index              <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
$ Landing_Day        <date> 1994-01-05, 1994-01-07, 1994-03-28, 1994-04…
$ Fish_Market        <fct> 釜石, 釜石, 大船渡, 久慈, 久慈, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡…
$ Fishery_Type       <fct> 定置網, 小延縄, その他, 定置網, 底刺網, その他, その他, その他, 定置網,…
$ Fish_Speces        <fct> ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ,…
$ Product_Type       <fct> 生鮮, 生鮮, 山, 活魚, 生鮮, 山, 山, 生鮮, 山, 生鮮, 山, 生鮮, 山…
$ Landings_kg        <dbl> 2.0, 10.5, 1.0, 3.8, 15.8, 0.5, 4.0, 3.2, 2.…
$ Landing_Value      <dbl> 10000, 20894, 3300, 25460, 37940, 2000, 6800…
$ High_Price         <dbl> 5000, 3839, 0, 6700, 2500, 0, 0, 6800, 0, 65…
$ Average_Price      <dbl> 5000.00, 1989.90, 3300.00, 6700.00, 2401.27,…
$ Law_Price          <dbl> 5000, 1389, 0, 6700, 2350, 0, 0, 6000, 0, 55…
$ Year               <dbl> 1994, 1994, 1994, 1994, 1994, 1994, 1994, 19…
$ Month              <dbl> 1, 1, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,…
$ Year_Month         <date> 1994-01-01, 1994-01-01, 1994-03-01, 1994-04…
$ All_Items          <dbl> 97.3, 97.3, 97.7, 97.9, 97.9, 97.9, 97.9, 97…
$ Food               <dbl> 93.3, 93.3, 94.3, 93.2, 93.2, 93.2, 93.2, 93…
$ Fish_Seafood       <dbl> 88.9, 88.9, 88.3, 88.0, 88.0, 88.0, 88.0, 88…
$ Fresh_Fish_Seafood <dbl> 91.3, 91.3, 90.6, 90.0, 90.0, 90.0, 90.0, 90…
$ Real_Average_Price <dbl> 5476.451, 2179.518, 3642.384, 7444.444, 2668…
$ Real_Landing_Value <dbl> 10952.903, 22884.995, 3642.384, 28288.889, 4…

Data_Daily_CSV <- Data_Daily_CSV %>%
                   select(-Year_Month)

2-5 データクリーニング　1 負の値とデータ欠損の削除

Observations: 178,679
Variables: 19
$ Index              <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
$ Landing_Day        <date> 1994-01-05, 1994-01-07, 1994-03-28, 1994-04…
$ Fish_Market        <fct> 釜石, 釜石, 大船渡, 久慈, 久慈, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡…
$ Fishery_Type       <fct> 定置網, 小延縄, その他, 定置網, 底刺網, その他, その他, その他, 定置網,…
$ Fish_Speces        <fct> ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ,…
$ Product_Type       <fct> 生鮮, 生鮮, 山, 活魚, 生鮮, 山, 山, 生鮮, 山, 生鮮, 山, 生鮮, 山…
$ Landings_kg        <dbl> 2.0, 10.5, 1.0, 3.8, 15.8, 0.5, 4.0, 3.2, 2.…
$ Landing_Value      <dbl> 10000, 20894, 3300, 25460, 37940, 2000, 6800…
$ High_Price         <dbl> 5000, 3839, 0, 6700, 2500, 0, 0, 6800, 0, 65…
$ Average_Price      <dbl> 5000.00, 1989.90, 3300.00, 6700.00, 2401.27,…
$ Law_Price          <dbl> 5000, 1389, 0, 6700, 2350, 0, 0, 6000, 0, 55…
$ Year               <dbl> 1994, 1994, 1994, 1994, 1994, 1994, 1994, 19…
$ Month              <dbl> 1, 1, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,…
$ All_Items          <dbl> 97.3, 97.3, 97.7, 97.9, 97.9, 97.9, 97.9, 97…
$ Food               <dbl> 93.3, 93.3, 94.3, 93.2, 93.2, 93.2, 93.2, 93…
$ Fish_Seafood       <dbl> 88.9, 88.9, 88.3, 88.0, 88.0, 88.0, 88.0, 88…
$ Fresh_Fish_Seafood <dbl> 91.3, 91.3, 90.6, 90.0, 90.0, 90.0, 90.0, 90…
$ Real_Average_Price <dbl> 5476.451, 2179.518, 3642.384, 7444.444, 2668…
$ Real_Landing_Value <dbl> 10952.903, 22884.995, 3642.384, 28288.889, 4…

2-6 日毎市況データの整理テーブル化しデータ数をまとめる

3 Outlier Detection 外れ値の検出

3-1 データクリーニング　一変量による外れ値の検出　univariate detections

boxplot による可視化
Grubbs’ test 　正規分布を仮定し、外れ値の検出、季節トレンドなどがある場合は使えない

3-2 データクリーニング　多変数による外れ値の検出　multivariate detection

\[D{_i}=\frac{\sum_{j=1}^{n}\left( \hat{Y}_{j} - \hat{Y}_{j \left(i \right)} \right)^{2}}{p \times MSE}\] Multivariate Model Approach Declaring an observation as an outlier based on a just one (rather unimportant) feature could lead to unrealistic inferences. When you have to decide if an individual entity (represented by row or observation) is an extreme value or not, it better to collectively consider the features (X’s) that matter. Enter Cook’s Distance.

Cooks Distance Cook’s distance is a measure computed with respect to a given regression model and therefore is impacted only by the X variables included in the model. But, what does cook’s distance mean? It computes the influence exerted by each data point (row) on the predicted outcome. where,

\[\hat{Y}_{j}\] j is the value of jth fitted response when all the observations are included. \[\hat{Y}_{j \left(i \right)}\] is the value of jth fitted response, where the fit does not include observation i. MSE is the mean squared error. p is the number of coefficients in the regression model.

Reference; http://r-statistics.co/Outlier-Treatment-With-R.html

3-3 データクリーニング　多変数による外れ値の検出　multivariate detection 　Visualizing kNN distance

Local Outlier factor (LOF) LOF はあるデータの集まりの中から外れ値を見つけ出す外れ値検知アルゴリズム LOFは空間におけるデータの密度に着目します。特に、自身の点から近傍 k 個の点といかに密かであるかを表す局所密度 (Local density) という指標に注目しますメリット:複雑な分布をしているデータにおける外れ値検知にも強い http://hktech.hatenablog.com/entry/2018/09/04/002034

\[ld = 1/{\displaystyle \frac{\sum_{Q \in N_k(P)}d(P, Q)}{k}}\] \[lof(P) =\sum_{Q \in N_k(P)}\frac{ld(Q)}{ld(P)}/k\] 外れ値スコア lof は、大きい値をとるほど外れ値である可能性が高いということを表す。


Call:
lm(formula = Real_Average_Price ~ Log_Landings_kg + Month, data = Temp_Data_Daily_CSV)

Residuals:
    Min      1Q  Median      3Q     Max 
-4064.5 -1174.6  -419.0   680.5  9127.1 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3498.1      139.9  24.996  < 2e-16 ***
Log_Landings_kg   -507.3       63.4  -8.002 1.70e-15 ***
Month2             538.7      188.9   2.851 0.004385 ** 
Month3            1142.4      181.2   6.305 3.27e-10 ***
Month4             672.7      169.9   3.959 7.69e-05 ***
Month5           -1230.0      186.8  -6.583 5.36e-11 ***
Month6           -1934.2      178.6 -10.831  < 2e-16 ***
Month7           -1460.8      177.0  -8.254  < 2e-16 ***
Month8            -228.4      174.2  -1.311 0.190073    
Month9            -636.2      168.2  -3.782 0.000159 ***
Month10           -614.8      166.3  -3.696 0.000223 ***
Month11          -1148.6      167.0  -6.879 7.22e-12 ***
Month12           -983.9      171.5  -5.736 1.06e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1846 on 3206 degrees of freedom
Multiple R-squared:  0.1935,    Adjusted R-squared:  0.1905 
F-statistic: 64.11 on 12 and 3206 DF,  p-value: < 2.2e-16

2-9 データクリーニング　季節性トレンドをベースにした外れ値の検出　seasonal detection

https://towardsdatascience.com/tidy-anomaly-detection-using-r-82a0c776d523 https://blog.exploratory.io/introduction-to-anomaly-detection-in-r-with-exploratory-a0507d40385d

#Data Preprocessing
#For Anomaly Detection using anomalize, we need to have either a tibble or tibbletime object. Hence we have to convert the dataframeinto a tibble object that follows a time series shape .

# To curry this analysis, The problem is that your date variable is not unique.
# have to extract unique dataset Fish_Market　＆　Fishery Type
Fishery_Type_List <- sort(unique(Data_Daily_CSV$Fishery_Type))
Product_Type_List <- sort(unique(Data_Daily_CSV$Product_Type))

##  この分析は同じ日付データがあるとできない市場、製品などを選ぶ必要がある。

Temp_Data_Daily_CSV　<-  Data_Daily_CSV %>%
                         filter(Fish_Market=="大船渡",Product_Type=="生鮮",Fishery_Type=="定置網")

glimpse(Temp_Data_Daily_CSV)

Observations: 5,489
Variables: 19
$ Index              <dbl> 10, 14, 18, 22, 26, 29, 32, 36, 41, 45, 49, …
$ Landing_Day        <date> 1994-04-30, 1994-05-02, 1994-05-06, 1994-05…
$ Fish_Market        <fct> 大船渡, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡, 大船渡,…
$ Fishery_Type       <fct> 定置網, 定置網, 定置網, 定置網, 定置網, 定置網, 定置網, 定置網, 定置網,…
$ Fish_Speces        <fct> ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ, ヒラメ,…
$ Product_Type       <fct> 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, 生鮮, …
$ Landings_kg        <dbl> 7.6, 21.5, 53.8, 28.4, 16.5, 12.0, 4.9, 4.6,…
$ Landing_Value      <dbl> 44100, 130210, 291990, 144730, 80970, 63000,…
$ High_Price         <dbl> 6500, 6800, 6800, 6000, 6000, 5700, 6300, 60…
$ Average_Price      <dbl> 5802.63, 6056.28, 5427.32, 5096.13, 4907.27,…
$ Law_Price          <dbl> 5500, 5500, 2500, 2900, 3500, 4000, 5800, 60…
$ Year               <dbl> 1994, 1994, 1994, 1994, 1994, 1994, 1994, 19…
$ Month              <dbl> 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
$ All_Items          <dbl> 97.9, 98.0, 98.0, 98.0, 98.0, 98.0, 98.0, 98…
$ Food               <dbl> 93.2, 92.8, 92.8, 92.8, 92.8, 92.8, 92.8, 92…
$ Fish_Seafood       <dbl> 88.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87.0, 87…
$ Fresh_Fish_Seafood <dbl> 90.0, 88.4, 88.4, 88.4, 88.4, 88.4, 88.4, 88…
$ Real_Average_Price <dbl> 6447.367, 6850.995, 6139.502, 5764.853, 5551…
$ Real_Landing_Value <dbl> 49000.00, 147296.38, 330305.43, 163721.72, 9…

Landing_Day_ts1 <- Temp_Data_Daily_CSV %>% as.tibble() %>%
                  group_by(Product_Type) %>%
                 select(Real_Average_Price, Landing_Day) %>%
                  ungroup()     # ungroupをいれないとグループ変数を保持する

Landing_Day_ts1 <- Landing_Day_ts1 %>%  
               filter(Product_Type=="生鮮")%>% 
               select(Landing_Day,Real_Average_Price) %>%
               as_tbl_time(index=Landing_Day)
           
#Time Series Decomposition with Anomalies
#One of the important things to do with Time Series data before starting with Time Series forecasting or Modelling is Time Series Decomposition where the Time series data is decomposed into Seasonal, Trend and remainder components. anomalize has got a function time_decompose() to perform the same. Once the components are decomposed, anomalize can detect and flag anomalies in the decomposed data of the reminder component which then could be visualized with plot_anomaly_decomposition() 

Landing_Day_ts1  %>% 
  time_decompose(Real_Average_Price) %>%
  anomalize(remainder) %>%
  time_recompose() %>%
  plot_anomalies(time_recomposed = TRUE, ncol = 3, alpha_dots = 0.5)

Landing_Day_ts1  %>% 
  time_decompose(Real_Average_Price, method = "stl", frequency = "auto", trend = "auto") %>%
  anomalize(remainder, method = "gesd", alpha = 0.05, max_anoms = 0.2) %>%
  plot_anomaly_decomposition()

Anomalies_1 <- Landing_Day_ts1  %>% 
  time_decompose(Real_Average_Price) %>%
  anomalize(remainder) %>%
  time_recompose() %>%
  filter(anomaly == 'Yes') 

glimpse(Anomalies_1)

Observations: 236
Variables: 10
$ Landing_Day   <date> 1994-07-20, 1994-07-25, 1994-07-26, 1994-07-28, …
$ observed      <dbl> 7938.967, 7640.194, 8505.468, 8505.468, 10015.091…
$ season        <dbl> 6.310238, -15.263521, 8.675082, 6.310238, 3.38097…
$ trend         <dbl> 4807.926, 4746.615, 4715.959, 4654.647, 4623.991,…
$ remainder     <dbl> 3124.731, 2908.843, 3780.834, 3844.510, 5387.719,…
$ remainder_l1  <dbl> -2337.612, -2337.612, -2337.612, -2337.612, -2337…
$ remainder_l2  <dbl> 2510.157, 2510.157, 2510.157, 2510.157, 2510.157,…
$ anomaly       <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", …
$ recomposed_l1 <dbl> 2476.6250, 2393.7396, 2387.0224, 2323.3459, 2289.…
$ recomposed_l2 <dbl> 7324.393, 7241.508, 7234.791, 7171.114, 7137.529,…

tsData <- EuStockMarkets[, 1] # ts data
decomposedRes <- decompose(tsData, type="mult") # use type = "additive" for additive components
plot (decomposedRes) # see plot below

stlRes <- stl(tsData, s.window = "periodic")

Prefecture Data Analysis Procedure

Prefecture Data Analysis Procedure

漁業データの数理解析レポート 2019-10-26

県 岩手

魚種 ひらめ

いわて大漁ナビ http://www.suigi.pref.iwate.jp/shikyosearch

0:分析準備・環境設定

0-1 Libraryの読み込み

0-2 Working Directoryの確認

1: データの読み込み

1-1 CSV形式の大漁ナビデータひらめの市況データ

1-2 excel形式の月毎にまとめたひらめ水揚げデータの読み込み

1-3 消費者物価指数 Consumer Price Indexの読み込み

2: 分析用データフレームの作成

2-1 年月日データの整理

2-2 日毎市況データの整理

2-3 とりひきごと市況データの日にちを時間データ

2-4 CPIをつかって現在価値の算出

2-5 データクリーニング 1 負の値とデータ欠損の削除

2-6 日毎市況データの整理 テーブル化しデータ数をまとめる

3 Outlier Detection 外れ値の検出

3-1 データクリーニング 一変量による外れ値の検出 univariate detections

3-2 データクリーニング 多変数による外れ値の検出 multivariate detection

3-3 データクリーニング 多変数による外れ値の検出 multivariate detection Visualizing kNN distance

2-9 データクリーニング 季節性トレンドをベースにした外れ値の検出 seasonal detection

3: Time series analysis

3-1

漁業データの数理解析レポート　2019-10-26

県　　岩手

魚種　ひらめ

2-5 データクリーニング　1 負の値とデータ欠損の削除

2-6 日毎市況データの整理テーブル化しデータ数をまとめる

3-1 データクリーニング　一変量による外れ値の検出　univariate detections

3-2 データクリーニング　多変数による外れ値の検出　multivariate detection

3-3 データクリーニング　多変数による外れ値の検出　multivariate detection 　Visualizing kNN distance

2-9 データクリーニング　季節性トレンドをベースにした外れ値の検出　seasonal detection