This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Here is a summary of the Windturbine dataset. It has 44 attributes and 1077 datapoints.
dataf1 <- read.table("Windtur.csv")
## Warning in scan(file, what, nmax, sep, dec, quote, skip, nlines,
## na.strings, : number of items read is not a multiple of the number of
## columns
summary(dataf1)
## V1 V2 V3 V4
## 1 :1076 6 :1076 0 :1076 10 : 1
## turbine_id: 1 outage_id: 1 turbine_type: 1 100 : 1
## 1000 : 1
## 10000 : 1
## 10010 : 1
## 10020 : 1
## (Other):1071
## V5 V6 V7 V8
## 0 :1076 1619.8 : 21 386.68 : 5 390.88 : 6
## classification: 1 1620 : 15 387.3 : 5 389.11 : 4
## 1619.9 : 13 387.83 : 5 389.15 : 4
## 1620.2 : 13 386.77 : 4 389.54 : 4
## 1619.7 : 12 386.87 : 4 389.7 : 4
## 1620.1 : 7 387 : 4 390.03 : 4
## (Other):996 (Other):1050 (Other):1051
## V9 V10 V11 V12 V13
## 390.9 : 7 1341.8 : 8 1352.9 : 6 1350.2 : 5 0 : 75
## 393.22 : 7 1342 : 5 6.09 : 6 6.34 : 5 1367 : 20
## 388.36 : 6 5.51 : 5 1353 : 4 6.44 : 5 859.97 : 18
## 389.16 : 5 5.99 : 5 1354.2 : 4 6.52 : 5 1366.9 : 15
## 389.97 : 5 6.02 : 5 1354.8 : 4 7.11 : 5 860.01 : 14
## 390.52 : 5 6.03 : 5 5.81 : 4 5.75 : 4 1366.7 : 13
## (Other):1042 (Other):1044 (Other):1049 (Other):1048 (Other):922
## V14 V15 V16 V17 V18
## 0 :194 82 :302 4.14 : 6 157 : 8 82 :302
## 15.28 : 91 -0.03 : 21 5.81 : 6 341 : 4 -0.03 : 24
## 9.62 : 68 -0.36 : 18 2.96 : 5 123 : 3 0.02 : 24
## 9.61 : 58 -0.41 : 18 2.99 : 5 164.12 : 3 -0.1 : 21
## 0.01 : 24 -0.22 : 16 3.64 : 5 0 : 2 -0.06 : 18
## 0.02 : 16 -0.34 : 16 6.27 : 5 0.96 : 2 -0.08 : 16
## (Other):626 (Other):686 (Other):1045 (Other):1055 (Other):672
## V19 V20 V21 V22 V23
## 82 :239 -0.06 : 8 100.86 : 3 44.64 : 3 36.51 : 5
## 82.01 : 63 -1.71 : 6 26.52 : 3 63.19 : 3 36.96 : 5
## 0.47 : 22 -2.02 : 6 63.75 : 3 100.49 : 2 30.77 : 4
## 0.65 : 16 -0.3 : 5 100.23 : 2 103.25 : 2 36.1 : 4
## 0.3 : 15 -0.56 : 5 101.61 : 2 106.24 : 2 37.45 : 4
## 0.36 : 14 -0.74 : 5 104.09 : 2 111.32 : 2 38.79 : 4
## (Other):708 (Other):1042 (Other):1062 (Other):1063 (Other):1051
## V24 V25 V26 V27
## 44.69 : 5 58.21 : 7 17.41 : 5 36.04 : 4
## 34.76 : 3 59.48 : 5 10.23 : 4 56.89 : 4
## 35.05 : 3 59.65 : 5 10.58 : 4 65.3 : 4
## 35.08 : 3 34.17 : 4 11.12 : 4 66.71 : 4
## 35.09 : 3 42.54 : 4 11.17 : 4 68.02 : 4
## 35.34 : 3 42.58 : 4 12.73 : 4 36.09 : 3
## (Other):1057 (Other):1048 (Other):1052 (Other):1054
## V28 V29 V30 V31 V32
## -0.41 : 6 -5.34 : 9 0 : 76 0 :415 0.29 :263
## -0.99 : 6 -5.09 : 5 1367 : 19 101 : 37 99.99 : 24
## -1.91 : 6 -5.25 : 5 859.97 : 19 100 : 36 -0.09 : 14
## -0.79 : 5 -5.41 : 5 1366.9 : 16 99 : 34 0.98 : 14
## -1.62 : 5 -5.26 : 4 1366.7 : 14 102 : 22 100 : 14
## -2.71 : 5 -5.29 : 4 859.98 : 13 98 : 22 -0.24 : 13
## (Other):1044 (Other):1045 (Other):920 (Other):511 (Other):735
## V33 V34 V35 V36 V37
## 27.7 : 5 38.14 : 3 27.95 : 13 0 : 83 20.33 : 4
## 20.99 : 4 47.91 : 3 30.46 : 7 0.39 : 13 35.25 : 3
## 21.17 : 4 23.46 : 2 30.56 : 7 0.38 : 11 80.18 : 3
## 21.34 : 4 28.53 : 2 27.45 : 6 0.35 : 10 10.41 : 2
## 24.62 : 4 29.83 : 2 27.82 : 6 0.33 : 9 110.35 : 2
## 25.21 : 4 29.89 : 2 27.94 : 6 0.37 : 9 12.27 : 2
## (Other):1052 (Other):1063 (Other):1032 (Other):942 (Other):1061
## V38 V39 V40 V41
## 27.72 : 16 19.52 : 4 32.68 : 5 20.31 : 7
## 27.39 : 11 19.79 : 4 33.12 : 5 20.48 : 6
## 26.92 : 10 23.92 : 4 33.61 : 5 23.49 : 6
## 26.93 : 10 30.67 : 4 34.06 : 5 23.56 : 6
## 27.54 : 10 31 : 4 34.11 : 5 30.13 : 6
## 27.78 : 10 32.5 : 4 30.52 : 4 20.28 : 5
## (Other):1010 (Other):1053 (Other):1048 (Other):1041
## V42 V43 V44
## 20.12 : 7 21.86 : 7 19.6 : 6
## 22.88 : 7 25.58 : 7 29.61 : 6
## 22.96 : 6 29.01 : 7 25.5 : 5
## 30.41 : 6 21.98 : 6 28.31 : 5
## 19.99 : 5 23.41 : 5 21.87 : 4
## 20.13 : 5 23.59 : 5 22.04 : 4
## (Other):1041 (Other):1040 (Other):1047
str(dataf1)
## 'data.frame': 1077 obs. of 44 variables:
## $ V1 : Factor w/ 2 levels "1","turbine_id": 2 1 1 1 1 1 1 1 1 1 ...
## $ V2 : Factor w/ 2 levels "6","outage_id": 2 1 1 1 1 1 1 1 1 1 ...
## $ V3 : Factor w/ 2 levels "0","turbine_type": 2 1 1 1 1 1 1 1 1 1 ...
## $ V4 : Factor w/ 1077 levels "10","100","1000",..: 1077 1 189 300 411 522 633 744 855 966 ...
## $ V5 : Factor w/ 2 levels "0","classification": 2 1 1 1 1 1 1 1 1 1 ...
## $ V6 : Factor w/ 828 levels "-10.29","-10.81",..: 828 511 589 340 607 694 724 755 754 595 ...
## $ V7 : Factor w/ 766 levels "375.11","376.28",..: 766 147 176 124 182 283 327 381 441 397 ...
## $ V8 : Factor w/ 800 levels "377.68","378.86",..: 800 151 176 121 200 314 347 393 424 395 ...
## $ V9 : Factor w/ 728 levels "378.38","379.57",..: 728 191 230 166 231 319 360 411 472 437 ...
## $ V10: Factor w/ 906 levels "1004.5","1006.3",..: 906 326 414 259 444 578 599 700 697 398 ...
## $ V11: Factor w/ 919 levels "10.03","1001.4",..: 919 337 424 272 453 599 619 723 718 405 ...
## $ V12: Factor w/ 925 levels "10.27","10.31",..: 925 344 430 279 461 587 609 712 709 411 ...
## $ V13: Factor w/ 794 levels "0","0.02","0.51",..: 794 603 664 598 725 775 17 47 54 654 ...
## $ V14: Factor w/ 380 levels "0","0.01","0.02",..: 380 347 361 346 68 108 124 142 147 358 ...
## $ V15: Factor w/ 444 levels "-0.01","-0.02",..: 444 11 9 106 55 13 13 18 26 20 ...
## $ V16: Factor w/ 639 levels "1.89","10.02",..: 639 273 336 221 341 430 447 465 406 293 ...
## $ V17: Factor w/ 1027 levels "0","0.07","0.2",..: 1027 655 610 615 588 556 437 440 198 928 ...
## $ V18: Factor w/ 446 levels "-0.01","-0.02",..: 446 28 9 94 35 21 21 7 6 10 ...
## $ V19: Factor w/ 456 levels "0.08","0.09",..: 456 11 14 103 56 41 41 31 20 18 ...
## $ V20: Factor w/ 658 levels "-0.01","-0.03",..: 658 144 204 477 81 123 161 151 262 310 ...
## $ V21: Factor w/ 1007 levels "100.16","100.17",..: 1007 583 591 579 567 599 620 640 682 664 ...
## $ V22: Factor w/ 1026 levels "100.17","100.34",..: 1026 587 595 582 573 600 623 645 691 668 ...
## $ V23: Factor w/ 866 levels "22.79","22.85",..: 866 398 350 330 301 283 266 258 250 248 ...
## $ V24: Factor w/ 947 levels "23.36","23.37",..: 947 534 510 489 468 452 443 431 427 424 ...
## $ V25: Factor w/ 860 levels "34.02","34.03",..: 860 676 674 665 656 655 659 661 671 662 ...
## $ V26: Factor w/ 823 levels "10","10.01","10.03",..: 823 33 26 27 9 816 799 792 788 785 ...
## $ V27: Factor w/ 904 levels "35.99","36.01",..: 904 658 668 610 613 656 695 702 724 664 ...
## $ V28: Factor w/ 668 levels "-0.01","-0.02",..: 668 142 208 489 89 134 166 148 263 325 ...
## $ V29: Factor w/ 938 levels "-0.7","-0.88",..: 938 252 223 237 227 189 134 56 791 11 ...
## $ V30: Factor w/ 792 levels "0","0.25","1.07",..: 792 603 665 599 721 773 14 44 51 655 ...
## $ V31: Factor w/ 107 levels "-2","0","1","10",..: 107 27 25 21 64 70 59 65 60 37 ...
## $ V32: Factor w/ 617 levels "-0.02","-0.03",..: 617 168 241 73 247 350 370 388 400 249 ...
## $ V33: Factor w/ 840 levels "16.21","16.22",..: 840 264 242 222 204 206 209 198 193 170 ...
## $ V34: Factor w/ 1001 levels "100.56","100.58",..: 1001 476 474 463 451 466 487 499 524 522 ...
## $ V35: Factor w/ 650 levels "23.03","23.07",..: 650 484 480 470 464 461 455 444 428 422 ...
## $ V36: Factor w/ 538 levels "-0.01","-0.02",..: 538 145 528 202 113 466 468 486 108 129 ...
## $ V37: Factor w/ 1028 levels "0.36","10.28",..: 1028 1008 995 934 969 972 977 19 112 140 ...
## $ V38: Factor w/ 329 levels "26.27","26.28",..: 329 159 150 146 143 140 140 138 139 137 ...
## $ V39: Factor w/ 835 levels "15.02","15.05",..: 835 181 171 168 159 149 160 160 162 178 ...
## $ V40: Factor w/ 670 levels "27.12","27.13",..: 670 523 498 467 443 423 403 388 369 350 ...
## $ V41: Factor w/ 671 levels "19.49","19.52",..: 671 440 435 429 425 419 411 407 401 399 ...
## $ V42: Factor w/ 684 levels "19.09","19.1",..: 684 430 425 422 416 410 404 399 391 387 ...
## $ V43: Factor w/ 696 levels "","17.41","17.43",..: 696 443 435 430 420 411 403 392 382 375 ...
## $ V44: Factor w/ 718 levels "","16.36","16.54",..: 718 361 359 336 338 350 356 348 338 320 ...
bigdataf<- (read.csv(file.choose(), header=T))
bigmyvars <- c("time", "torque_actual_value", "temp_shaft_bearing","hydraulic_pressure")
bignewdata1 <- bigdataf[bigmyvars]
bignewdata2 <- bignewdata1[4:1000,]
#Look at the first few columns
dataf1[1:10,]
## V1 V2 V3 V4 V5 V6
## 1 turbine_id outage_id turbine_type time classification kw
## 2 1 6 0 10 0 260.32
## 3 1 6 0 20 0 366.99
## 4 1 6 0 30 0 135.2
## 5 1 6 0 40 0 394.16
## 6 1 6 0 50 0 601.08
## 7 1 6 0 60 0 664.34
## 8 1 6 0 70 0 756.16
## 9 1 6 0 80 0 755.89
## 10 1 6 0 90 0 374.32
## V7 V8 V9 V10 V11
## 1 volt_phase_a volt_phase_b volt_phase_c current_phase_a current_phase_b
## 2 386.1 388.73 389.69 267.03 266.12
## 3 386.59 389.11 390.27 344.05 342.66
## 4 385.63 388.19 389.35 187.04 187.12
## 5 386.68 389.51 390.3 371.51 370.89
## 6 388.2 391.15 391.63 529.18 528.85
## 7 388.91 391.95 392.23 578.22 578.27
## 8 389.85 392.92 393.05 650.71 651.11
## 9 390.95 393.89 394.11 646 646.23
## 10 390.14 392.95 393.48 329.91 328.83
## V12 V13 V14 V15 V16
## 1 current_phase_c gen_rpm rotor_rpm actual_angle_blade_1 wind_speed
## 2 270.47 860.01 9.62 -0.12 4.63
## 3 346.78 875.05 9.78 -0.09 5.37
## 4 191.65 859.96 9.61 0.92 3.91
## 5 374.73 906.4 10.13 0.09 5.42
## 6 531.82 975.24 10.9 -0.14 6.43
## 7 580.89 1008.4 11.27 -0.14 6.63
## 8 653.36 1041 11.64 -0.19 6.85
## 9 648.43 1051.4 11.76 -0.27 6.15
## 10 332.5 871.69 9.75 -0.21 4.88
## V17 V18 V19
## 1 nacelle_position actual_angle_blade_2 actual_angle_blade_3
## 2 31.72 0 0.29
## 3 27.04 -0.09 0.32
## 4 27.84 1.01 1.44
## 5 25.54 0.07 0.77
## 6 23.68 -0.22 0.59
## 7 19.41 -0.22 0.59
## 8 19.7 -0.07 0.49
## 9 15.84 -0.06 0.38
## 10 6.36 -0.1 0.36
## V20 V21 V22 V23 V24
## 1 wind_deviation gen_1_temp gen_2_temp bearing_a_temp bearing_b_temp
## 2 -1.69 58.53 58.59 35.61 50.87
## 3 -2.15 59.03 59.04 34.65 49.78
## 4 0.93 58.15 58.01 34.1 48.82
## 5 -0.99 57.73 57.61 33.33 47.89
## 6 -1.45 59.61 59.49 32.82 47.09
## 7 -1.91 61.64 61.57 32.42 46.52
## 8 -1.78 62.81 62.56 32.18 46.05
## 9 -3.03 64.73 64.55 32.02 45.91
## 10 -4.24 63.75 63.41 31.97 45.75
## V25 V26 V27 V28
## 1 tran_temp ambient_temp tran_bearing_a_temp wind_deviation_one_sec
## 2 63.98 10.58 68.18 -1.63
## 3 63.78 10.46 68.36 -2.14
## 4 63.27 10.48 67.36 0.97
## 5 62.87 10.19 67.39 -1.02
## 6 62.85 9.84 68.16 -1.54
## 7 62.95 9.49 68.92 -1.91
## 8 63.12 9.24 69.06 -1.69
## 9 63.52 9.16 69.74 -2.94
## 10 63.13 9.08 68.26 -4.39
## V29 V30 V31 V32
## 1 reactive_power generator_speed_plc torque_actual_value torque_set_value
## 2 -169.41 860.01 27 25.53
## 3 -158.32 875.05 25 35.12
## 4 -162.83 859.97 21 13.25
## 5 -159.98 906.4 60 35.89
## 6 -149.53 975.25 66 51.59
## 7 -138.2 1008.4 56 55.05
## 8 -116.05 1041 61 59.52
## 9 -97.25 1051.4 57 60.12
## 10 -102.1 871.67 36 36.08
## V33 V34 V35
## 1 temp_nacelle temp_generator_cooling_air temp_shaft_bearing
## 2 24.58 50.6 31.22
## 3 24.03 50.46 31.18
## 4 23.53 49.88 31.02
## 5 23.19 49.19 30.92
## 6 23.21 49.97 30.88
## 7 23.27 51.26 30.79
## 8 22.98 51.96 30.64
## 9 22.86 53.33 30.46
## 10 22.44 53.17 30.38
## V36 V37 V38
## 1 high_speed_running_number tower_acceleration hydraulic_pressure
## 2 11.13 96.19 27.91
## 3 9.82 93.72 27.82
## 4 13.46 83.29 27.78
## 5 10.4 90.07 27.75
## 6 9.06 90.41 27.72
## 7 9.08 91.61 27.72
## 8 9.27 102.38 27.7
## 9 10.31 128.21 27.71
## 10 10.8 138.26 27.69
## V39 V40 V41
## 1 temp_main_box temp_top_box temp_battery_box_axis_1
## 2 20.47 38.1 28.4
## 3 20.27 37.69 28.31
## 4 20.21 37.21 28.19
## 5 19.92 36.79 28.08
## 6 19.7 36.44 27.92
## 7 19.95 36.11 27.76
## 8 19.95 35.79 27.68
## 9 20.02 35.49 27.57
## 10 20.41 35.19 27.54
## V42 V43 V44
## 1 temp_battery_box_axis_2 temp_battery_box_axis_3 temp_hub
## 2 27.7 26.43 25.8
## 3 27.62 26.31 25.7
## 4 27.58 26.24 25.34
## 5 27.47 26.09 25.36
## 6 27.37 25.92 25.56
## 7 27.24 25.77 25.64
## 8 27.17 25.64 25.53
## 9 27.07 25.54 25.36
## 10 27.01 25.43 24.99
# select variables v1, v2, v3
myvars <- c("V4", "V31", "V35","V38")
newdata1 <- dataf1[myvars]
newdata2 <- newdata1[4:1000,]
#Convert to array of 600 observations
newdata_arr<- array(newdata2,dim=c(600,4))
#Scale and remove mean
bignewdata_arr_sc<-scale(bignewdata2)
In this section we perform a visual analysis of 4 different attributes. These are time,torque_actual_value, temp_shaft_bearing and hydraulic_pressure.
pairs(~V4+V31+V35+V38,data=newdata2,
main="Simple Scatterplot Matrix")
sapply(newdata2, sd)
## V4 V31 V35 V38
## 288.87876 34.12354 164.73563 74.42047
hist(bignewdata2$torque_actual_value,breaks=20, xlab="Torque", main="Distribution of actual Torque")
hist(bignewdata2$temp_shaft_bearing,breaks=20, xlab="Temperature", main="Distribution Temperature of shaft bearing")
hist(bignewdata2$hydraulic_pressure,breaks=20, xlab="Pressure", main="Distribution of hydraulic pressure")
From the above matrix we can conclude that there is a definite pattern of temperature and pressure variation against time.This can be used for predictive analysis for turbine failure based on the unique combination of the status of these two attributes.
From the histograms we can infer that the torque value exhibits a uniform distribution. However, the temperature of shaft bearing shows kurtosis since we can see two peaks. Finally the hydraulic pressure shows somewhat a normal distribution but sloghtly skewed to the left.
cov(bignewdata2)
## time torque_actual_value temp_shaft_bearing
## time 8291716.6667 -22463.11245 -156.960341
## torque_actual_value -22463.1124 1356.05712 8.637680
## temp_shaft_bearing -156.9603 8.63768 7.032847
## hydraulic_pressure -659.1288 -10.64878 1.373987
## hydraulic_pressure
## time -659.128815
## torque_actual_value -10.648778
## temp_shaft_bearing 1.373987
## hydraulic_pressure 1.053777
##Covariance of the scaled data
cov(bignewdata_arr_sc)
## time torque_actual_value temp_shaft_bearing
## time 1.00000000 -0.21184036 -0.02055427
## torque_actual_value -0.21184036 1.00000000 0.08844892
## temp_shaft_bearing -0.02055427 0.08844892 1.00000000
## hydraulic_pressure -0.22298400 -0.28169972 0.50471073
## hydraulic_pressure
## time -0.2229840
## torque_actual_value -0.2816997
## temp_shaft_bearing 0.5047107
## hydraulic_pressure 1.0000000
cor((bignewdata_arr_sc), use="pairwise.complete.obs")
## time torque_actual_value temp_shaft_bearing
## time 1.00000000 -0.21184036 -0.02055427
## torque_actual_value -0.21184036 1.00000000 0.08844892
## temp_shaft_bearing -0.02055427 0.08844892 1.00000000
## hydraulic_pressure -0.22298400 -0.28169972 0.50471073
## hydraulic_pressure
## time -0.2229840
## torque_actual_value -0.2816997
## temp_shaft_bearing 0.5047107
## hydraulic_pressure 1.0000000
There is definitely a strong positive correlation between hydraulic pressure and temperature of the shaft bearing.
Given a data series, the fourier transform (FT) breaks it into a set of related cycles that describes it. Each cycle has a strength, a delay and a speed. These cycles are important since we can compare, modify, simplify, and if needed, reconstruct the original trajectory. For predictive analysis this can be extremely useful:
fft1<-fft(as.numeric(bignewdata2$torque_actual_value))
fft2<-Mod(fft1)
plot(fft2[1:200],type="b",xlab="frequency component",ylab="amplitude")