Abstract
“Recognizing human activity is a technological breakthrough of machine learning paving the way to understanding and making the quality of the lives of people better. This brief analysis is based on Sean Kross’s swirl course Exploratory Data Analysis.”
The main goal of this brief report is to highlight the fruitful directions of research towards answerable questions. An aspect of research which will be made clear is that the “real-world” research isn’t always neat and well-defined like textbook questions with theoretical clear cut questions. The question we will try to answer before training a machine learning model is “Is the correlation between the measurements and activities good enough to train a machine?” so that “Given a set of 561 measurements, would a trained machine be able to determine which of the 6 activities the person was doing?”
The data we will use is already in a tidy format and contains 30 volunteers who performed activities of daily living while carrying a waist-mounted smartphone with embedded inertial sensors. Each person performed six activities while wearing a Galaxy S II which were video recorded and so during the performance of each activity the manually written labels address the activity of the subject correctly.
ssd <- readRDS("samsung.rda")The data table has 7352 rows and 563 columns, meaning that we have 7352 observations and 561 features since the last two columns store the subject id and the activity of the corresponding subject.
dim(ssd)[1] 7352 563
names(ssd) [1] "tBodyAcc.mean...X"
[2] "tBodyAcc.mean...Y"
[3] "tBodyAcc.mean...Z"
[4] "tBodyAcc.std...X"
[5] "tBodyAcc.std...Y"
[6] "tBodyAcc.std...Z"
[7] "tBodyAcc.mad...X"
[8] "tBodyAcc.mad...Y"
[9] "tBodyAcc.mad...Z"
[10] "tBodyAcc.max...X"
[11] "tBodyAcc.max...Y"
[12] "tBodyAcc.max...Z"
[13] "tBodyAcc.min...X"
[14] "tBodyAcc.min...Y"
[15] "tBodyAcc.min...Z"
[16] "tBodyAcc.sma.."
[17] "tBodyAcc.energy...X"
[18] "tBodyAcc.energy...Y"
[19] "tBodyAcc.energy...Z"
[20] "tBodyAcc.iqr...X"
[21] "tBodyAcc.iqr...Y"
[22] "tBodyAcc.iqr...Z"
[23] "tBodyAcc.entropy...X"
[24] "tBodyAcc.entropy...Y"
[25] "tBodyAcc.entropy...Z"
[26] "tBodyAcc.arCoeff...X.1"
[27] "tBodyAcc.arCoeff...X.2"
[28] "tBodyAcc.arCoeff...X.3"
[29] "tBodyAcc.arCoeff...X.4"
[30] "tBodyAcc.arCoeff...Y.1"
[31] "tBodyAcc.arCoeff...Y.2"
[32] "tBodyAcc.arCoeff...Y.3"
[33] "tBodyAcc.arCoeff...Y.4"
[34] "tBodyAcc.arCoeff...Z.1"
[35] "tBodyAcc.arCoeff...Z.2"
[36] "tBodyAcc.arCoeff...Z.3"
[37] "tBodyAcc.arCoeff...Z.4"
[38] "tBodyAcc.correlation...X.Y"
[39] "tBodyAcc.correlation...X.Z"
[40] "tBodyAcc.correlation...Y.Z"
[41] "tGravityAcc.mean...X"
[42] "tGravityAcc.mean...Y"
[43] "tGravityAcc.mean...Z"
[44] "tGravityAcc.std...X"
[45] "tGravityAcc.std...Y"
[46] "tGravityAcc.std...Z"
[47] "tGravityAcc.mad...X"
[48] "tGravityAcc.mad...Y"
[49] "tGravityAcc.mad...Z"
[50] "tGravityAcc.max...X"
[51] "tGravityAcc.max...Y"
[52] "tGravityAcc.max...Z"
[53] "tGravityAcc.min...X"
[54] "tGravityAcc.min...Y"
[55] "tGravityAcc.min...Z"
[56] "tGravityAcc.sma.."
[57] "tGravityAcc.energy...X"
[58] "tGravityAcc.energy...Y"
[59] "tGravityAcc.energy...Z"
[60] "tGravityAcc.iqr...X"
[61] "tGravityAcc.iqr...Y"
[62] "tGravityAcc.iqr...Z"
[63] "tGravityAcc.entropy...X"
[64] "tGravityAcc.entropy...Y"
[65] "tGravityAcc.entropy...Z"
[66] "tGravityAcc.arCoeff...X.1"
[67] "tGravityAcc.arCoeff...X.2"
[68] "tGravityAcc.arCoeff...X.3"
[69] "tGravityAcc.arCoeff...X.4"
[70] "tGravityAcc.arCoeff...Y.1"
[71] "tGravityAcc.arCoeff...Y.2"
[72] "tGravityAcc.arCoeff...Y.3"
[73] "tGravityAcc.arCoeff...Y.4"
[74] "tGravityAcc.arCoeff...Z.1"
[75] "tGravityAcc.arCoeff...Z.2"
[76] "tGravityAcc.arCoeff...Z.3"
[77] "tGravityAcc.arCoeff...Z.4"
[78] "tGravityAcc.correlation...X.Y"
[79] "tGravityAcc.correlation...X.Z"
[80] "tGravityAcc.correlation...Y.Z"
[81] "tBodyAccJerk.mean...X"
[82] "tBodyAccJerk.mean...Y"
[83] "tBodyAccJerk.mean...Z"
[84] "tBodyAccJerk.std...X"
[85] "tBodyAccJerk.std...Y"
[86] "tBodyAccJerk.std...Z"
[87] "tBodyAccJerk.mad...X"
[88] "tBodyAccJerk.mad...Y"
[89] "tBodyAccJerk.mad...Z"
[90] "tBodyAccJerk.max...X"
[91] "tBodyAccJerk.max...Y"
[92] "tBodyAccJerk.max...Z"
[93] "tBodyAccJerk.min...X"
[94] "tBodyAccJerk.min...Y"
[95] "tBodyAccJerk.min...Z"
[96] "tBodyAccJerk.sma.."
[97] "tBodyAccJerk.energy...X"
[98] "tBodyAccJerk.energy...Y"
[99] "tBodyAccJerk.energy...Z"
[100] "tBodyAccJerk.iqr...X"
[101] "tBodyAccJerk.iqr...Y"
[102] "tBodyAccJerk.iqr...Z"
[103] "tBodyAccJerk.entropy...X"
[104] "tBodyAccJerk.entropy...Y"
[105] "tBodyAccJerk.entropy...Z"
[106] "tBodyAccJerk.arCoeff...X.1"
[107] "tBodyAccJerk.arCoeff...X.2"
[108] "tBodyAccJerk.arCoeff...X.3"
[109] "tBodyAccJerk.arCoeff...X.4"
[110] "tBodyAccJerk.arCoeff...Y.1"
[111] "tBodyAccJerk.arCoeff...Y.2"
[112] "tBodyAccJerk.arCoeff...Y.3"
[113] "tBodyAccJerk.arCoeff...Y.4"
[114] "tBodyAccJerk.arCoeff...Z.1"
[115] "tBodyAccJerk.arCoeff...Z.2"
[116] "tBodyAccJerk.arCoeff...Z.3"
[117] "tBodyAccJerk.arCoeff...Z.4"
[118] "tBodyAccJerk.correlation...X.Y"
[119] "tBodyAccJerk.correlation...X.Z"
[120] "tBodyAccJerk.correlation...Y.Z"
[121] "tBodyGyro.mean...X"
[122] "tBodyGyro.mean...Y"
[123] "tBodyGyro.mean...Z"
[124] "tBodyGyro.std...X"
[125] "tBodyGyro.std...Y"
[126] "tBodyGyro.std...Z"
[127] "tBodyGyro.mad...X"
[128] "tBodyGyro.mad...Y"
[129] "tBodyGyro.mad...Z"
[130] "tBodyGyro.max...X"
[131] "tBodyGyro.max...Y"
[132] "tBodyGyro.max...Z"
[133] "tBodyGyro.min...X"
[134] "tBodyGyro.min...Y"
[135] "tBodyGyro.min...Z"
[136] "tBodyGyro.sma.."
[137] "tBodyGyro.energy...X"
[138] "tBodyGyro.energy...Y"
[139] "tBodyGyro.energy...Z"
[140] "tBodyGyro.iqr...X"
[141] "tBodyGyro.iqr...Y"
[142] "tBodyGyro.iqr...Z"
[143] "tBodyGyro.entropy...X"
[144] "tBodyGyro.entropy...Y"
[145] "tBodyGyro.entropy...Z"
[146] "tBodyGyro.arCoeff...X.1"
[147] "tBodyGyro.arCoeff...X.2"
[148] "tBodyGyro.arCoeff...X.3"
[149] "tBodyGyro.arCoeff...X.4"
[150] "tBodyGyro.arCoeff...Y.1"
[151] "tBodyGyro.arCoeff...Y.2"
[152] "tBodyGyro.arCoeff...Y.3"
[153] "tBodyGyro.arCoeff...Y.4"
[154] "tBodyGyro.arCoeff...Z.1"
[155] "tBodyGyro.arCoeff...Z.2"
[156] "tBodyGyro.arCoeff...Z.3"
[157] "tBodyGyro.arCoeff...Z.4"
[158] "tBodyGyro.correlation...X.Y"
[159] "tBodyGyro.correlation...X.Z"
[160] "tBodyGyro.correlation...Y.Z"
[161] "tBodyGyroJerk.mean...X"
[162] "tBodyGyroJerk.mean...Y"
[163] "tBodyGyroJerk.mean...Z"
[164] "tBodyGyroJerk.std...X"
[165] "tBodyGyroJerk.std...Y"
[166] "tBodyGyroJerk.std...Z"
[167] "tBodyGyroJerk.mad...X"
[168] "tBodyGyroJerk.mad...Y"
[169] "tBodyGyroJerk.mad...Z"
[170] "tBodyGyroJerk.max...X"
[171] "tBodyGyroJerk.max...Y"
[172] "tBodyGyroJerk.max...Z"
[173] "tBodyGyroJerk.min...X"
[174] "tBodyGyroJerk.min...Y"
[175] "tBodyGyroJerk.min...Z"
[176] "tBodyGyroJerk.sma.."
[177] "tBodyGyroJerk.energy...X"
[178] "tBodyGyroJerk.energy...Y"
[179] "tBodyGyroJerk.energy...Z"
[180] "tBodyGyroJerk.iqr...X"
[181] "tBodyGyroJerk.iqr...Y"
[182] "tBodyGyroJerk.iqr...Z"
[183] "tBodyGyroJerk.entropy...X"
[184] "tBodyGyroJerk.entropy...Y"
[185] "tBodyGyroJerk.entropy...Z"
[186] "tBodyGyroJerk.arCoeff...X.1"
[187] "tBodyGyroJerk.arCoeff...X.2"
[188] "tBodyGyroJerk.arCoeff...X.3"
[189] "tBodyGyroJerk.arCoeff...X.4"
[190] "tBodyGyroJerk.arCoeff...Y.1"
[191] "tBodyGyroJerk.arCoeff...Y.2"
[192] "tBodyGyroJerk.arCoeff...Y.3"
[193] "tBodyGyroJerk.arCoeff...Y.4"
[194] "tBodyGyroJerk.arCoeff...Z.1"
[195] "tBodyGyroJerk.arCoeff...Z.2"
[196] "tBodyGyroJerk.arCoeff...Z.3"
[197] "tBodyGyroJerk.arCoeff...Z.4"
[198] "tBodyGyroJerk.correlation...X.Y"
[199] "tBodyGyroJerk.correlation...X.Z"
[200] "tBodyGyroJerk.correlation...Y.Z"
[201] "tBodyAccMag.mean.."
[202] "tBodyAccMag.std.."
[203] "tBodyAccMag.mad.."
[204] "tBodyAccMag.max.."
[205] "tBodyAccMag.min.."
[206] "tBodyAccMag.sma.."
[207] "tBodyAccMag.energy.."
[208] "tBodyAccMag.iqr.."
[209] "tBodyAccMag.entropy.."
[210] "tBodyAccMag.arCoeff..1"
[211] "tBodyAccMag.arCoeff..2"
[212] "tBodyAccMag.arCoeff..3"
[213] "tBodyAccMag.arCoeff..4"
[214] "tGravityAccMag.mean.."
[215] "tGravityAccMag.std.."
[216] "tGravityAccMag.mad.."
[217] "tGravityAccMag.max.."
[218] "tGravityAccMag.min.."
[219] "tGravityAccMag.sma.."
[220] "tGravityAccMag.energy.."
[221] "tGravityAccMag.iqr.."
[222] "tGravityAccMag.entropy.."
[223] "tGravityAccMag.arCoeff..1"
[224] "tGravityAccMag.arCoeff..2"
[225] "tGravityAccMag.arCoeff..3"
[226] "tGravityAccMag.arCoeff..4"
[227] "tBodyAccJerkMag.mean.."
[228] "tBodyAccJerkMag.std.."
[229] "tBodyAccJerkMag.mad.."
[230] "tBodyAccJerkMag.max.."
[231] "tBodyAccJerkMag.min.."
[232] "tBodyAccJerkMag.sma.."
[233] "tBodyAccJerkMag.energy.."
[234] "tBodyAccJerkMag.iqr.."
[235] "tBodyAccJerkMag.entropy.."
[236] "tBodyAccJerkMag.arCoeff..1"
[237] "tBodyAccJerkMag.arCoeff..2"
[238] "tBodyAccJerkMag.arCoeff..3"
[239] "tBodyAccJerkMag.arCoeff..4"
[240] "tBodyGyroMag.mean.."
[241] "tBodyGyroMag.std.."
[242] "tBodyGyroMag.mad.."
[243] "tBodyGyroMag.max.."
[244] "tBodyGyroMag.min.."
[245] "tBodyGyroMag.sma.."
[246] "tBodyGyroMag.energy.."
[247] "tBodyGyroMag.iqr.."
[248] "tBodyGyroMag.entropy.."
[249] "tBodyGyroMag.arCoeff..1"
[250] "tBodyGyroMag.arCoeff..2"
[251] "tBodyGyroMag.arCoeff..3"
[252] "tBodyGyroMag.arCoeff..4"
[253] "tBodyGyroJerkMag.mean.."
[254] "tBodyGyroJerkMag.std.."
[255] "tBodyGyroJerkMag.mad.."
[256] "tBodyGyroJerkMag.max.."
[257] "tBodyGyroJerkMag.min.."
[258] "tBodyGyroJerkMag.sma.."
[259] "tBodyGyroJerkMag.energy.."
[260] "tBodyGyroJerkMag.iqr.."
[261] "tBodyGyroJerkMag.entropy.."
[262] "tBodyGyroJerkMag.arCoeff..1"
[263] "tBodyGyroJerkMag.arCoeff..2"
[264] "tBodyGyroJerkMag.arCoeff..3"
[265] "tBodyGyroJerkMag.arCoeff..4"
[266] "fBodyAcc.mean...X"
[267] "fBodyAcc.mean...Y"
[268] "fBodyAcc.mean...Z"
[269] "fBodyAcc.std...X"
[270] "fBodyAcc.std...Y"
[271] "fBodyAcc.std...Z"
[272] "fBodyAcc.mad...X"
[273] "fBodyAcc.mad...Y"
[274] "fBodyAcc.mad...Z"
[275] "fBodyAcc.max...X"
[276] "fBodyAcc.max...Y"
[277] "fBodyAcc.max...Z"
[278] "fBodyAcc.min...X"
[279] "fBodyAcc.min...Y"
[280] "fBodyAcc.min...Z"
[281] "fBodyAcc.sma.."
[282] "fBodyAcc.energy...X"
[283] "fBodyAcc.energy...Y"
[284] "fBodyAcc.energy...Z"
[285] "fBodyAcc.iqr...X"
[286] "fBodyAcc.iqr...Y"
[287] "fBodyAcc.iqr...Z"
[288] "fBodyAcc.entropy...X"
[289] "fBodyAcc.entropy...Y"
[290] "fBodyAcc.entropy...Z"
[291] "fBodyAcc.maxInds.X"
[292] "fBodyAcc.maxInds.Y"
[293] "fBodyAcc.maxInds.Z"
[294] "fBodyAcc.meanFreq...X"
[295] "fBodyAcc.meanFreq...Y"
[296] "fBodyAcc.meanFreq...Z"
[297] "fBodyAcc.skewness...X"
[298] "fBodyAcc.kurtosis...X"
[299] "fBodyAcc.skewness...Y"
[300] "fBodyAcc.kurtosis...Y"
[301] "fBodyAcc.skewness...Z"
[302] "fBodyAcc.kurtosis...Z"
[303] "fBodyAcc.bandsEnergy...1.8"
[304] "fBodyAcc.bandsEnergy...9.16"
[305] "fBodyAcc.bandsEnergy...17.24"
[306] "fBodyAcc.bandsEnergy...25.32"
[307] "fBodyAcc.bandsEnergy...33.40"
[308] "fBodyAcc.bandsEnergy...41.48"
[309] "fBodyAcc.bandsEnergy...49.56"
[310] "fBodyAcc.bandsEnergy...57.64"
[311] "fBodyAcc.bandsEnergy...1.16"
[312] "fBodyAcc.bandsEnergy...17.32"
[313] "fBodyAcc.bandsEnergy...33.48"
[314] "fBodyAcc.bandsEnergy...49.64"
[315] "fBodyAcc.bandsEnergy...1.24"
[316] "fBodyAcc.bandsEnergy...25.48"
[317] "fBodyAcc.bandsEnergy...1.8.1"
[318] "fBodyAcc.bandsEnergy...9.16.1"
[319] "fBodyAcc.bandsEnergy...17.24.1"
[320] "fBodyAcc.bandsEnergy...25.32.1"
[321] "fBodyAcc.bandsEnergy...33.40.1"
[322] "fBodyAcc.bandsEnergy...41.48.1"
[323] "fBodyAcc.bandsEnergy...49.56.1"
[324] "fBodyAcc.bandsEnergy...57.64.1"
[325] "fBodyAcc.bandsEnergy...1.16.1"
[326] "fBodyAcc.bandsEnergy...17.32.1"
[327] "fBodyAcc.bandsEnergy...33.48.1"
[328] "fBodyAcc.bandsEnergy...49.64.1"
[329] "fBodyAcc.bandsEnergy...1.24.1"
[330] "fBodyAcc.bandsEnergy...25.48.1"
[331] "fBodyAcc.bandsEnergy...1.8.2"
[332] "fBodyAcc.bandsEnergy...9.16.2"
[333] "fBodyAcc.bandsEnergy...17.24.2"
[334] "fBodyAcc.bandsEnergy...25.32.2"
[335] "fBodyAcc.bandsEnergy...33.40.2"
[336] "fBodyAcc.bandsEnergy...41.48.2"
[337] "fBodyAcc.bandsEnergy...49.56.2"
[338] "fBodyAcc.bandsEnergy...57.64.2"
[339] "fBodyAcc.bandsEnergy...1.16.2"
[340] "fBodyAcc.bandsEnergy...17.32.2"
[341] "fBodyAcc.bandsEnergy...33.48.2"
[342] "fBodyAcc.bandsEnergy...49.64.2"
[343] "fBodyAcc.bandsEnergy...1.24.2"
[344] "fBodyAcc.bandsEnergy...25.48.2"
[345] "fBodyAccJerk.mean...X"
[346] "fBodyAccJerk.mean...Y"
[347] "fBodyAccJerk.mean...Z"
[348] "fBodyAccJerk.std...X"
[349] "fBodyAccJerk.std...Y"
[350] "fBodyAccJerk.std...Z"
[351] "fBodyAccJerk.mad...X"
[352] "fBodyAccJerk.mad...Y"
[353] "fBodyAccJerk.mad...Z"
[354] "fBodyAccJerk.max...X"
[355] "fBodyAccJerk.max...Y"
[356] "fBodyAccJerk.max...Z"
[357] "fBodyAccJerk.min...X"
[358] "fBodyAccJerk.min...Y"
[359] "fBodyAccJerk.min...Z"
[360] "fBodyAccJerk.sma.."
[361] "fBodyAccJerk.energy...X"
[362] "fBodyAccJerk.energy...Y"
[363] "fBodyAccJerk.energy...Z"
[364] "fBodyAccJerk.iqr...X"
[365] "fBodyAccJerk.iqr...Y"
[366] "fBodyAccJerk.iqr...Z"
[367] "fBodyAccJerk.entropy...X"
[368] "fBodyAccJerk.entropy...Y"
[369] "fBodyAccJerk.entropy...Z"
[370] "fBodyAccJerk.maxInds.X"
[371] "fBodyAccJerk.maxInds.Y"
[372] "fBodyAccJerk.maxInds.Z"
[373] "fBodyAccJerk.meanFreq...X"
[374] "fBodyAccJerk.meanFreq...Y"
[375] "fBodyAccJerk.meanFreq...Z"
[376] "fBodyAccJerk.skewness...X"
[377] "fBodyAccJerk.kurtosis...X"
[378] "fBodyAccJerk.skewness...Y"
[379] "fBodyAccJerk.kurtosis...Y"
[380] "fBodyAccJerk.skewness...Z"
[381] "fBodyAccJerk.kurtosis...Z"
[382] "fBodyAccJerk.bandsEnergy...1.8"
[383] "fBodyAccJerk.bandsEnergy...9.16"
[384] "fBodyAccJerk.bandsEnergy...17.24"
[385] "fBodyAccJerk.bandsEnergy...25.32"
[386] "fBodyAccJerk.bandsEnergy...33.40"
[387] "fBodyAccJerk.bandsEnergy...41.48"
[388] "fBodyAccJerk.bandsEnergy...49.56"
[389] "fBodyAccJerk.bandsEnergy...57.64"
[390] "fBodyAccJerk.bandsEnergy...1.16"
[391] "fBodyAccJerk.bandsEnergy...17.32"
[392] "fBodyAccJerk.bandsEnergy...33.48"
[393] "fBodyAccJerk.bandsEnergy...49.64"
[394] "fBodyAccJerk.bandsEnergy...1.24"
[395] "fBodyAccJerk.bandsEnergy...25.48"
[396] "fBodyAccJerk.bandsEnergy...1.8.1"
[397] "fBodyAccJerk.bandsEnergy...9.16.1"
[398] "fBodyAccJerk.bandsEnergy...17.24.1"
[399] "fBodyAccJerk.bandsEnergy...25.32.1"
[400] "fBodyAccJerk.bandsEnergy...33.40.1"
[401] "fBodyAccJerk.bandsEnergy...41.48.1"
[402] "fBodyAccJerk.bandsEnergy...49.56.1"
[403] "fBodyAccJerk.bandsEnergy...57.64.1"
[404] "fBodyAccJerk.bandsEnergy...1.16.1"
[405] "fBodyAccJerk.bandsEnergy...17.32.1"
[406] "fBodyAccJerk.bandsEnergy...33.48.1"
[407] "fBodyAccJerk.bandsEnergy...49.64.1"
[408] "fBodyAccJerk.bandsEnergy...1.24.1"
[409] "fBodyAccJerk.bandsEnergy...25.48.1"
[410] "fBodyAccJerk.bandsEnergy...1.8.2"
[411] "fBodyAccJerk.bandsEnergy...9.16.2"
[412] "fBodyAccJerk.bandsEnergy...17.24.2"
[413] "fBodyAccJerk.bandsEnergy...25.32.2"
[414] "fBodyAccJerk.bandsEnergy...33.40.2"
[415] "fBodyAccJerk.bandsEnergy...41.48.2"
[416] "fBodyAccJerk.bandsEnergy...49.56.2"
[417] "fBodyAccJerk.bandsEnergy...57.64.2"
[418] "fBodyAccJerk.bandsEnergy...1.16.2"
[419] "fBodyAccJerk.bandsEnergy...17.32.2"
[420] "fBodyAccJerk.bandsEnergy...33.48.2"
[421] "fBodyAccJerk.bandsEnergy...49.64.2"
[422] "fBodyAccJerk.bandsEnergy...1.24.2"
[423] "fBodyAccJerk.bandsEnergy...25.48.2"
[424] "fBodyGyro.mean...X"
[425] "fBodyGyro.mean...Y"
[426] "fBodyGyro.mean...Z"
[427] "fBodyGyro.std...X"
[428] "fBodyGyro.std...Y"
[429] "fBodyGyro.std...Z"
[430] "fBodyGyro.mad...X"
[431] "fBodyGyro.mad...Y"
[432] "fBodyGyro.mad...Z"
[433] "fBodyGyro.max...X"
[434] "fBodyGyro.max...Y"
[435] "fBodyGyro.max...Z"
[436] "fBodyGyro.min...X"
[437] "fBodyGyro.min...Y"
[438] "fBodyGyro.min...Z"
[439] "fBodyGyro.sma.."
[440] "fBodyGyro.energy...X"
[441] "fBodyGyro.energy...Y"
[442] "fBodyGyro.energy...Z"
[443] "fBodyGyro.iqr...X"
[444] "fBodyGyro.iqr...Y"
[445] "fBodyGyro.iqr...Z"
[446] "fBodyGyro.entropy...X"
[447] "fBodyGyro.entropy...Y"
[448] "fBodyGyro.entropy...Z"
[449] "fBodyGyro.maxInds.X"
[450] "fBodyGyro.maxInds.Y"
[451] "fBodyGyro.maxInds.Z"
[452] "fBodyGyro.meanFreq...X"
[453] "fBodyGyro.meanFreq...Y"
[454] "fBodyGyro.meanFreq...Z"
[455] "fBodyGyro.skewness...X"
[456] "fBodyGyro.kurtosis...X"
[457] "fBodyGyro.skewness...Y"
[458] "fBodyGyro.kurtosis...Y"
[459] "fBodyGyro.skewness...Z"
[460] "fBodyGyro.kurtosis...Z"
[461] "fBodyGyro.bandsEnergy...1.8"
[462] "fBodyGyro.bandsEnergy...9.16"
[463] "fBodyGyro.bandsEnergy...17.24"
[464] "fBodyGyro.bandsEnergy...25.32"
[465] "fBodyGyro.bandsEnergy...33.40"
[466] "fBodyGyro.bandsEnergy...41.48"
[467] "fBodyGyro.bandsEnergy...49.56"
[468] "fBodyGyro.bandsEnergy...57.64"
[469] "fBodyGyro.bandsEnergy...1.16"
[470] "fBodyGyro.bandsEnergy...17.32"
[471] "fBodyGyro.bandsEnergy...33.48"
[472] "fBodyGyro.bandsEnergy...49.64"
[473] "fBodyGyro.bandsEnergy...1.24"
[474] "fBodyGyro.bandsEnergy...25.48"
[475] "fBodyGyro.bandsEnergy...1.8.1"
[476] "fBodyGyro.bandsEnergy...9.16.1"
[477] "fBodyGyro.bandsEnergy...17.24.1"
[478] "fBodyGyro.bandsEnergy...25.32.1"
[479] "fBodyGyro.bandsEnergy...33.40.1"
[480] "fBodyGyro.bandsEnergy...41.48.1"
[481] "fBodyGyro.bandsEnergy...49.56.1"
[482] "fBodyGyro.bandsEnergy...57.64.1"
[483] "fBodyGyro.bandsEnergy...1.16.1"
[484] "fBodyGyro.bandsEnergy...17.32.1"
[485] "fBodyGyro.bandsEnergy...33.48.1"
[486] "fBodyGyro.bandsEnergy...49.64.1"
[487] "fBodyGyro.bandsEnergy...1.24.1"
[488] "fBodyGyro.bandsEnergy...25.48.1"
[489] "fBodyGyro.bandsEnergy...1.8.2"
[490] "fBodyGyro.bandsEnergy...9.16.2"
[491] "fBodyGyro.bandsEnergy...17.24.2"
[492] "fBodyGyro.bandsEnergy...25.32.2"
[493] "fBodyGyro.bandsEnergy...33.40.2"
[494] "fBodyGyro.bandsEnergy...41.48.2"
[495] "fBodyGyro.bandsEnergy...49.56.2"
[496] "fBodyGyro.bandsEnergy...57.64.2"
[497] "fBodyGyro.bandsEnergy...1.16.2"
[498] "fBodyGyro.bandsEnergy...17.32.2"
[499] "fBodyGyro.bandsEnergy...33.48.2"
[500] "fBodyGyro.bandsEnergy...49.64.2"
[501] "fBodyGyro.bandsEnergy...1.24.2"
[502] "fBodyGyro.bandsEnergy...25.48.2"
[503] "fBodyAccMag.mean.."
[504] "fBodyAccMag.std.."
[505] "fBodyAccMag.mad.."
[506] "fBodyAccMag.max.."
[507] "fBodyAccMag.min.."
[508] "fBodyAccMag.sma.."
[509] "fBodyAccMag.energy.."
[510] "fBodyAccMag.iqr.."
[511] "fBodyAccMag.entropy.."
[512] "fBodyAccMag.maxInds"
[513] "fBodyAccMag.meanFreq.."
[514] "fBodyAccMag.skewness.."
[515] "fBodyAccMag.kurtosis.."
[516] "fBodyBodyAccJerkMag.mean.."
[517] "fBodyBodyAccJerkMag.std.."
[518] "fBodyBodyAccJerkMag.mad.."
[519] "fBodyBodyAccJerkMag.max.."
[520] "fBodyBodyAccJerkMag.min.."
[521] "fBodyBodyAccJerkMag.sma.."
[522] "fBodyBodyAccJerkMag.energy.."
[523] "fBodyBodyAccJerkMag.iqr.."
[524] "fBodyBodyAccJerkMag.entropy.."
[525] "fBodyBodyAccJerkMag.maxInds"
[526] "fBodyBodyAccJerkMag.meanFreq.."
[527] "fBodyBodyAccJerkMag.skewness.."
[528] "fBodyBodyAccJerkMag.kurtosis.."
[529] "fBodyBodyGyroMag.mean.."
[530] "fBodyBodyGyroMag.std.."
[531] "fBodyBodyGyroMag.mad.."
[532] "fBodyBodyGyroMag.max.."
[533] "fBodyBodyGyroMag.min.."
[534] "fBodyBodyGyroMag.sma.."
[535] "fBodyBodyGyroMag.energy.."
[536] "fBodyBodyGyroMag.iqr.."
[537] "fBodyBodyGyroMag.entropy.."
[538] "fBodyBodyGyroMag.maxInds"
[539] "fBodyBodyGyroMag.meanFreq.."
[540] "fBodyBodyGyroMag.skewness.."
[541] "fBodyBodyGyroMag.kurtosis.."
[542] "fBodyBodyGyroJerkMag.mean.."
[543] "fBodyBodyGyroJerkMag.std.."
[544] "fBodyBodyGyroJerkMag.mad.."
[545] "fBodyBodyGyroJerkMag.max.."
[546] "fBodyBodyGyroJerkMag.min.."
[547] "fBodyBodyGyroJerkMag.sma.."
[548] "fBodyBodyGyroJerkMag.energy.."
[549] "fBodyBodyGyroJerkMag.iqr.."
[550] "fBodyBodyGyroJerkMag.entropy.."
[551] "fBodyBodyGyroJerkMag.maxInds"
[552] "fBodyBodyGyroJerkMag.meanFreq.."
[553] "fBodyBodyGyroJerkMag.skewness.."
[554] "fBodyBodyGyroJerkMag.kurtosis.."
[555] "angle.tBodyAccMean.gravity."
[556] "angle.tBodyAccJerkMean..gravityMean."
[557] "angle.tBodyGyroMean.gravityMean."
[558] "angle.tBodyGyroJerkMean.gravityMean."
[559] "angle.X.gravityMean."
[560] "angle.Y.gravityMean."
[561] "angle.Z.gravityMean."
[562] "subject"
[563] "activity"
table(ssd$subject)
1 3 5 6 7 8 11 14 15 16 17 19 21 22 23 25 26 27 28 29
347 341 302 325 308 281 316 323 328 366 368 360 408 321 372 409 392 376 382 344
30
383
table(ssd$activity)
laying sitting standing walk walkdown walkup
1407 1286 1374 1226 986 1073
Sum the tables of observations per subject and activity to verify that they sum up to the number of rows.
sum(table(ssd$subject))[1] 7352
sum(table(ssd$activity))[1] 7352
For the purpose of this section we will subset the train dataset to limit it to only subject 1,
sub1 <- subset(ssd,subject == 1)
dim(sub1)[1] 347 563
we observe that the dataset with which we will work our analysis has only 347 observations. Let’s review the names of the first 12 features again
names(sub1[1:12]) [1] "tBodyAcc.mean...X" "tBodyAcc.mean...Y" "tBodyAcc.mean...Z"
[4] "tBodyAcc.std...X" "tBodyAcc.std...Y" "tBodyAcc.std...Z"
[7] "tBodyAcc.mad...X" "tBodyAcc.mad...Y" "tBodyAcc.mad...Z"
[10] "tBodyAcc.max...X" "tBodyAcc.max...Y" "tBodyAcc.max...Z"
and focus on the first three which measure the mean of the acceleration in the three dimensions.
par(mfrow=c(2,3), mar = c(5, 4, 1, 1),oma = c(1,1,3,1))
plot(sub1[, 1], col = sub1$activity, ylab = names(sub1)[1])
plot(sub1[, 2], col = sub1$activity, ylab = names(sub1)[2])
plot(sub1[, 3], col = sub1$activity, ylab = names(sub1)[3])
plot(sub1[,1],sub1[,2],col = sub1$activity,xlab = names(sub1)[1],ylab = names(sub1)[2])
plot(sub1[,1],sub1[,3],col = sub1$activity,xlab = names(sub1)[1],ylab = names(sub1)[3])
plot(sub1[,2],sub1[,3],col = sub1$activity,xlab = names(sub1)[2],ylab = names(sub1)[3])
legend("bottomleft",legend=unique(sub1$activity),col=unique(sub1$activity), pch = 19)
mtext("Mean Acceleration Plots By Index And Against Each Other",outer = TRUE)par(mfrow=c(1,1))We observe that the active activities show more variability than the passive ones, especially in the X dimension.
We still focus on the three dimensions of the mean acceleration features.
mdist <- dist(sub1[,1:3])
hclustering <- hclust(mdist)We will define a function for a more beautiful hierarchical clustering plot as a dendrogram written by Eva KF Chan.
myplclust <- function(hclust,lab = hclust$labels,lab.col = rep(1,length(hclust$labels)),
hang = 0.1,...){
## modification of plclust for plotting hclust objects *in colour*!
## Copyright Eva KF Chan 2009
## Arguments:
## hclust: hclust object
## lab: a character vector of labels of the leaves of the tree
## lab.col: color for the labels; NA=default device foreground color
## hang: as in hclust & plclust
## Side effect:
## A display of hierarchical cluster with colored leaf labels.
y <- rep(hclust$height,2)
x <- as.numeric(hclust$merge)
y <- y[which(x < 0)]
x <- x[which(x < 0)]
x <- abs(x)
y <- y[order(x)]
x <- x[order(x)]
plot(hclust,labels = FALSE,hang = hang,...)
text(x = x,y = y[hclust$order] - (max(hclust$height) * hang),labels = lab[hclust$order],
col = lab.col[hclust$order],srt = 90,adj = c(1,0.5),xpd = NA,...)}We will plot the hierarchical clustering as a dendrogram, based on the distance matrix we computed,
myplclust(hclustering,lab.col = unclass(sub1$activity),xlab = "activity")
legend("topright",legend=unique(sub1$activity),col=unique(sub1$activity),pch = 19)we observe that we cannot understand much from the average acceleration since all colors seem jumbled together and actually there is no sign of clustering at all, so we need to explore another triplet of features which might be more useful.
Since we didn’t figure out much from the first three features, let’s pick the columns 10 through 12 which measure the maximum acceleration of a subject in three dimensions. As in part I let’s plot the maximum acceleration as stored in 10th, 11th and 12th column.
par(mfrow=c(2, 3), mar = c(5, 4, 1, 1),oma = c(1,1,3,1))
plot(sub1[, 10], col = sub1$activity, ylab = names(sub1)[10])
plot(sub1[, 11], col = sub1$activity, ylab = names(sub1)[11])
plot(sub1[, 12], col = sub1$activity, ylab = names(sub1)[12])
plot(sub1[,10],sub1[,11],col = sub1$activity,xlab = names(sub1)[10],ylab = names(sub1)[11])
plot(sub1[,10],sub1[,12],col = sub1$activity,xlab = names(sub1)[10],ylab = names(sub1)[12])
plot(sub1[,11],sub1[,12],col = sub1$activity,xlab = names(sub1)[11],ylab = names(sub1)[12])
legend("topleft",legend=unique(sub1$activity),col=unique(sub1$activity),pch = 19)
mtext("Max Acceleration Plots By Index And Against Each Other",outer = TRUE)par(mfrow=c(1,1))We’re seeing something vaguely interesting, passive activities mostly fall below the active ones!
As in part I we will create a distance matrix
mdist <- dist(sub1[,10:12])
hclustering <- hclust(mdist)and create the hierarchical clustering plot
myplclust(hclustering,lab.col = unclass(sub1$activity),xlab = "activity")
legend("topright",legend=unique(sub1$activity),col=unique(sub1$activity),pch = 19)we see clearly that the data splits into 2 clusters, active and passive activities. Moreover, the light blue (walking down) is clearly distinct from the other walking activities. The dark blue (walking level) also seems to be somewhat clustered. The passive activities, however, seem all jumbled together with no clear pattern visible.
We will try SVD on the dataset sub1, the argument we will pass will be a scaled data frame with the last two columns removed (subject id and activity).
svd1 <- svd(scale(sub1[,-c(562,563)]))The first two principal components explain 64% of the data.
barplot(svd1$d[1:10]^2/sum(svd1$d^2),names = mapply("paste0","PC",1:10),las = 3,col = 'red')We will now plot the first two left singular vectors of svd1 which are the first two columns of the U matrix, remember that SVD transformed matrix sub1 into \(UDV^{T}\) and that each row in U corresponds to a row in the matrix sub1.
par(mfrow = c(1,2),mar = c(1,1,1,6))
plot(svd1$u[,1],pch = 19,col = unclass(sub1$activity))
legend("right",inset = c(-.45,0),legend=unique(sub1$activity),
col=unique(sub1$activity),pch = 19,xpd = TRUE)
plot(svd1$u[,2],pch = 19,col = unclass(sub1$activity))Here we’re looking at the 2 left singular vectors of svd1 (the first 2 columns of \(U\)). Each entry of the columns belongs to a particular row with one of the 6 activities assigned to it. We see the activities distinguished by color. Moving from left to right, the first section of rows are green (standing), the second red (sitting), the third black (laying), etc. The first column of U shows separation of the non moving (black, red, and green) from the walking activities. The second column is harder to interpret. However, the magenta cluster, which represents walking up, seems separate from the others.
We’ll try to figure out why that is. To do that we’ll have to find which of the 500+ measurements (represented by the columns of sub1) contributes to the variation of that component. Since we’re interested in sub1 columns, we’ll look at the RIGHT singular vectors (the columns of V), and in particular, the second one since the separation of the magenta cluster stood out in the second column of \(U\). We will plot the second principal component of V,
plot(svd1$v[,2],col = rgb(0,0,.5,.4),pch = 19)from the plot we don’t see any pattern or anything else useful. In order to investigate the magenta colored cluster we will find the feature which contributes most to this separation by finding the maximum index of the second column of V.
maxCon <- which.max(svd1$v[,2])We will now create a distance matrix of the sub1 matrix which contains the 10th to 12th column and maxCon.
mdist <- dist(sub1[,c(10:12,maxCon)])Next we will plot the hierarchical clustering dendrogram,
hclustering <- hclust(mdist)
myplclust(hclustering,lab.col = unclass(sub1$activity))
legend("topright",legend=unique(sub1$activity),col=unique(sub1$activity),pch = 19)and we also need to find name of this magenta clustering contributor.
names(sub1[maxCon])[1] "fBodyAcc.meanFreq...Z"
So the mean body acceleration in the frequency domain in the Z direction is the main contributor to this clustering phenomenon we’re seeing. Let’s move on to k-means clustering to see if this technique can distinguish between the activities.
Since the labeled activities are six we will use the kmeans function with input the sub1 matrix having removed the last two columns.
kClust <- kmeans(sub1[,-c(562,563)],centers = 6,nstart = 100)Let’s take a look at a how the activities are clustered around each center,
table(kClust$cluster,sub1$activity)
laying sitting standing walk walkdown walkup
1 0 0 0 0 49 0
2 29 0 0 0 0 0
3 18 10 2 0 0 0
4 0 0 0 95 0 0
5 3 0 0 0 0 53
6 0 37 51 0 0 0
We see that with 100 random starts, the passive activities tend to cluster together. One of the clusters contains only laying, but in another cluster, standing and sitting group together. We will also check the dimensions of the centers,
dim(kClust$centers)[1] 6 561
we see that the centers are a 6 by 561 array. Sometimes it is a good idea to look at the features of these centers to see if any dominate. We will do this for the laying activity which as seen in the above has a size of 29.
laying <- which(kClust$size == 29)We will plot the first 12 features of the laying center,
plot(kClust$centers[laying,1:12],pch = 19,ylab = "Laying Center",col = "orange")from the plot we see the first three columns dominate the laying cluster and we also need to find the names of these features
names(sub1[1:3])[1] "tBodyAcc.mean...X" "tBodyAcc.mean...Y" "tBodyAcc.mean...Z"
so the three direction of mean body acceleration seem to have the biggest effect on lying. We will do the same for the walkdown activity,
walkdown <- which(kClust$size == 49)we will now plot the first twelve columns of that center
plot(kClust$centers[walkdown,1:12],ylab = "Walkdown Center",pch = 19,col = "orange")we see an interesting pattern, from left to right we are looking at the twelve accelaration measurements in groups of three in which the points decrease in value. The X direction dominates, then the Y and finally Z in this pattern on and on.