We will use LCFS, EHS, and Ofgem data to derive domestic archetypes for 2019/20.
# clear environment
rm(list = ls())
#set seed for reproducibility
set.seed(123)
# load data processing packages
library(foreign)
library(Hmisc)
library(dplyr)
library(openxlsx)
library(cluster)
library(tidyr)
library(ggplot2)
library(RColorBrewer)
library(hutils)
library(dineq)
library(factoextra)
library(vegan)
library(tidyverse)
library(Hmisc)
library(vtable)
library(randomForest)
library(pROC)
library(caret)
library(gbm)
library(packrat)
library(rsconnect)
The LCFS is an annual survey which collects a variety of information including:
rawhh: household-level data which includes data such as: i) employment information ii) ownership of household durables iii) most regular household payments
rawper: income person-level data such as: i) income from employment, benefits, and assets. These are also combined to household-level
diary person-level data such as various expenditure data
There are also derived databases which we use:
dvhh: household characteristics data
dvper: household member data
In this code we try to:
First, we need to import the data on R
dvhh <- read.spss("data/lcf/lcfs_2019_dvhh_urbanrural_ukanon.sav",
use.value.labels = TRUE,
to.data.frame = TRUE,
reencode = "utf-8")
## re-encoding from utf-8
## Warning in read.spss("data/lcf/lcfs_2019_dvhh_urbanrural_ukanon.sav",
## use.value.labels = TRUE, : Undeclared level(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
## 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
## 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
## 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
## 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
## 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
## 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
## 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
## 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
## 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187,
## 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
## 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219,
## 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,
## 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251,
## 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267,
## 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283,
## 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
## 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315,
## 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331,
## 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347,
## 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,
## 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379,
## 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395,
## 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411,
## 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427,
## 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443,
## 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459,
## 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475,
## 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491,
## 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507,
## 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523,
## 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539,
## 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555,
## 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571,
## 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587,
## 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603,
## 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619,
## 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635,
## 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651,
## 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667,
## 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683,
## 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699,
## 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715,
## 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731,
## 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747,
## 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763,
## 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779,
## 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795,
## 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811,
## 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827,
## 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843,
## 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859,
## 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875,
## 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891,
## 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907,
## 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923,
## 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939,
## 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955,
## 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971,
## 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987,
## 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002,
## 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015,
## 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028,
## 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041,
## 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054,
## 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067,
## 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080,
## 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093,
## 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106,
## 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119,
## 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132,
## 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145,
## 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158,
## 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171,
## 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184,
## 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197,
## 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210,
## 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223,
## 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236,
## 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249,
## 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262,
## 1263, 1264, 1265, 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275,
## 1276, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288,
## 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301,
## 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314,
## 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327,
## 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340,
## 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353,
## 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366,
## 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379,
## 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392,
## 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405,
## 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418,
## 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431,
## 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444,
## 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457,
## 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470,
## 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483,
## 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496,
## 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509,
## 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522,
## 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535,
## 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1
## Warning in read.spss("data/lcf/lcfs_2019_dvhh_urbanrural_ukanon.sav",
## use.value.labels = TRUE, : Duplicated levels in factor A091: Lower professional
## & higher technical (t, Lower professional & higher technical (n
Lets turn all factors into characters to aid later manipulations
dvhh <- dvhh %>%
mutate_if(is.factor,as.character)
Create a single variable for connecting to the internet (A172), by using: - A172aP:Household has direct internet access (proxy calculation) - A172bP:Household has internet access via smart phone (proxy calculation) - A172cP:Household has internet access through smart phone only(proxy calculation) - A172P:Household has internet access(proxy calculation)
dvhh <- dvhh %>%
mutate(A172 = case_when(A172aP == "Internet connection in household" ~ 1,
A172bP == "Internet connection in household" ~ 1,
A172P == "Internet connection in household" ~ 1,
A172cP == "Internet connection in household" ~ 1,
TRUE ~ 2))
#and change formatting a bit.
dvhh <- dvhh %>%
mutate(A172 = as.character(A172))
Create a single variable for having access to a telephone (A101), by using: - A101aP: Household has landline telephone (proxy calculation) - A101bP: Household has mobile telephone (proxy calculation)
dvhh <- dvhh %>%
mutate(A101 = case_when(A101aP == 1 ~ 1,
A101bP == 1 ~ 1,
TRUE ~ 2))
Create a new variable for having a washing machine (A108) by using: - A167: Tumble dryer in Household
dvhh <- dvhh %>%
mutate(A108 = A167)
Essentially rename some variables, using: - B166b: Telephone account – single bill - B1661b: Mobile account – single bill
dvhh <- dvhh %>%
mutate(B166 = B166b,
B1661 = B1661b)
Lets make a small formatting change to aid later work.
dvhh <- dvhh %>%
mutate(case = as.numeric(case))
Now, we make notes for survey-year and a nuw unique id. This was done in original code, but could perhaps not be necessary.
For now, we will keep it.
dvhh <- dvhh %>%
mutate(survey.year = "201920") %>%
mutate(newuniqueid = paste(as.character(survey.year), as.character(case), sep = "-"))
Now there is some reweighting of the annual weight (weighta).
Again, this doesn’t look like it is necessary, but for now, we will keep it.
#First, we will add together all the annual weights for the 2019/20 survey.
sum.of.201920.weights <-
dvhh %>%
filter(survey.year == 201920) %>%
summarise(sum(weighta)) %>%
as.numeric()
#and all the weights from across surveys (whixh since only 2019/20 data, identical weight sums)
sum.of.all.weights <-
dvhh %>%
summarise(sum(weighta)) %>%
as.numeric()
#so the ratio equals 1
ratio <- sum.of.201920.weights/ sum.of.all.weights
#So now, we create this new set of weights (new_weight), which perfectly overlaps the original ones.
dvhh <- dvhh %>%
mutate(new_weight = weighta * ratio)
Now, it’s the time to extract all the variables of interest:
dvhh <- dvhh %>%
select(case,
newuniqueid,
new_weight,
weighta,
case,
OECDSc,
A040,
A041,
A049,
A062,
A071,
A101,
A103,
A108,
A116,
A121,
A122,
A124,
A128,
A129,
A130,
A138,
A149,
A150,
A151,
A152,
A153,
A154,
A155,
A156,
A160,
A1661,
A172,
B010,
B020,
B026p,
B028p,
B029p,
B030,
B038p,
B039p,
B041,
B042,
B040,
B050,
B017,
B018,
B130,
B134,
B150,
B166,
B1661,
B168,
B170,
B1701,
B174,
B1751,
B226,
B227,
B231,
B233,
B234,
B235,
B2331,
B2332,
B254,
B255,
B489,
B490,
B173,
B175,
B178,
B181,
B200,
B2011,
B203,
B204,
G018,
G019,
P204,
P205,
P206p,
P220p,
P220urnt,
P257,
P281p,
P537t,
C45112t,
C45114t,
C45212t,
C45214t,
C45222t,
C45312t,
C45411t,
C45412t,
C45511t,
C53211t,
C53311t,
C53312t,
C53313t,
C61111t,
C61112t,
C61211t,
C61311t,
C61312t,
C61313t,
C62111t,
C62112t,
C62113t,
C62114t,
C62211t,
C62212t,
C62311t,
C62321t,
C62322t,
C62331t,
C63111t,
CK1313t,
CK1314t,
CK1315t,
CK1316t,
CK1411t,
CK4111t,
A054,
A056,
A057,
A058,
A060,
A091,
A093,
A094,
P304,
P332,
P336,
P348,
P425,
SexHRP,
P116t,
P117t,
A131,
A1311,
A190,
A191,
A192,
A193,
A194,
Gorx,
P364p,
P365p,
Year,
A053,
P600t,
P601t,
P602t,
P603t,
P604t,
P605t,
P606t,
P607t,
P608t,
P609t,
P610t,
P611t,
P612t,
P620tp,
P630tp,
C55214w,
C73311w,
C73312w,
C91311w,
C96111w,
C96112w,
p344p,
incanon,
pc96anon,
p493p,
p492p,
a069p,
OAC3D,
OAC2D,
OAC1D,
P389p,
P352p,
EqIncDMp,
EqIncDOp,
P431p,
a117p,
a118p,
a143p,
a162p,
a065p,
a070p,
p396p,
P535CP,
P535TP,
P535p,
P536CP,
P536p,
P536tp,
URGridEWp,
URGridSCp
)
First, we need to import the data on R
rawhh <- read.spss("data/lcf/lcfs_2019_rawhh_ukanon.sav",
use.value.labels = TRUE,
to.data.frame = TRUE,
reencode = "utf-8")
## re-encoding from utf-8
## Warning in read.spss("data/lcf/lcfs_2019_rawhh_ukanon.sav", use.value.labels =
## TRUE, : Undeclared level(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
## 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
## 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
## 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
## 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
## 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
## 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,
## 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,
## 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,
## 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,
## 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,
## 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207,
## 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
## 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239,
## 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255,
## 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271,
## 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287,
## 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303,
## 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319,
## 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335,
## 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351,
## 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367,
## 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383,
## 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399,
## 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415,
## 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431,
## 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447,
## 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463,
## 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479,
## 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495,
## 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511,
## 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527,
## 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543,
## 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559,
## 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575,
## 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591,
## 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607,
## 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623,
## 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639,
## 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655,
## 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671,
## 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687,
## 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703,
## 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719,
## 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735,
## 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751,
## 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767,
## 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783,
## 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799,
## 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815,
## 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831,
## 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847,
## 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863,
## 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879,
## 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895,
## 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911,
## 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927,
## 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943,
## 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959,
## 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975,
## 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991,
## 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005,
## 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018,
## 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031,
## 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044,
## 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057,
## 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070,
## 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083,
## 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096,
## 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109,
## 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122,
## 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135,
## 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145, 1146, 1147, 1148,
## 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158, 1159, 1160, 1161,
## 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171, 1172, 1173, 1174,
## 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1186, 1187,
## 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197, 1198, 1199, 1200,
## 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213,
## 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1226,
## 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236, 1237, 1238, 1239,
## 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249, 1250, 1251, 1252,
## 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262, 1263, 1264, 1265,
## 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275, 1276, 1277, 1278,
## 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288, 1289, 1290, 1291,
## 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301, 1302, 1303, 1304,
## 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314, 1315, 1316, 1317,
## 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330,
## 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340, 1341, 1342, 1343,
## 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356,
## 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366, 1367, 1368, 1369,
## 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382,
## 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395,
## 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406, 1407, 1408,
## 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418, 1419, 1420, 1421,
## 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432, 1433, 1434,
## 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 1447,
## 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457, 1458, 1459, 1460,
## 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473,
## 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486,
## 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496, 1497, 1498, 1499,
## 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509, 1510, 1511, 1512,
## 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522, 1523, 1524, 1525,
## 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1538,
## 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1
Lets turn all factors into characters to aid later manipulations
rawhh <- rawhh %>%
mutate_if(is.factor,as.character)
Lets make a small formatting change to make sure later working work well.
rawhh <- rawhh %>%
mutate(case = as.numeric(case))
For some reason, original code created a variable MenPol4, equaling NA.
We will keep this for now.
rawhh <- rawhh %>%
mutate(MenPol4 = NA)
Lets create a variable for whether one has a TV package (TV), by using: - TVAmt1-5: ‘How much was your last payment for this TV service?’. Each number signifies a different package, in case some households have multiple (e.g. Netflix, Disney, etc).
This variable will probably not do much, as there are only 2 occurrences where it is ‘Yes’!
rawhh <- rawhh %>%
mutate(TV = case_when(!is.na(TVAmt) & !is.na(TVAmt2) & !is.na(TVAmt3) &
!is.na(TVAmt4) & !is.na(TVAmt5) ~ "Yes",
is.na(TVAmt) & is.na(TVAmt2) & is.na(TVAmt3) &
is.na(TVAmt4) & is.na(TVAmt5) ~ "No",
TRUE ~ NA_character_))
They now recreate various variables for some reason. For now, lets do the same.
The variables they used are: - ComTVTyp5: TV services included in package - O.TelAmt: Telephone payment before abatement/imputation - DVMEAC: DV for Elecmeter - O_EAcAmt: Electricity amount (ElecAmt) before abatement/imputation - GasAmt: How much did you pay last time for your gas? - GasPer: How often do you pay for your gas? - GERebtAmt: How much was the last rebate you received?
rawhh <- rawhh %>%
mutate( TVType = ComTVTyp5,
Telephon = case_when(!is.na(O_TelAmt) ~ "Yes",
TRUE ~ "No"),
DVMeac = DVMEAC,
EAcAmt = O_EAcAmt,
GAcAmt = GasAmt,
GAcPer = GasPer,
GERbtAmt = GERebtAmt)
Now, we make notes for survey-year and a nuw unique id. This was done in original code, but could perhaps not be necessary.
For now, we will keep it.
rawhh <- rawhh %>%
mutate(survey.year = "201920") %>%
mutate(newuniqueid = paste(as.character(survey.year), as.character(case), sep = "-"))
Now, it’s the time to extract all the variables of interest:
rawhh <- rawhh %>%
select(case,
newuniqueid,
Year,
Accom,
AccOth,
BGasAmt,
BGasBuy,
CAPVALp,
case,
CElecAbmt,
CElecAmt,
CentH,
CGasAbmt,
CGasAmt,
CHFuel,
CHI_LHP,
CHIAmt,
CHIAmt2,
#CHIBAmt,
#CHIBAmt2,
CHIBus,
CHIBus2,
DVPriRnt,
LastPay,
TopUp,
MortLeft,
MortType,
MenPol1,
MenPol2,
MenPol3,
MenPol4,
MortFlex,
O_MortPay,
MortPay,
MortPerc,
MortPayQ,
MortPercQ,
Morusu,
Morusamt,
DVMORT,
CHIDVAM,
CTBand,
CTdisab,
TV,
TVType,
#Freezer,
#WashMach,
Drier,
DishWash,
#MicroWve,
Telephon,
#Mobile,
#DVD,
Computer1,
Computer2,
Computer3,
Computer4,
#Inter,
#Access1,
#Access2,
#Access3,
#Access4,
#Access5,
#Intro,
CHIDVAM2,
#CHIDVBX,
#CHIDVBX2,
CHIFilt,
CHIHow,
CHIHow2,
CHIWhe,
CHM_LHP,
CHMAmt,
CHMAmt2,
CHMBAmt,
CHMBAmt2,
CHMBus,
CHMBus2,
CHMDVAM,
CHMDVAM2,
CHMDVBX,
CHMDVBX2,
CHMFilt,
CHMHow,
CHMHow2,
CHMWhe,
DVbedp,
DVBGas,
DVCElecAmt,
DVCGasAmt,
DVCsize,
DVEAC,
DVERB,
DVGAC,
DVGE,
DVGERB,
DVGRB,
DVHsize,
DVMeac,
DVMGAC,
DVMGE,
DVMGEE,
DVMGEG,
DVnumBU,
DVOIL,
DVrmsp,
DwellNo,
EAcAbmt,
EAcAmt,
#EAcPer,
ElecDays,
ElecMeter,
Elecnum,
ElecPay,
ElecPeriod,
ElecRebt,
ERbtAmt,
ERbtPer,
FltTyp,
GAcAbmt,
GAcAmt,
GAcPer,
GasDays,
GasMeter,
Gasnum,
GasPay,
GasPeriod,
Petrol,
Petrol2,
Petrol3,
GasRebt,
GEAbmt,
GEAmt,
GECom,
GEDays,
GEDoc,
GEMeter,
GEPay,
GEPer,
GEPeriod,
GERbtAmt,
GERbtPer,
GERebt,
GESup,
GFFMU,
GRbtAmt,
GRbtPer,
HPnum,
HseType,
Mortnum,
NimpDry,
NumAdult,
NumCh18,
NumChild,
NumDepCh,
NumHHldr,
O_CElecAmt,
O_CGasAmt,
O_EAcAmt,
O_GAcAmt,
BankFilt,
StOrFilt,
O_GEAmt,
OilAmt,
OilBuy,
OthFunum,
Ratesnum,
Rentnum,
SampQtr,
Nrms1p,
SerElec,
SurvYr,
FinArr1,
FinArr2,
FinArr3,
FinArr4,
HBen,
GETelDet3,
GETelDet4,
GETelDet5,
GETelDet6,
ETelDet3,
ETelDet4,
CombFilt1,
CombFilt2,
CombFilt3,
CombFilt4,
ComServ1,
ComServ2,
ComServ3,
ComServ4,
ComServ5,
ComServ6,
ComServ9,
TelFilt1, #THIS HAS BEST 3474 people = Contract mobile phone and or data plan; 824 = Pay-as-you-go or top-up mobile phone a
TelFilt2,
TelFilt3,
TelServ1,
TelServ2,
TelServ5,
TelServ6,
TelServ9,
TelServ10,
TelServ13,
TelServ17,
TelServ21,
TelServ25,
TelServ29
)
hh <- inner_join(dvhh, rawhh, by = "newuniqueid", suffix = c(".dv", ".raw"))
First, we need to import the data on R
rawper <- read.spss("data/lcf/lcfs_2019_rawper_ukanon_final.sav",
use.value.labels = TRUE,
to.data.frame = TRUE,
reencode = "utf-8")
## re-encoding from utf-8
## Warning in read.spss("data/lcf/lcfs_2019_rawper_ukanon_final.sav",
## use.value.labels = TRUE, : Undeclared level(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
## 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
## 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
## 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
## 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
## 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
## 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
## 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
## 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
## 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187,
## 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
## 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219,
## 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,
## 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251,
## 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267,
## 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283,
## 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
## 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315,
## 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331,
## 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347,
## 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,
## 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379,
## 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395,
## 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411,
## 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427,
## 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443,
## 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459,
## 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475,
## 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491,
## 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507,
## 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523,
## 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539,
## 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555,
## 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571,
## 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587,
## 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603,
## 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619,
## 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635,
## 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651,
## 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667,
## 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683,
## 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699,
## 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715,
## 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731,
## 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747,
## 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763,
## 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779,
## 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795,
## 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811,
## 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827,
## 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843,
## 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859,
## 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875,
## 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891,
## 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907,
## 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923,
## 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939,
## 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955,
## 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971,
## 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987,
## 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002,
## 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015,
## 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028,
## 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041,
## 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054,
## 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067,
## 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080,
## 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093,
## 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106,
## 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119,
## 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132,
## 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145,
## 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158,
## 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171,
## 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184,
## 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197,
## 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210,
## 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223,
## 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236,
## 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249,
## 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262,
## 1263, 1264, 1265, 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275,
## 1276, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288,
## 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301,
## 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314,
## 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327,
## 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340,
## 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353,
## 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366,
## 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379,
## 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392,
## 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405,
## 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418,
## 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431,
## 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444,
## 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457,
## 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470,
## 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483,
## 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496,
## 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509,
## 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522,
## 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535,
## 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1
Lets turn all factors into characters to aid later manipulations
rawper <- rawper %>%
mutate_if(is.factor,as.character)
Lets make a small formatting change to make sure later working work well.
rawper <- rawper %>%
mutate(case = as.numeric(case))
Now, we make notes for survey-year and a nuw unique id. This was done in original code, but could perhaps not be necessary.
For now, we will keep it.
rawper <- rawper %>%
mutate(survey.year = "201920") %>%
mutate(newuniqueid = paste(as.character(survey.year), as.character(case), sep = "-"))
Now, it’s the time to extract all the variables of interest:
rawper <- rawper %>%
select(case,
newuniqueid,
Person,
HRPID,
HRPPrID,
QHealth1,
HealIll,
RedAct,
Durredact,
HighEd1,
Satis,
Worth,
Happy,
Anxious,
dvage_p,
DVWTC,
DVCB,
DVUC,
PComp1,
PComp2,
DVIS)
First, we need to import the data on R
dvper <- read.spss("data/lcf/lcfs_2019_dvper_ukanon201920.sav",
use.value.labels = TRUE,
to.data.frame = TRUE,
reencode = "utf-8")
## re-encoding from utf-8
## Warning in read.spss("data/lcf/lcfs_2019_dvper_ukanon201920.sav",
## use.value.labels = TRUE, : Undeclared level(s) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
## 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
## 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
## 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
## 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
## 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
## 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139,
## 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
## 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
## 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187,
## 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
## 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219,
## 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235,
## 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251,
## 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267,
## 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283,
## 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299,
## 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315,
## 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331,
## 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347,
## 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363,
## 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379,
## 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395,
## 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411,
## 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427,
## 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443,
## 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459,
## 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475,
## 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491,
## 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507,
## 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523,
## 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539,
## 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555,
## 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571,
## 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587,
## 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603,
## 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619,
## 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635,
## 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651,
## 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667,
## 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683,
## 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699,
## 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715,
## 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731,
## 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747,
## 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763,
## 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779,
## 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795,
## 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811,
## 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827,
## 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843,
## 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859,
## 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875,
## 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891,
## 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907,
## 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923,
## 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939,
## 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955,
## 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971,
## 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987,
## 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002,
## 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015,
## 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028,
## 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041,
## 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054,
## 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067,
## 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080,
## 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093,
## 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106,
## 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119,
## 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132,
## 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 1142, 1143, 1144, 1145,
## 1146, 1147, 1148, 1149, 1150, 1151, 1152, 1153, 1154, 1155, 1156, 1157, 1158,
## 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1168, 1169, 1170, 1171,
## 1172, 1173, 1174, 1175, 1176, 1177, 1178, 1179, 1180, 1181, 1182, 1183, 1184,
## 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1192, 1193, 1194, 1195, 1196, 1197,
## 1198, 1199, 1200, 1201, 1202, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210,
## 1211, 1212, 1213, 1214, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223,
## 1224, 1225, 1226, 1227, 1228, 1229, 1230, 1231, 1232, 1233, 1234, 1235, 1236,
## 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1247, 1248, 1249,
## 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1261, 1262,
## 1263, 1264, 1265, 1266, 1267, 1268, 1269, 1270, 1271, 1272, 1273, 1274, 1275,
## 1276, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, 1287, 1288,
## 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1300, 1301,
## 1302, 1303, 1304, 1305, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1313, 1314,
## 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327,
## 1328, 1329, 1330, 1331, 1332, 1333, 1334, 1335, 1336, 1337, 1338, 1339, 1340,
## 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, 1349, 1350, 1351, 1352, 1353,
## 1354, 1355, 1356, 1357, 1358, 1359, 1360, 1361, 1362, 1363, 1364, 1365, 1366,
## 1367, 1368, 1369, 1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379,
## 1380, 1381, 1382, 1383, 1384, 1385, 1386, 1387, 1388, 1389, 1390, 1391, 1392,
## 1393, 1394, 1395, 1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405,
## 1406, 1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417, 1418,
## 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1426, 1427, 1428, 1429, 1430, 1431,
## 1432, 1433, 1434, 1435, 1436, 1437, 1438, 1439, 1440, 1441, 1442, 1443, 1444,
## 1445, 1446, 1447, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1457,
## 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470,
## 1471, 1472, 1473, 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1483,
## 1484, 1485, 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, 1496,
## 1497, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507, 1508, 1509,
## 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, 1520, 1521, 1522,
## 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535,
## 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1544, 1545, 1546, 1
## Warning in read.spss("data/lcf/lcfs_2019_dvper_ukanon201920.sav",
## use.value.labels = TRUE, : Undeclared level(s) 0, 1, 8, 10, 11, 12, 13, 14, 15,
## 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
## 36, 37, 39, 42, 43, 44, 47, 56 added in variable: A010
Lets turn all factors into characters to aid later manipulations
dvper <- dvper %>%
mutate_if(is.factor,as.character)
Lets make a small formatting change to make sure later working work well.
dvper <- dvper %>%
mutate(case = as.numeric(case))
Now, we make notes for survey-year and a new unique id. This was done in original code, but could perhaps not be necessary.
For now, we will keep it.
dvper <- dvper %>%
mutate(survey.year = "201920") %>%
mutate(newuniqueid = paste(as.character(survey.year), as.character(case), sep = "-"))
Now, lets select all the variables of interest.
dvper <- dvper %>%
select(case,
newuniqueid,
#survey.number,
survey.year,
`Relationship to Person 1` = A002,
`Household Reference Person` = A003,
`Partner of Household Reference Person` = A0031,
`Sex` = A004,
`Age - anonymised` = a005p,
`Marital status - anonymised` = a006p,
`Ethnic origin of Household Reference Person - anonymised` = a012p,
`Ethnic origin of Partner of Household Reference Person - anonymised` = a013p,
`Incapacity benefit - length of time received` = A2261,
`Incapacity benefit - number weeks received (up to 2 years)` = A2262,
`Incapacity benefit - receiving at present` = A227,
`Income supp - length of time received 1` = A2281,
`Income supp - length of time received 2` = A2282,
`Income support - receiving at present 1` = A229,
`Income support - amount received 1` = B365,
`Income support amount received 2` = P025,
`Statutory sick pay - receiving at present 1` = A234,
`Statutory sick pay - receiving at present 2` = A279,
`Maternity allowance - length of time received` = A2391,
`Maternity allowance - number weeks received (up to 2 years)` = A2392,
`Maternity allowance - receiving at present` = A240,
`Pension credit- receiving at present ` = A241 ,
`JSA (contribution based) - receiving at present` = A301,
`JSA (contribution based) - length of time received` = A3001,
`JSA (contribution based) - no. of weeks received` = A3002,
`JSA (income based) - length of time received` = A3021,
`JSA (income based) - no. of weeks received` = A3022,
`JSA (contribution and income based) - length of time received` = A3025,
`JSA (contribution and income based) - no. of weeks received` = A3026,
`JSA (income based) - receiving at present` = A303,
`JSA (contribution and income based) - receiving at present` = A3035,
`JSA (contribution based) - last amount received` = B510,
`JSA (income based) - amount received.` = B511,
`JSA (contribution and income based) - amount received.` = B512,
`Job Seekers Allowance - current` = P024,
`ESA (contribution based) - length of time received` = A3041,
`ESA (contribution based) - no. of weeks received` = A3042,
`ESA (contribution based) - receiving at present` = A305,
`ESA (income based) - length of time received` = A3061,
`ESA (income based) - no. of weeks received` = A3062,
`ESA (contribution and income based) - length of time received` = A3065,
`ESA (contribution and income based) - no. of weeks received` = A3066,
`ESA (income based) - receiving at present` = A307,
`ESA (contribution and income based) - receiving at present` = A3075,
`ESA (contribution based) - last amount received` = B513,
`ESA (income based) - last amount received` = B514,
`ESA (contribution and income based) - last amount received` = B515,
`Employment and Support Allowance amount received` = P027,
`Universal Credit - length of time received` = A3081,
`Universal Credit - number weeks received (up to 2 years)` = A3082,
`Invalid care allowance - amount received` = B343,
`Pension credit amt received` = B3651,
`Incapacity benefit - amount last received` = B366,
`Incapacity benefit - amount received at present` = P026,
`WTC - receiving at present` = A257,
`WTC - last amount received as a benefit, excludes amounts where CTC and WTC are paid in combination` = B368,
`CTC - last amount received as a benefit, excludes amounts where CTC and WTC are paid in combination` = B369,
`Tax credits received as benefits, includes where CTC/WTC are paid in combination` = B370 ,
`Disability living allowance (self-care)` = B403,
`Disability living allowance (mobility)` = B405,
`PIP (Care component)` = B552,
`PIP (Mobility component)` = B553,
`Attendance allowance - last amount received` = B421,
`Universal Credit - Last amount received` = B550,
`War disability | widows pension - amount received` = B340,
`Indst injury disablement benefit - current` = P016,
`Severe disablement allowance - last amount received` = B418,
`Social security benefits concurrent with earnings` = P028,
`NI contributions paid by non - employees - anonymised` = P029p,
`Social security benefits excld income by 13 week rule` = P030,
`Social security benefits included in income calcs` = P031,
`Social security retirement, old age, widows pensions` = P033
)
First, for some reason, NAs are replaced with zeros for the variable: - DVMORT: DV for MortPay
core.demographics <- hh %>%
mutate(DVMORT = ifelse(is.na(DVMORT), 0, DVMORT))
Then, we create a single variable for the number of children <5 years old (no.chld.below.5), by using: - A040: Number of children children-age under 2 - A041: Number of children children-age 2 but un
NOTE: definition of A041 unclear so it needs to be double-checked.
core.demographics <- core.demographics %>%
mutate(no.chld.below.5 = A040 + A041)
Then, we select the variables of interest
Notice they were all renamed for some reason! For now, we will keep it like that.
core.demographics <- core.demographics %>%
select(newuniqueid,
#case.dv,
SurvYr,
new_weight,
has.gas.elec = A103,
hhldsize = A049,
hhldcomp = A062,
num.adults = G018,
num.children = G019,
num.hhlders = NumHHldr,
hrp.age.rank = a065p,
age.oldest = a070p,
hrp.age.anon = p396p,
SexHRP,
no.chld.below.5 = no.chld.below.5,
#ethnicity, - need from person level data
#marital status - need from person level data
#has.computer = A1661,
#internet = A172,
weekly.rent = P257,
weekly.mortgage = DVMORT,
hhldtype = a069p,
wealthly.hhld = p493p,
hrp.economic.status = A093,
total.expn.anon = P630tp,
hhld.inc.gross.week = p344p,
hhld.inc.net.week = P389p,
hhld.inc.gross.week.current = P352p,
hhld.inc.equiv.mccl = EqIncDMp,
hhld.inc.equiv.oecd = EqIncDOp,
Gasnum,
Elecnum,
Ratesnum,
HPnum,
FinArr1,
FinArr2,
FinArr3,
FinArr4,
OAC1D
)
Firstly, ethnicity.
We will create a variable which copies the following one (hrp.ethnicity): - Ethnic origin of Household Reference Person - anonymised (from a012)
and another one which only splits between White and BAME people (hrp.ethnicity.2x).
As a note - 93% of the sample are ‘White’.
ethnicity <- dvper %>%
filter(`Household Reference Person` == "HRP") %>% #we determine ethnicity based on household reference person
select(newuniqueid,
hrp.ethnicity = `Ethnic origin of Household Reference Person - anonymised`)
#and now we create hrp.ethnicity.3x
ethnicity <- ethnicity %>%
mutate(hrp.ethnicity.2x = case_when(hrp.ethnicity != "White" ~ "BAME",
TRUE ~ "White") )
Next, marital status.
Marital status is assigned based on the household reference person.
We will create a variable which copies the following one (hrp.marital.status): - ‘Marital status - anonymised’ (from a006p)
and another one which (hrp.marital.status.simple), which splits just between: - Married/civil partner/cohabiting - Divorced/separated - Single - Widowed
marital <- dvper %>%
filter(`Household Reference Person` == "HRP") %>%
select(newuniqueid,
hrp.marital.status = `Marital status - anonymised`)
#and now we create hrp.marital.status.simple
marital <- marital %>%
mutate(hrp.marital.status.simple = case_when(hrp.marital.status == "Civil Partner or Former Civil Partner" ~ "Married/civil partner/cohabiting",
hrp.marital.status == "Cohabitee" ~ "Married/civil partner/cohabiting",
hrp.marital.status == "Married and living with your husband/wife" ~ "Married/civil partner/cohabiting",
hrp.marital.status == "Divorced" ~ "Divorced/separated",
hrp.marital.status == "Separated" ~ "Divorced/separated",
hrp.marital.status == "Single" ~ "Single",
hrp.marital.status == "Widowed" ~ "Widowed"))
Here, we collect the education data of interest.
First, we create (highest.qualification.hrp), by just changing a bit the variable: - HighEd1: Highest qualification level
education <- rawper %>%
filter(HRPID == "HRP") %>%
transmute(newuniqueid,
highest.qualification.hrp = ifelse(!is.na(HighEd1), HighEd1, "Missing"))
and now we create a simpler variable (degree.or.higher.hrp), where: - Degree or equivalent = yes - ONC National Level BTEC, A Levels, GCSE etc = no - Missing = missing
education <- education %>%
mutate(degree.or.higher.hrp = case_when(highest.qualification.hrp == "Degree or equivalent" ~ "yes",
highest.qualification.hrp != "Missing" ~ "no",
TRUE ~ "missing"))
Now, for some reason the main demographics df is created, while excluding the education df.
So, as a start, it mergers together the core.demographics, ethnicity, and marital dfs:
demographics <- Reduce(function(x, y) left_join(x = x, y = y, by = "newuniqueid"),
list(core.demographics,
#education,
ethnicity,
marital
))
Then, we create a variable with age bands (hrp.age.banded):
demographics <- demographics %>%
mutate(hrp.age.banded = case_when(hrp.age.anon < 25 ~ "16-24",
hrp.age.anon < 35 ~ "25-34",
hrp.age.anon < 45 ~ "35-44",
hrp.age.anon < 55 ~ "45-54",
hrp.age.anon < 65 ~ "55-64",
hrp.age.anon < 75 ~ "65-74",
hrp.age.anon >= 75 ~ "75+"))
Then we create some more simplified version of the hrp.economic.status.simple variables:
hrp.economic.status.simple: -This links to variable results which dont exist in the original data frame, so identical to hrp.economc.status
demographics <- demographics %>%
mutate(hrp.economic.status.simple = case_when(hrp.economic.status == "Retired/unoccupied and of minimum NI Pension age" ~ "Retired",
hrp.economic.status == "Retired/unoccupied but under minimum NI Pension age" ~ "Retired",
hrp.economic.status == "Work related Government Training Programmes" ~ "Gvmnt training programme",
#hrp.economic.status == "" ~ "",
#hrp.economic.status == "" ~ "",
TRUE ~ hrp.economic.status))
hrp.economic.status.2: -again, the namign conventions are misspecified leading to matching not happening fully.
demographics <- demographics %>%
mutate(hrp.economic.status.2 = case_when(hrp.economic.status == "Ret unoc over min ni age" ~ "Retired",
hrp.economic.status == "Unoc - under min ni age" ~ "Retired",
hrp.economic.status == "Work related govt train prog" ~ "Education/training",
hrp.economic.status == "Fulltime employee" ~ "Full-time employee",
hrp.economic.status == "Pt employee " ~ "Part-time or self employed",
hrp.economic.status == "Self-employed" ~ "Part-time or self employed",
#hrp.economic.status == "" ~ "",
#hrp.economic.status == "" ~ "",
TRUE ~ hrp.economic.status))
hrp.emp.5x: -again, the naming conventions are misspecified leading to matching not happening fully.
demographics <- demographics %>%
mutate(hrp.emp.5x = case_when(hrp.economic.status == "Ret unoc over min ni age" ~ "retired",
hrp.economic.status == "Unoc - under min ni age" ~ "other inactive",
hrp.economic.status == "Work related govt train prog" ~ "full-time education",
hrp.economic.status == "Fulltime employee" ~ "employed (full-time/part-time/self)",
hrp.economic.status == "Pt employee " ~ "employed (full-time/part-time/self)",
hrp.economic.status == "Self-employed" ~ "employed (full-time/part-time/self)",
hrp.economic.status == "Unemployed" ~ "unemployed",
#hrp.economic.status == "" ~ "",
TRUE ~ hrp.economic.status))
Then, we create a variable to highlight the number of children (children), using: - num.children: originally was variable G019: Number of children - hhldcomp: originally was variable A062: Composition of Household
demographics <- demographics %>%
mutate(children = case_when(num.children == 0 ~ "0",
num.children == 1 ~ "1",
num.children >= 2 ~ "2+")) %>%
#here we get some extra data from hhldcomp, as sometimes this includes info on children number.
#the command 'grepl' is a text-identification command
mutate(children = case_when(is.na(children) & grepl("2 children", hhldcomp) ~ "2+",
is.na(children) & grepl("2 or more children", hhldcomp) ~ "2+",
is.na(children) & grepl("1 or more children", hhldcomp) ~ "2+",
TRUE ~ children))
Then, we simplify the household composition (hhldcomp.8x), using: - hhldcomp: originally was variable A062: Composition of Household
This in practice only leads to: i) Others’, ii) single parent (woman), iii) couple, iv) couple and children
demographics <- demographics %>%
mutate(hhldcomp.8x =
case_when(hhldcomp == "One man" ~ "single adult (man)",
hhldcomp == "One man; one child" ~ "single parent (man)",
hhldcomp == "One man and 1 woman" ~ "couple",
hhldcomp == "One man; 2 plus c" ~ "single parent (man)",
hhldcomp == "1 man; 1 woman & 1c" ~ "couple and children",
hhldcomp == "1 man; 1 woman; 2c" ~ "couple and children",
hhldcomp == "1 man; 1 woman; 3c" ~ "couple and children",
hhldcomp == "One woman" ~ "single adult (woman)",
hhldcomp == "One woman; one child" ~ "single parent (woman)",
hhldcomp == "One woman; 2 plus c" ~ "single parent (woman)",
hhldcomp == "Two adults; 4c" ~ "couple and children",
hhldcomp == "Two adults; 5c" ~ "couple and children",
hhldcomp == "Two adults; 6+c" ~ "couple and children",
hhldcomp == "Two men or two women" ~ "couple",
hhldcomp == "2 men or 2 women; 1c" ~ "couple and children",
hhldcomp == "2 men or 2 women; 2c" ~ "couple and children",
hhldcomp == "2 men or 2 women; 3c" ~ "couple and children",
hhldcomp == "Three adults" ~ "multiple adults",
hhldcomp == "Three adults; 1c" ~ "multiple adults with children",
hhldcomp == "Three adults; 2c" ~ "multiple adults with children",
hhldcomp == "Three adults; 3c" ~ "multiple adults with children",
hhldcomp == "Three adults; 4+c" ~ "multiple adults with children",
hhldcomp == "Four adults" ~ "multiple adults",
hhldcomp == "Four adults; 1c" ~ "multiple adults with children",
hhldcomp == "Four adults; 2+c" ~ "multiple adults with children",
hhldcomp == "Five adults" ~ "multiple adults",
hhldcomp == "Five adults; 1+c" ~ "multiple adults with children",
hhldcomp == "Six or more adults" ~ "multiple adults",
hhldcomp == "Other hholds with c" ~ "multiple adults with children",
TRUE ~ "Other"))
Now we calculate the deciles for a variety of variables, being: - Income.deciles: from hhld.inc.net.week, originally P389p = Normal weekly disposable household income - anonymised - Income.deciles.gross: from hhld.inc.gross.week, originally p344p = Gross normal weekly household income - anonymised - Income.deciles.equiv.oecd: from hhld.inc.equiv.oecd, originally EqIncDOp = Equivalised income (OECD Scale) - anonymised - Expenditure.deciles: from total.expn.anon, originally P630tp = Fam Spend: Total Expenditure (anonymised)
Notice that the deciles are derived by using new_weight, which originally was weighta = annual weight. Still unsure how weighta is derived.
demographics <- demographics %>%
mutate(income.deciles = weighted_ntile(hhld.inc.net.week, weights = new_weight, 10),
income.deciles.gross = weighted_ntile(hhld.inc.gross.week, weights = new_weight, 10),
income.deciles.equiv.oecd = weighted_ntile(hhld.inc.equiv.oecd, weights = new_weight, 10),
expenditure.deciles = weighted_ntile(total.expn.anon, weights = new_weight, 10))
We do the same exercise for quintiles:
demographics <- demographics %>%
mutate(income.quintiles = weighted_ntile(hhld.inc.net.week, weights = new_weight, 5),
income.quintiles.equiv.oecd = weighted_ntile(hhld.inc.equiv.oecd, weights = new_weight, 5),
income.quintiles.gross = weighted_ntile(hhld.inc.gross.week, weights = new_weight, 5),
expenditure.quintiles = weighted_ntile(total.expn.anon, weights = new_weight, 5)
)
Now, it is time to start deriving equivalised income.
To do this, we first collect the equivalisation factors, by importing the data from a csv file:
equiv.factor.lup <- read.csv("data/income-equivalisation-factors.csv")
#and selecting the relevant data:
equiv.factor.lup <- equiv.factor.lup %>%
select(num.adults,
num.children,
equiv.factor)
Now, lets merge the equivalisation factors into the demographics data frame:
demographics <- left_join(demographics, equiv.factor.lup, by = c("num.adults", "num.children"))
Ok, now it is time to create some more new variables again.
First, lets create housing.costs, using:
demographics <- demographics %>%
mutate(housing.costs = weekly.rent + weekly.mortgage)
Now, lets create a variable for equivalised net income before housing costs (hhlds.net.bhc.income.equiv), using: - hhld.inc.net.week: originally P389p: Normal weekly disposable household income - anonymised
demographics <- demographics %>%
mutate(hhld.net.bhc.income.equiv = hhld.inc.net.week / equiv.factor)
And now lets create a variable for equivalised net income AFTER housing costs (hhlds.net.ahc.income.equiv):
demographics <- demographics %>%
mutate(hhld.net.ahc.income.equiv = (hhld.inc.net.week - housing.costs) / equiv.factor)
Now, lets do some work incorporating some poverty sort of categorisations.
We define the poverty line as being 60% below the median (does this make sense??).
Lets derive the poverty threshold before housing costs (bhc.pov.threshold):
bhc.pov.threshold = demographics %>%
summarise(median.income = wtd.quantile(hhld.net.bhc.income.equiv, new_weight, probs = 0.5) * 0.6) %>%
as.numeric()
And create a variable to identify those households which are below the poverty line before housing costs:
demographics <- demographics %>%
mutate(hhld.below.pov.line.bhc = case_when(hhld.net.bhc.income.equiv < bhc.pov.threshold ~ "Income below 60% median (BHC)",
TRUE ~ "Not in poverty"))
Now, lets derive the poverty threshold after housing costs (ahc.pov.threshold):
ahc.pov.threshold = demographics %>%
summarise(median.income = wtd.quantile(hhld.net.ahc.income.equiv, new_weight, probs = 0.5) * 0.6) %>%
as.numeric()
And, again identify those households below the poverty line (hhld.below.pov.line.ahc):
demographics <- demographics %>%
mutate(hhld.below.pov.line.ahc = case_when(hhld.net.ahc.income.equiv < bhc.pov.threshold ~ "Income below 60% median (AHC)",
TRUE ~ "Not in poverty"))
Lets create even more new variables… Now, it is annual rent (rent.annual), using: - weekly.rent: from P257 = Rent - gross (inc rates if not paid sep)
demographics <- demographics %>%
mutate(rent.annual = weekly.rent * 365 / 7)
Lets also create annual mortgage. For some reason, NAs are replaced with zeros which is a bit risky. Will keep for now:
demographics <- demographics %>%
mutate(mortgage.annual = ifelse(is.na(weekly.mortgage), 0, weekly.mortgage * 365 / 7))
Lets now derive the ANNUAL after housing costs net income (annual.ahc.net.hhld.income), by using: - hhld.inc.net.week: P389p: Normal weekly disposable household income - anonymised
demographics <- demographics %>%
mutate(annual.ahc.net.hhld.income = (hhld.inc.net.week * 365 / 7) - rent.annual - mortgage.annual)
Now for some reason the original code recalculated ‘income.deciles.equiv.oecd’. This actually leads to a SLIGHLY different result, so kept in. Could perhaps drop in future as it led to different outputs only in 2 cases out of 5,438.
demographics <- demographics %>%
mutate(income.deciles.equiv.oecd = ntiles.wtd(hhld.inc.equiv.oecd, weights = new_weight, 10))
For some reason, the ‘new_weight’ variable is also dropped at this stage:
demographics <- demographics %>%
select(-new_weight)
Finally, a new variable is created (arrears.on.fuel.bills), to capture those who were ever in arrears for gas, electricity, oil, or coal. This uses: - Gasnum: Gas bill arrears in last 12 months - Elecnum: Electricity bill arrears in last 12 months - FinArr1-4: “Has household been in arrears in last 12 months on”
demographics <- demographics %>%
mutate(arrears.on.fuel.bills = case_when(Gasnum == "Once only" | Gasnum == "Twice or more" ~ "In arrears on gas/elec",
Elecnum == "Once only" | Elecnum == "Twice or more" ~ "In arrears on gas/elec",
FinArr1 == "Other fuel bills like oil or coal?" ~ "In arrears on gas/elec",
FinArr2 == "Other fuel bills like oil or coal?" ~ "In arrears on gas/elec",
FinArr3 == "Other fuel bills like oil or coal?" ~ "In arrears on gas/elec",
FinArr4 == "Other fuel bills like oil or coal?" ~ "In arrears on gas/elec",
!is.na(FinArr1) ~ "Not in arrears"))
Lets select from the hh data the variables of interest:
housing <- hh %>%
select(newuniqueid,
hrp.length.residence.years = A131,
hrp.length.residence.months = A1311,
dwelling.type = A116,
HseType,
Accom,
AccOth,
FltTyp,
DwellNo,
DVrmsp,
DVbedp,
tenure.dv = A121,
tenure.harm = A122,
P535CP,
P535TP,
P535p,
P536CP,
P536p,
P536tp,
URGridSCp,
URGridEWp
)
We then replace the ‘others’ values from a variable for dwelling type (dwelltype), using: - HseType: Is it (the house or bungalow): detached; semi-detached; or terraced/ end of terrace? - FltTyp: Is it (the flat/maisonette): a purpose-built block; or a converted house/some other kind of building? - AccOth: Is the accommodation: a caravan; mobile home; homeboat; or some other kind of accommodation? - dwelltype: from A116: Category of dwelling
housing <- housing %>%
mutate(dwelling.type = case_when(dwelling.type == "Others" & HseType == "Detatched" ~ "Whole house,bungalow-detached",
dwelling.type == "Others" & HseType == "Semi-Detatched" ~ "Whole hse,bungalow-semi-dtchd",
dwelling.type == "Others" & HseType == "Terraced/end of terrace" ~ "Whole house,bungalow-terraced",
dwelling.type == "Others" & FltTyp == "A converted house/some other kind of building" ~ "Part of house converted flat",
dwelling.type == "Others" & FltTyp == "A purpose built block" ~ "Purpose-built flat maisonette",
dwelling.type == "Others" & AccOth == "Caravan, mobile home or houseboat" ~ AccOth,
dwelling.type == "Others" ~ "Other accomodation",
TRUE ~ dwelling.type)) %>%
mutate(dwelltype = case_when(dwelling.type == "Whole house,bungalow-detached" ~ "detached",
dwelling.type == "Whole hse,bungalow-semi-dtchd" ~ "semi-detached",
dwelling.type == "Whole house,bungalow-terraced" ~ "terraced",
dwelling.type == "Part of house converted flat" ~ "converted-flat",
dwelling.type == "Purpose-built flat maisonette" ~ "purposebuilt-flat",
dwelling.type == "Caravan, mobile home or houseboat" ~ "caravan-boat",
dwelling.type == "Other accomodation" ~ "other"))
#We then drop the original dwelling.type variable
housing <- housing %>%
select(-dwelling.type)
Then we create a variable in which we rename the types of tenure (tenure), using: - tenure.harm: from A122: Tenure type - harmonised
housing <- housing %>%
mutate(tenure = case_when(tenure.harm == "Hsng Assn (furnished unfrnish)" ~ "housing-association",
tenure.harm == "LA (furnished unfurnished)" ~ "local-authority",
grepl("Priv. rented", tenure.harm) ~ "private-rented",
tenure.harm == "Rent free" ~ "private-rented",
grepl("Owned", tenure.harm) ~ "owner-occupier"
))
Then, we create a variable for the length of residence (length.of.residence), using: - hrp.length.residence.years: from A131: HRP - no of years at address - hrp.length.residence.months: from A1311: HRP - no of months at address
housing <- housing %>%
mutate(length.of.residence = round(hrp.length.residence.years + hrp.length.residence.months / 12, 2))
Then, we create a variable with bands for the amount of years (length.of.residence.banded):
housing <- housing %>%
mutate(length.of.residence.banded = case_when(length.of.residence < 1 ~ "less than 1 year",
#length.of.residence == 1 ~ "1 year",
length.of.residence <= 2 ~ "1-2 years",
length.of.residence <= 5 ~ "3-5 years",
length.of.residence <= 10 ~ "6-10 years",
length.of.residence <= 20 ~ "11-20 years",
length.of.residence <= 30 ~ "21-30 years",
length.of.residence > 30 ~ "more than 30 years",
is.na(hrp.length.residence.years) & hrp.length.residence.months == 0 ~ "less than 1 year"))
Then, we create a variable for the number of rooms (number.of.rooms), using: - DVrmsp: DV for total number of rooms, excluding small kitchens and bathrooms - anonymised
Notice the ‘ten or more rooms’ is simply translated into ten rooms which may need a rethink.
housing <- housing %>%
mutate(number.of.rooms = case_when(grepl("one", DVrmsp, ignore.case = T) ~ 1,
grepl("two", DVrmsp, ignore.case = T) ~ 2,
grepl("three", DVrmsp, ignore.case = T) ~ 3,
grepl("four", DVrmsp, ignore.case = T) ~ 4,
grepl("five", DVrmsp, ignore.case = T) ~ 5,
grepl("six", DVrmsp, ignore.case = T) ~ 6,
grepl("seven", DVrmsp, ignore.case = T) ~ 7,
grepl("eight", DVrmsp, ignore.case = T) ~ 8,
grepl("nine", DVrmsp, ignore.case = T) ~ 9,
grepl("ten", DVrmsp, ignore.case = T) ~ 10))
#and convert to numeric:
housing <- housing %>%
mutate(number.of.rooms = as.integer(number.of.rooms))
Then, lets create a variable for the number of bedrooms (number.of.bedrooms), using: - DVbedp: DV for total number of bedrooms occupied by household - anonymised
Again, ‘ten or more’ is converted to ‘10’.
housing <- housing %>%
mutate(number.of.bedrooms = case_when(grepl("one", DVbedp, ignore.case = T) ~ 1,
grepl("two", DVbedp, ignore.case = T) ~ 2,
grepl("three", DVbedp, ignore.case = T) ~ 3,
grepl("four", DVbedp, ignore.case = T) ~ 4,
grepl("five", DVbedp, ignore.case = T) ~ 5,
grepl("six", DVbedp, ignore.case = T) ~ 6,
grepl("seven", DVbedp, ignore.case = T) ~ 7,
grepl("eight", DVbedp, ignore.case = T) ~ 8,
grepl("nine", DVbedp, ignore.case = T) ~ 9,
grepl("ten", DVbedp, ignore.case = T) ~ 10,
TRUE ~ 0))
#and convert to numeric:
housing <- housing %>%
mutate(number.of.bedrooms = as.integer(number.of.bedrooms))
Then, we create a variable to capture whether it is rural (rurality), by using: - URGridEWp: Rural and Urban Classification 2004 for England & Wales - URGridSCp: Scottish Executive Urban Rural Classification
housing <- housing %>%
#incorporate scottish data when available.
mutate(rurality = case_when(is.na(URGridEWp) ~ URGridSCp, TRUE ~ URGridEWp)) %>%
#replace NAs with 'missing'
mutate(rurality = ifelse(is.na(rurality), "Missing", rurality))
Finally, we select the variables of interest:
housing <- housing %>%
select(newuniqueid,
rurality,
dwelltype,
number.of.rooms,
number.of.bedrooms,
tenure,
length.of.residence,
length.of.residence.banded
)
We will start using the rawper data.
First, lets isolate the data for the head of the household or their partner in (longterm.health), using: - HRPID: Household Reference Person identifier - HRPPrID: Partner of Household Reference Person
longterm.health <- rawper %>%
filter(HRPID == "HRP" | HRPPrID == "Partner of HRP")
We then create another variable to signify if HRP or partner (hrp.or.partner)
longterm.health <- longterm.health %>%
mutate(hrp.or.partner = case_when(HRPID == "HRP" ~ "HRP",
HRPPrID == "Partner of HRP" ~ "Partner",
TRUE ~ "Other"))
Now, lets isolate the variables of interest and rename them as in the original code:
longterm.health <- longterm.health %>%
transmute(newuniqueid,
Person,
hrp.or.partner,
QHealth1 = tolower(QHealth1), #How is your health in general, is it...
HealIll = tolower(HealIll), #Have you physical/mental health condition lasting 12 m+?
RedAct = tolower(RedAct), #Does your condition reduce ability to carry daily activities?
Durredact = tolower(Durredact)) #How long was your ability to carry activity reduced?
Now, they run various code to remove duplicates with peolpe having multiple partners. But inspecting the code this never occurs.
As a result, this part of the original code was removed.
Now, we create a variable to identify if there long-term health conditions (longterm.limiting.health.condition), using: - HealIll: Have you physical/mental health condition lasting 12 m+? - Durredact: How long was your ability to carry activity reduced? - RedAct: Does your condition reduce ability to carry out daily activities?
longterm.health <- longterm.health %>%
mutate(longterm.limiting.health.condition = case_when(HealIll == "yes" & grepl("12 months or more", Durredact) & grepl("yes, a lot", RedAct) ~ "yes, affects daily activities a lot",
HealIll == "yes" & grepl("12 months or more", Durredact) & grepl("yes, a little", RedAct) ~ "yes, affects daily activities a little",
HealIll == "yes" & grepl("12 months or more", Durredact) & grepl("not at all", RedAct) ~ "yes, but no limitations to daily activities",
TRUE ~ "No longterm health condition"))
In original code they also derived a dataframe called longterm.health.final. This didnt seem necessary so dropped.
Then, we create another variable with the total annual health expenditure (health.spend), using: -P606t: COICOP: Total Health expenditure
health.spend <- hh %>%
select(newuniqueid,
total.health.spend = P606t)
And now, lets merge the longterm.health and health.spend dataframes to get (health):
health <- inner_join(longterm.health, health.spend, by = "newuniqueid")
We will be using the ‘hh’ data.
First, lets select the data we are interested in, regarding cars, renaming them:
cars <- hh %>%
transmute(newuniqueid,
case.dv,
new_weight,
car.vans.in.hhld = A124,
cars.owned = A149,
cars.owned.or.used = A160,
new.cars.bought.12mnths = a117p,
snd.hand.cars.bought.12mnths = a118p,
cars.and.vans.owned = a143p,
#cars.and.vans.used.or.owned = a162p,
Petrol,
Petrol2,
Petrol3)
Then we create three variables based on whether people have no car/van or more(car1.type, car2.type,car3.type), using: - Petrol, Petrol2, Petrol3: Does the vehicle run on: petrol; or diesel; or something else? - cars.and.vans.owned: from a143p: Number of cars & vans currently owned - anonymised
#car1.type addition
cars <- cars %>%
mutate(car1.type = case_when(is.na(Petrol) & cars.and.vans.owned == 0 ~ "No cars/vans",
is.na(Petrol) & cars.and.vans.owned > 0 ~ "Missing data",
TRUE ~ Petrol),
#car2.type addition
car2.type = case_when(is.na(Petrol2) & cars.and.vans.owned < 2 ~ "No second car/van",
is.na(Petrol2) & cars.and.vans.owned > 1 ~ "Missing data",
TRUE ~ Petrol2),
#car3.type addition
car3.type = case_when(is.na(Petrol3) & cars.and.vans.owned < 3 ~ "No third car/van",
is.na(Petrol3) & cars.and.vans.owned > 2 ~ "Missing data",
TRUE ~ Petrol3))
Then, some values that didn’t exist replaced. The code didn’t do anything so not replicated.
Now, lets create variables that identify if households have an EV (electric.vehicles), hybrid (hybrid.vehicles), or both (elec.or.hybrid.vehicles).
Notice that the coded formatting is different for the first two variables and the last one which uses Yes and No.
cars <- cars %>%
mutate(electric.vehicles = case_when(car1.type == "Aelectricity,A" ~ 1,
car2.type == "Aelectricity,A" ~ 1,
car3.type == "Aelectricity,A" ~ 1,
TRUE ~ 0),
hybrid.vehicles = case_when(car1.type == "Ahybrid (uses a combination of petrol an" ~ 1,
car2.type == "Ahybrid (uses a combination of petrol an" ~ 1,
car3.type == "Ahybrid (uses a combination of petrol an" ~ 1,
TRUE ~ 0),
elec.or.hybrid.vehicles = case_when(electric.vehicles == 1 ~ "Yes",
hybrid.vehicles == 1 ~ "Yes",
TRUE ~ "No")
)
And lets select the variables of interest:
cars <- cars %>%
select(newuniqueid,
cars.and.vans.owned,
new.cars.bought.12mnths,
snd.hand.cars.bought.12mnths,
car1.type,
car2.type,
car3.type,
electric.vehicles,
hybrid.vehicles,
elec.or.hybrid.vehicles
)
Lets calculate annual air travel data (air.travel) from hh, using: - C73311w: Air fares (within UK) - C73312w: Air fares (international)
air.travel <- hh %>%
transmute(newuniqueid,
ann.air.fares.uk = C73311w * 365 / 7,
ann.air.fares.int = C73312w * 365 / 7)
Lets collect travel costs in weekly and annual form, using: - P607t: COICOP: Total Transport costs
travel.costs <- hh %>%
transmute(newuniqueid,
total.travel.expenditure.week = P607t,
total.travel.expenditure.annual = P607t * 365 / 7)
Lets combine the car, air travel, and travel costs in one data frame (all.travel):
all.travel <- inner_join(travel.costs,
inner_join(air.travel, cars, by = "newuniqueid"), by = "newuniqueid")