“Taste classifies, and it classifies the classifier”, claims Pierre Bourdieu in his revolutionary study concerning the formation and modes of taste across distinct social classes – “Distinction: A Social Critique of the Judgement of Taste” (1984). The inner workings behind taste formation have been disputed invariably since the beginning of times, but only recently did the discourse on taste acquire a political, rather than merely ethical or aesthetic value. Bourdieu (1984) frames taste as a means of symbolic power – reproducing and perpetuating social inequality – endlessly deepening the schism between the masses and the elites.
Another dimension of taste is its binding ability to bring people belonging to a same sociological fraction – or stratum – together. Marx (1886) distinguishes social class as “class against capital” and “class for itself”. The former of these will serve as the main subject of the analysis performed in this project. The concerned definition of class is based on differences between the living conditions across distinct social strata, social class, in this sense, exists as a collectivity of individuals, who perceive the similarities between each other within a certain situation. Bourdieu (1984) extends this theory to capture how an individual’s status is conferred through the mode of presenting oneself to the world, one’s aesthetic dispositions. These strategies serve as a way to, not only distinguish oneself as a member of a certain social stratum (class), but also to distance oneself from lower groups. Aesthetic dispositions are imposed onto member’s of various social clases from their youngest days, so that the entire process of the acquisition of taste seems seamless and natural. The eye is a product of history reproduced by social upbringing and education.
Not all factors determining the belonging to a certain social class are as ephemeral and intangible as taste, according to Bourdieu (1986) class fractions are determined by a combination of social, economic and cultural capital. However, it is the symbolic goods, especially those regarded as the attributes of excellence that facilitate the strategies of distinction and inequality reproduction. These attributes are proclaimed excellent by the dominating class – amplifying and legitimizing social distances.
The aim of this study is to analyze whether such strategies of distinction still gain traction in today’s society, in which the access to cultural goods is no longer limited so as to favor the elites. The democratization of access that took place along digitization played a crucial role in dismantling the previous hierarchy of cultural goods and the status they conferred on their consumer. This study endeavors to find hidden relationships between social, economic and cultural capital, and hence cultural preferences and tastes, through an application of association rule mining algorithm.
This study will be based on the Survey of Public Participation in the Arts (SPPA) – a combination of two sets surveys conducted in 2017. The most recent SPPA module was conducted in 2022, however, seeing as during 2022, the USA was still overtaken by the COVID-19 pandemic, which strongly affected the arts, especially the public participation in it, this project will utilize the 2017 module. The SPPA is comprised of the Current Population Survey (CPS) from July 2017 and the SPPA supplement, which provides information about public participation in the arts within the United States. The CPS, however, administered monthly by the U.S. Census Bureau, collects labor force data about population aged 15 or older living in the USA – it provides information about the socio-economic status of a civilian – age, gender, race, marital status, educational attainment, income, occupation, etc..
In addition to the basic CPS questions, two randomly selected household members, aged 18 or older, are asked bout arts attendance, visited venues, literary reading and the motivation behind it. interviewers asked supplementary questions on public participation in the arts of two randomly selected household members aged 18 or older from about one-half of the sampled CPS households. This supplement contains questions about the respondent’s participation in various artistic activities over the last year. The 2017 version included additional five modules capturing other types of arts participation and leisure activities, such as: training and exposure, frequency of participation, musical and artistic preferencs, school-age socialization and the use of electronic devices in art consumption. The modules are separated in a following way:
load("/Users/gosia/Downloads/ICPSR_37138/DS0001/37138-0001-Data.rda")
sppa <- da37138.0001
head(sppa)## CASEID HRHHID HRMONTH HRYEAR4 HURESPLI HUFINAL
## 1 1 4.220117e+12 7 2017 NA (231) Unoccupied tent or trail
## 2 2 9.517675e+14 7 2017 NA (226) Vacant regular
## 3 3 7.100992e+14 7 2017 NA (226) Vacant regular
## 4 4 6.100091e+14 7 2017 NA (226) Vacant regular
## 5 5 1.108629e+11 7 2017 NA (226) Vacant regular
## 6 6 4.108131e+14 7 2017 1 (201) CAPI Compelete
## HULANGCODE HETENURE
## 1 (0) Unlabeled/not Spanish <NA>
## 2 (0) Unlabeled/not Spanish <NA>
## 3 (0) Unlabeled/not Spanish <NA>
## 4 (0) Unlabeled/not Spanish <NA>
## 5 (0) Unlabeled/not Spanish <NA>
## 6 (0) Unlabeled/not Spanish (1) Owned or being bought by a HH member
## HEHOUSUT HETELHHD HETELAVL
## 1 (10) Unoccupied tent site or trailer site <NA> <NA>
## 2 (01) House, apartment, flat <NA> <NA>
## 3 (01) House, apartment, flat <NA> <NA>
## 4 (01) House, apartment, flat <NA> <NA>
## 5 (01) House, apartment, flat <NA> <NA>
## 6 (01) House, apartment, flat (1) Yes <NA>
## HEPHONEO HEFAMINC HUTYPEA
## 1 (0) Undocumented Code <NA> <NA>
## 2 (0) Undocumented Code <NA> <NA>
## 3 (0) Undocumented Code <NA> <NA>
## 4 (0) Undocumented Code <NA> <NA>
## 5 (0) Undocumented Code <NA> <NA>
## 6 (1) Yes (12) 50,000 TO 59,999 <NA>
## HUTYPB HUTYPC HWHHWGT
## 1 (7) Unoccupied tent site or trailer site <NA> 0
## 2 (1) Vacant regular <NA> 0
## 3 (1) Vacant regular <NA> 0
## 4 (1) Vacant regular <NA> 0
## 5 (1) Vacant regular <NA> 0
## 6 <NA> <NA> 17691109
## HRINTSTA HRNUMHOU
## 1 (3) Type B non-interview 0
## 2 (3) Type B non-interview 0
## 3 (3) Type B non-interview 0
## 4 (3) Type B non-interview 0
## 5 (3) Type B non-interview 0
## 6 (1) Interview 2
## HRHTYPE HRMIS HUINTTYP HUPRSCNT
## 1 (00) Non-interview household 7 <NA> 0
## 2 (00) Non-interview household 2 <NA> 0
## 3 (00) Non-interview household 3 (1) Personal 1
## 4 (00) Non-interview household 3 (1) Personal 1
## 5 (00) Non-interview household 3 (1) Personal 1
## 6 (01) Husband/wife primary family (neither AF) 1 (1) Personal 1
## HRLONGLK HRHHID2 HWHHWTLN HUBUS HUBUSL1 HUBUSL2
## 1 (2) MIS 2-4 OR MIS 6-8 5011 0 <NA> NA NA
## 2 (2) MIS 2-4 OR MIS 6-8 7011 0 <NA> NA NA
## 3 (2) MIS 2-4 OR MIS 6-8 7011 0 <NA> NA NA
## 4 (2) MIS 2-4 OR MIS 6-8 7011 0 <NA> NA NA
## 5 (2) MIS 2-4 OR MIS 6-8 7011 0 <NA> NA NA
## 6 (0) MIS 1 OR REPLACEMENT HH (NO LINK) 7011 1 (2) No NA NA
## HUBUSL3 HUBUSL4 GEREG GEDIV GCFIP GCTCB GCTCO
## 1 NA NA (3) South (6) East South Central (01) AL 33860 0
## 2 NA NA (3) South (6) East South Central (01) AL 19300 3
## 3 NA NA (3) South (6) East South Central (01) AL 19300 3
## 4 NA NA (3) South (6) East South Central (01) AL 19300 3
## 5 NA NA (3) South (6) East South Central (01) AL 19300 3
## 6 NA NA (3) South (6) East South Central (01) AL 13820 0
## GTCBSAST GTMETSTA GTINDVPC GTCBSASZ GCTCS
## 1 (2) Balance (1) Metropolitan 0 (3) 250,000 - 499,999 0
## 2 (4) Not identified (1) Metropolitan 0 (2) 100,000 - 249,999 380
## 3 (4) Not identified (1) Metropolitan 0 (2) 100,000 - 249,999 380
## 4 (4) Not identified (1) Metropolitan 0 (2) 100,000 - 249,999 380
## 5 (4) Not identified (1) Metropolitan 0 (2) 100,000 - 249,999 380
## 6 (2) Balance (1) Metropolitan 0 (5) 1,000,000 - 2,499,999 0
## PERRP PEPARENT PRTAGE PRTFAGE
## 1 <NA> NA NA (0) No top code
## 2 <NA> NA NA (0) No top code
## 3 <NA> NA NA (0) No top code
## 4 <NA> NA NA (0) No top code
## 5 <NA> NA NA (0) No top code
## 6 (01) Reference person w/rels. NA 73 (0) No top code
## PEMARITL PESPOUSE PESEX PEAFEVER PEAFNOW
## 1 <NA> NA <NA> <NA> <NA>
## 2 <NA> NA <NA> <NA> <NA>
## 3 <NA> NA <NA> <NA> <NA>
## 4 <NA> NA <NA> <NA> <NA>
## 5 <NA> NA <NA> <NA> <NA>
## 6 (1) Married - spouse present 2 (2) Female (2) No (2) No
## PEEDUCA PTDTRACE PRDTHSP
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (34) 7th or 8th grade (01) White Only <NA>
## PUCHINHH PULINENO
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
## 6 (9) Change in demographic information 1
## PRFAMNUM PRFAMREL PRFAMTYP
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (01) Primary family member only (1) Reference person (1) Primary family
## PEHSPNON PRMARSTA
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 (2) Non-hispanic (1) Married, civilian spouse present
## PRPERTYP PENATVTY PEMNTVTY
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (2) Adult civilian household member (057) United States (057) United States
## PEFNTVTY PRCITSHP PRCITFLG
## 1 <NA> <NA> NA
## 2 <NA> <NA> NA
## 3 <NA> <NA> NA
## 4 <NA> <NA> NA
## 5 <NA> <NA> NA
## 6 (057) United States (1) Native, born in the United States 0
## PRINUYER PUSLFPRX PEMLR PUWK
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 (00) Not foreign born (1) Self (5) Not in labor force-retired (3) Retired
## PUBUS1 PUBUS2OT PUBUSCK1 PUBUSCK2 PUBUSCK3 PUBUSCK4 PURETOT PUDIS
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> (2) GOTO PURETCK1 <NA> <NA> <NA> <NA> <NA>
## PERET1 PUDIS1 PUDIS2 PUABSOT PULAY PEABSRSN PEABSPDO PEMJOT PEMJNUM PEHRUSL1
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 6 (2) No <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## PEHRUSL2 PEHRFTPT PEHRUSLT PEHRWANT PEHRRSN1 PEHRRSN2 PEHRRSN3 PUHROFF1
## 1 NA <NA> NA <NA> <NA> <NA> <NA> <NA>
## 2 NA <NA> NA <NA> <NA> <NA> <NA> <NA>
## 3 NA <NA> NA <NA> <NA> <NA> <NA> <NA>
## 4 NA <NA> NA <NA> <NA> <NA> <NA> <NA>
## 5 NA <NA> NA <NA> <NA> <NA> <NA> <NA>
## 6 NA <NA> NA <NA> <NA> <NA> <NA> <NA>
## PUHROFF2 PUHROT1 PUHROT2 PEHRACT1 PEHRACT2 PEHRACTT PEHRAVL PUHRCK1 PUHRCK2
## 1 NA <NA> NA NA NA NA <NA> <NA> <NA>
## 2 NA <NA> NA NA NA NA <NA> <NA> <NA>
## 3 NA <NA> NA NA NA NA <NA> <NA> <NA>
## 4 NA <NA> NA NA NA NA <NA> <NA> <NA>
## 5 NA <NA> NA NA NA NA <NA> <NA> <NA>
## 6 NA <NA> NA NA NA NA <NA> <NA> <NA>
## PUHRCK3 PUHRCK4 PUHRCK5 PUHRCK6 PUHRCK7 PUHRCK12 PULAYDT PULAY6M PELAYAVL
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PULAYAVR PELAYLK PELAYDUR PELAYFTO PULAYCK1 PULAYCK2 PULAYCK3 PULK PELKM1
## 1 <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> NA <NA> <NA> <NA> <NA> <NA> <NA>
## PULKM2 PULKM3 PULKM4 PULKM5 PULKM6 PULKDK1 PULKDK2 PULKDK3 PULKDK4 PULKDK5
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> NA
## PULKDK6 PULKPS1 PULKPS2 PULKPS3 PULKPS4 PULKPS5 PULKPS6 PELKAVL PULKAVR
## 1 NA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 NA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 NA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 NA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 NA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 NA <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PELKLL1O PELKLL2O PELKLWO PELKDUR PELKFTO PEDWWNTO PEDWRSN PEDWLKO PEDWWK
## 1 <NA> <NA> <NA> NA <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> NA <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> NA <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> NA <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> NA <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> NA <NA> <NA> <NA> <NA> <NA>
## PEDW4WK PEDWLKWK PEDWAVL PEDWAVR PUDWCK1 PUDWCK2 PUDWCK3 PUDWCK4 PUDWCK5
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEJHWKO PUJHDP1O PEJHRSN PEJHWANT PUJHCK1 PUJHCK2 PRABSREA
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PRCIVLF PRDISC PREMPHRS
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (2) Not in civilian labor force <NA> (00) Unemployed and NILF
## PREMPNOT PREXPLF PRFTLF PRHRUSL PRJOBSEA PRPTHRS
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 (4) Not in labor force (NILF)-other <NA> <NA> <NA> <NA> <NA>
## PRPTREA PRUNEDUR PRUNTYPE PRWKSCH PRWKSTAT
## 1 <NA> NA <NA> <NA> <NA>
## 2 <NA> NA <NA> <NA> <NA>
## 3 <NA> NA <NA> <NA> <NA>
## 4 <NA> NA <NA> <NA> <NA>
## 5 <NA> NA <NA> <NA> <NA>
## 6 <NA> NA <NA> (0) Not in labor force (01) Not in labor force
## PRWNTJOB PUJHCK3 PUJHCK4 PUJHCK5 PUIODP1 PUIODP2 PUIODP3
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 (2) Other not in labor force <NA> <NA> <NA> <NA> <NA> <NA>
## PEIO1COW PUIO1MFG PEIO2COW PUIO2MFG PUIOCK1 PUIOCK2 PUIOCK3
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PRIOELG PRAGNA PRCOW1 PRCOW2 PRCOWPG PRDTCOW1 PRDTCOW2
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 (0) Not eligible for edit <NA> <NA> <NA> <NA> <NA> <NA>
## PRDTIND1 PRDTIND2 PRDTOCC1 PRDTOCC2 PREMP PRMJIND1 PRMJIND2 PRMJOCC1 PRMJOCC2
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PRMJOCGR PRNAGPWS PRNAGWS PRSJMJ PRERELG PEERNUOT PEERNPER
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> (0) Not eligible for edit <NA> <NA>
## PEERNRT PEERNHRY PUERNH1C PEERNH2 PEERNH1O PRERNHLY PTHR PEERNHRO
## 1 <NA> <NA> NA NA NA NA (0) Not topcoded NA
## 2 <NA> <NA> NA NA NA NA (0) Not topcoded NA
## 3 <NA> <NA> NA NA NA NA (0) Not topcoded NA
## 4 <NA> <NA> NA NA NA NA (0) Not topcoded NA
## 5 <NA> <NA> NA NA NA NA (0) Not topcoded NA
## 6 <NA> <NA> NA NA NA NA (0) Not topcoded NA
## PRERNWA PTWK PEERN PUERN2 PTOT PEERNWKP PEERNLAB
## 1 NA (0) Not topcoded NA NA (0) Not topcoded NA <NA>
## 2 NA (0) Not topcoded NA NA (0) Not topcoded NA <NA>
## 3 NA (0) Not topcoded NA NA (0) Not topcoded NA <NA>
## 4 NA (0) Not topcoded NA NA (0) Not topcoded NA <NA>
## 5 NA (0) Not topcoded NA NA (0) Not topcoded NA <NA>
## 6 NA (0) Not topcoded NA NA (0) Not topcoded NA <NA>
## PEERNCOV PENLFJH PENLFRET PENLFACT PUNLFCK1 PUNLFCK2
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> (2) All others goto LBFR-END
## PESCHENR PESCHFT PESCHLVL PRNLFSCH PWFMWGT PWLGWGT PWORWGT PWSSWGT PWVETWGT
## 1 <NA> <NA> <NA> <NA> 0 0 0 0 0
## 2 <NA> <NA> <NA> <NA> 0 0 0 0 0
## 3 <NA> <NA> <NA> <NA> 0 0 0 0 0
## 4 <NA> <NA> <NA> <NA> 0 0 0 0 0
## 5 <NA> <NA> <NA> <NA> 0 0 0 0 0
## 6 <NA> <NA> <NA> <NA> 17691109 0 0 17691109 17888002
## PRCHLD PRNMCHLD PXPDEMP1
## 1 <NA> NA <NA>
## 2 <NA> NA <NA>
## 3 <NA> NA <NA>
## 4 <NA> NA <NA>
## 5 <NA> NA <NA>
## 6 (00) No own children under 18 years of age 0 (00) Value - no change
## PRWERNAL PRHERNAL HXTENURE HXHOUSUT
## 1 <NA> <NA> (01) Blank - no change (00) Value - no change
## 2 <NA> <NA> (01) Blank - no change (00) Value - no change
## 3 <NA> <NA> (01) Blank - no change (00) Value - no change
## 4 <NA> <NA> (01) Blank - no change (00) Value - no change
## 5 <NA> <NA> (01) Blank - no change (00) Value - no change
## 6 <NA> <NA> (00) Value - no change (00) Value - no change
## HXTELHHD HXTELAVL HXPHONEO PXINUSYR
## 1 (01) Blank - no change (01) Blank - no change (00) Value - no change <NA>
## 2 (01) Blank - no change (01) Blank - no change (00) Value - no change <NA>
## 3 (01) Blank - no change (01) Blank - no change (00) Value - no change <NA>
## 4 (01) Blank - no change (01) Blank - no change (00) Value - no change <NA>
## 5 (01) Blank - no change (01) Blank - no change (00) Value - no change <NA>
## 6 (00) Value - no change (01) Blank - no change (00) Value - no change <NA>
## PXRRP PXPARENT PXAGE
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (00) Value - no change (50) Value to blank (00) Value - no change
## PXMARITL PXSPOUSE PXSEX
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change
## PXAFWHN1 PXAFNOW PXEDUCA
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (01) Blank - no change (00) Value - no change (00) Value - no change
## PXRACE1 PXNATVTY PXMNTVTY PXFNTVTY PXNMEMP1
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 (00) Value - no change <NA> <NA> <NA> (00) Value - no change
## PXHSPNON PXMLR PXRET1 PXABSRSN
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change <NA>
## PXABSPDO PXMJOT PXMJNUM PXHRUSL1 PXHRUSL2 PXHRFTPT PXHRUSLT
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> (01) Blank - no change <NA>
## PXHRWANT PXHRRSN1 PXHRRSN2 PXHRACT1 PXHRACT2 PXHRACTT PXHRRSN3 PXHRAVL
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PXLAYAVL PXLAYLK PXLAYDUR PXLAYFTO PXLKM1 PXLKAVL PXLKLL1O PXLKLL2O PXLKLWO
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PXLKDUR PXLKFTO PXDWWNTO PXDWRSN
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 <NA> <NA> (01) Blank - no change (01) Blank - no change
## PXDWLKO PXDWWK PXDW4WK
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (01) Blank - no change (01) Blank - no change (01) Blank - no change
## PXDWLKWK PXDWAVL PXDWAVR
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (01) Blank - no change (01) Blank - no change (01) Blank - no change
## PXJHWKO PXJHRSN PXJHWANT PXIO1COW
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 (01) Blank - no change (01) Blank - no change (01) Blank - no change <NA>
## PXIO1ICD PXIO1OCD PXIO2COW PXIO2ICD PXIO2OCD PXERNUOT PXERNPER PXERNH1O
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PXERNHRO PXERN PXPDEMP2 PXNMEMP2 PXERNWKP PXERNRT
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> (00) Value - no change (00) Value - no change <NA> <NA>
## PXERNHRY PXERNH2 PXERNLAB PXERNCOV PXNLFJH
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> (01) Blank - no change
## PXNLFRET PXNLFACT PXSCHENR PXSCHFT PXSCHLVL
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 (01) Blank - no change (01) Blank - no change <NA> <NA> <NA>
## QSTNUM OCCURNUM PEDIPGED PEHGCOMP PECYC PXDIPGED
## 1 1 1 <NA> <NA> <NA> <NA>
## 2 2 1 <NA> <NA> <NA> <NA>
## 3 3 1 <NA> <NA> <NA> <NA>
## 4 4 1 <NA> <NA> <NA> <NA>
## 5 5 1 <NA> <NA> <NA> <NA>
## 6 6 1 <NA> <NA> <NA> (01) Blank - no change
## PXHGCOMP PXCYC PWCMPWGT PEIO1ICD PEIO1OCD
## 1 <NA> <NA> 0 NA NA
## 2 <NA> <NA> 0 NA NA
## 3 <NA> <NA> 0 NA NA
## 4 <NA> <NA> 0 NA NA
## 5 <NA> <NA> 0 NA NA
## 6 (01) Blank - no change (01) Blank - no change 17928473 NA NA
## PEIO2ICD PEIO2OCD PRIMIND1 PRIMIND2 PEAFWHN1 PEAFWHN2 PEAFWHN3 PEAFWHN4
## 1 NA NA <NA> <NA> <NA> <NA> <NA> <NA>
## 2 NA NA <NA> <NA> <NA> <NA> <NA> <NA>
## 3 NA NA <NA> <NA> <NA> <NA> <NA> <NA>
## 4 NA NA <NA> <NA> <NA> <NA> <NA> <NA>
## 5 NA NA <NA> <NA> <NA> <NA> <NA> <NA>
## 6 NA NA <NA> <NA> <NA> <NA> <NA> <NA>
## PXAFEVER PELNDAD PELNMOM PEDADTYP PEMOMTYP PECOHAB
## 1 <NA> NA NA <NA> <NA> NA
## 2 <NA> NA NA <NA> <NA> NA
## 3 <NA> NA NA <NA> <NA> NA
## 4 <NA> NA NA <NA> <NA> NA
## 5 <NA> NA NA <NA> <NA> NA
## 6 (00) Value - no change NA NA <NA> <NA> NA
## PXLNDAD PXLNMOM PXDADTYP
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (50) Value to blank (50) Value to blank (01) Blank - no change
## PXMOMTYP PXCOHAB PEDISEAR PEDISEYE PEDISREM
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 (01) Blank - no change (01) Blank - no change (2) No (2) No (2) No
## PEDISPHY PEDISDRS PEDISOUT PRDISFLG PXDISEAR
## 1 <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA>
## 6 (2) No (2) No (2) No (2) No (00) Value - no change
## PXDISEYE PXDISREM PXDISPHY
## 1 <NA> <NA> <NA>
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change
## PXDISDRS PXDISOUT HXFAMINC
## 1 <NA> <NA> (01) Blank - no change
## 2 <NA> <NA> (01) Blank - no change
## 3 <NA> <NA> (01) Blank - no change
## 4 <NA> <NA> (01) Blank - no change
## 5 <NA> <NA> (01) Blank - no change
## 6 (00) Value - no change (00) Value - no change (43) Refused to allocated value
## PRDASIAN PEPDEMP1 PTNMEMP1 PEPDEMP2 PTNMEMP2 PECERT1 PECERT2 PECERT3
## 1 <NA> <NA> NA <NA> NA <NA> <NA> <NA>
## 2 <NA> <NA> NA <NA> NA <NA> <NA> <NA>
## 3 <NA> <NA> NA <NA> NA <NA> <NA> <NA>
## 4 <NA> <NA> NA <NA> NA <NA> <NA> <NA>
## 5 <NA> <NA> NA <NA> NA <NA> <NA> <NA>
## 6 <NA> <NA> NA <NA> NA (2) No <NA> <NA>
## PXCERT1 PXCERT2 PXCERT3 PEC1Q1A
## 1 <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change <NA>
## PTC1Q1B PEC1Q2A PTC1Q2B PEC1Q3A PTC1Q3B PEC1Q4A PTC1Q4B PEC1Q5A PTC1Q5B
## 1 NA <NA> NA <NA> NA <NA> NA <NA> NA
## 2 NA <NA> NA <NA> NA <NA> NA <NA> NA
## 3 NA <NA> NA <NA> NA <NA> NA <NA> NA
## 4 NA <NA> NA <NA> NA <NA> NA <NA> NA
## 5 NA <NA> NA <NA> NA <NA> NA <NA> NA
## 6 NA <NA> NA <NA> NA <NA> NA <NA> NA
## PEC1Q6A PTC1Q6B PEC1Q7A PTC1Q7B PEC1Q8A PTC1Q8B PEC1Q9A PEC1Q10A PTC1Q10B
## 1 <NA> NA <NA> NA <NA> NA <NA> <NA> NA
## 2 <NA> NA <NA> NA <NA> NA <NA> <NA> NA
## 3 <NA> NA <NA> NA <NA> NA <NA> <NA> NA
## 4 <NA> NA <NA> NA <NA> NA <NA> <NA> NA
## 5 <NA> NA <NA> NA <NA> NA <NA> <NA> NA
## 6 <NA> NA <NA> NA <NA> NA <NA> <NA> NA
## PEC1Q11A PEC1Q12A PEC1Q13A PEC1Q14A PTC1Q14B PEC1Q15A PEC1Q15B PEC1Q15C
## 1 <NA> <NA> <NA> <NA> NA <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> NA <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> NA <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> NA <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> NA <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> NA <NA> <NA> <NA>
## PEC1Q16A PEC1Q16B PEC1Q16C PEC1Q16D PEC1Q16E PEC1Q17A PEC1Q18A PEC2Q1A
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEC2Q1B PEC2Q1C PEC2Q1D PEC2Q1E PEC2Q1F PEC2Q1G PEC2Q2A PEC2Q2B PEC2Q2C
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEC2Q2D PEC2Q2E PEC2Q2F PEC2Q3A PEC2Q3B PEC2Q3C PEC2Q3D PEC2Q3E PEC2Q3F
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEC2Q3G PEC2Q3H PEC2Q3I PEC2Q4A PEC2Q4B PEC2Q4C PEC2Q4D PEC2Q4E PEC2Q4F
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEC2Q4G PEC2Q4H PEC2Q4I PEMAQ1A PEMAQ1B PEMAQ1C PEMAQ1D PEMAQ1E PEMAQ1F
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMAQ1G PEMAQ1H PEMAQ1I PEMAQ1J PEMAQ2A PEMAQ2B PEMAQ2C PEMAQ2D PEMAQ2E
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMAQ2F PEMAQ2G PEMAQ2H PEMAQ2I PEMAQ2J PEMAQ3A PEMAQ3B PEMAQ4A PEMAQ4B
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMAQ4C PEMAQ4D PEMBQ1A PEMBQ1B PEMBQ1C PEMBQ1AA PEMBQ1BB PEMBQ1CC PEMBQ1D
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMBQ1E PEMBQ1DD PEMBQ1F PEMBQ2A PEMBQ2B PEMBQ2C PEMBQ2CC PEMBQ2D PEMBQ2E
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMBQ3A PEMBQ3B PEMBQ3C PEMBQ3D PEMBQ3E PEMBQ3F PEMBQ3G PEMBQ4A PEMBQ4B
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMBQ4C PEMBQ4D PEMBQ4E PEMBQ4F PEMBQ4G PEMBQ5 PEMBQ6 PEMCQ1A PEMCQ1B PEMCQ1C
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMCQ1D PEMCQ1E PEMCQ1F PEMCQ1G PEMCQ1H PEMCQ1I PEMCQ2A PEMCQ2B PEMCQ2C
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMCQ2D PEMCQ2E PEMCQ2F PEMCQ2G PEMCQ2H PEMCQ2I PEMCQ3A PEMCQ3B PEMCQ3C
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMCQ3D PEMCQ3E PEMCQ3F PEMCQ3G PEMCQ4A PEMCQ4B PEMCQ4C PEMCQ4D PEMCQ4E
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMCQ4F PEMCQ4G PEMCQ5 PEMCQ6 PEMCQ7 PEMCQ8 PEMCQ9A PEMCQ9B PEMCQ9C PEMCQ9D
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMCQ9E PEMCQ9F PEMCQ9G PEMCQ10 PEMCQ11 PEMDQ1A PEMDQ1B PEMDQ1C PEMDQ1D
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMDQ1E PEMDQ1F PEMDQ1G PEMDQ1H PEMDQ1I PEMDQ1J PEMDQ1K PEMDQ2F PEMDQ2G
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMDQ2H PEMDQ2I PEMDQ2J PEMDQ2K PEMDQ3 PEMDQ4 PEMDQ5 PEMDQ6 PEMEQ1A PEMEQ1B
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMEQ1C PEMEQ1D PEMEQ1E PEMEQ1F PEMEQ1G PEMEQ2A PEMEQ2B PEMEQ2C PEMEQ2D
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMEQ2E PEMEQ2F PEMEQ2G PEMEQ3A PEMEQ3B PEMEQ3C PEMEQ3D PEMEQ3E PEMEQ3F
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMEQ3G PEMEQ3AA PEMEQ3BB PEMEQ3CC PEMEQ3DD PEMEQ3EE PEMEQ3FF PEMEQ3GG
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMEQ4A PEMEQ4B PEMEQ4C PEMEQ4D PEMEQ4E PEMEQ4F PEMEQ4G PEMEQ5 PEMEQ6A
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMEQ6B PEMEQ6C PEMEQ7A PEMEQ7B PEMEQ7C PEMEQ7D PEMEQ8 PEMEQ9 PEMEQ10 PEMEQ11
## 1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## PEMEQ12 PUNXTPR3 PRINTFLG HENELGSCR PESPELIG PWSUPWGT AGEGROUP2 EDUGROUP
## 1 <NA> <NA> <NA> NA NA 0 0 0
## 2 <NA> <NA> <NA> NA NA 0 0 0
## 3 <NA> <NA> <NA> NA NA 0 0 0
## 4 <NA> <NA> <NA> NA NA 0 0 0
## 5 <NA> <NA> <NA> NA NA 0 0 0
## 6 <NA> <NA> <NA> NA NA 0 6 1
## RACEHIS2
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 1
The variables used for this analysis are described in the table below. The choice behind the variable selection is based on Bourdieu’s Distinction, the chosen variables are devised to capture the specific characteristic’s of respondent’s taste in cultural goods and their socio-economic background (CPS variables).
vars_to_keep <- c("PRPERTYP", # person type
"HRINTSTA", # survey type - will filter for only fully completed: code 1
# social class
# cultural capital
"PEEDUCA", # education level
# economic capital
"HEFAMINC", # family income
"PRMJOCC1", # occupation
"HETENURE", # housing tenure - whether the apartment is rented or owned
# demographic variables
"GTMETSTA", # metropolitan status
"PRTAGE", # age
"PESEX", # gender
"PTDTRACE", # race
# highbrow culture
"PEC1Q3A", # classical music
"PEC1Q4A", # opera
"PEC1Q7A", # ballet
"PEC1Q10A", # art museum
"PEC1Q1A", # jazz
"PEC1Q15B", # poetry
# middlebrow and popular culture
"PEC1Q2A", # latin, spanish or salsa music
"PEC1Q8A", # live dance (non-ballet)
"PEC1Q9A", # other live performances
"PEC1Q11A", # crafts fair or visual arts festival
"PEC1Q12A", # festival with artists performing
"PEC1Q14A", # any books
"PEC1Q17A", # audiobooks
"PEC1Q13A", # cultural touristry - sightseeing and parks
"PEC1Q5A", # musical play
"PEC1Q6A" # non musical play
)
# selecting the variables
sppa1 <- sppa %>%
dplyr::select(all_of(vars_to_keep))
# table - variable description
# reference table with codes and categories
vars_structure <- tribble(
~Variable_Code, ~Category,
"PRPERTYP", "Filter",
"HRINTSTA", "Filter",
"PEEDUCA", "Cultural Capital (Education)",
"HEFAMINC", "Economic Capital (Income)",
"PRMJOCC1", "Social Class (Occupation)",
"HETENURE", "Economic Capital (Housing)",
"GTMETSTA", "Demographics",
"PRTAGE", "Demographics",
"PESEX", "Demographics",
"PTDTRACE", "Demographics",
"PEC1Q1A", "Highbrow Culture",
"PEC1Q3A", "Highbrow Culture",
"PEC1Q4A", "Highbrow Culture",
"PEC1Q7A", "Highbrow Culture",
"PEC1Q10A", "Highbrow Culture",
"PEC1Q15B", "Highbrow Culture",
"PEC1Q2A", "Middlebrow/Popular Culture",
"PEC1Q5A", "Middlebrow/Popular Culture",
"PEC1Q6A", "Middlebrow/Popular Culture",
"PEC1Q8A", "Middlebrow/Popular Culture",
"PEC1Q11A", "Middlebrow/Popular Culture",
"PEC1Q12A", "Middlebrow/Popular Culture",
"PEC1Q13A", "Middlebrow/Popular Culture",
"PEC1Q14A", "Middlebrow/Popular Culture",
"PEC1Q17A", "Middlebrow/Popular Culture")
# extracting the official descriptions (Labels) from the loaded dataset
raw_labels <- attr(sppa1, "variable.labels")
variable_table <- vars_structure %>%
rowwise() %>%
mutate(
Question_Description = ifelse(
Variable_Code %in% names(raw_labels),
raw_labels[[Variable_Code]]
)
) %>%
ungroup()
# table
kable(variable_table,
col.names = c("Variable Code", "Category", "Question Description"),
caption = "SPPA 2017: Selected Variables",
align = "lll") | Variable Code | Category | Question Description |
|---|---|---|
| PRPERTYP | Filter | Type of person record recode |
| HRINTSTA | Filter | Interview status |
| PEEDUCA | Cultural Capital (Education) | Highest level of school completed or degree received |
| HEFAMINC | Economic Capital (Income) | Family income |
| PRMJOCC1 | Social Class (Occupation) | Major occupation recode - job 1 |
| HETENURE | Economic Capital (Housing) | Are your living quarters… |
| GTMETSTA | Demographics | Metropolitan Status |
| PRTAGE | Demographics | Person’s age |
| PESEX | Demographics | Sex |
| PTDTRACE | Demographics | Race |
| PEC1Q1A | Highbrow Culture | Attended a live jazz performance in the last 12 months |
| PEC1Q3A | Highbrow Culture | Attended a live classical music performance in the last 12 months |
| PEC1Q4A | Highbrow Culture | Attended a live opera performance in the last 12 months |
| PEC1Q7A | Highbrow Culture | Attended a live ballet performance in the last 12 months |
| PEC1Q10A | Highbrow Culture | Visited art museum or gallery last 12 months |
| PEC1Q15B | Highbrow Culture | Read any poetry the last 12 months |
| PEC1Q2A | Middlebrow/Popular Culture | Attended a live Latin, Spanish, or salsa music performance in the last 12 months |
| PEC1Q5A | Middlebrow/Popular Culture | Attended a live musical stage play in the last 12 months |
| PEC1Q6A | Middlebrow/Popular Culture | Attended a live nonmusical stage play in the last 12 months |
| PEC1Q8A | Middlebrow/Popular Culture | Attended a live dance (non-ballet) performance in the last 12 months |
| PEC1Q11A | Middlebrow/Popular Culture | Visited a crafts fair or visual arts festival last 12 months |
| PEC1Q12A | Middlebrow/Popular Culture | Visited an outdoor festival that featured performing artists last 12 months |
| PEC1Q13A | Middlebrow/Popular Culture | Visited a historic park or monument or tour a building/neighborhood for historic design last 12 months |
| PEC1Q14A | Middlebrow/Popular Culture | Read any books during the last 12 months |
| PEC1Q17A | Middlebrow/Popular Culture | Listened to any audiobooks the last 12 months |
First, the dataset will be filtered for the concerned variables. A quick look into the data structure illustrates that the variables are factors with multiple levels, this section deals with discretizing the variables (converting them into 0/1). For that purpose multi-level factors will need to be divided into categorical subgroups and each level of those – discretized.
## 'data.frame': 147629 obs. of 26 variables:
## $ PRPERTYP: Factor w/ 3 levels "(1) Child household member",..: NA NA NA NA NA 2 2 2 2 2 ...
## $ HRINTSTA: Factor w/ 4 levels "(1) Interview",..: 3 3 3 3 3 1 1 1 1 1 ...
## $ PEEDUCA : Factor w/ 16 levels "(31) Less than 1st grade",..: NA NA NA NA NA 4 6 4 9 10 ...
## $ HEFAMINC: Factor w/ 16 levels "(01) Less than $5,000",..: NA NA NA NA NA 12 12 6 7 13 ...
## $ PRMJOCC1: Factor w/ 11 levels "(01) Management, business, and financial occupations",..: NA NA NA NA NA NA NA NA NA NA ...
## $ HETENURE: Factor w/ 3 levels "(1) Owned or being bought by a HH member",..: NA NA NA NA NA 1 1 1 1 2 ...
## $ GTMETSTA: Factor w/ 3 levels "(1) Metropolitan",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ PRTAGE : num NA NA NA NA NA 73 85 72 70 22 ...
## ..- attr(*, "value.labels")= Named num(0)
## .. ..- attr(*, "names")= chr(0)
## $ PESEX : Factor w/ 2 levels "(1) Male","(2) Female": NA NA NA NA NA 2 1 2 2 2 ...
## $ PTDTRACE: Factor w/ 26 levels "(01) White Only",..: NA NA NA NA NA 1 1 1 1 1 ...
## $ PEC1Q3A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q4A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q7A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q10A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q1A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q15B: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q2A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q8A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q9A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q11A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q12A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q14A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q17A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q13A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q5A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## $ PEC1Q6A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
## - attr(*, "variable.labels")= Named chr [1:639] "ICPSR Case Identification Number" "Household Identifier" "Month of Interview" "Year of Interview" ...
## ..- attr(*, "names")= chr [1:639] "CASEID" "HRHHID" "HRMONTH" "HRYEAR4" ...
## - attr(*, "codepage")= int 28591
Seeing as the number corresponding to a category level is enclosed in a bracket at the beginning of each record – these will now be extracted for further analysis. The variables of factor class will be encoded according to the SPPA codebook.
sppa_converted <- sppa1 %>%
# text conversion and filtering for complete surveys of adults
mutate(across(everything(), as.character)) %>%
filter(
str_detect(PRPERTYP, "\\(2\\)"), # person type 2 - adult
str_detect(HRINTSTA, "\\(1\\)"), # survey type 1 - interview
as.numeric(PRTAGE) >= 18) %>%
mutate(
# socio-economic variables
# education
Education = case_when(
str_detect(PEEDUCA, "\\(3[1-8]\\)") ~ "No_Diploma", # < HS
str_detect(PEEDUCA, "\\(39\\)") ~ "HighSchool", # HS
str_detect(PEEDUCA, "\\(4[0-2]\\)") ~ "SomeCollege", # college, no degree
str_detect(PEEDUCA, "\\(43\\)") ~ "Bachelor", # bachelor
str_detect(PEEDUCA, "\\(44\\)") ~ "Master", # master
str_detect(PEEDUCA, "\\(4[5-6]\\)") ~ "PhD_Prof", # phd/prof
TRUE ~ NA_character_),
# family income
Income = case_when(
str_detect(HEFAMINC, "\\(0[1-7]\\)") ~ "Income_Low", # < $25k
str_detect(HEFAMINC, "\\(0[8-9]\\)|\\(1[0-1]\\)") ~ "Income_LowerMid", # $25k-$50k
str_detect(HEFAMINC, "\\(1[2-3]\\)") ~ "Income_Middle", # $50k-$75k
str_detect(HEFAMINC, "\\(1[4-5]\\)") ~ "Income_UpperMid", # $75k-$150k
str_detect(HEFAMINC, "\\(16\\)") ~ "Income_High", # > $150k
TRUE ~ NA_character_),
# occupation
Job = case_when(
str_detect(PRMJOCC1, "\\(01\\)") ~ "Management",
str_detect(PRMJOCC1, "\\(02\\)") ~ "Professional",
str_detect(PRMJOCC1, "\\(03\\)|\\(05\\)") ~ "Service_and_Administartion",
str_detect(PRMJOCC1, "\\(04\\)") ~ "Sales",
str_detect(PRMJOCC1, "\\(0[6-9]\\)|\\(1[0-1]\\)") ~ "Manual",
is.na(PRMJOCC1) ~ "Unemployed",
TRUE ~ "Job_Other"),
# gender
Sex = if_else(str_detect(PESEX, "\\(2\\)"), "Female", "Male"),
# race
Race = case_when(
str_detect(PTDTRACE, "\\(01\\)") ~ "White",
str_detect(PTDTRACE, "\\(02\\)") ~ "Black",
str_detect(PTDTRACE, "\\(04\\)") ~ "Asian",
str_detect(PTDTRACE, "\\(03\\)") ~ "NativeAm",
TRUE ~ "Mixed"),
# housing and location
Housing = if_else(str_detect(HETENURE, "\\(1\\)"), "House_Owner", "House_Renter"),
Location = if_else(str_detect(GTMETSTA, "\\(1\\)"), "Metropolitan", "Rural"),
# splitting numeric age into groups
Age_Group = cut(as.numeric(PRTAGE),
breaks = c(17, 29, 49, 64, 100),
labels = c("18-29", "30-49", "50-64", "65+")),
# cultural variables
Opera = str_detect(PEC1Q4A, "\\(1\\)"),
Classical_Music = str_detect(PEC1Q3A, "\\(1\\)"),
Ballet = str_detect(PEC1Q7A, "\\(1\\)"),
Art_Museum = str_detect(PEC1Q10A, "\\(1\\)"),
Jazz = str_detect(PEC1Q1A, "\\(1\\)"),
Musical = str_detect(PEC1Q5A, "\\(1\\)"),
Theater = str_detect(PEC1Q6A, "\\(1\\)"),
Sightseeing = str_detect(PEC1Q13A, "\\(1\\)"),
Books = str_detect(PEC1Q14A, "\\(1\\)"),
Poetry = str_detect(PEC1Q15B, "\\(1\\)"),
Latin_Music = str_detect(PEC1Q2A, "\\(1\\)"),
Live_Dance = str_detect(PEC1Q8A, "\\(1\\)"),
Crafts_Fair = str_detect(PEC1Q11A, "\\(1\\)"),
Outdoor_Festival= str_detect(PEC1Q12A, "\\(1\\)"),
Audiobook = str_detect(PEC1Q17A, "\\(1\\)"),
) %>%
# selection and cleaning
dplyr::select(Education, Income, Job, Race, Sex, Housing, Location, Age_Group,
Opera, Classical_Music, Ballet, Art_Museum,Jazz, Musical, Theater, Sightseeing,
Books, Poetry, Latin_Music, Live_Dance, Crafts_Fair, Outdoor_Festival, Audiobook)
glimpse(sppa_converted)## Rows: 97,201
## Columns: 23
## $ Education <chr> "No_Diploma", "No_Diploma", "No_Diploma", "HighSchool…
## $ Income <chr> "Income_Middle", "Income_Middle", "Income_Low", "Inco…
## $ Job <chr> "Unemployed", "Unemployed", "Unemployed", "Unemployed…
## $ Race <chr> "White", "White", "White", "White", "White", "White",…
## $ Sex <chr> "Female", "Male", "Female", "Female", "Female", "Male…
## $ Housing <chr> "House_Owner", "House_Owner", "House_Owner", "House_O…
## $ Location <chr> "Metropolitan", "Metropolitan", "Metropolitan", "Metr…
## $ Age_Group <fct> 65+, 65+, 65+, 65+, 18-29, 18-29, 30-49, 50-64, 30-49…
## $ Opera <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Classical_Music <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Ballet <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Art_Museum <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Jazz <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Musical <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Theater <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Sightseeing <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Books <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Poetry <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Latin_Music <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Live_Dance <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Crafts_Fair <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Outdoor_Festival <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Audiobook <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
A quick glimpse into the converted dataset reveals a myriad of missing values. The percentages of missing values will be analyzed to work out the best scenario of variable or observation elimination.
# missing values analysis
missings <- sppa_converted %>%
summarize(across(everything(), ~ sum(is.na(.)))) %>%
pivot_longer(cols = everything(),
names_to = "variable",
values_to = "missings") %>%
mutate(missings_share = missings/nrow(sppa_converted)) %>%
arrange(desc(missings_share))
print(missings)## # A tibble: 23 × 3
## variable missings missings_share
## <chr> <int> <dbl>
## 1 Audiobook 88609 0.912
## 2 Poetry 88602 0.912
## 3 Books 88567 0.911
## 4 Sightseeing 88521 0.911
## 5 Outdoor_Festival 88503 0.911
## 6 Crafts_Fair 88497 0.910
## 7 Art_Museum 88490 0.910
## 8 Live_Dance 88465 0.910
## 9 Ballet 88454 0.910
## 10 Theater 88454 0.910
## # ℹ 13 more rows
All of the variables from the CPS survey (concerning demographic and socio-economic data) contain no missing values, however all of the variables crucial for evaluating the respondent’s participation in art contain over 90% missing values. While this might seem alarming, the methodology of SPPA survey conduction explains the cause of this issue. Namely, the SPPA supplement to CPS, as described previously, includes five modules designed to capture other types of arts participation – the respondents are randomly assigned to one fo the two core modules and, subsequently, to two of the five additional modules – hence – the missing values. To alleviate this issue, I shall focus on the sample of respondents with no missing values in the module A, from which I derived most of the variables. The only variable from the module B is Poetry – the number of observations that it shares with Module A variables is displayed in the Venn diagram below.
venn_diag <- list(
Core_A = which(!is.na(sppa1$PEC1Q4A)),
Poetry_B = which(!is.na(sppa1$PEC1Q15B)))
ggVennDiagram(venn_diag, set_color = "mediumpurple4") +
scale_fill_gradient(low = "lavender", high = "mediumpurple3")The Venn diagram illustrates that by retaining the only variable from the B module, a minimal number of observations is lost. The rest of the variables, the coverage of which varies as well, all belong to the module A, so the missing observations will be eliminated row-wise, as there are no other methodological basis for the elimination of the variables.
Subsequently, in order to prepare the database for the application of association rules – the levels of factor variables will be divided into individual columns and assigned a binary value: 1 if true (the observation is characterized by the characteristic described in the given column), 0 otherwise. In other words, this section proceeds with the discretization of the variables. For this purpose, te package arules will be utilized. The function as() allows for an automatic transformation into a transactional data matrix.
Prior to the application of association rule mining algorithms, it is essential to understand and further inspect the nature and structure of the transactional data and to ensure its proper transformation and discretization. This section serves that purpose.
The final dataset consists of 8,498 observations – each corresponding to a distinct adult respondent of the CPS supplemented with SPPA surveys conducted by the United States Bureau fo Census in collaboration with the National Endowment for the Arts in July 2017, which is the last available year of the SPPA unaffected by the CO-VID 19 pandemic. The filtered and transformed dataset provides information on the socio-economic and demographic background of the respondents – contained in 8 variables, subdivided into factor-levels – as well as the information concerning the respondent’s participation in public art and other forms of leisure – encapsulated in 15 binary variables corresponding to the form of participation. The discretization of the dataset resulted in the creation of 47 columns capturing socio-economic and cultural capital of the respondents.
Application of the summary function to the discretized dataset yields further information concerning data structure. It reveals that the density of data matrix is 0.224, which compared to the standard applications of association rules mining, such as basket analysis, indicates a relatively dense dataset. High density suggests that applying association rule learning algorithms may result in a high number of rules. However, seeing as the most frequent items are dominated by demographic variables, a large number of trivial rules are to be expected. The varying length of the transactions reveals that the largest share of the observations has a length of 8 items, which suggests that 2,422 people did not participate in public arts at all. While this finding needs to be further inspected, it does not prove the analysis conducted in this project redundant, because the lack of participation is in itself an interesting phenomenon potentially dictated by the socio-economic status of the individual. The most frequent items in this dataset are Race=White with 7,043 occurences, Location=Metropolitan, Housing=House_Owner and the variable Books, which, combined, may suggest that a large amount of the individuals in the dataset belong to the wealthier class.
## labels variables levels
## 1 Education=Bachelor Education Bachelor
## 2 Education=HighSchool Education HighSchool
## 3 Education=Master Education Master
## 4 Education=No_Diploma Education No_Diploma
## 5 Education=PhD_Prof Education PhD_Prof
## 6 Education=SomeCollege Education SomeCollege
## 7 Income=Income_High Income Income_High
## 8 Income=Income_Low Income Income_Low
## 9 Income=Income_LowerMid Income Income_LowerMid
## 10 Income=Income_Middle Income Income_Middle
## 11 Income=Income_UpperMid Income Income_UpperMid
## 12 Job=Management Job Management
## 13 Job=Manual Job Manual
## 14 Job=Professional Job Professional
## 15 Job=Sales Job Sales
## 16 Job=Service_and_Administartion Job Service_and_Administartion
## 17 Job=Unemployed Job Unemployed
## 18 Race=Asian Race Asian
## 19 Race=Black Race Black
## 20 Race=Mixed Race Mixed
## 21 Race=NativeAm Race NativeAm
## 22 Race=White Race White
## 23 Sex=Female Sex Female
## 24 Sex=Male Sex Male
## 25 Housing=House_Owner Housing House_Owner
## 26 Housing=House_Renter Housing House_Renter
## 27 Location=Metropolitan Location Metropolitan
## 28 Location=Rural Location Rural
## 29 Age_Group=18-29 Age_Group 18-29
## 30 Age_Group=30-49 Age_Group 30-49
## 31 Age_Group=50-64 Age_Group 50-64
## 32 Age_Group=65+ Age_Group 65+
## 33 Opera Opera TRUE
## 34 Classical_Music Classical_Music TRUE
## 35 Ballet Ballet TRUE
## 36 Art_Museum Art_Museum TRUE
## 37 Jazz Jazz TRUE
## 38 Musical Musical TRUE
## 39 Theater Theater TRUE
## 40 Sightseeing Sightseeing TRUE
## 41 Books Books TRUE
## 42 Poetry Poetry TRUE
## 43 Latin_Music Latin_Music TRUE
## 44 Live_Dance Live_Dance TRUE
## 45 Crafts_Fair Crafts_Fair TRUE
## 46 Outdoor_Festival Outdoor_Festival TRUE
## 47 Audiobook Audiobook TRUE
## transactions as itemMatrix in sparse format with
## 8498 rows (elements/itemsets/transactions) and
## 47 columns (items) and a density of 0.2242655
##
## most frequent items:
## Race=White Location=Metropolitan Housing=House_Owner
## 7043 6666 5691
## Books Sex=Female (Other)
## 4762 4626 60785
##
## element (itemset/transaction) length distribution:
## sizes
## 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## 2422 1712 1059 837 639 520 403 348 202 141 98 58 39 15 3 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 8.00 10.00 10.54 12.00 23.00
##
## includes extended item information - examples:
## labels variables levels
## 1 Education=Bachelor Education Bachelor
## 2 Education=HighSchool Education HighSchool
## 3 Education=Master Education Master
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 2
## 3 3
The plot below illustrates the distribution of the cultural activity among respondents. Indeed, the vast majority has not indicated the participation in the chosen cultural activities and the frequency of participation in cultural enterprises lessens along with the rise of the number of distinct number of forms of activities.
activity_counts <- size(sppa_t) - 8
df_activity <- data.frame(Number_of_Activities = activity_counts)
ggplot(df_activity, aes(x = Number_of_Activities)) +
geom_bar(fill = "mediumpurple", color = "mediumpurple4") +
scale_x_continuous(breaks = 0:15) +
labs(title = "Distribution of the Number of Cultural Activities Partcipated In",
x = "Number of Activities",
y = "Number of Respondents") +
theme_minimal()The application of inspect() function to the first ten observations reveals the correctness of the transformation of data into a transactional format. Each item reveals a basket of respondent’s characteristics, the first occurence of cultural activity in the individual’s basket is revealed in the third observation: a high-school level educated, unemployed, white female living in her own house in the metropolitan area, who reads books in her free time.
## items transactionID
## [1] {Education=No_Diploma,
## Income=Income_UpperMid,
## Job=Service_and_Administartion,
## Race=White,
## Sex=Female,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=30-49} 1
## [2] {Education=HighSchool,
## Income=Income_Low,
## Job=Unemployed,
## Race=White,
## Sex=Female,
## Housing=House_Owner,
## Location=Rural,
## Age_Group=65+} 2
## [3] {Education=HighSchool,
## Income=Income_Low,
## Job=Unemployed,
## Race=White,
## Sex=Female,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=65+,
## Books} 3
## [4] {Education=PhD_Prof,
## Income=Income_Middle,
## Job=Professional,
## Race=White,
## Sex=Male,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=50-64} 4
## [5] {Education=HighSchool,
## Income=Income_LowerMid,
## Job=Unemployed,
## Race=White,
## Sex=Female,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=50-64,
## Outdoor_Festival} 5
## [6] {Education=HighSchool,
## Income=Income_LowerMid,
## Job=Sales,
## Race=White,
## Sex=Female,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=65+,
## Books} 6
## [7] {Education=HighSchool,
## Income=Income_Low,
## Job=Unemployed,
## Race=White,
## Sex=Male,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=65+} 7
## [8] {Education=HighSchool,
## Income=Income_LowerMid,
## Job=Manual,
## Race=White,
## Sex=Male,
## Housing=House_Renter,
## Location=Metropolitan,
## Age_Group=18-29,
## Outdoor_Festival} 8
## [9] {Education=No_Diploma,
## Income=Income_UpperMid,
## Job=Unemployed,
## Race=White,
## Sex=Female,
## Housing=House_Owner,
## Location=Metropolitan,
## Age_Group=65+,
## Crafts_Fair} 9
## [10] {Education=HighSchool,
## Income=Income_LowerMid,
## Job=Unemployed,
## Race=White,
## Sex=Male,
## Housing=House_Owner,
## Location=Rural,
## Age_Group=50-64,
## Books} 10
The output below illustrates the relative frequency, or support, of all distinct items in the dataset ordered decreasingly for readability. The most frequent items are dominated by demographic variables, such as race and gender. The most frequently appearing cultural activities include: reading books (~56%), sightseeing (~30%), attending a crafts fair (~25.7%), art museum (~25%) and an outdoor festival (~24%). The least frequent items involve Native American and mixed races, PhD or Proffessor’s degrees (~3%) and exclusive cultural activities such as opera (2.4%) and ballet (3.5%). The identification of the significant variables and their support is crucial for devising the Minimum Support parameter in the Apriori algorithm. Setting the global support threshold at a too high level would eliminate the crucial activities serving the role of class signifiers, however seeing as in order to maintain all of the variables assumed as significant, the global support would have to be lowered to 0.03, or even 0.024 – the computational costs of apriori algorithm implementation with such low support threshold could be outsandingly high, however due to the relatively small size of the dataset (roughly 8,400 rows) this might not pose an issue.
## Race=White Location=Metropolitan
## 0.82878324 0.78441986
## Housing=House_Owner Books
## 0.66968699 0.56036715
## Sex=Female Sex=Male
## 0.54436338 0.45563662
## Job=Unemployed Housing=House_Renter
## 0.37526477 0.33031301
## Age_Group=30-49 Sightseeing
## 0.31842787 0.29912921
## Education=HighSchool Education=SomeCollege
## 0.28747941 0.27841845
## Age_Group=65+ Age_Group=50-64
## 0.27065192 0.26700400
## Crafts_Fair Income=Income_LowerMid
## 0.25700165 0.25347141
## Art_Museum Outdoor_Festival
## 0.24923511 0.24205695
## Income=Income_UpperMid Income=Income_Low
## 0.23487880 0.22923041
## Location=Rural Education=Bachelor
## 0.21558014 0.21299129
## Income=Income_Middle Job=Service_and_Administartion
## 0.18369028 0.17815957
## Musical Audiobook
## 0.17568840 0.17133443
## Job=Professional Age_Group=18-29
## 0.15380089 0.14391622
## Job=Manual Poetry
## 0.12920687 0.12238174
## Job=Management Theater
## 0.10861379 0.10343610
## Race=Black Income=Income_High
## 0.10320075 0.09872911
## Education=No_Diploma Classical_Music
## 0.09578724 0.09413980
## Education=Master Jazz
## 0.09355142 0.09096258
## Live_Dance Job=Sales
## 0.06660391 0.05495411
## Latin_Music Race=Asian
## 0.04871735 0.03730289
## Ballet Education=PhD_Prof
## 0.03506707 0.03177218
## Opera Race=Mixed
## 0.02435867 0.01941633
## Race=NativeAm
## 0.01129678
The plot below provides a graphic illustartion of the technical output above – limited to display only the top 25 items in the dataset according to their relative frequencies.
itemFrequencyPlot(sppa_t, topN = 25, type = "relative", cex.names = 0.8, main = "Top 25 Items - Frequency Plot", col = "mediumpurple")To gain a deeper understanding of the data structure the function image() is utilized, which allows for a graphic representation of the binary matrix of transactional data – each row denotes the answers of an individual respondent (or a transaction) and each column represents one of the 47 variables used in the analysis, embodying either the socio-economic background of the repondent or their participation in public arts. The plots below illustrate the structure of the first 25 rows of the data and of a randomly selected subsample of 100 observations. What might catch the reader’s eye is the area of relatively dense vertical strips, representing the most frequently appearing variables (Race=White, Location=Metropolitan, Sex, etc.). This observation carries a crucial implication in regards to the apriori algorithm application, namely, the algorithm will be inclined towards generating a large number of trivial rules, due to, solely, the frequency of the appearance of some demogrpahic variables. However, excluding the aforementioned dense strip, the matrix is characterized by a relative sparsity of data – represented by the empty white space – which warrants the use of a lower threshold of support. The binary matrix is sparse with the exception of columns containing demographic data, which implies the need to look for strong connections (characterized by high Lift/Confidence values) within rare occurrences (low Support).
The distinct, rectangular splashes of color, illustrated on the correlation matrix plot below, indicate the presence of specific relationships within data – ranging from a strong positive correlation to strong negative – these relationships validate the application of association rule mining algorithm.
binary_matrix <- as(sppa_t, "matrix") * 1
M <- cor(binary_matrix)
# correlation plot
corrplot(M,
method = "color",
type = "lower",
order = "hclust",
tl.col = "black",
tl.cex = 0.4,
diag = FALSE,
col = colorRampPalette(c("red", "white", "blue"))(200),
title = "Correlation Matrix",
mar = c(0,0,2,0)) The exploratory data analysis unveils crucial patterns and characteristics of the dataset that inform the selection of the most appropriate association rule mining startegies. The data provided by the SPPA 2017, after cleaning and transformation, is, ultimately, a relatively small sample, containing less than 8,500 observations. The structure of the data, characterized, both, by the sparsity of crucial cultural variables and relative density across demogrpahic variables, warrants the implementation of multiple association rule learning algorithms and their subsequent comparison.
Association rule mining is an unsupervised learning technique that allows to discover the hidden relationships, patterns and associations within large transactional datasets. While devised and used mainly for consumer’s basket analysis, recommendation systems and costumer behavior analysis, lately it has been gaining traction in sociological studies of taste (Pan et al. 2019, Gondal 2025). The application of unsupervised learning algorithms for sociological database’s analysis extends the possibilities of other forms of analyses, because it is based on the natural data structure, rather than researchers intuition and literature review. The key contribution of this particular unsupervised learning technique is the generation of hidden relationships that might have otherwise remained unnoticed as they might not be intuitive or obvious. Bourdieu’s analysis of habitus relies heavily on the relationship between different forms of capital, however the relationships within different manifestations of his three forms of capital are yet to be revealed. This study endeavors to find hidden relationships between social, economic and cultural capital, and hence cultural preferences and tastes, through an application of association rule mining algorithm.
Having previously outlined the technicalities of data employed in this study, distinct association rule mining algorithms were evaluated to determine the most robust approach for a dataset characterized by specific structural duality of crucial cultural data sparsity and density of demographic items.
While the algorithms for frequent itemset mining are plentiful, the field is dominated by three primary startegies: Apriori (Breadth-First Search), ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal – Depth-First Search) and FP-Growth (Frequent Pattern Growth – tree-based compression). Given the size of the dataset (N≈8,500) but complex internal structures, a comparative assessment was conducted to balance computational efficiency with interpretability. The table below outlines methodological trade-offs between distinct algorithms.
| Comparison Criteria | Apriori | ECLAT | FP-Growth |
|---|---|---|---|
| Search Strategy | Horizontal (Breadth-First). Iterative candidate generation (level-wise). | Vertical (Depth-First). Intersection of Transaction ID (TID) lists. | Tree-based (FP-Tree). Compressed representation. |
| Handling of Dense Data | High computational cost due to ‘candidate explosion’ when connecting frequent items. | Memory intensive for long ID lists (e.g., common demographic traits). | Excellent compression of repeated patterns (e.g., identical demographic profiles). |
| Handling of Sparse Data | Must scan empty space to find rare items (e.g., Opera). Requires low support threshold. | Very fast for rare items (skips empty space via vertical layout). | No candidate generation. Very fast extraction of rare items. |
| Output Type | Association Rules (Directly generates antecedents -> consequents). | Frequent Itemsets (Requires a secondary step to induce rules). | Frequent Itemsets. |
| Computational Cost (N=8,500) | Low / Negligible (< 1 second). | Low / Negligible (< 1 second). | Lowest (Most Efficient). |
The comparative analysis of teh three dominating algorithms indicates that while FP-Growth is theoretically the most efficient due to its compressed tree structure, its implementation may not be necessitated thanks to the relatively small size of the sample. Similarly, ECLAT, though, excelling at identifying rare event through vertical scanning, its pirmary output are frequent itemsets, rather than directional rules. Given that the primary objective is to reveal implications of taste, the Apriori algorithm seems to be the optimal choice.
The analysis of association rules is initialized through the implementation of the Apriori algorithm. It works on a horizontal basis, imitating a Breadth-First Search startegy – meaning it finds all frequent items of a set minimal size, moving on level by level through the lattice of combinations up to the maximal size, when set. The algorithm relies on one specific mathematical property: anti-monotonicity or the Apriori property, stating that all nonempty subsets of frequent itemset must also be frequent. The algorithm employs this to cut off branches of search that are dead-ended. The algorithm adheres to the following steps:
Candidate Generation – algorithm generates candidates for the next level, then subsequently it takes the frequent itemsets found in the previous step and joins them with themselves to create larger itemsets (e.g. initially generated {A,B} and {A,C} are joined to create a candidate {A,B,C}),
Prune the Search Space – the algorithm eliminates bad candidates to avoid wasting processing power – it scans the database to measure the frequency of appearance of the previously generated candidates and discards these, which do not exceed the minimum support threshold.
These two steps are then iterated until no more frequent itemsets can be found.
The code below initializes the Apriori algorithm for the dataset employed in this project. The minimum support threshold is set to 0.005 in order to capture the rarely emerging attendance to high-brow cultural events, the confidence (measured as the fraction of support for an itemset and the support of the item) is set to 0.15, the minimal length of an itemset is set to 3 and maximal length to 20, in order to capture the maximal possible number of rules containing variables representing cultural participation. Thus defined algorithm successfully generated 1,067,650 rules, which is in accordance with the density of transaction matrix and the specificalities of the implemented parameters.
rules <- apriori(sppa_t, parameter = list(support = 0.005,
confidence = 0.15,
minlen = 3,
maxlen = 23))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.15 0.1 1 none FALSE TRUE 5 0.005 3
## maxlen target ext
## 23 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 42
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[47 item(s), 8498 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 11 done [0.14s].
## writing ... [1067650 rule(s)] done [0.05s].
## creating S4 object ... done [0.12s].
The output of the summary function reveals the distribution of rule length, whose number increases along with the increase of the number items, peak at length = 6 and then decrease until the maximal length of a rule (11) is reached. Below the distribution of rule length loom the summaries of quality measures – the minimum value of support and confidence are in compliance to the parameters set in the initialization of the Apriori algorithm. The average value of support does not exceed 1%, which while alarming in itself, is the predicament of sociological data, especially that pertaining to culture. Statistics describing the values of confidence and lift seem to provide a more optimistic outloook on the performed analysis, namely, the vlaues of mean and median confidence are relatively close to each other and oscillate around 65%, with the maximum value of confidence reaching 1. The distribution of lift values (measured as \(lift(X -> Y) = confidence(X -> Y)/support(Y)\)), ranges from 0.228 to 12.99 with a median at 1.75 and the average value of 2.17. Such values of lift indicate that some itemsets make the occurrence of the right-hand-side (RHS) of a rule, on average, twice as likely and maximally 12 times as liekly to occur than it would be by chance.
## set of 1067650 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5 6 7 8 9 10 11
## 17099 90942 224505 307780 254115 128737 38303 5850 319
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 6.182 7.000 11.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.005060 Min. :0.1500 Min. :0.005060 Min. : 0.2283
## 1st Qu.:0.005766 1st Qu.:0.4088 1st Qu.:0.008473 1st Qu.: 1.1406
## Median :0.007178 Median :0.6667 Median :0.012827 Median : 1.7502
## Mean :0.009526 Mean :0.6265 Mean :0.018796 Mean : 2.1716
## 3rd Qu.:0.010002 3rd Qu.:0.8421 3rd Qu.:0.021417 3rd Qu.: 2.9128
## Max. :0.443516 Max. :1.0000 Max. :0.635914 Max. :12.9939
## count
## Min. : 43.00
## 1st Qu.: 49.00
## Median : 61.00
## Mean : 80.95
## 3rd Qu.: 85.00
## Max. :3769.00
##
## mining info:
## data ntransactions support confidence
## sppa_t 8498 0.005 0.15
## call
## apriori(data = sppa_t, parameter = list(support = 0.005, confidence = 0.15, minlen = 3, maxlen = 23))
The analysis of rules generated by the previously specified Apriori algorithm begins with the inspection of rules ordered and displayed according to their support values. The table below presents an overview of 25 rules with the highest support values. The majority of the displayed rules seem trivial and redundant in the context of this study – most of the variables capturing the occurrence of cultural participation are not frequent and are, hence, discriminated by the criterium of high support. The first three rules are a permutation of the same itemset with support value of 44% and lift approximating 1 – the dataset is dominated by white people, who own their house and live in the metropolitan area. The fourth and fifth rule, similarly a permutation of one rule, contain a variable capturing whether the person has read a book in the last 12 months – these rule indicate that white people living in the metropolitan area read books. While, on the surface it might seem trivial, it might reveal a systemic privilege, as reading a book in full might be time consuming and metropolitan areas, often characterized by a high concentration of highly-cultural people might promote an engagement with culture in the form of book-reading.
## lhs rhs support confidence coverage lift count
## [1] {Housing=House_Owner,
## Location=Metropolitan} => {Race=White} 0.4435161 0.8688336 0.5104731 1.0483242 3769
## [2] {Race=White,
## Housing=House_Owner} => {Location=Metropolitan} 0.4435161 0.7527462 0.5891975 0.9596215 3769
## [3] {Race=White,
## Location=Metropolitan} => {Housing=House_Owner} 0.4435161 0.6974463 0.6359143 1.0414512 3769
## [4] {Location=Metropolitan,
## Books} => {Race=White} 0.3732643 0.8353964 0.4468110 1.0079793 3172
## [5] {Race=White,
## Books} => {Location=Metropolitan} 0.3732643 0.7820513 0.4772888 0.9969805 3172
Subsequently, the Apriori generated rules are examined according to their confidence values. Confidence is measured as:
\[ confidence(X -> Y ) = support(X,Y)/support(X) \] The rules revealed through sorting by confidence prove to be more interpretable and insightful in the context of cultural capital. The first 22 rules according to their values of confidence, which reach the maximal possible value of one, meaning that all occurrences of the antecedent (left-hand side) are changelessly accompanied by the consequent (right-hand side), share a common consequent: Location=Metropolitan. This indicates a conditional probability of \(P(Y∣X)=1\), suggesting a deterministic relationship within the analyzed subsample, however, it is crucial to bear in mind the value of support in such cases. The values of support for the displayed cases all range below 1%, which might indicate overfitting (most of the concerned 22 rules relate to subsample of asian people living in the metropolitan area). The analysis of confidence reveals a larger numebr of rules pertaining to the participation in cultural events, the representation of Location=Metropolitan as a consequent suggests that living in a metropolitan area promotes the consumption of cultural goods. Rule displayed 6th in order suggest that if the person attended a ballet dance and a latin music concert, they live in a metropolitan area. An interesting shift in rule’s consequent happens at the 23rd position – rules 23 - 26 indicate an interesting and deterministic relationship between high education and consumption of high-brow cultural goods, such as going to the theater, attending an art museum and reading poetry. The emergence of these specific rules with a confidence level of 1 serves as empirical verification og Bourdieu’s concept of institutionalized cultural capital. It demonstrates that the possession of high educational credentials (a PhD or Professor’s titles) acts as a binding predictor for participation in legitimate culture. This suggests that high-brow cultural consumption serves as a structural imperative of class habitus for this specific social stratum.
## lhs rhs support confidence coverage lift count
## [1] {Income=Income_High,
## Race=Asian} => {Location=Metropolitan} 0.006707461 1 0.006707461 1.274827 57
## [2] {Race=Asian,
## Musical} => {Location=Metropolitan} 0.005060014 1 0.005060014 1.274827 43
## [3] {Education=Bachelor,
## Race=Asian} => {Location=Metropolitan} 0.011532125 1 0.011532125 1.274827 98
## [4] {Race=Asian,
## Outdoor_Festival} => {Location=Metropolitan} 0.005060014 1 0.005060014 1.274827 43
## [5] {Race=Asian,
## Sightseeing} => {Location=Metropolitan} 0.009649329 1 0.009649329 1.274827 82
## [6] {Ballet,
## Latin_Music} => {Location=Metropolitan} 0.005177689 1 0.005177689 1.274827 44
## [7] {Income=Income_High,
## Race=Asian,
## Housing=House_Owner} => {Location=Metropolitan} 0.005177689 1 0.005177689 1.274827 44
## [8] {Job=Professional,
## Race=Asian,
## Age_Group=30-49} => {Location=Metropolitan} 0.005766063 1 0.005766063 1.274827 49
## [9] {Job=Professional,
## Race=Asian,
## Housing=House_Owner} => {Location=Metropolitan} 0.006001412 1 0.006001412 1.274827 51
## [10] {Education=Bachelor,
## Race=Asian,
## Age_Group=30-49} => {Location=Metropolitan} 0.005530713 1 0.005530713 1.274827 47
## [11] {Education=Bachelor,
## Race=Asian,
## Sex=Male} => {Location=Metropolitan} 0.005177689 1 0.005177689 1.274827 44
## [12] {Education=Bachelor,
## Race=Asian,
## Sex=Female} => {Location=Metropolitan} 0.006354436 1 0.006354436 1.274827 54
## [13] {Education=Bachelor,
## Race=Asian,
## Books} => {Location=Metropolitan} 0.006354436 1 0.006354436 1.274827 54
## [14] {Education=Bachelor,
## Race=Asian,
## Housing=House_Owner} => {Location=Metropolitan} 0.006825135 1 0.006825135 1.274827 58
## [15] {Income=Income_UpperMid,
## Race=Asian,
## Sex=Male} => {Location=Metropolitan} 0.005413038 1 0.005413038 1.274827 46
## [16] {Race=Asian,
## Age_Group=30-49,
## Sightseeing} => {Location=Metropolitan} 0.005177689 1 0.005177689 1.274827 44
## [17] {Race=Asian,
## Art_Museum,
## Sightseeing} => {Location=Metropolitan} 0.005883737 1 0.005883737 1.274827 50
## [18] {Race=Asian,
## Sex=Female,
## Sightseeing} => {Location=Metropolitan} 0.006472111 1 0.006472111 1.274827 55
## [19] {Race=Asian,
## Sightseeing,
## Books} => {Location=Metropolitan} 0.007884208 1 0.007884208 1.274827 67
## [20] {Race=Asian,
## Housing=House_Owner,
## Sightseeing} => {Location=Metropolitan} 0.005295364 1 0.005295364 1.274827 45
## [21] {Race=Asian,
## Sex=Male,
## Books} => {Location=Metropolitan} 0.006472111 1 0.006472111 1.274827 55
## [22] {Housing=House_Renter,
## Opera,
## Art_Museum} => {Location=Metropolitan} 0.005766063 1 0.005766063 1.274827 49
## [23] {Education=PhD_Prof,
## Musical,
## Theater} => {Books} 0.006236762 1 0.006236762 1.784544 53
## [24] {Education=PhD_Prof,
## Theater,
## Crafts_Fair} => {Books} 0.005413038 1 0.005413038 1.784544 46
## [25] {Education=PhD_Prof,
## Theater,
## Sightseeing} => {Books} 0.007178160 1 0.007178160 1.784544 61
## [26] {Education=PhD_Prof,
## Art_Museum,
## Poetry} => {Books} 0.005177689 1 0.005177689 1.784544 44
## [27] {Education=Bachelor,
## Age_Group=30-49,
## Ballet} => {Location=Metropolitan} 0.005295364 1 0.005295364 1.274827 45
## [28] {Income=Income_Middle,
## Job=Sales,
## Books} => {Race=White} 0.005060014 1 0.005060014 1.206588 43
## [29] {Education=Bachelor,
## Housing=House_Renter,
## Latin_Music} => {Location=Metropolitan} 0.005060014 1 0.005060014 1.274827 43
## [30] {Education=Master,
## Age_Group=65+,
## Outdoor_Festival} => {Race=White} 0.007178160 1 0.007178160 1.206588 61
All of the rules displayed above possess a confidence value of 1, which prompts further inspection of deterministic rules. However, their full analysis is proved impossible by the sheer amount of such occurrences, which is 4,427.
## [1] 4427
The most relevant results are yielded by the inspection of Apriori rules sorted by the value of lift, as previously expected. The first thirteen rules successfully predict the level of education of a respondent characterized by a set of specific demographic characteristics and vairables indicating participation in the arts, with Education=PhD_Prof as rule consequent. The analysis of income level, type of job classification and particiaption in culture predicts the person’s possession of a high educational credentials up to 12 times better than random guessing. This confirms that the intellectual elite as a distinct class fraction, is not defined merely by their level of income, or educational level but rather by the accumulation of all three forms of capital: economic, cultural and institutional. Another interesting pattern of cultural consumption is revealed by rules 17, 18, 21, 22 and 24, which rather than predicting demographic attributes of a respondent, link together distinct forms of participation in the arts. These rules allow for an identification of a structural homology of taste – rules that link together sets of cultural goods such as {Classical_Music, Art_Museum, Books, Poetry} and other type of participation in the arts, such as {Opera} (rule 24) – reveal that high-culture consumers are statistically driven towards other forms of high-brow culture. The system of high-brow culture validates the existence of a coherent aesthetic disposition.
## lhs rhs support confidence coverage lift count
## [1] {Income=Income_High,
## Job=Professional,
## Race=White,
## Sightseeing,
## Books} => {Education=PhD_Prof} 0.005295364 0.4128440 0.01282655 12.99388 45
## [2] {Income=Income_High,
## Job=Professional,
## Race=White,
## Location=Metropolitan,
## Sightseeing} => {Education=PhD_Prof} 0.005060014 0.3944954 0.01282655 12.41638 43
## [3] {Income=Income_High,
## Job=Professional,
## Sightseeing,
## Books} => {Education=PhD_Prof} 0.005648388 0.3934426 0.01435632 12.38324 48
## [4] {Income=Income_High,
## Job=Professional,
## Location=Metropolitan,
## Sightseeing,
## Books} => {Education=PhD_Prof} 0.005177689 0.3928571 0.01317957 12.36481 44
## [5] {Income=Income_High,
## Job=Professional,
## Race=White,
## Sightseeing} => {Education=PhD_Prof} 0.005530713 0.3884298 0.01423864 12.22547 47
## [6] {Income=Income_High,
## Job=Professional,
## Art_Museum,
## Books} => {Education=PhD_Prof} 0.005295364 0.3879310 0.01365027 12.20977 45
## [7] {Income=Income_High,
## Job=Professional,
## Sex=Male} => {Education=PhD_Prof} 0.005413038 0.3865546 0.01400329 12.16645 46
## [8] {Income=Income_High,
## Job=Professional,
## Sex=Male,
## Location=Metropolitan} => {Education=PhD_Prof} 0.005060014 0.3839286 0.01317957 12.08380 43
## [9] {Income=Income_High,
## Job=Professional,
## Location=Metropolitan,
## Art_Museum} => {Education=PhD_Prof} 0.005413038 0.3833333 0.01412097 12.06506 46
## [10] {Income=Income_High,
## Job=Professional,
## Art_Museum} => {Education=PhD_Prof} 0.005766063 0.3798450 0.01518004 11.95527 49
## [11] {Income=Income_High,
## Job=Professional,
## Location=Metropolitan,
## Sightseeing} => {Education=PhD_Prof} 0.005530713 0.3790323 0.01459167 11.92969 47
## [12] {Income=Income_High,
## Job=Professional,
## Sightseeing} => {Education=PhD_Prof} 0.006001412 0.3722628 0.01612144 11.71663 51
## [13] {Income=Income_High,
## Job=Professional,
## Race=White,
## Books} => {Education=PhD_Prof} 0.007178160 0.3696970 0.01941633 11.63587 61
## [14] {Location=Metropolitan,
## Ballet,
## Crafts_Fair} => {Opera} 0.005060014 0.2828947 0.01788656 11.61372 43
## [15] {Income=Income_High,
## Job=Professional,
## Race=White,
## Location=Metropolitan,
## Books} => {Education=PhD_Prof} 0.006589786 0.3684211 0.01788656 11.59571 56
## [16] {Income=Income_High,
## Job=Professional,
## Books} => {Education=PhD_Prof} 0.008119558 0.3556701 0.02282890 11.19439 69
## [17] {Location=Metropolitan,
## Opera,
## Crafts_Fair} => {Ballet} 0.005060014 0.3909091 0.01294422 11.14747 43
## [18] {Race=White,
## Location=Metropolitan,
## Classical_Music,
## Art_Museum,
## Books,
## Live_Dance} => {Ballet} 0.005060014 0.3909091 0.01294422 11.14747 43
## [19] {Income=Income_High,
## Job=Professional,
## Race=White,
## Housing=House_Owner,
## Books} => {Education=PhD_Prof} 0.005883737 0.3521127 0.01670981 11.08242 50
## [20] {Income=Income_High,
## Job=Professional,
## Location=Metropolitan,
## Books} => {Education=PhD_Prof} 0.007413509 0.3519553 0.02106378 11.07747 63
## [21] {Location=Metropolitan,
## Opera,
## Musical} => {Ballet} 0.005295364 0.3879310 0.01365027 11.06254 45
## [22] {Location=Metropolitan,
## Ballet,
## Musical} => {Opera} 0.005295364 0.2694611 0.01965168 11.06222 45
## [23] {Income=Income_High,
## Job=Professional,
## Race=White,
## Housing=House_Owner,
## Location=Metropolitan,
## Books} => {Education=PhD_Prof} 0.005413038 0.3511450 0.01541539 11.05196 46
## [24] {Location=Metropolitan,
## Classical_Music,
## Art_Museum,
## Books,
## Poetry} => {Opera} 0.005648388 0.2666667 0.02118145 10.94750 48
## [25] {Income=Income_High,
## Job=Professional,
## Housing=House_Owner,
## Books} => {Education=PhD_Prof} 0.006825135 0.3473054 0.01965168 10.93112 58
Following the formal inspection of the generated rules, comes the visual analysis of their specific structure and characteristics. This is initiated with a glimpse into the matrix-based visulization for all of the 1,067,650, which reveals the relationships between rule antecedents and consequents – the x axis illustrates unique sets of items in the LHS and th y axis represents rule consequents. The color intensity of the displayed bars represents the values of lift. The presence of horizontal bands indicates that certain consequents are generated by a large number of distinct sets of attributes.
The scatter plot of the generated rules presented below – in the shape of the letter L represents explosion of the number of rules, which is to be expected due to the parameters set in the Apriori algorithm, however necessitated by the specificalities of sociological data. The X axis illustrates support (the frequency of occurrence), while the Y axis – lift. The upper left corner captures the rare but significant itesmsets characterized with extremely high values of lift, supposedly pertaining to the elite culture. However, the lower right corner, illustrating occurrences characterized by high support but low loft values comprises the trivial platitudes generated mostly by different combinations of demographic variables.
plot(rules, measure=c("support","lift"), shading="confidence", colors = c("mediumpurple", "lavender"))## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
The Two-Key Plot provides a visual representation of the length of rules scattered according to their support (horizontally) and confidence (vertically). The large rules are concentrated at the beginning of the coordinate system, with a large number of these accumulated in the upper left corner – indicating that higher specialization leads to the decrease of support value, but does not drastically reduce confidence. The determinism of rules is accomplished mostly by highly specified and complex rules, while simple rules pertaining to a larger share of population have lower confidence values.
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
The exceptionally large number of generated rules, the values of key measure metrics and the banality of the vast majority of rules caused by the use of demographic data necessitate the implementation of targeted comparative analysis.
In order to obtain more specific insights into the inner workings of class habitus, targeted apriori analysis will be implemented. This section will be divided according to Bourdieu’s forms of capital (1986), which in itself are subdivided into distinct forms of manifestation:
Having outlined the roadmap for the subsequent analysis, I will proceed with the examination of specifically targeted Apriori rules. The code below serves the purpose of a setup for further analysis. Grouds of variables are created to illustarte distinct forms of capital.
# defining a set of cultural goods
cultural_goods <- c("Opera", "Ballet", "Art_Museum", "Classical_Music", "Sightseeing", "Crafts_Fair", "Outdoor_Festival", "Latin_Music",
"Jazz", "Musical", "Theater", "Books", "Poetry", "Live_Dance", "Audiobook")
# set of economic capital variables
economic <- grep("Income=|Housing=", itemLabels(sppa_t), value = TRUE)
# institutionalized cultural capital
education <- grep("Education=", itemLabels(sppa_t), value = TRUE)
# set of social capital proxies
social <- grep("Job=|Location=", itemLabels(sppa_t), value = TRUE)
# preferred color :)) for visualizations
my_color <- "mediumpurple"This section presents the results for targeted analysis of the SPPA 2017 respondents based on their level of income and house ownership – direct indicators of an individual’s socio-economic status and ease of living. Seeing as, cultural goods are rightfully considered a type of experience good, the consumption of which requires time as well as a prior possession of a certain taste, which dictates the value a person attributes to a certain good – the satiation of basic needs and guarantees of stability may serve as strong drivers of participation in the arts. This phenomenon is described by Bourdieu (1986) as distance from necessity, a key characteristics of the upper classes, which allow these social strata an appreciation of art, which favors form over functionality, determining the drive towards abstract forms of art, as well as forms of art historically established as elite – forms characterized by a high cognitive barrier of entry, as their appreciation is predicated on the acquisition of specific cultural codes necessary for decoding the complex aesthetic structures (opera, ballet).
Therefore, the Apriori rules are inspected in terms of how economic capital revealed through an individual’s level of income and property ownership influence the consumption of cultural goods. The table below presents aggregated economic profiles of respondents and the corresponding cultural consumption. The inspection respondents characterized by high income reveals a wide diversification of cultural consumption, reaching the highest numbers of itemset sizes, potentially evaluating a theory proposed by Peterson (1992), stating that following the democratization of access to cultural goods, the tastes allowing for a distinction between certain social strata evolved from high-brow and popular as proposed by Bourdieu (1984) and Gans (1974) into an omnivore-univore distinction. His analysis of the American audience, based on the 1992 edition of the SPPA survey, reveals a correlation between the diversification and volume of cultural consumption and the socio-economic position of an individual, proclaiming that the mode of consumption of higher classes changed from elitist, strictly in the sense of high-brow culture, into an omnivorous consumption – an eclectic combination of a variety of cultural goods, be it high-brow or popular. While the consumption of the lower classes remained tethered to a single lane of cultural good consumption. However, upon further inspection of cultural baskets consumed by respondents with low income, it is visible that while staying in one lane, they also tend to consume popular goods with lower entry level, such as: books, sightseeing, carfts fair and outdoor festivals.
# rule filtering : lhs - income or housing, rhs - cultural goods; economic capital -> cultural goods
rules_economic <- apriori(sppa_t, parameter = list(supp = 0.005, conf = 0.15, minlen = 2),
appearance = list(lhs = economic,
rhs = cultural_goods,
default = "none"),
control = list(verbose = FALSE))
# data frame of economic rules
rules_df <- data.frame(
lhs = labels(lhs(rules_economic)),
rhs = labels(rhs(rules_economic)),
quality(rules_economic)
)
profile_summary <- rules_df %>%
mutate(
lhs = str_remove_all(lhs, "\\{|\\}"),
rhs = str_remove_all(rhs, "\\{|\\}")
) %>%
group_by(lhs) %>%
summarise(
Basket_Size = n(),
Cultural_Basket = paste0(rhs, " (Lift=", round(lift, 2), ")", collapse = ", "),
Avg_Lift = round(mean(lift), 2),
Max_Confidence = round(max(confidence), 2)
) %>%
arrange(desc(Avg_Lift))
datatable(profile_summary,
options = list(scrollX = TRUE, pageLength = 10),
caption = "Economic Profiles and Cultural Consumption")Thus targeted rules are visualized on the interactive plot below. One can inspect the rules pertaining to specific variables and the numebr and variety of rules that the chose item is connected to.
# interactive network graph
plot(rules_economic, method = "graph", engine = "htmlwidget",
control = list(nodeCol = my_color, edgeCol = my_color))## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
The parallel coordinates graph allows for a graphic illustration of rules as arrows – the intensity of their color represents their corresponding values of lift, while the width of their lines corresponds to support values. All of the plotted rules pertain to the individuals characterized by high values of income, the grpah allows for an identification of the diversity of their consumption, ranging from musicals, thorugh museum attendance (highest support) and jazz listening, to classical music.
# parallel coordinates (static)
plot(head(sort(rules_economic, by="lift"), 10), method="paracoord",
control = list(col = my_color))This section proceeds with the analysis of the influence that different forms of cultural capital hold over the other. Though it might seem endogenous, the verification of the power of institutionalized state and academically acquired and evaluated tastes to shape and perpetuate an individual’s mode of cultural consumption. The verification of the homology of tastes seems to also be of the utmost importance in the context of taste formation.
Cultural capital in the form of institutionalized state serves as a key proxy of an individual’s cultural socialization process largely shaped by the theories, opinions and competences acquired through education. Education shapes the eye of the beholder of cultural goods and therefore is bound to strongly influence the volume, as well as the type of participation of an individual in the arts. The table below presenting the aggregated baskets of cultural goods of the respondent, depending on their level of education, shows that
# rule filtering : education -> cultural gods
rules_institutionalized <- apriori(sppa_t, parameter = list(supp = 0.005, conf = 0.15, minlen = 2),
appearance = list(lhs = education,
rhs = cultural_goods,
default = "none"),
control = list(verbose = FALSE))
rules_df_edu <- data.frame(
lhs = labels(lhs(rules_institutionalized)),
rhs = labels(rhs(rules_institutionalized)),
quality(rules_institutionalized)
)
profile_summary_edu <- rules_df_edu %>%
mutate(
lhs = str_remove_all(lhs, "\\{|\\}"),
rhs = str_remove_all(rhs, "\\{|\\}")
) %>%
group_by(lhs) %>%
summarise(
Basket_Size = n(),
Cultural_Basket = paste0(rhs, " (Lift=", round(lift, 2), ")", collapse = ", "),
Avg_Lift = round(mean(lift), 2),
Max_Confidence = round(max(confidence), 2)
) %>%
arrange(desc(Avg_Lift))
datatable(profile_summary_edu,
options = list(scrollX = TRUE, pageLength = 10),
caption = "Education and Cultural Consumption")The interested viewer, who may be inclined to further inspect the relationship between the level of education obtained by an individual and their cultural consumption, can inspect the graph of rules choosing the concerned level of education by which to filter the rules.
# interactive network graph
plot(rules_institutionalized, method = "graph", engine = "htmlwidget",
control = list(nodeCol = my_color, edgeCol = my_color))## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
The parallel coordinates plot for the first ten rules illustrates the similarly diversified consumption of cultural goods. The arrows, which have their starting point in Education=Master are characterized by higher lift values, but are more sparse compared to those whose staring point is concentrated in Education=PhD_Prof.
plot(head(sort(rules_institutionalized, by="lift"), 10), method="paracoord",
control = list(col = my_color))The variables representing cultural goods in the SPPA 2017 dataset cannot be strictly attributed to one form of cultural capital (i.e. embodied or objectified state), the participation in culture requires a certain level of savvy, pre-existing knowledge and predispositions of an individual and cannot, therefore, be reduced merely to the materiality of a public event or an owned good – this is why this section binds these two forms of capital together. What is essential to the embodied state and acquired taste is its homology and coherence – to fully and truly substantialize a form of cultural capital it must be integrated into the sheer self-hood and identity of an individual – their habitus – from which the individual is inseparable. This section analyzes the coherence of tastes through an inspection of rules of which the right-hand-side, as well as the left-hand-side contain solely cultural goods. For that purpose, a subset of SPPA is created so as to contain only cultural variables solely for the respondents, who participated in public arts. Subsequently, apriori algorithm with increased confidence and support values is run. The output below shows the first 20 rules with the highest lift values, however their analysis does not facilitate the qualification of the taste and its homogeneity – most of the printed rules illustrate the relationship between multiple cultural baskets and ballet attendance, the fact that such a lot of disticnt baskets lead to the same rhs point to the heterogeneity of taste, rather than its homology, which might validate the findings of Peterson (1992) – in the era of democratized access taste is captured in the eclectic combination of distinct modes of cultrual consumption.
# temporary dataset - only cultural variables
sppa_active <- sppa_t[size(sppa_t[, cultural_goods]) > 0]
sppa_culture <- sppa_active[, cultural_goods]
# apriori on subset
rules_homology <- apriori(sppa_culture,
parameter = list(supp = 0.005, conf = 0.4, minlen = 2),
control = list(verbose = FALSE))
# removing redundant rules
rules_homology <- rules_homology[!is.redundant(rules_homology)]
# inspect
inspect(head(sort(rules_homology, by = "lift"), 20))## lhs rhs support confidence coverage lift count
## [1] {Opera,
## Musical,
## Theater,
## Books} => {Ballet} 0.005102041 0.4558824 0.01119157 9.295105 31
## [2] {Opera,
## Art_Museum,
## Crafts_Fair,
## Musical} => {Ballet} 0.005431205 0.4400000 0.01234365 8.971275 33
## [3] {Opera,
## Musical,
## Theater} => {Ballet} 0.005266623 0.4383562 0.01201448 8.937759 32
## [4] {Opera,
## Sightseeing,
## Crafts_Fair,
## Musical} => {Ballet} 0.005266623 0.4266667 0.01234365 8.699418 32
## [5] {Art_Museum,
## Crafts_Fair,
## Jazz,
## Musical,
## Books,
## Live_Dance} => {Ballet} 0.005102041 0.4246575 0.01201448 8.658454 31
## [6] {Opera,
## Crafts_Fair,
## Musical} => {Ballet} 0.006418697 0.4193548 0.01530612 8.550336 39
## [7] {Art_Museum,
## Sightseeing,
## Crafts_Fair,
## Musical,
## Theater,
## Books,
## Live_Dance} => {Ballet} 0.005102041 0.4189189 0.01217907 8.541447 31
## [8] {Art_Museum,
## Sightseeing,
## Crafts_Fair,
## Musical,
## Theater,
## Live_Dance} => {Ballet} 0.005431205 0.4177215 0.01300197 8.517033 33
## [9] {Classical_Music,
## Jazz,
## Musical,
## Live_Dance} => {Ballet} 0.005102041 0.4133333 0.01234365 8.427562 31
## [10] {Art_Museum,
## Crafts_Fair,
## Musical,
## Theater,
## Live_Dance} => {Ballet} 0.005924951 0.4090909 0.01448321 8.341062 36
## [11] {Opera,
## Art_Museum,
## Sightseeing,
## Musical,
## Books} => {Ballet} 0.005102041 0.4078947 0.01250823 8.316673 31
## [12] {Opera,
## Art_Museum,
## Sightseeing,
## Musical} => {Ballet} 0.005431205 0.4074074 0.01333114 8.306736 33
## [13] {Art_Museum,
## Classical_Music,
## Jazz,
## Books,
## Live_Dance} => {Ballet} 0.005431205 0.4074074 0.01333114 8.306736 33
## [14] {Art_Museum,
## Crafts_Fair,
## Jazz,
## Musical,
## Live_Dance} => {Ballet} 0.005266623 0.4050633 0.01300197 8.258941 32
## [15] {Opera,
## Art_Museum,
## Theater,
## Books} => {Ballet} 0.005102041 0.4025974 0.01267281 8.208664 31
## [16] {Opera,
## Theater,
## Books} => {Ballet} 0.005760369 0.4022989 0.01431863 8.202577 35
## [17] {Art_Museum,
## Classical_Music,
## Sightseeing,
## Outdoor_Festival,
## Jazz,
## Books,
## Poetry} => {Latin_Music} 0.005431205 0.5156250 0.01053325 7.567482 33
## [18] {Art_Museum,
## Classical_Music,
## Sightseeing,
## Outdoor_Festival,
## Jazz,
## Poetry} => {Latin_Music} 0.005595787 0.5074627 0.01102699 7.447689 34
## [19] {Art_Museum,
## Sightseeing,
## Jazz,
## Poetry,
## Live_Dance} => {Latin_Music} 0.005266623 0.5000000 0.01053325 7.338164 32
## [20] {Art_Museum,
## Classical_Music,
## Crafts_Fair,
## Outdoor_Festival,
## Jazz,
## Poetry} => {Latin_Music} 0.005266623 0.5000000 0.01053325 7.338164 32
The graph below enables further exploration of combinations of cultural goods, which concur with others. The arrows originate from the rule antecedent visualized as the rotund blue square and the arrows pointing to red dots, mark the rule consequent. For example, an arrow originating in “Opera”, pointing towards “Ballet” implies that attending the opera increases the likelihood of attending the ballet.
rules_top <- head(sort(rules_homology, by = "lift"), 50)
plot(rules_top, method = "graph", engine = "htmlwidget",
control = list(
type = "items",
layout = igraph::layout_with_fr,
alpha = 1,
arrowSize = 0.5
))## Warning: Unknown control parameters: type, layout, alpha, arrowSize
## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
According to the idea of social distinction, members of distinct classes send out various signals meant to embody the values and tastes of a given social stratum. This process can be treated similarly to the process of signalling as outlined by Michael Spence (1973) in the context of the job market, however taste does not serve solely as a signal of belonging, but rather as a means of symbolic power perpetuating social differences and prejudices. The analysis performed in this project seems to be in line with the later studies of taste, such as that performed by Peterson (1992), stating that the democratization of access led to the abolishing and dismantling of previously functioning hierarchies of taste, as outlined by Bourdieu (1984) and Veblen (1899) – observing a clear bifurcation between high-brow or luxury culture and popular or mass media. Nowadays, the schism previously formed by taste does not seem so clear, imposing and separating. The higher classes can be characterized by a more diversified consumption, no longer treating popular culture with disdain, but rather actively engaging in an eclectic variety of cultural activities – combining high-brow culture along with the popular.
This study successfully applies association rule mining algorithm to analyze the inner workings of taste and the interaction between three forms of capital as distinguished by Bourdieu (1986): economic, cultural and social capital. Implementation of unsupervised learning algorithms for an analysis of sociological or survey-based data serves as a powerful tool enabling the researcher to see patterns beyond common intution. The key contribution of the tehcniqu implemented in this study – association rule mining – is the generation of hidden relationships that might have otherwise remained unnoticed or unthought of. The value of this study lies in offering a nuanced representation of distinct class profiles and their cultural consumption. However, there is a lot of room for improvement in future reserach – while the implementation of asociation rule mining in itself provides interestign results, future studies could also incorporate other forms of unsupervised learnign to uncover hidden patterns in cultural consumption in relation to social classes.
Bourdieu, P. (1984). Distinction a social critique of the judgement of taste. In Inequality (pp. 287-318). Routledge.
Bourdieu, P. (1986). The forms of capital. In The sociology of economic life (pp. 78-92). Routledge.
Gans, H. J., (1974). Popular culture and high culture. New York: Basic Books.
Gondal, N. (2025). Rulenet: Mapping the structure of cultural preferences using association-rules and network graphs. Poetics, 110, 101996.
Marx, K. (1886). Pisma pomniejsze, t. 1, Librairie Keva, Paryż, s. 128
National Endowment for the Arts, and United States. Bureau of the Census. Survey of Public Participation in the Arts (SPPA), United States, 2017. Inter-university Consortium for Political and Social Research [distributor], 2019-02-04.
Pan, Z., Li, J., Chen, Y., Pacheco, J., Dai, L., & Zhang, J. (2019). Knowledge discovery in sociological databases: An application on general society survey dataset. International Journal of Crowd Science, 3(3), 315-332.
Peterson, R. A. (1992). Understanding audience segmentation: From elite and mass to omnivore and univore. Poetics, 21(4), 243-258.
Spence, M. (1973). Job Market Signaling. The Quarterly Journal of Economics, 87(3), 355–374.
Veblen, T., & Howells, W. D. (1899). The theory of the leisure class: 1899. AM Kelley.
Social Capital
Social capital approximated with classification of the area, in which the respondent lives and their occupation often determine the possibilities of one’s participation in distinct cultural events – metropolitan areas provide a wider variety of options due to a higher concentration of cultural institutions. Occupation, however, influences an individual inner circle, as well as (and mostly) class identity. This section examines the baskets of cultural goods dependent on the respondent’s occupation and living area. The table below presents aggregated rules for all combinations of occupation and location. The most sparse cultural baskets, containing mostly itemset relating to popular culture are consumed by sales associates, manager and manual workers living in rural areas, while the most diversified baskets belong to professionals and managers living in the metropolitan areas.
The graph plot below facillitates further socially targeted rule exploration.
The parallel coordinates plot illustrate the first ten rules, which all relate to people living in metropolitan areas. This illustration displays professional’s and managers inclination towards consumption of legitimized high-brow culture.