Distinction Rule Mining: An Application of Apriori Rule Mining to an Analysis of Taste

Introduction

“Taste classifies, and it classifies the classifier”, claims Pierre Bourdieu in his revolutionary study concerning the formation and modes of taste across distinct social classes – “Distinction: A Social Critique of the Judgement of Taste” (1984). The inner workings behind taste formation have been disputed invariably since the beginning of times, but only recently did the discourse on taste acquire a political, rather than merely ethical or aesthetic value. Bourdieu (1984) frames taste as a means of symbolic power – reproducing and perpetuating social inequality – endlessly deepening the schism between the masses and the elites.

Another dimension of taste is its binding ability to bring people belonging to a same sociological fraction – or stratum – together. Marx (1886) distinguishes social class as “class against capital” and “class for itself”. The former of these will serve as the main subject of the analysis performed in this project. The concerned definition of class is based on differences between the living conditions across distinct social strata, social class, in this sense, exists as a collectivity of individuals, who perceive the similarities between each other within a certain situation. Bourdieu (1984) extends this theory to capture how an individual’s status is conferred through the mode of presenting oneself to the world, one’s aesthetic dispositions. These strategies serve as a way to, not only distinguish oneself as a member of a certain social stratum (class), but also to distance oneself from lower groups. Aesthetic dispositions are imposed onto member’s of various social clases from their youngest days, so that the entire process of the acquisition of taste seems seamless and natural. The eye is a product of history reproduced by social upbringing and education.

Not all factors determining the belonging to a certain social class are as ephemeral and intangible as taste, according to Bourdieu (1986) class fractions are determined by a combination of social, economic and cultural capital. However, it is the symbolic goods, especially those regarded as the attributes of excellence that facilitate the strategies of distinction and inequality reproduction. These attributes are proclaimed excellent by the dominating class – amplifying and legitimizing social distances.

The aim of this study is to analyze whether such strategies of distinction still gain traction in today’s society, in which the access to cultural goods is no longer limited so as to favor the elites. The democratization of access that took place along digitization played a crucial role in dismantling the previous hierarchy of cultural goods and the status they conferred on their consumer. This study endeavors to find hidden relationships between social, economic and cultural capital, and hence cultural preferences and tastes, through an application of association rule mining algorithm.

Data Description

This study will be based on the Survey of Public Participation in the Arts (SPPA) – a combination of two sets surveys conducted in 2017. The most recent SPPA module was conducted in 2022, however, seeing as during 2022, the USA was still overtaken by the COVID-19 pandemic, which strongly affected the arts, especially the public participation in it, this project will utilize the 2017 module. The SPPA is comprised of the Current Population Survey (CPS) from July 2017 and the SPPA supplement, which provides information about public participation in the arts within the United States. The CPS, however, administered monthly by the U.S. Census Bureau, collects labor force data about population aged 15 or older living in the USA – it provides information about the socio-economic status of a civilian – age, gender, race, marital status, educational attainment, income, occupation, etc..

In addition to the basic CPS questions, two randomly selected household members, aged 18 or older, are asked bout arts attendance, visited venues, literary reading and the motivation behind it. interviewers asked supplementary questions on public participation in the arts of two randomly selected household members aged 18 or older from about one-half of the sampled CPS households. This supplement contains questions about the respondent’s participation in various artistic activities over the last year. The 2017 version included additional five modules capturing other types of arts participation and leisure activities, such as: training and exposure, frequency of participation, musical and artistic preferencs, school-age socialization and the use of electronic devices in art consumption. The modules are separated in a following way:

Module A: Consuming Art via Electronic Media
Module B: Performing Art
Module C: Creating Visual Art and Writing
Module D: Other Leisure Activities
Module E: Arts Education, and Arts Access and Opportunity

load("/Users/gosia/Downloads/ICPSR_37138/DS0001/37138-0001-Data.rda")
sppa <- da37138.0001
head(sppa)

##   CASEID       HRHHID HRMONTH HRYEAR4 HURESPLI                        HUFINAL
## 1      1 4.220117e+12       7    2017       NA (231) Unoccupied tent or trail
## 2      2 9.517675e+14       7    2017       NA           (226) Vacant regular
## 3      3 7.100992e+14       7    2017       NA           (226) Vacant regular
## 4      4 6.100091e+14       7    2017       NA           (226) Vacant regular
## 5      5 1.108629e+11       7    2017       NA           (226) Vacant regular
## 6      6 4.108131e+14       7    2017        1           (201) CAPI Compelete
##                  HULANGCODE                                 HETENURE
## 1 (0) Unlabeled/not Spanish                                     <NA>
## 2 (0) Unlabeled/not Spanish                                     <NA>
## 3 (0) Unlabeled/not Spanish                                     <NA>
## 4 (0) Unlabeled/not Spanish                                     <NA>
## 5 (0) Unlabeled/not Spanish                                     <NA>
## 6 (0) Unlabeled/not Spanish (1) Owned or being bought by a HH member
##                                    HEHOUSUT HETELHHD HETELAVL
## 1 (10) Unoccupied tent site or trailer site     <NA>     <NA>
## 2               (01) House, apartment, flat     <NA>     <NA>
## 3               (01) House, apartment, flat     <NA>     <NA>
## 4               (01) House, apartment, flat     <NA>     <NA>
## 5               (01) House, apartment, flat     <NA>     <NA>
## 6               (01) House, apartment, flat  (1) Yes     <NA>
##                HEPHONEO              HEFAMINC HUTYPEA
## 1 (0) Undocumented Code                  <NA>    <NA>
## 2 (0) Undocumented Code                  <NA>    <NA>
## 3 (0) Undocumented Code                  <NA>    <NA>
## 4 (0) Undocumented Code                  <NA>    <NA>
## 5 (0) Undocumented Code                  <NA>    <NA>
## 6               (1) Yes (12) 50,000 TO 59,999    <NA>
##                                     HUTYPB HUTYPC  HWHHWGT
## 1 (7) Unoccupied tent site or trailer site   <NA>        0
## 2                       (1) Vacant regular   <NA>        0
## 3                       (1) Vacant regular   <NA>        0
## 4                       (1) Vacant regular   <NA>        0
## 5                       (1) Vacant regular   <NA>        0
## 6                                     <NA>   <NA> 17691109
##                   HRINTSTA HRNUMHOU
## 1 (3) Type B non-interview        0
## 2 (3) Type B non-interview        0
## 3 (3) Type B non-interview        0
## 4 (3) Type B non-interview        0
## 5 (3) Type B non-interview        0
## 6            (1) Interview        2
##                                         HRHTYPE HRMIS     HUINTTYP HUPRSCNT
## 1                  (00) Non-interview household     7         <NA>        0
## 2                  (00) Non-interview household     2         <NA>        0
## 3                  (00) Non-interview household     3 (1) Personal        1
## 4                  (00) Non-interview household     3 (1) Personal        1
## 5                  (00) Non-interview household     3 (1) Personal        1
## 6 (01) Husband/wife primary family (neither AF)     1 (1) Personal        1
##                                HRLONGLK HRHHID2 HWHHWTLN  HUBUS HUBUSL1 HUBUSL2
## 1               (2) MIS 2-4 OR MIS 6-8     5011        0   <NA>      NA      NA
## 2               (2) MIS 2-4 OR MIS 6-8     7011        0   <NA>      NA      NA
## 3               (2) MIS 2-4 OR MIS 6-8     7011        0   <NA>      NA      NA
## 4               (2) MIS 2-4 OR MIS 6-8     7011        0   <NA>      NA      NA
## 5               (2) MIS 2-4 OR MIS 6-8     7011        0   <NA>      NA      NA
## 6 (0) MIS 1 OR REPLACEMENT HH (NO LINK)    7011        1 (2) No      NA      NA
##   HUBUSL3 HUBUSL4     GEREG                  GEDIV   GCFIP GCTCB GCTCO
## 1      NA      NA (3) South (6) East South Central (01) AL 33860     0
## 2      NA      NA (3) South (6) East South Central (01) AL 19300     3
## 3      NA      NA (3) South (6) East South Central (01) AL 19300     3
## 4      NA      NA (3) South (6) East South Central (01) AL 19300     3
## 5      NA      NA (3) South (6) East South Central (01) AL 19300     3
## 6      NA      NA (3) South (6) East South Central (01) AL 13820     0
##             GTCBSAST         GTMETSTA GTINDVPC                  GTCBSASZ GCTCS
## 1        (2) Balance (1) Metropolitan        0     (3) 250,000 - 499,999     0
## 2 (4) Not identified (1) Metropolitan        0     (2) 100,000 - 249,999   380
## 3 (4) Not identified (1) Metropolitan        0     (2) 100,000 - 249,999   380
## 4 (4) Not identified (1) Metropolitan        0     (2) 100,000 - 249,999   380
## 5 (4) Not identified (1) Metropolitan        0     (2) 100,000 - 249,999   380
## 6        (2) Balance (1) Metropolitan        0 (5) 1,000,000 - 2,499,999     0
##                           PERRP PEPARENT PRTAGE         PRTFAGE
## 1                          <NA>       NA     NA (0) No top code
## 2                          <NA>       NA     NA (0) No top code
## 3                          <NA>       NA     NA (0) No top code
## 4                          <NA>       NA     NA (0) No top code
## 5                          <NA>       NA     NA (0) No top code
## 6 (01) Reference person w/rels.       NA     73 (0) No top code
##                       PEMARITL PESPOUSE      PESEX PEAFEVER PEAFNOW
## 1                         <NA>       NA       <NA>     <NA>    <NA>
## 2                         <NA>       NA       <NA>     <NA>    <NA>
## 3                         <NA>       NA       <NA>     <NA>    <NA>
## 4                         <NA>       NA       <NA>     <NA>    <NA>
## 5                         <NA>       NA       <NA>     <NA>    <NA>
## 6 (1) Married - spouse present        2 (2) Female   (2) No  (2) No
##                 PEEDUCA        PTDTRACE PRDTHSP
## 1                  <NA>            <NA>    <NA>
## 2                  <NA>            <NA>    <NA>
## 3                  <NA>            <NA>    <NA>
## 4                  <NA>            <NA>    <NA>
## 5                  <NA>            <NA>    <NA>
## 6 (34) 7th or 8th grade (01) White Only    <NA>
##                                PUCHINHH PULINENO
## 1                                  <NA>       NA
## 2                                  <NA>       NA
## 3                                  <NA>       NA
## 4                                  <NA>       NA
## 5                                  <NA>       NA
## 6 (9) Change in demographic information        1
##                          PRFAMNUM             PRFAMREL           PRFAMTYP
## 1                            <NA>                 <NA>               <NA>
## 2                            <NA>                 <NA>               <NA>
## 3                            <NA>                 <NA>               <NA>
## 4                            <NA>                 <NA>               <NA>
## 5                            <NA>                 <NA>               <NA>
## 6 (01) Primary family member only (1) Reference person (1) Primary family
##           PEHSPNON                             PRMARSTA
## 1             <NA>                                 <NA>
## 2             <NA>                                 <NA>
## 3             <NA>                                 <NA>
## 4             <NA>                                 <NA>
## 5             <NA>                                 <NA>
## 6 (2) Non-hispanic (1) Married, civilian spouse present
##                              PRPERTYP            PENATVTY            PEMNTVTY
## 1                                <NA>                <NA>                <NA>
## 2                                <NA>                <NA>                <NA>
## 3                                <NA>                <NA>                <NA>
## 4                                <NA>                <NA>                <NA>
## 5                                <NA>                <NA>                <NA>
## 6 (2) Adult civilian household member (057) United States (057) United States
##              PEFNTVTY                              PRCITSHP PRCITFLG
## 1                <NA>                                  <NA>       NA
## 2                <NA>                                  <NA>       NA
## 3                <NA>                                  <NA>       NA
## 4                <NA>                                  <NA>       NA
## 5                <NA>                                  <NA>       NA
## 6 (057) United States (1) Native, born in the United States        0
##                PRINUYER PUSLFPRX                          PEMLR        PUWK
## 1                  <NA>     <NA>                           <NA>        <NA>
## 2                  <NA>     <NA>                           <NA>        <NA>
## 3                  <NA>     <NA>                           <NA>        <NA>
## 4                  <NA>     <NA>                           <NA>        <NA>
## 5                  <NA>     <NA>                           <NA>        <NA>
## 6 (00) Not foreign born (1) Self (5) Not in labor force-retired (3) Retired
##   PUBUS1 PUBUS2OT          PUBUSCK1 PUBUSCK2 PUBUSCK3 PUBUSCK4 PURETOT PUDIS
## 1   <NA>     <NA>              <NA>     <NA>     <NA>     <NA>    <NA>  <NA>
## 2   <NA>     <NA>              <NA>     <NA>     <NA>     <NA>    <NA>  <NA>
## 3   <NA>     <NA>              <NA>     <NA>     <NA>     <NA>    <NA>  <NA>
## 4   <NA>     <NA>              <NA>     <NA>     <NA>     <NA>    <NA>  <NA>
## 5   <NA>     <NA>              <NA>     <NA>     <NA>     <NA>    <NA>  <NA>
## 6   <NA>     <NA> (2) GOTO PURETCK1     <NA>     <NA>     <NA>    <NA>  <NA>
##   PERET1 PUDIS1 PUDIS2 PUABSOT PULAY PEABSRSN PEABSPDO PEMJOT PEMJNUM PEHRUSL1
## 1   <NA>   <NA>   <NA>    <NA>  <NA>     <NA>     <NA>   <NA>    <NA>       NA
## 2   <NA>   <NA>   <NA>    <NA>  <NA>     <NA>     <NA>   <NA>    <NA>       NA
## 3   <NA>   <NA>   <NA>    <NA>  <NA>     <NA>     <NA>   <NA>    <NA>       NA
## 4   <NA>   <NA>   <NA>    <NA>  <NA>     <NA>     <NA>   <NA>    <NA>       NA
## 5   <NA>   <NA>   <NA>    <NA>  <NA>     <NA>     <NA>   <NA>    <NA>       NA
## 6 (2) No   <NA>   <NA>    <NA>  <NA>     <NA>     <NA>   <NA>    <NA>       NA
##   PEHRUSL2 PEHRFTPT PEHRUSLT PEHRWANT PEHRRSN1 PEHRRSN2 PEHRRSN3 PUHROFF1
## 1       NA     <NA>       NA     <NA>     <NA>     <NA>     <NA>     <NA>
## 2       NA     <NA>       NA     <NA>     <NA>     <NA>     <NA>     <NA>
## 3       NA     <NA>       NA     <NA>     <NA>     <NA>     <NA>     <NA>
## 4       NA     <NA>       NA     <NA>     <NA>     <NA>     <NA>     <NA>
## 5       NA     <NA>       NA     <NA>     <NA>     <NA>     <NA>     <NA>
## 6       NA     <NA>       NA     <NA>     <NA>     <NA>     <NA>     <NA>
##   PUHROFF2 PUHROT1 PUHROT2 PEHRACT1 PEHRACT2 PEHRACTT PEHRAVL PUHRCK1 PUHRCK2
## 1       NA    <NA>      NA       NA       NA       NA    <NA>    <NA>    <NA>
## 2       NA    <NA>      NA       NA       NA       NA    <NA>    <NA>    <NA>
## 3       NA    <NA>      NA       NA       NA       NA    <NA>    <NA>    <NA>
## 4       NA    <NA>      NA       NA       NA       NA    <NA>    <NA>    <NA>
## 5       NA    <NA>      NA       NA       NA       NA    <NA>    <NA>    <NA>
## 6       NA    <NA>      NA       NA       NA       NA    <NA>    <NA>    <NA>
##   PUHRCK3 PUHRCK4 PUHRCK5 PUHRCK6 PUHRCK7 PUHRCK12 PULAYDT PULAY6M PELAYAVL
## 1    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
##   PULAYAVR PELAYLK PELAYDUR PELAYFTO PULAYCK1 PULAYCK2 PULAYCK3 PULK PELKM1
## 1     <NA>    <NA>       NA     <NA>     <NA>     <NA>     <NA> <NA>   <NA>
## 2     <NA>    <NA>       NA     <NA>     <NA>     <NA>     <NA> <NA>   <NA>
## 3     <NA>    <NA>       NA     <NA>     <NA>     <NA>     <NA> <NA>   <NA>
## 4     <NA>    <NA>       NA     <NA>     <NA>     <NA>     <NA> <NA>   <NA>
## 5     <NA>    <NA>       NA     <NA>     <NA>     <NA>     <NA> <NA>   <NA>
## 6     <NA>    <NA>       NA     <NA>     <NA>     <NA>     <NA> <NA>   <NA>
##   PULKM2 PULKM3 PULKM4 PULKM5 PULKM6 PULKDK1 PULKDK2 PULKDK3 PULKDK4 PULKDK5
## 1   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>      NA
## 2   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>      NA
## 3   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>      NA
## 4   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>      NA
## 5   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>      NA
## 6   <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>      NA
##   PULKDK6 PULKPS1 PULKPS2 PULKPS3 PULKPS4 PULKPS5 PULKPS6 PELKAVL PULKAVR
## 1      NA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2      NA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3      NA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4      NA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5      NA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6      NA    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PELKLL1O PELKLL2O PELKLWO PELKDUR PELKFTO PEDWWNTO PEDWRSN PEDWLKO PEDWWK
## 1     <NA>     <NA>    <NA>      NA    <NA>     <NA>    <NA>    <NA>   <NA>
## 2     <NA>     <NA>    <NA>      NA    <NA>     <NA>    <NA>    <NA>   <NA>
## 3     <NA>     <NA>    <NA>      NA    <NA>     <NA>    <NA>    <NA>   <NA>
## 4     <NA>     <NA>    <NA>      NA    <NA>     <NA>    <NA>    <NA>   <NA>
## 5     <NA>     <NA>    <NA>      NA    <NA>     <NA>    <NA>    <NA>   <NA>
## 6     <NA>     <NA>    <NA>      NA    <NA>     <NA>    <NA>    <NA>   <NA>
##   PEDW4WK PEDWLKWK PEDWAVL PEDWAVR PUDWCK1 PUDWCK2 PUDWCK3 PUDWCK4 PUDWCK5
## 1    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEJHWKO PUJHDP1O PEJHRSN PEJHWANT PUJHCK1 PUJHCK2 PRABSREA
## 1    <NA>     <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 2    <NA>     <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 3    <NA>     <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 4    <NA>     <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 5    <NA>     <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
## 6    <NA>     <NA>    <NA>     <NA>    <NA>    <NA>     <NA>
##                           PRCIVLF PRDISC                 PREMPHRS
## 1                            <NA>   <NA>                     <NA>
## 2                            <NA>   <NA>                     <NA>
## 3                            <NA>   <NA>                     <NA>
## 4                            <NA>   <NA>                     <NA>
## 5                            <NA>   <NA>                     <NA>
## 6 (2) Not in civilian labor force   <NA> (00) Unemployed and NILF
##                              PREMPNOT PREXPLF PRFTLF PRHRUSL PRJOBSEA PRPTHRS
## 1                                <NA>    <NA>   <NA>    <NA>     <NA>    <NA>
## 2                                <NA>    <NA>   <NA>    <NA>     <NA>    <NA>
## 3                                <NA>    <NA>   <NA>    <NA>     <NA>    <NA>
## 4                                <NA>    <NA>   <NA>    <NA>     <NA>    <NA>
## 5                                <NA>    <NA>   <NA>    <NA>     <NA>    <NA>
## 6 (4) Not in labor force (NILF)-other    <NA>   <NA>    <NA>     <NA>    <NA>
##   PRPTREA PRUNEDUR PRUNTYPE                PRWKSCH                PRWKSTAT
## 1    <NA>       NA     <NA>                   <NA>                    <NA>
## 2    <NA>       NA     <NA>                   <NA>                    <NA>
## 3    <NA>       NA     <NA>                   <NA>                    <NA>
## 4    <NA>       NA     <NA>                   <NA>                    <NA>
## 5    <NA>       NA     <NA>                   <NA>                    <NA>
## 6    <NA>       NA     <NA> (0) Not in labor force (01) Not in labor force
##                       PRWNTJOB PUJHCK3 PUJHCK4 PUJHCK5 PUIODP1 PUIODP2 PUIODP3
## 1                         <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2                         <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3                         <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4                         <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5                         <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6 (2) Other not in labor force    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEIO1COW PUIO1MFG PEIO2COW PUIO2MFG PUIOCK1 PUIOCK2 PUIOCK3
## 1     <NA>     <NA>     <NA>     <NA>    <NA>    <NA>    <NA>
## 2     <NA>     <NA>     <NA>     <NA>    <NA>    <NA>    <NA>
## 3     <NA>     <NA>     <NA>     <NA>    <NA>    <NA>    <NA>
## 4     <NA>     <NA>     <NA>     <NA>    <NA>    <NA>    <NA>
## 5     <NA>     <NA>     <NA>     <NA>    <NA>    <NA>    <NA>
## 6     <NA>     <NA>     <NA>     <NA>    <NA>    <NA>    <NA>
##                     PRIOELG PRAGNA PRCOW1 PRCOW2 PRCOWPG PRDTCOW1 PRDTCOW2
## 1                      <NA>   <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 2                      <NA>   <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 3                      <NA>   <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 4                      <NA>   <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 5                      <NA>   <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
## 6 (0) Not eligible for edit   <NA>   <NA>   <NA>    <NA>     <NA>     <NA>
##   PRDTIND1 PRDTIND2 PRDTOCC1 PRDTOCC2 PREMP PRMJIND1 PRMJIND2 PRMJOCC1 PRMJOCC2
## 1     <NA>     <NA>     <NA>     <NA>  <NA>     <NA>     <NA>     <NA>     <NA>
## 2     <NA>     <NA>     <NA>     <NA>  <NA>     <NA>     <NA>     <NA>     <NA>
## 3     <NA>     <NA>     <NA>     <NA>  <NA>     <NA>     <NA>     <NA>     <NA>
## 4     <NA>     <NA>     <NA>     <NA>  <NA>     <NA>     <NA>     <NA>     <NA>
## 5     <NA>     <NA>     <NA>     <NA>  <NA>     <NA>     <NA>     <NA>     <NA>
## 6     <NA>     <NA>     <NA>     <NA>  <NA>     <NA>     <NA>     <NA>     <NA>
##   PRMJOCGR PRNAGPWS PRNAGWS PRSJMJ                   PRERELG PEERNUOT PEERNPER
## 1     <NA>     <NA>    <NA>   <NA>                      <NA>     <NA>     <NA>
## 2     <NA>     <NA>    <NA>   <NA>                      <NA>     <NA>     <NA>
## 3     <NA>     <NA>    <NA>   <NA>                      <NA>     <NA>     <NA>
## 4     <NA>     <NA>    <NA>   <NA>                      <NA>     <NA>     <NA>
## 5     <NA>     <NA>    <NA>   <NA>                      <NA>     <NA>     <NA>
## 6     <NA>     <NA>    <NA>   <NA> (0) Not eligible for edit     <NA>     <NA>
##   PEERNRT PEERNHRY PUERNH1C PEERNH2 PEERNH1O PRERNHLY             PTHR PEERNHRO
## 1    <NA>     <NA>       NA      NA       NA       NA (0) Not topcoded       NA
## 2    <NA>     <NA>       NA      NA       NA       NA (0) Not topcoded       NA
## 3    <NA>     <NA>       NA      NA       NA       NA (0) Not topcoded       NA
## 4    <NA>     <NA>       NA      NA       NA       NA (0) Not topcoded       NA
## 5    <NA>     <NA>       NA      NA       NA       NA (0) Not topcoded       NA
## 6    <NA>     <NA>       NA      NA       NA       NA (0) Not topcoded       NA
##   PRERNWA             PTWK PEERN PUERN2             PTOT PEERNWKP PEERNLAB
## 1      NA (0) Not topcoded    NA     NA (0) Not topcoded       NA     <NA>
## 2      NA (0) Not topcoded    NA     NA (0) Not topcoded       NA     <NA>
## 3      NA (0) Not topcoded    NA     NA (0) Not topcoded       NA     <NA>
## 4      NA (0) Not topcoded    NA     NA (0) Not topcoded       NA     <NA>
## 5      NA (0) Not topcoded    NA     NA (0) Not topcoded       NA     <NA>
## 6      NA (0) Not topcoded    NA     NA (0) Not topcoded       NA     <NA>
##   PEERNCOV PENLFJH PENLFRET PENLFACT PUNLFCK1                     PUNLFCK2
## 1     <NA>    <NA>     <NA>     <NA>     <NA>                         <NA>
## 2     <NA>    <NA>     <NA>     <NA>     <NA>                         <NA>
## 3     <NA>    <NA>     <NA>     <NA>     <NA>                         <NA>
## 4     <NA>    <NA>     <NA>     <NA>     <NA>                         <NA>
## 5     <NA>    <NA>     <NA>     <NA>     <NA>                         <NA>
## 6     <NA>    <NA>     <NA>     <NA>     <NA> (2) All others goto LBFR-END
##   PESCHENR PESCHFT PESCHLVL PRNLFSCH  PWFMWGT PWLGWGT PWORWGT  PWSSWGT PWVETWGT
## 1     <NA>    <NA>     <NA>     <NA>        0       0       0        0        0
## 2     <NA>    <NA>     <NA>     <NA>        0       0       0        0        0
## 3     <NA>    <NA>     <NA>     <NA>        0       0       0        0        0
## 4     <NA>    <NA>     <NA>     <NA>        0       0       0        0        0
## 5     <NA>    <NA>     <NA>     <NA>        0       0       0        0        0
## 6     <NA>    <NA>     <NA>     <NA> 17691109       0       0 17691109 17888002
##                                       PRCHLD PRNMCHLD               PXPDEMP1
## 1                                       <NA>       NA                   <NA>
## 2                                       <NA>       NA                   <NA>
## 3                                       <NA>       NA                   <NA>
## 4                                       <NA>       NA                   <NA>
## 5                                       <NA>       NA                   <NA>
## 6 (00) No own children under 18 years of age        0 (00) Value - no change
##   PRWERNAL PRHERNAL               HXTENURE               HXHOUSUT
## 1     <NA>     <NA> (01) Blank - no change (00) Value - no change
## 2     <NA>     <NA> (01) Blank - no change (00) Value - no change
## 3     <NA>     <NA> (01) Blank - no change (00) Value - no change
## 4     <NA>     <NA> (01) Blank - no change (00) Value - no change
## 5     <NA>     <NA> (01) Blank - no change (00) Value - no change
## 6     <NA>     <NA> (00) Value - no change (00) Value - no change
##                 HXTELHHD               HXTELAVL               HXPHONEO PXINUSYR
## 1 (01) Blank - no change (01) Blank - no change (00) Value - no change     <NA>
## 2 (01) Blank - no change (01) Blank - no change (00) Value - no change     <NA>
## 3 (01) Blank - no change (01) Blank - no change (00) Value - no change     <NA>
## 4 (01) Blank - no change (01) Blank - no change (00) Value - no change     <NA>
## 5 (01) Blank - no change (01) Blank - no change (00) Value - no change     <NA>
## 6 (00) Value - no change (01) Blank - no change (00) Value - no change     <NA>
##                    PXRRP            PXPARENT                  PXAGE
## 1                   <NA>                <NA>                   <NA>
## 2                   <NA>                <NA>                   <NA>
## 3                   <NA>                <NA>                   <NA>
## 4                   <NA>                <NA>                   <NA>
## 5                   <NA>                <NA>                   <NA>
## 6 (00) Value - no change (50) Value to blank (00) Value - no change
##                 PXMARITL               PXSPOUSE                  PXSEX
## 1                   <NA>                   <NA>                   <NA>
## 2                   <NA>                   <NA>                   <NA>
## 3                   <NA>                   <NA>                   <NA>
## 4                   <NA>                   <NA>                   <NA>
## 5                   <NA>                   <NA>                   <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change
##                 PXAFWHN1                PXAFNOW                PXEDUCA
## 1                   <NA>                   <NA>                   <NA>
## 2                   <NA>                   <NA>                   <NA>
## 3                   <NA>                   <NA>                   <NA>
## 4                   <NA>                   <NA>                   <NA>
## 5                   <NA>                   <NA>                   <NA>
## 6 (01) Blank - no change (00) Value - no change (00) Value - no change
##                  PXRACE1 PXNATVTY PXMNTVTY PXFNTVTY               PXNMEMP1
## 1                   <NA>     <NA>     <NA>     <NA>                   <NA>
## 2                   <NA>     <NA>     <NA>     <NA>                   <NA>
## 3                   <NA>     <NA>     <NA>     <NA>                   <NA>
## 4                   <NA>     <NA>     <NA>     <NA>                   <NA>
## 5                   <NA>     <NA>     <NA>     <NA>                   <NA>
## 6 (00) Value - no change     <NA>     <NA>     <NA> (00) Value - no change
##                 PXHSPNON                  PXMLR                 PXRET1 PXABSRSN
## 1                   <NA>                   <NA>                   <NA>     <NA>
## 2                   <NA>                   <NA>                   <NA>     <NA>
## 3                   <NA>                   <NA>                   <NA>     <NA>
## 4                   <NA>                   <NA>                   <NA>     <NA>
## 5                   <NA>                   <NA>                   <NA>     <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change     <NA>
##   PXABSPDO PXMJOT PXMJNUM PXHRUSL1 PXHRUSL2               PXHRFTPT PXHRUSLT
## 1     <NA>   <NA>    <NA>     <NA>     <NA>                   <NA>     <NA>
## 2     <NA>   <NA>    <NA>     <NA>     <NA>                   <NA>     <NA>
## 3     <NA>   <NA>    <NA>     <NA>     <NA>                   <NA>     <NA>
## 4     <NA>   <NA>    <NA>     <NA>     <NA>                   <NA>     <NA>
## 5     <NA>   <NA>    <NA>     <NA>     <NA>                   <NA>     <NA>
## 6     <NA>   <NA>    <NA>     <NA>     <NA> (01) Blank - no change     <NA>
##   PXHRWANT PXHRRSN1 PXHRRSN2 PXHRACT1 PXHRACT2 PXHRACTT PXHRRSN3 PXHRAVL
## 1     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 2     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 3     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 4     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 5     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 6     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
##   PXLAYAVL PXLAYLK PXLAYDUR PXLAYFTO PXLKM1 PXLKAVL PXLKLL1O PXLKLL2O PXLKLWO
## 1     <NA>    <NA>     <NA>     <NA>   <NA>    <NA>     <NA>     <NA>    <NA>
## 2     <NA>    <NA>     <NA>     <NA>   <NA>    <NA>     <NA>     <NA>    <NA>
## 3     <NA>    <NA>     <NA>     <NA>   <NA>    <NA>     <NA>     <NA>    <NA>
## 4     <NA>    <NA>     <NA>     <NA>   <NA>    <NA>     <NA>     <NA>    <NA>
## 5     <NA>    <NA>     <NA>     <NA>   <NA>    <NA>     <NA>     <NA>    <NA>
## 6     <NA>    <NA>     <NA>     <NA>   <NA>    <NA>     <NA>     <NA>    <NA>
##   PXLKDUR PXLKFTO               PXDWWNTO                PXDWRSN
## 1    <NA>    <NA>                   <NA>                   <NA>
## 2    <NA>    <NA>                   <NA>                   <NA>
## 3    <NA>    <NA>                   <NA>                   <NA>
## 4    <NA>    <NA>                   <NA>                   <NA>
## 5    <NA>    <NA>                   <NA>                   <NA>
## 6    <NA>    <NA> (01) Blank - no change (01) Blank - no change
##                  PXDWLKO                 PXDWWK                PXDW4WK
## 1                   <NA>                   <NA>                   <NA>
## 2                   <NA>                   <NA>                   <NA>
## 3                   <NA>                   <NA>                   <NA>
## 4                   <NA>                   <NA>                   <NA>
## 5                   <NA>                   <NA>                   <NA>
## 6 (01) Blank - no change (01) Blank - no change (01) Blank - no change
##                 PXDWLKWK                PXDWAVL                PXDWAVR
## 1                   <NA>                   <NA>                   <NA>
## 2                   <NA>                   <NA>                   <NA>
## 3                   <NA>                   <NA>                   <NA>
## 4                   <NA>                   <NA>                   <NA>
## 5                   <NA>                   <NA>                   <NA>
## 6 (01) Blank - no change (01) Blank - no change (01) Blank - no change
##                  PXJHWKO                PXJHRSN               PXJHWANT PXIO1COW
## 1                   <NA>                   <NA>                   <NA>     <NA>
## 2                   <NA>                   <NA>                   <NA>     <NA>
## 3                   <NA>                   <NA>                   <NA>     <NA>
## 4                   <NA>                   <NA>                   <NA>     <NA>
## 5                   <NA>                   <NA>                   <NA>     <NA>
## 6 (01) Blank - no change (01) Blank - no change (01) Blank - no change     <NA>
##   PXIO1ICD PXIO1OCD PXIO2COW PXIO2ICD PXIO2OCD PXERNUOT PXERNPER PXERNH1O
## 1     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 2     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 3     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 4     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 5     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 6     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
##   PXERNHRO PXERN               PXPDEMP2               PXNMEMP2 PXERNWKP PXERNRT
## 1     <NA>  <NA>                   <NA>                   <NA>     <NA>    <NA>
## 2     <NA>  <NA>                   <NA>                   <NA>     <NA>    <NA>
## 3     <NA>  <NA>                   <NA>                   <NA>     <NA>    <NA>
## 4     <NA>  <NA>                   <NA>                   <NA>     <NA>    <NA>
## 5     <NA>  <NA>                   <NA>                   <NA>     <NA>    <NA>
## 6     <NA>  <NA> (00) Value - no change (00) Value - no change     <NA>    <NA>
##   PXERNHRY PXERNH2 PXERNLAB PXERNCOV                PXNLFJH
## 1     <NA>    <NA>     <NA>     <NA>                   <NA>
## 2     <NA>    <NA>     <NA>     <NA>                   <NA>
## 3     <NA>    <NA>     <NA>     <NA>                   <NA>
## 4     <NA>    <NA>     <NA>     <NA>                   <NA>
## 5     <NA>    <NA>     <NA>     <NA>                   <NA>
## 6     <NA>    <NA>     <NA>     <NA> (01) Blank - no change
##                 PXNLFRET               PXNLFACT PXSCHENR PXSCHFT PXSCHLVL
## 1                   <NA>                   <NA>     <NA>    <NA>     <NA>
## 2                   <NA>                   <NA>     <NA>    <NA>     <NA>
## 3                   <NA>                   <NA>     <NA>    <NA>     <NA>
## 4                   <NA>                   <NA>     <NA>    <NA>     <NA>
## 5                   <NA>                   <NA>     <NA>    <NA>     <NA>
## 6 (01) Blank - no change (01) Blank - no change     <NA>    <NA>     <NA>
##   QSTNUM OCCURNUM PEDIPGED PEHGCOMP PECYC               PXDIPGED
## 1      1        1     <NA>     <NA>  <NA>                   <NA>
## 2      2        1     <NA>     <NA>  <NA>                   <NA>
## 3      3        1     <NA>     <NA>  <NA>                   <NA>
## 4      4        1     <NA>     <NA>  <NA>                   <NA>
## 5      5        1     <NA>     <NA>  <NA>                   <NA>
## 6      6        1     <NA>     <NA>  <NA> (01) Blank - no change
##                 PXHGCOMP                  PXCYC PWCMPWGT PEIO1ICD PEIO1OCD
## 1                   <NA>                   <NA>        0       NA       NA
## 2                   <NA>                   <NA>        0       NA       NA
## 3                   <NA>                   <NA>        0       NA       NA
## 4                   <NA>                   <NA>        0       NA       NA
## 5                   <NA>                   <NA>        0       NA       NA
## 6 (01) Blank - no change (01) Blank - no change 17928473       NA       NA
##   PEIO2ICD PEIO2OCD PRIMIND1 PRIMIND2 PEAFWHN1 PEAFWHN2 PEAFWHN3 PEAFWHN4
## 1       NA       NA     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 2       NA       NA     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 3       NA       NA     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 4       NA       NA     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 5       NA       NA     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 6       NA       NA     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
##                 PXAFEVER PELNDAD PELNMOM PEDADTYP PEMOMTYP PECOHAB
## 1                   <NA>      NA      NA     <NA>     <NA>      NA
## 2                   <NA>      NA      NA     <NA>     <NA>      NA
## 3                   <NA>      NA      NA     <NA>     <NA>      NA
## 4                   <NA>      NA      NA     <NA>     <NA>      NA
## 5                   <NA>      NA      NA     <NA>     <NA>      NA
## 6 (00) Value - no change      NA      NA     <NA>     <NA>      NA
##               PXLNDAD             PXLNMOM               PXDADTYP
## 1                <NA>                <NA>                   <NA>
## 2                <NA>                <NA>                   <NA>
## 3                <NA>                <NA>                   <NA>
## 4                <NA>                <NA>                   <NA>
## 5                <NA>                <NA>                   <NA>
## 6 (50) Value to blank (50) Value to blank (01) Blank - no change
##                 PXMOMTYP                PXCOHAB PEDISEAR PEDISEYE PEDISREM
## 1                   <NA>                   <NA>     <NA>     <NA>     <NA>
## 2                   <NA>                   <NA>     <NA>     <NA>     <NA>
## 3                   <NA>                   <NA>     <NA>     <NA>     <NA>
## 4                   <NA>                   <NA>     <NA>     <NA>     <NA>
## 5                   <NA>                   <NA>     <NA>     <NA>     <NA>
## 6 (01) Blank - no change (01) Blank - no change   (2) No   (2) No   (2) No
##   PEDISPHY PEDISDRS PEDISOUT PRDISFLG               PXDISEAR
## 1     <NA>     <NA>     <NA>     <NA>                   <NA>
## 2     <NA>     <NA>     <NA>     <NA>                   <NA>
## 3     <NA>     <NA>     <NA>     <NA>                   <NA>
## 4     <NA>     <NA>     <NA>     <NA>                   <NA>
## 5     <NA>     <NA>     <NA>     <NA>                   <NA>
## 6   (2) No   (2) No   (2) No   (2) No (00) Value - no change
##                 PXDISEYE               PXDISREM               PXDISPHY
## 1                   <NA>                   <NA>                   <NA>
## 2                   <NA>                   <NA>                   <NA>
## 3                   <NA>                   <NA>                   <NA>
## 4                   <NA>                   <NA>                   <NA>
## 5                   <NA>                   <NA>                   <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change
##                 PXDISDRS               PXDISOUT                        HXFAMINC
## 1                   <NA>                   <NA>          (01) Blank - no change
## 2                   <NA>                   <NA>          (01) Blank - no change
## 3                   <NA>                   <NA>          (01) Blank - no change
## 4                   <NA>                   <NA>          (01) Blank - no change
## 5                   <NA>                   <NA>          (01) Blank - no change
## 6 (00) Value - no change (00) Value - no change (43) Refused to allocated value
##   PRDASIAN PEPDEMP1 PTNMEMP1 PEPDEMP2 PTNMEMP2 PECERT1 PECERT2 PECERT3
## 1     <NA>     <NA>       NA     <NA>       NA    <NA>    <NA>    <NA>
## 2     <NA>     <NA>       NA     <NA>       NA    <NA>    <NA>    <NA>
## 3     <NA>     <NA>       NA     <NA>       NA    <NA>    <NA>    <NA>
## 4     <NA>     <NA>       NA     <NA>       NA    <NA>    <NA>    <NA>
## 5     <NA>     <NA>       NA     <NA>       NA    <NA>    <NA>    <NA>
## 6     <NA>     <NA>       NA     <NA>       NA  (2) No    <NA>    <NA>
##                  PXCERT1                PXCERT2                PXCERT3 PEC1Q1A
## 1                   <NA>                   <NA>                   <NA>    <NA>
## 2                   <NA>                   <NA>                   <NA>    <NA>
## 3                   <NA>                   <NA>                   <NA>    <NA>
## 4                   <NA>                   <NA>                   <NA>    <NA>
## 5                   <NA>                   <NA>                   <NA>    <NA>
## 6 (00) Value - no change (00) Value - no change (00) Value - no change    <NA>
##   PTC1Q1B PEC1Q2A PTC1Q2B PEC1Q3A PTC1Q3B PEC1Q4A PTC1Q4B PEC1Q5A PTC1Q5B
## 1      NA    <NA>      NA    <NA>      NA    <NA>      NA    <NA>      NA
## 2      NA    <NA>      NA    <NA>      NA    <NA>      NA    <NA>      NA
## 3      NA    <NA>      NA    <NA>      NA    <NA>      NA    <NA>      NA
## 4      NA    <NA>      NA    <NA>      NA    <NA>      NA    <NA>      NA
## 5      NA    <NA>      NA    <NA>      NA    <NA>      NA    <NA>      NA
## 6      NA    <NA>      NA    <NA>      NA    <NA>      NA    <NA>      NA
##   PEC1Q6A PTC1Q6B PEC1Q7A PTC1Q7B PEC1Q8A PTC1Q8B PEC1Q9A PEC1Q10A PTC1Q10B
## 1    <NA>      NA    <NA>      NA    <NA>      NA    <NA>     <NA>       NA
## 2    <NA>      NA    <NA>      NA    <NA>      NA    <NA>     <NA>       NA
## 3    <NA>      NA    <NA>      NA    <NA>      NA    <NA>     <NA>       NA
## 4    <NA>      NA    <NA>      NA    <NA>      NA    <NA>     <NA>       NA
## 5    <NA>      NA    <NA>      NA    <NA>      NA    <NA>     <NA>       NA
## 6    <NA>      NA    <NA>      NA    <NA>      NA    <NA>     <NA>       NA
##   PEC1Q11A PEC1Q12A PEC1Q13A PEC1Q14A PTC1Q14B PEC1Q15A PEC1Q15B PEC1Q15C
## 1     <NA>     <NA>     <NA>     <NA>       NA     <NA>     <NA>     <NA>
## 2     <NA>     <NA>     <NA>     <NA>       NA     <NA>     <NA>     <NA>
## 3     <NA>     <NA>     <NA>     <NA>       NA     <NA>     <NA>     <NA>
## 4     <NA>     <NA>     <NA>     <NA>       NA     <NA>     <NA>     <NA>
## 5     <NA>     <NA>     <NA>     <NA>       NA     <NA>     <NA>     <NA>
## 6     <NA>     <NA>     <NA>     <NA>       NA     <NA>     <NA>     <NA>
##   PEC1Q16A PEC1Q16B PEC1Q16C PEC1Q16D PEC1Q16E PEC1Q17A PEC1Q18A PEC2Q1A
## 1     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 2     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 3     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 4     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 5     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
## 6     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>    <NA>
##   PEC2Q1B PEC2Q1C PEC2Q1D PEC2Q1E PEC2Q1F PEC2Q1G PEC2Q2A PEC2Q2B PEC2Q2C
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEC2Q2D PEC2Q2E PEC2Q2F PEC2Q3A PEC2Q3B PEC2Q3C PEC2Q3D PEC2Q3E PEC2Q3F
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEC2Q3G PEC2Q3H PEC2Q3I PEC2Q4A PEC2Q4B PEC2Q4C PEC2Q4D PEC2Q4E PEC2Q4F
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEC2Q4G PEC2Q4H PEC2Q4I PEMAQ1A PEMAQ1B PEMAQ1C PEMAQ1D PEMAQ1E PEMAQ1F
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMAQ1G PEMAQ1H PEMAQ1I PEMAQ1J PEMAQ2A PEMAQ2B PEMAQ2C PEMAQ2D PEMAQ2E
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMAQ2F PEMAQ2G PEMAQ2H PEMAQ2I PEMAQ2J PEMAQ3A PEMAQ3B PEMAQ4A PEMAQ4B
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMAQ4C PEMAQ4D PEMBQ1A PEMBQ1B PEMBQ1C PEMBQ1AA PEMBQ1BB PEMBQ1CC PEMBQ1D
## 1    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>     <NA>     <NA>     <NA>    <NA>
##   PEMBQ1E PEMBQ1DD PEMBQ1F PEMBQ2A PEMBQ2B PEMBQ2C PEMBQ2CC PEMBQ2D PEMBQ2E
## 1    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 2    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 3    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 4    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 5    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
## 6    <NA>     <NA>    <NA>    <NA>    <NA>    <NA>     <NA>    <NA>    <NA>
##   PEMBQ3A PEMBQ3B PEMBQ3C PEMBQ3D PEMBQ3E PEMBQ3F PEMBQ3G PEMBQ4A PEMBQ4B
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMBQ4C PEMBQ4D PEMBQ4E PEMBQ4F PEMBQ4G PEMBQ5 PEMBQ6 PEMCQ1A PEMCQ1B PEMCQ1C
## 1    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>    <NA>
##   PEMCQ1D PEMCQ1E PEMCQ1F PEMCQ1G PEMCQ1H PEMCQ1I PEMCQ2A PEMCQ2B PEMCQ2C
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMCQ2D PEMCQ2E PEMCQ2F PEMCQ2G PEMCQ2H PEMCQ2I PEMCQ3A PEMCQ3B PEMCQ3C
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMCQ3D PEMCQ3E PEMCQ3F PEMCQ3G PEMCQ4A PEMCQ4B PEMCQ4C PEMCQ4D PEMCQ4E
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMCQ4F PEMCQ4G PEMCQ5 PEMCQ6 PEMCQ7 PEMCQ8 PEMCQ9A PEMCQ9B PEMCQ9C PEMCQ9D
## 1    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMCQ9E PEMCQ9F PEMCQ9G PEMCQ10 PEMCQ11 PEMDQ1A PEMDQ1B PEMDQ1C PEMDQ1D
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMDQ1E PEMDQ1F PEMDQ1G PEMDQ1H PEMDQ1I PEMDQ1J PEMDQ1K PEMDQ2F PEMDQ2G
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMDQ2H PEMDQ2I PEMDQ2J PEMDQ2K PEMDQ3 PEMDQ4 PEMDQ5 PEMDQ6 PEMEQ1A PEMEQ1B
## 1    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>   <NA>   <NA>    <NA>    <NA>
##   PEMEQ1C PEMEQ1D PEMEQ1E PEMEQ1F PEMEQ1G PEMEQ2A PEMEQ2B PEMEQ2C PEMEQ2D
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMEQ2E PEMEQ2F PEMEQ2G PEMEQ3A PEMEQ3B PEMEQ3C PEMEQ3D PEMEQ3E PEMEQ3F
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
##   PEMEQ3G PEMEQ3AA PEMEQ3BB PEMEQ3CC PEMEQ3DD PEMEQ3EE PEMEQ3FF PEMEQ3GG
## 1    <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 2    <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 3    <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 4    <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 5    <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
## 6    <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>     <NA>
##   PEMEQ4A PEMEQ4B PEMEQ4C PEMEQ4D PEMEQ4E PEMEQ4F PEMEQ4G PEMEQ5 PEMEQ6A
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>    <NA>
##   PEMEQ6B PEMEQ6C PEMEQ7A PEMEQ7B PEMEQ7C PEMEQ7D PEMEQ8 PEMEQ9 PEMEQ10 PEMEQ11
## 1    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>
## 2    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>
## 3    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>
## 4    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>
## 5    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>
## 6    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>   <NA>   <NA>    <NA>    <NA>
##   PEMEQ12 PUNXTPR3 PRINTFLG HENELGSCR PESPELIG PWSUPWGT AGEGROUP2 EDUGROUP
## 1    <NA>     <NA>     <NA>        NA       NA        0         0        0
## 2    <NA>     <NA>     <NA>        NA       NA        0         0        0
## 3    <NA>     <NA>     <NA>        NA       NA        0         0        0
## 4    <NA>     <NA>     <NA>        NA       NA        0         0        0
## 5    <NA>     <NA>     <NA>        NA       NA        0         0        0
## 6    <NA>     <NA>     <NA>        NA       NA        0         6        1
##   RACEHIS2
## 1        0
## 2        0
## 3        0
## 4        0
## 5        0
## 6        1

Variable Selection

The variables used for this analysis are described in the table below. The choice behind the variable selection is based on Bourdieu’s Distinction, the chosen variables are devised to capture the specific characteristic’s of respondent’s taste in cultural goods and their socio-economic background (CPS variables).

vars_to_keep <- c("PRPERTYP", # person type
  "HRINTSTA", # survey type - will filter for only fully completed: code 1
  # social class
  # cultural capital 
  "PEEDUCA",  # education level 
  # economic capital 
  "HEFAMINC", # family income
  "PRMJOCC1", # occupation
  "HETENURE", # housing tenure - whether the apartment is rented or owned
  # demographic variables
  "GTMETSTA", # metropolitan status
  "PRTAGE",   # age
  "PESEX",    # gender
  "PTDTRACE", # race
  # highbrow culture
  "PEC1Q3A",  # classical music
  "PEC1Q4A",  # opera
  "PEC1Q7A",  # ballet
  "PEC1Q10A", # art museum
  "PEC1Q1A",  # jazz
  "PEC1Q15B", # poetry
  # middlebrow and popular culture
  "PEC1Q2A", # latin, spanish or salsa music
  "PEC1Q8A", # live dance (non-ballet)
  "PEC1Q9A", # other live performances
  "PEC1Q11A", # crafts fair or visual arts festival
  "PEC1Q12A", # festival with artists performing
  "PEC1Q14A", # any books 
  "PEC1Q17A", # audiobooks
  "PEC1Q13A", # cultural touristry - sightseeing and parks
  "PEC1Q5A", # musical play
  "PEC1Q6A" # non musical play
  ) 

# selecting the variables
sppa1 <- sppa %>%
  dplyr::select(all_of(vars_to_keep))

# table - variable description

# reference table with codes and categories
vars_structure <- tribble(
  ~Variable_Code, ~Category,
  "PRPERTYP",    "Filter",
  "HRINTSTA",    "Filter",
  "PEEDUCA",     "Cultural Capital (Education)",
  "HEFAMINC",    "Economic Capital (Income)",
  "PRMJOCC1",    "Social Class (Occupation)",
  "HETENURE",    "Economic Capital (Housing)",
  "GTMETSTA",    "Demographics",
  "PRTAGE",      "Demographics",
  "PESEX",       "Demographics",
  "PTDTRACE",    "Demographics",
  "PEC1Q1A",     "Highbrow Culture",
  "PEC1Q3A",     "Highbrow Culture",
  "PEC1Q4A",     "Highbrow Culture",
  "PEC1Q7A",     "Highbrow Culture",
  "PEC1Q10A",    "Highbrow Culture",
  "PEC1Q15B",    "Highbrow Culture",
  "PEC1Q2A",     "Middlebrow/Popular Culture",
  "PEC1Q5A",     "Middlebrow/Popular Culture", 
  "PEC1Q6A",     "Middlebrow/Popular Culture",
  "PEC1Q8A",     "Middlebrow/Popular Culture",
  "PEC1Q11A",    "Middlebrow/Popular Culture",
  "PEC1Q12A",    "Middlebrow/Popular Culture",
  "PEC1Q13A",    "Middlebrow/Popular Culture",
  "PEC1Q14A",    "Middlebrow/Popular Culture",
  "PEC1Q17A",    "Middlebrow/Popular Culture")

# extracting the official descriptions (Labels) from the loaded dataset
raw_labels <- attr(sppa1, "variable.labels")

variable_table <- vars_structure %>%
  rowwise() %>%
  mutate(
    Question_Description = ifelse(
      Variable_Code %in% names(raw_labels),
      raw_labels[[Variable_Code]]
    )
  ) %>%
  ungroup()

# table
kable(variable_table, 
      col.names = c("Variable Code", "Category", "Question Description"),
      caption = "SPPA 2017: Selected Variables",
      align = "lll")

SPPA 2017: Selected Variables
Variable Code	Category	Question Description
PRPERTYP	Filter	Type of person record recode
HRINTSTA	Filter	Interview status
PEEDUCA	Cultural Capital (Education)	Highest level of school completed or degree received
HEFAMINC	Economic Capital (Income)	Family income
PRMJOCC1	Social Class (Occupation)	Major occupation recode - job 1
HETENURE	Economic Capital (Housing)	Are your living quarters…
GTMETSTA	Demographics	Metropolitan Status
PRTAGE	Demographics	Person’s age
PESEX	Demographics	Sex
PTDTRACE	Demographics	Race
PEC1Q1A	Highbrow Culture	Attended a live jazz performance in the last 12 months
PEC1Q3A	Highbrow Culture	Attended a live classical music performance in the last 12 months
PEC1Q4A	Highbrow Culture	Attended a live opera performance in the last 12 months
PEC1Q7A	Highbrow Culture	Attended a live ballet performance in the last 12 months
PEC1Q10A	Highbrow Culture	Visited art museum or gallery last 12 months
PEC1Q15B	Highbrow Culture	Read any poetry the last 12 months
PEC1Q2A	Middlebrow/Popular Culture	Attended a live Latin, Spanish, or salsa music performance in the last 12 months
PEC1Q5A	Middlebrow/Popular Culture	Attended a live musical stage play in the last 12 months
PEC1Q6A	Middlebrow/Popular Culture	Attended a live nonmusical stage play in the last 12 months
PEC1Q8A	Middlebrow/Popular Culture	Attended a live dance (non-ballet) performance in the last 12 months
PEC1Q11A	Middlebrow/Popular Culture	Visited a crafts fair or visual arts festival last 12 months
PEC1Q12A	Middlebrow/Popular Culture	Visited an outdoor festival that featured performing artists last 12 months
PEC1Q13A	Middlebrow/Popular Culture	Visited a historic park or monument or tour a building/neighborhood for historic design last 12 months
PEC1Q14A	Middlebrow/Popular Culture	Read any books during the last 12 months
PEC1Q17A	Middlebrow/Popular Culture	Listened to any audiobooks the last 12 months

Filtering and Cleaning

First, the dataset will be filtered for the concerned variables. A quick look into the data structure illustrates that the variables are factors with multiple levels, this section deals with discretizing the variables (converting them into 0/1). For that purpose multi-level factors will need to be divided into categorical subgroups and each level of those – discretized.

str(sppa1)

## 'data.frame':    147629 obs. of  26 variables:
##  $ PRPERTYP: Factor w/ 3 levels "(1) Child household member",..: NA NA NA NA NA 2 2 2 2 2 ...
##  $ HRINTSTA: Factor w/ 4 levels "(1) Interview",..: 3 3 3 3 3 1 1 1 1 1 ...
##  $ PEEDUCA : Factor w/ 16 levels "(31) Less than 1st grade",..: NA NA NA NA NA 4 6 4 9 10 ...
##  $ HEFAMINC: Factor w/ 16 levels "(01) Less than $5,000",..: NA NA NA NA NA 12 12 6 7 13 ...
##  $ PRMJOCC1: Factor w/ 11 levels "(01) Management, business, and financial occupations",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ HETENURE: Factor w/ 3 levels "(1) Owned or being bought by a HH member",..: NA NA NA NA NA 1 1 1 1 2 ...
##  $ GTMETSTA: Factor w/ 3 levels "(1) Metropolitan",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ PRTAGE  : num  NA NA NA NA NA 73 85 72 70 22 ...
##   ..- attr(*, "value.labels")= Named num(0) 
##   .. ..- attr(*, "names")= chr(0) 
##  $ PESEX   : Factor w/ 2 levels "(1) Male","(2) Female": NA NA NA NA NA 2 1 2 2 2 ...
##  $ PTDTRACE: Factor w/ 26 levels "(01) White Only",..: NA NA NA NA NA 1 1 1 1 1 ...
##  $ PEC1Q3A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q4A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q7A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q10A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q1A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q15B: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q2A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q8A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q9A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q11A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q12A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q14A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q17A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q13A: Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q5A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  $ PEC1Q6A : Factor w/ 2 levels "(1) Yes","(2) No": NA NA NA NA NA NA NA NA NA NA ...
##  - attr(*, "variable.labels")= Named chr [1:639] "ICPSR Case Identification Number" "Household Identifier" "Month of Interview" "Year of Interview" ...
##   ..- attr(*, "names")= chr [1:639] "CASEID" "HRHHID" "HRMONTH" "HRYEAR4" ...
##  - attr(*, "codepage")= int 28591

Seeing as the number corresponding to a category level is enclosed in a bracket at the beginning of each record – these will now be extracted for further analysis. The variables of factor class will be encoded according to the SPPA codebook.

sppa_converted <- sppa1 %>%
  # text conversion and filtering for complete surveys of adults
  mutate(across(everything(), as.character)) %>%
  filter(
    str_detect(PRPERTYP, "\\(2\\)"), # person type 2 - adult
    str_detect(HRINTSTA, "\\(1\\)"), # survey type 1 - interview
    as.numeric(PRTAGE) >= 18) %>%
  mutate(
    # socio-economic variables
    # education
    Education = case_when(
      str_detect(PEEDUCA, "\\(3[1-8]\\)") ~ "No_Diploma",      # < HS
      str_detect(PEEDUCA, "\\(39\\)") ~ "HighSchool",          # HS 
      str_detect(PEEDUCA, "\\(4[0-2]\\)") ~ "SomeCollege",     # college, no degree 
      str_detect(PEEDUCA, "\\(43\\)") ~ "Bachelor",            # bachelor
      str_detect(PEEDUCA, "\\(44\\)") ~ "Master",              # master
      str_detect(PEEDUCA, "\\(4[5-6]\\)") ~ "PhD_Prof",        # phd/prof
      TRUE ~ NA_character_),
    # family income
    Income = case_when(
      str_detect(HEFAMINC, "\\(0[1-7]\\)") ~ "Income_Low",            # < $25k
      str_detect(HEFAMINC, "\\(0[8-9]\\)|\\(1[0-1]\\)") ~ "Income_LowerMid", # $25k-$50k 
      str_detect(HEFAMINC, "\\(1[2-3]\\)") ~ "Income_Middle",         # $50k-$75k 
      str_detect(HEFAMINC, "\\(1[4-5]\\)") ~ "Income_UpperMid",       # $75k-$150k 
      str_detect(HEFAMINC, "\\(16\\)") ~ "Income_High",              # > $150k 
      TRUE ~ NA_character_),
    # occupation
    Job = case_when(
      str_detect(PRMJOCC1, "\\(01\\)") ~ "Management",        
      str_detect(PRMJOCC1, "\\(02\\)") ~ "Professional",     
      str_detect(PRMJOCC1, "\\(03\\)|\\(05\\)") ~ "Service_and_Administartion", 
      str_detect(PRMJOCC1, "\\(04\\)") ~ "Sales",           
      str_detect(PRMJOCC1, "\\(0[6-9]\\)|\\(1[0-1]\\)") ~ "Manual",  
      is.na(PRMJOCC1) ~ "Unemployed",
      TRUE ~ "Job_Other"),
    # gender 
    Sex = if_else(str_detect(PESEX, "\\(2\\)"), "Female", "Male"),
    # race
    Race = case_when(
      str_detect(PTDTRACE, "\\(01\\)") ~ "White",
      str_detect(PTDTRACE, "\\(02\\)") ~ "Black",
      str_detect(PTDTRACE, "\\(04\\)") ~ "Asian",
      str_detect(PTDTRACE, "\\(03\\)") ~ "NativeAm", 
      TRUE ~ "Mixed"),
    # housing and location
    Housing = if_else(str_detect(HETENURE, "\\(1\\)"), "House_Owner", "House_Renter"),
    Location = if_else(str_detect(GTMETSTA, "\\(1\\)"), "Metropolitan", "Rural"),
    # splitting numeric age into groups
    Age_Group = cut(as.numeric(PRTAGE), 
                    breaks = c(17, 29, 49, 64, 100),
                    labels = c("18-29", "30-49", "50-64", "65+")),
    # cultural variables 
    Opera = str_detect(PEC1Q4A, "\\(1\\)"),
    Classical_Music = str_detect(PEC1Q3A, "\\(1\\)"),
    Ballet = str_detect(PEC1Q7A, "\\(1\\)"),
    Art_Museum = str_detect(PEC1Q10A, "\\(1\\)"),
    Jazz = str_detect(PEC1Q1A, "\\(1\\)"),
    Musical = str_detect(PEC1Q5A, "\\(1\\)"),
    Theater = str_detect(PEC1Q6A, "\\(1\\)"),
    Sightseeing = str_detect(PEC1Q13A, "\\(1\\)"),
    Books = str_detect(PEC1Q14A, "\\(1\\)"),
    Poetry = str_detect(PEC1Q15B, "\\(1\\)"), 
    Latin_Music = str_detect(PEC1Q2A, "\\(1\\)"), 
    Live_Dance = str_detect(PEC1Q8A, "\\(1\\)"), 
    Crafts_Fair = str_detect(PEC1Q11A, "\\(1\\)"), 
    Outdoor_Festival= str_detect(PEC1Q12A, "\\(1\\)"), 
    Audiobook = str_detect(PEC1Q17A, "\\(1\\)"), 
  ) %>%
  # selection and cleaning
  dplyr::select(Education, Income, Job, Race, Sex, Housing, Location, Age_Group,
         Opera, Classical_Music, Ballet, Art_Museum,Jazz, Musical, Theater, Sightseeing, 
         Books, Poetry, Latin_Music, Live_Dance, Crafts_Fair, Outdoor_Festival, Audiobook)

glimpse(sppa_converted)

## Rows: 97,201
## Columns: 23
## $ Education        <chr> "No_Diploma", "No_Diploma", "No_Diploma", "HighSchool…
## $ Income           <chr> "Income_Middle", "Income_Middle", "Income_Low", "Inco…
## $ Job              <chr> "Unemployed", "Unemployed", "Unemployed", "Unemployed…
## $ Race             <chr> "White", "White", "White", "White", "White", "White",…
## $ Sex              <chr> "Female", "Male", "Female", "Female", "Female", "Male…
## $ Housing          <chr> "House_Owner", "House_Owner", "House_Owner", "House_O…
## $ Location         <chr> "Metropolitan", "Metropolitan", "Metropolitan", "Metr…
## $ Age_Group        <fct> 65+, 65+, 65+, 65+, 18-29, 18-29, 30-49, 50-64, 30-49…
## $ Opera            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Classical_Music  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Ballet           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Art_Museum       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Jazz             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Musical          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Theater          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Sightseeing      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Books            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Poetry           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Latin_Music      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Live_Dance       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Crafts_Fair      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Outdoor_Festival <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Audiobook        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

A quick glimpse into the converted dataset reveals a myriad of missing values. The percentages of missing values will be analyzed to work out the best scenario of variable or observation elimination.

# missing values analysis 

missings <- sppa_converted %>%
  summarize(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), 
               names_to = "variable", 
               values_to = "missings") %>%
  mutate(missings_share = missings/nrow(sppa_converted)) %>%
  arrange(desc(missings_share))
print(missings)

## # A tibble: 23 × 3
##    variable         missings missings_share
##    <chr>               <int>          <dbl>
##  1 Audiobook           88609          0.912
##  2 Poetry              88602          0.912
##  3 Books               88567          0.911
##  4 Sightseeing         88521          0.911
##  5 Outdoor_Festival    88503          0.911
##  6 Crafts_Fair         88497          0.910
##  7 Art_Museum          88490          0.910
##  8 Live_Dance          88465          0.910
##  9 Ballet              88454          0.910
## 10 Theater             88454          0.910
## # ℹ 13 more rows

All of the variables from the CPS survey (concerning demographic and socio-economic data) contain no missing values, however all of the variables crucial for evaluating the respondent’s participation in art contain over 90% missing values. While this might seem alarming, the methodology of SPPA survey conduction explains the cause of this issue. Namely, the SPPA supplement to CPS, as described previously, includes five modules designed to capture other types of arts participation – the respondents are randomly assigned to one fo the two core modules and, subsequently, to two of the five additional modules – hence – the missing values. To alleviate this issue, I shall focus on the sample of respondents with no missing values in the module A, from which I derived most of the variables. The only variable from the module B is Poetry – the number of observations that it shares with Module A variables is displayed in the Venn diagram below.

venn_diag <- list(
  Core_A = which(!is.na(sppa1$PEC1Q4A)),
  Poetry_B = which(!is.na(sppa1$PEC1Q15B)))

ggVennDiagram(venn_diag, set_color = "mediumpurple4") +
  scale_fill_gradient(low = "lavender", high = "mediumpurple3")

The Venn diagram illustrates that by retaining the only variable from the B module, a minimal number of observations is lost. The rest of the variables, the coverage of which varies as well, all belong to the module A, so the missing observations will be eliminated row-wise, as there are no other methodological basis for the elimination of the variables.

sppa_clean <- sppa_converted %>%
  drop_na()

Subsequently, in order to prepare the database for the application of association rules – the levels of factor variables will be divided into individual columns and assigned a binary value: 1 if true (the observation is characterized by the characteristic described in the given column), 0 otherwise. In other words, this section proceeds with the discretization of the variables. For this purpose, te package arules will be utilized. The function as() allows for an automatic transformation into a transactional data matrix.

sppa_prepared <- sppa_clean %>%
  mutate(across(where(is.character), as.factor)) %>%
  mutate(across(c(Opera:Poetry), as.logical)) %>%
  drop_na()

sppa_t <- as(sppa_prepared, "transactions")

Exploratory Analysis

Prior to the application of association rule mining algorithms, it is essential to understand and further inspect the nature and structure of the transactional data and to ensure its proper transformation and discretization. This section serves that purpose.

The final dataset consists of 8,498 observations – each corresponding to a distinct adult respondent of the CPS supplemented with SPPA surveys conducted by the United States Bureau fo Census in collaboration with the National Endowment for the Arts in July 2017, which is the last available year of the SPPA unaffected by the CO-VID 19 pandemic. The filtered and transformed dataset provides information on the socio-economic and demographic background of the respondents – contained in 8 variables, subdivided into factor-levels – as well as the information concerning the respondent’s participation in public art and other forms of leisure – encapsulated in 15 binary variables corresponding to the form of participation. The discretization of the dataset resulted in the creation of 47 columns capturing socio-economic and cultural capital of the respondents.

Application of the summary function to the discretized dataset yields further information concerning data structure. It reveals that the density of data matrix is 0.224, which compared to the standard applications of association rules mining, such as basket analysis, indicates a relatively dense dataset. High density suggests that applying association rule learning algorithms may result in a high number of rules. However, seeing as the most frequent items are dominated by demographic variables, a large number of trivial rules are to be expected. The varying length of the transactions reveals that the largest share of the observations has a length of 8 items, which suggests that 2,422 people did not participate in public arts at all. While this finding needs to be further inspected, it does not prove the analysis conducted in this project redundant, because the lack of participation is in itself an interesting phenomenon potentially dictated by the socio-economic status of the individual. The most frequent items in this dataset are Race=White with 7,043 occurences, Location=Metropolitan, Housing=House_Owner and the variable Books, which, combined, may suggest that a large amount of the individuals in the dataset belong to the wealthier class.

print(sppa_t@itemInfo)

##                            labels        variables                     levels
## 1              Education=Bachelor        Education                   Bachelor
## 2            Education=HighSchool        Education                 HighSchool
## 3                Education=Master        Education                     Master
## 4            Education=No_Diploma        Education                 No_Diploma
## 5              Education=PhD_Prof        Education                   PhD_Prof
## 6           Education=SomeCollege        Education                SomeCollege
## 7              Income=Income_High           Income                Income_High
## 8               Income=Income_Low           Income                 Income_Low
## 9          Income=Income_LowerMid           Income            Income_LowerMid
## 10           Income=Income_Middle           Income              Income_Middle
## 11         Income=Income_UpperMid           Income            Income_UpperMid
## 12                 Job=Management              Job                 Management
## 13                     Job=Manual              Job                     Manual
## 14               Job=Professional              Job               Professional
## 15                      Job=Sales              Job                      Sales
## 16 Job=Service_and_Administartion              Job Service_and_Administartion
## 17                 Job=Unemployed              Job                 Unemployed
## 18                     Race=Asian             Race                      Asian
## 19                     Race=Black             Race                      Black
## 20                     Race=Mixed             Race                      Mixed
## 21                  Race=NativeAm             Race                   NativeAm
## 22                     Race=White             Race                      White
## 23                     Sex=Female              Sex                     Female
## 24                       Sex=Male              Sex                       Male
## 25            Housing=House_Owner          Housing                House_Owner
## 26           Housing=House_Renter          Housing               House_Renter
## 27          Location=Metropolitan         Location               Metropolitan
## 28                 Location=Rural         Location                      Rural
## 29                Age_Group=18-29        Age_Group                      18-29
## 30                Age_Group=30-49        Age_Group                      30-49
## 31                Age_Group=50-64        Age_Group                      50-64
## 32                  Age_Group=65+        Age_Group                        65+
## 33                          Opera            Opera                       TRUE
## 34                Classical_Music  Classical_Music                       TRUE
## 35                         Ballet           Ballet                       TRUE
## 36                     Art_Museum       Art_Museum                       TRUE
## 37                           Jazz             Jazz                       TRUE
## 38                        Musical          Musical                       TRUE
## 39                        Theater          Theater                       TRUE
## 40                    Sightseeing      Sightseeing                       TRUE
## 41                          Books            Books                       TRUE
## 42                         Poetry           Poetry                       TRUE
## 43                    Latin_Music      Latin_Music                       TRUE
## 44                     Live_Dance       Live_Dance                       TRUE
## 45                    Crafts_Fair      Crafts_Fair                       TRUE
## 46               Outdoor_Festival Outdoor_Festival                       TRUE
## 47                      Audiobook        Audiobook                       TRUE

summary(sppa_t)

## transactions as itemMatrix in sparse format with
##  8498 rows (elements/itemsets/transactions) and
##  47 columns (items) and a density of 0.2242655 
## 
## most frequent items:
##            Race=White Location=Metropolitan   Housing=House_Owner 
##                  7043                  6666                  5691 
##                 Books            Sex=Female               (Other) 
##                  4762                  4626                 60785 
## 
## element (itemset/transaction) length distribution:
## sizes
##    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23 
## 2422 1712 1059  837  639  520  403  348  202  141   98   58   39   15    3    2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    8.00   10.00   10.54   12.00   23.00 
## 
## includes extended item information - examples:
##                 labels variables     levels
## 1   Education=Bachelor Education   Bachelor
## 2 Education=HighSchool Education HighSchool
## 3     Education=Master Education     Master
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

The plot below illustrates the distribution of the cultural activity among respondents. Indeed, the vast majority has not indicated the participation in the chosen cultural activities and the frequency of participation in cultural enterprises lessens along with the rise of the number of distinct number of forms of activities.

activity_counts <- size(sppa_t) - 8

df_activity <- data.frame(Number_of_Activities = activity_counts)

ggplot(df_activity, aes(x = Number_of_Activities)) +
  geom_bar(fill = "mediumpurple", color = "mediumpurple4") +
  scale_x_continuous(breaks = 0:15) + 
  labs(title = "Distribution of the Number of Cultural Activities Partcipated In",
    x = "Number of Activities",
    y = "Number of Respondents") +
  theme_minimal()

The application of inspect() function to the first ten observations reveals the correctness of the transformation of data into a transactional format. Each item reveals a basket of respondent’s characteristics, the first occurence of cultural activity in the individual’s basket is revealed in the third observation: a high-school level educated, unemployed, white female living in her own house in the metropolitan area, who reads books in her free time.

inspect(sppa_t[1:10])

##      items                             transactionID
## [1]  {Education=No_Diploma,                         
##       Income=Income_UpperMid,                       
##       Job=Service_and_Administartion,               
##       Race=White,                                   
##       Sex=Female,                                   
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=30-49}                            1 
## [2]  {Education=HighSchool,                         
##       Income=Income_Low,                            
##       Job=Unemployed,                               
##       Race=White,                                   
##       Sex=Female,                                   
##       Housing=House_Owner,                          
##       Location=Rural,                               
##       Age_Group=65+}                              2 
## [3]  {Education=HighSchool,                         
##       Income=Income_Low,                            
##       Job=Unemployed,                               
##       Race=White,                                   
##       Sex=Female,                                   
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=65+,                                
##       Books}                                      3 
## [4]  {Education=PhD_Prof,                           
##       Income=Income_Middle,                         
##       Job=Professional,                             
##       Race=White,                                   
##       Sex=Male,                                     
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=50-64}                            4 
## [5]  {Education=HighSchool,                         
##       Income=Income_LowerMid,                       
##       Job=Unemployed,                               
##       Race=White,                                   
##       Sex=Female,                                   
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=50-64,                              
##       Outdoor_Festival}                           5 
## [6]  {Education=HighSchool,                         
##       Income=Income_LowerMid,                       
##       Job=Sales,                                    
##       Race=White,                                   
##       Sex=Female,                                   
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=65+,                                
##       Books}                                      6 
## [7]  {Education=HighSchool,                         
##       Income=Income_Low,                            
##       Job=Unemployed,                               
##       Race=White,                                   
##       Sex=Male,                                     
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=65+}                              7 
## [8]  {Education=HighSchool,                         
##       Income=Income_LowerMid,                       
##       Job=Manual,                                   
##       Race=White,                                   
##       Sex=Male,                                     
##       Housing=House_Renter,                         
##       Location=Metropolitan,                        
##       Age_Group=18-29,                              
##       Outdoor_Festival}                           8 
## [9]  {Education=No_Diploma,                         
##       Income=Income_UpperMid,                       
##       Job=Unemployed,                               
##       Race=White,                                   
##       Sex=Female,                                   
##       Housing=House_Owner,                          
##       Location=Metropolitan,                        
##       Age_Group=65+,                                
##       Crafts_Fair}                                9 
## [10] {Education=HighSchool,                         
##       Income=Income_LowerMid,                       
##       Job=Unemployed,                               
##       Race=White,                                   
##       Sex=Male,                                     
##       Housing=House_Owner,                          
##       Location=Rural,                               
##       Age_Group=50-64,                              
##       Books}                                      10

Item Frequency

The output below illustrates the relative frequency, or support, of all distinct items in the dataset ordered decreasingly for readability. The most frequent items are dominated by demographic variables, such as race and gender. The most frequently appearing cultural activities include: reading books (~56%), sightseeing (~30%), attending a crafts fair (~25.7%), art museum (~25%) and an outdoor festival (~24%). The least frequent items involve Native American and mixed races, PhD or Proffessor’s degrees (~3%) and exclusive cultural activities such as opera (2.4%) and ballet (3.5%). The identification of the significant variables and their support is crucial for devising the Minimum Support parameter in the Apriori algorithm. Setting the global support threshold at a too high level would eliminate the crucial activities serving the role of class signifiers, however seeing as in order to maintain all of the variables assumed as significant, the global support would have to be lowered to 0.03, or even 0.024 – the computational costs of apriori algorithm implementation with such low support threshold could be outsandingly high, however due to the relatively small size of the dataset (roughly 8,400 rows) this might not pose an issue.

freqs <- itemFrequency(sppa_t)
sort(freqs, decreasing = TRUE)

##                     Race=White          Location=Metropolitan 
##                     0.82878324                     0.78441986 
##            Housing=House_Owner                          Books 
##                     0.66968699                     0.56036715 
##                     Sex=Female                       Sex=Male 
##                     0.54436338                     0.45563662 
##                 Job=Unemployed           Housing=House_Renter 
##                     0.37526477                     0.33031301 
##                Age_Group=30-49                    Sightseeing 
##                     0.31842787                     0.29912921 
##           Education=HighSchool          Education=SomeCollege 
##                     0.28747941                     0.27841845 
##                  Age_Group=65+                Age_Group=50-64 
##                     0.27065192                     0.26700400 
##                    Crafts_Fair         Income=Income_LowerMid 
##                     0.25700165                     0.25347141 
##                     Art_Museum               Outdoor_Festival 
##                     0.24923511                     0.24205695 
##         Income=Income_UpperMid              Income=Income_Low 
##                     0.23487880                     0.22923041 
##                 Location=Rural             Education=Bachelor 
##                     0.21558014                     0.21299129 
##           Income=Income_Middle Job=Service_and_Administartion 
##                     0.18369028                     0.17815957 
##                        Musical                      Audiobook 
##                     0.17568840                     0.17133443 
##               Job=Professional                Age_Group=18-29 
##                     0.15380089                     0.14391622 
##                     Job=Manual                         Poetry 
##                     0.12920687                     0.12238174 
##                 Job=Management                        Theater 
##                     0.10861379                     0.10343610 
##                     Race=Black             Income=Income_High 
##                     0.10320075                     0.09872911 
##           Education=No_Diploma                Classical_Music 
##                     0.09578724                     0.09413980 
##               Education=Master                           Jazz 
##                     0.09355142                     0.09096258 
##                     Live_Dance                      Job=Sales 
##                     0.06660391                     0.05495411 
##                    Latin_Music                     Race=Asian 
##                     0.04871735                     0.03730289 
##                         Ballet             Education=PhD_Prof 
##                     0.03506707                     0.03177218 
##                          Opera                     Race=Mixed 
##                     0.02435867                     0.01941633 
##                  Race=NativeAm 
##                     0.01129678

The plot below provides a graphic illustartion of the technical output above – limited to display only the top 25 items in the dataset according to their relative frequencies.

itemFrequencyPlot(sppa_t, topN = 25, type = "relative", cex.names = 0.8, main = "Top 25 Items - Frequency Plot", col = "mediumpurple")

Binary Matrix

To gain a deeper understanding of the data structure the function image() is utilized, which allows for a graphic representation of the binary matrix of transactional data – each row denotes the answers of an individual respondent (or a transaction) and each column represents one of the 47 variables used in the analysis, embodying either the socio-economic background of the repondent or their participation in public arts. The plots below illustrate the structure of the first 25 rows of the data and of a randomly selected subsample of 100 observations. What might catch the reader’s eye is the area of relatively dense vertical strips, representing the most frequently appearing variables (Race=White, Location=Metropolitan, Sex, etc.). This observation carries a crucial implication in regards to the apriori algorithm application, namely, the algorithm will be inclined towards generating a large number of trivial rules, due to, solely, the frequency of the appearance of some demogrpahic variables. However, excluding the aforementioned dense strip, the matrix is characterized by a relative sparsity of data – represented by the empty white space – which warrants the use of a lower threshold of support. The binary matrix is sparse with the exception of columns containing demographic data, which implies the need to look for strong connections (characterized by high Lift/Confidence values) within rare occurrences (low Support).

Correlation

The distinct, rectangular splashes of color, illustrated on the correlation matrix plot below, indicate the presence of specific relationships within data – ranging from a strong positive correlation to strong negative – these relationships validate the application of association rule mining algorithm.

binary_matrix <- as(sppa_t, "matrix") * 1

M <- cor(binary_matrix)

# correlation plot
corrplot(M, 
         method = "color",       
         type = "lower",        
         order = "hclust",
         tl.col = "black",
         tl.cex = 0.4, 
         diag = FALSE,
         col = colorRampPalette(c("red", "white", "blue"))(200),
         title = "Correlation Matrix",
         mar = c(0,0,2,0))

The exploratory data analysis unveils crucial patterns and characteristics of the dataset that inform the selection of the most appropriate association rule mining startegies. The data provided by the SPPA 2017, after cleaning and transformation, is, ultimately, a relatively small sample, containing less than 8,500 observations. The structure of the data, characterized, both, by the sparsity of crucial cultural variables and relative density across demogrpahic variables, warrants the implementation of multiple association rule learning algorithms and their subsequent comparison.

Association Rule Mining

Association rule mining is an unsupervised learning technique that allows to discover the hidden relationships, patterns and associations within large transactional datasets. While devised and used mainly for consumer’s basket analysis, recommendation systems and costumer behavior analysis, lately it has been gaining traction in sociological studies of taste (Pan et al. 2019, Gondal 2025). The application of unsupervised learning algorithms for sociological database’s analysis extends the possibilities of other forms of analyses, because it is based on the natural data structure, rather than researchers intuition and literature review. The key contribution of this particular unsupervised learning technique is the generation of hidden relationships that might have otherwise remained unnoticed as they might not be intuitive or obvious. Bourdieu’s analysis of habitus relies heavily on the relationship between different forms of capital, however the relationships within different manifestations of his three forms of capital are yet to be revealed. This study endeavors to find hidden relationships between social, economic and cultural capital, and hence cultural preferences and tastes, through an application of association rule mining algorithm.

Having previously outlined the technicalities of data employed in this study, distinct association rule mining algorithms were evaluated to determine the most robust approach for a dataset characterized by specific structural duality of crucial cultural data sparsity and density of demographic items.

While the algorithms for frequent itemset mining are plentiful, the field is dominated by three primary startegies: Apriori (Breadth-First Search), ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal – Depth-First Search) and FP-Growth (Frequent Pattern Growth – tree-based compression). Given the size of the dataset (N≈8,500) but complex internal structures, a comparative assessment was conducted to balance computational efficiency with interpretability. The table below outlines methodological trade-offs between distinct algorithms.

Comparison Criteria	Apriori	ECLAT	FP-Growth
Search Strategy	Horizontal (Breadth-First). Iterative candidate generation (level-wise).	Vertical (Depth-First). Intersection of Transaction ID (TID) lists.	Tree-based (FP-Tree). Compressed representation.
Handling of Dense Data	High computational cost due to ‘candidate explosion’ when connecting frequent items.	Memory intensive for long ID lists (e.g., common demographic traits).	Excellent compression of repeated patterns (e.g., identical demographic profiles).
Handling of Sparse Data	Must scan empty space to find rare items (e.g., Opera). Requires low support threshold.	Very fast for rare items (skips empty space via vertical layout).	No candidate generation. Very fast extraction of rare items.
Output Type	Association Rules (Directly generates antecedents -> consequents).	Frequent Itemsets (Requires a secondary step to induce rules).	Frequent Itemsets.
Computational Cost (N=8,500)	Low / Negligible (< 1 second).	Low / Negligible (< 1 second).	Lowest (Most Efficient).

The comparative analysis of teh three dominating algorithms indicates that while FP-Growth is theoretically the most efficient due to its compressed tree structure, its implementation may not be necessitated thanks to the relatively small size of the sample. Similarly, ECLAT, though, excelling at identifying rare event through vertical scanning, its pirmary output are frequent itemsets, rather than directional rules. Given that the primary objective is to reveal implications of taste, the Apriori algorithm seems to be the optimal choice.

Apriori Alogrithm

The analysis of association rules is initialized through the implementation of the Apriori algorithm. It works on a horizontal basis, imitating a Breadth-First Search startegy – meaning it finds all frequent items of a set minimal size, moving on level by level through the lattice of combinations up to the maximal size, when set. The algorithm relies on one specific mathematical property: anti-monotonicity or the Apriori property, stating that all nonempty subsets of frequent itemset must also be frequent. The algorithm employs this to cut off branches of search that are dead-ended. The algorithm adheres to the following steps:

Candidate Generation – algorithm generates candidates for the next level, then subsequently it takes the frequent itemsets found in the previous step and joins them with themselves to create larger itemsets (e.g. initially generated {A,B} and {A,C} are joined to create a candidate {A,B,C}),
Prune the Search Space – the algorithm eliminates bad candidates to avoid wasting processing power – it scans the database to measure the frequency of appearance of the previously generated candidates and discards these, which do not exceed the minimum support threshold.

These two steps are then iterated until no more frequent itemsets can be found.

The code below initializes the Apriori algorithm for the dataset employed in this project. The minimum support threshold is set to 0.005 in order to capture the rarely emerging attendance to high-brow cultural events, the confidence (measured as the fraction of support for an itemset and the support of the item) is set to 0.15, the minimal length of an itemset is set to 3 and maximal length to 20, in order to capture the maximal possible number of rules containing variables representing cultural participation. Thus defined algorithm successfully generated 1,067,650 rules, which is in accordance with the density of transaction matrix and the specificalities of the implemented parameters.

rules <- apriori(sppa_t, parameter = list(support = 0.005, 
                                  confidence = 0.15, 
                                  minlen = 3, 
                                  maxlen = 23))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.15    0.1    1 none FALSE            TRUE       5   0.005      3
##  maxlen target  ext
##      23  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 42 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[47 item(s), 8498 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 11 done [0.14s].
## writing ... [1067650 rule(s)] done [0.05s].
## creating S4 object  ... done [0.12s].

The output of the summary function reveals the distribution of rule length, whose number increases along with the increase of the number items, peak at length = 6 and then decrease until the maximal length of a rule (11) is reached. Below the distribution of rule length loom the summaries of quality measures – the minimum value of support and confidence are in compliance to the parameters set in the initialization of the Apriori algorithm. The average value of support does not exceed 1%, which while alarming in itself, is the predicament of sociological data, especially that pertaining to culture. Statistics describing the values of confidence and lift seem to provide a more optimistic outloook on the performed analysis, namely, the vlaues of mean and median confidence are relatively close to each other and oscillate around 65%, with the maximum value of confidence reaching 1. The distribution of lift values (measured as \(lift(X -> Y) = confidence(X -> Y)/support(Y)\)), ranges from 0.228 to 12.99 with a median at 1.75 and the average value of 2.17. Such values of lift indicate that some itemsets make the occurrence of the right-hand-side (RHS) of a rule, on average, twice as likely and maximally 12 times as liekly to occur than it would be by chance.

summary(rules)

## set of 1067650 rules
## 
## rule length distribution (lhs + rhs):sizes
##      3      4      5      6      7      8      9     10     11 
##  17099  90942 224505 307780 254115 128737  38303   5850    319 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   6.182   7.000  11.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift        
##  Min.   :0.005060   Min.   :0.1500   Min.   :0.005060   Min.   : 0.2283  
##  1st Qu.:0.005766   1st Qu.:0.4088   1st Qu.:0.008473   1st Qu.: 1.1406  
##  Median :0.007178   Median :0.6667   Median :0.012827   Median : 1.7502  
##  Mean   :0.009526   Mean   :0.6265   Mean   :0.018796   Mean   : 2.1716  
##  3rd Qu.:0.010002   3rd Qu.:0.8421   3rd Qu.:0.021417   3rd Qu.: 2.9128  
##  Max.   :0.443516   Max.   :1.0000   Max.   :0.635914   Max.   :12.9939  
##      count        
##  Min.   :  43.00  
##  1st Qu.:  49.00  
##  Median :  61.00  
##  Mean   :  80.95  
##  3rd Qu.:  85.00  
##  Max.   :3769.00  
## 
## mining info:
##    data ntransactions support confidence
##  sppa_t          8498   0.005       0.15
##                                                                                                   call
##  apriori(data = sppa_t, parameter = list(support = 0.005, confidence = 0.15, minlen = 3, maxlen = 23))

Rule Inspection

The analysis of rules generated by the previously specified Apriori algorithm begins with the inspection of rules ordered and displayed according to their support values. The table below presents an overview of 25 rules with the highest support values. The majority of the displayed rules seem trivial and redundant in the context of this study – most of the variables capturing the occurrence of cultural participation are not frequent and are, hence, discriminated by the criterium of high support. The first three rules are a permutation of the same itemset with support value of 44% and lift approximating 1 – the dataset is dominated by white people, who own their house and live in the metropolitan area. The fourth and fifth rule, similarly a permutation of one rule, contain a variable capturing whether the person has read a book in the last 12 months – these rule indicate that white people living in the metropolitan area read books. While, on the surface it might seem trivial, it might reveal a systemic privilege, as reading a book in full might be time consuming and metropolitan areas, often characterized by a high concentration of highly-cultural people might promote an engagement with culture in the form of book-reading.

inspect(sort(rules, by = "support")[1:5])

##     lhs                         rhs                       support confidence  coverage      lift count
## [1] {Housing=House_Owner,                                                                             
##      Location=Metropolitan}  => {Race=White}            0.4435161  0.8688336 0.5104731 1.0483242  3769
## [2] {Race=White,                                                                                      
##      Housing=House_Owner}    => {Location=Metropolitan} 0.4435161  0.7527462 0.5891975 0.9596215  3769
## [3] {Race=White,                                                                                      
##      Location=Metropolitan}  => {Housing=House_Owner}   0.4435161  0.6974463 0.6359143 1.0414512  3769
## [4] {Location=Metropolitan,                                                                           
##      Books}                  => {Race=White}            0.3732643  0.8353964 0.4468110 1.0079793  3172
## [5] {Race=White,                                                                                      
##      Books}                  => {Location=Metropolitan} 0.3732643  0.7820513 0.4772888 0.9969805  3172

Subsequently, the Apriori generated rules are examined according to their confidence values. Confidence is measured as:

\[ confidence(X -> Y ) = support(X,Y)/support(X) \] The rules revealed through sorting by confidence prove to be more interpretable and insightful in the context of cultural capital. The first 22 rules according to their values of confidence, which reach the maximal possible value of one, meaning that all occurrences of the antecedent (left-hand side) are changelessly accompanied by the consequent (right-hand side), share a common consequent: Location=Metropolitan. This indicates a conditional probability of \(P(Y∣X)=1\), suggesting a deterministic relationship within the analyzed subsample, however, it is crucial to bear in mind the value of support in such cases. The values of support for the displayed cases all range below 1%, which might indicate overfitting (most of the concerned 22 rules relate to subsample of asian people living in the metropolitan area). The analysis of confidence reveals a larger numebr of rules pertaining to the participation in cultural events, the representation of Location=Metropolitan as a consequent suggests that living in a metropolitan area promotes the consumption of cultural goods. Rule displayed 6th in order suggest that if the person attended a ballet dance and a latin music concert, they live in a metropolitan area. An interesting shift in rule’s consequent happens at the 23rd position – rules 23 - 26 indicate an interesting and deterministic relationship between high education and consumption of high-brow cultural goods, such as going to the theater, attending an art museum and reading poetry. The emergence of these specific rules with a confidence level of 1 serves as empirical verification og Bourdieu’s concept of institutionalized cultural capital. It demonstrates that the possession of high educational credentials (a PhD or Professor’s titles) acts as a binding predictor for participation in legitimate culture. This suggests that high-brow cultural consumption serves as a structural imperative of class habitus for this specific social stratum.

inspect(sort(rules, by = "confidence")[1:30])

##      lhs                          rhs                         support confidence    coverage     lift count
## [1]  {Income=Income_High,                                                                                  
##       Race=Asian}              => {Location=Metropolitan} 0.006707461          1 0.006707461 1.274827    57
## [2]  {Race=Asian,                                                                                          
##       Musical}                 => {Location=Metropolitan} 0.005060014          1 0.005060014 1.274827    43
## [3]  {Education=Bachelor,                                                                                  
##       Race=Asian}              => {Location=Metropolitan} 0.011532125          1 0.011532125 1.274827    98
## [4]  {Race=Asian,                                                                                          
##       Outdoor_Festival}        => {Location=Metropolitan} 0.005060014          1 0.005060014 1.274827    43
## [5]  {Race=Asian,                                                                                          
##       Sightseeing}             => {Location=Metropolitan} 0.009649329          1 0.009649329 1.274827    82
## [6]  {Ballet,                                                                                              
##       Latin_Music}             => {Location=Metropolitan} 0.005177689          1 0.005177689 1.274827    44
## [7]  {Income=Income_High,                                                                                  
##       Race=Asian,                                                                                          
##       Housing=House_Owner}     => {Location=Metropolitan} 0.005177689          1 0.005177689 1.274827    44
## [8]  {Job=Professional,                                                                                    
##       Race=Asian,                                                                                          
##       Age_Group=30-49}         => {Location=Metropolitan} 0.005766063          1 0.005766063 1.274827    49
## [9]  {Job=Professional,                                                                                    
##       Race=Asian,                                                                                          
##       Housing=House_Owner}     => {Location=Metropolitan} 0.006001412          1 0.006001412 1.274827    51
## [10] {Education=Bachelor,                                                                                  
##       Race=Asian,                                                                                          
##       Age_Group=30-49}         => {Location=Metropolitan} 0.005530713          1 0.005530713 1.274827    47
## [11] {Education=Bachelor,                                                                                  
##       Race=Asian,                                                                                          
##       Sex=Male}                => {Location=Metropolitan} 0.005177689          1 0.005177689 1.274827    44
## [12] {Education=Bachelor,                                                                                  
##       Race=Asian,                                                                                          
##       Sex=Female}              => {Location=Metropolitan} 0.006354436          1 0.006354436 1.274827    54
## [13] {Education=Bachelor,                                                                                  
##       Race=Asian,                                                                                          
##       Books}                   => {Location=Metropolitan} 0.006354436          1 0.006354436 1.274827    54
## [14] {Education=Bachelor,                                                                                  
##       Race=Asian,                                                                                          
##       Housing=House_Owner}     => {Location=Metropolitan} 0.006825135          1 0.006825135 1.274827    58
## [15] {Income=Income_UpperMid,                                                                              
##       Race=Asian,                                                                                          
##       Sex=Male}                => {Location=Metropolitan} 0.005413038          1 0.005413038 1.274827    46
## [16] {Race=Asian,                                                                                          
##       Age_Group=30-49,                                                                                     
##       Sightseeing}             => {Location=Metropolitan} 0.005177689          1 0.005177689 1.274827    44
## [17] {Race=Asian,                                                                                          
##       Art_Museum,                                                                                          
##       Sightseeing}             => {Location=Metropolitan} 0.005883737          1 0.005883737 1.274827    50
## [18] {Race=Asian,                                                                                          
##       Sex=Female,                                                                                          
##       Sightseeing}             => {Location=Metropolitan} 0.006472111          1 0.006472111 1.274827    55
## [19] {Race=Asian,                                                                                          
##       Sightseeing,                                                                                         
##       Books}                   => {Location=Metropolitan} 0.007884208          1 0.007884208 1.274827    67
## [20] {Race=Asian,                                                                                          
##       Housing=House_Owner,                                                                                 
##       Sightseeing}             => {Location=Metropolitan} 0.005295364          1 0.005295364 1.274827    45
## [21] {Race=Asian,                                                                                          
##       Sex=Male,                                                                                            
##       Books}                   => {Location=Metropolitan} 0.006472111          1 0.006472111 1.274827    55
## [22] {Housing=House_Renter,                                                                                
##       Opera,                                                                                               
##       Art_Museum}              => {Location=Metropolitan} 0.005766063          1 0.005766063 1.274827    49
## [23] {Education=PhD_Prof,                                                                                  
##       Musical,                                                                                             
##       Theater}                 => {Books}                 0.006236762          1 0.006236762 1.784544    53
## [24] {Education=PhD_Prof,                                                                                  
##       Theater,                                                                                             
##       Crafts_Fair}             => {Books}                 0.005413038          1 0.005413038 1.784544    46
## [25] {Education=PhD_Prof,                                                                                  
##       Theater,                                                                                             
##       Sightseeing}             => {Books}                 0.007178160          1 0.007178160 1.784544    61
## [26] {Education=PhD_Prof,                                                                                  
##       Art_Museum,                                                                                          
##       Poetry}                  => {Books}                 0.005177689          1 0.005177689 1.784544    44
## [27] {Education=Bachelor,                                                                                  
##       Age_Group=30-49,                                                                                     
##       Ballet}                  => {Location=Metropolitan} 0.005295364          1 0.005295364 1.274827    45
## [28] {Income=Income_Middle,                                                                                
##       Job=Sales,                                                                                           
##       Books}                   => {Race=White}            0.005060014          1 0.005060014 1.206588    43
## [29] {Education=Bachelor,                                                                                  
##       Housing=House_Renter,                                                                                
##       Latin_Music}             => {Location=Metropolitan} 0.005060014          1 0.005060014 1.274827    43
## [30] {Education=Master,                                                                                    
##       Age_Group=65+,                                                                                       
##       Outdoor_Festival}        => {Race=White}            0.007178160          1 0.007178160 1.206588    61

All of the rules displayed above possess a confidence value of 1, which prompts further inspection of deterministic rules. However, their full analysis is proved impossible by the sheer amount of such occurrences, which is 4,427.

length(subset(rules, confidence == 1))

## [1] 4427

The most relevant results are yielded by the inspection of Apriori rules sorted by the value of lift, as previously expected. The first thirteen rules successfully predict the level of education of a respondent characterized by a set of specific demographic characteristics and vairables indicating participation in the arts, with Education=PhD_Prof as rule consequent. The analysis of income level, type of job classification and particiaption in culture predicts the person’s possession of a high educational credentials up to 12 times better than random guessing. This confirms that the intellectual elite as a distinct class fraction, is not defined merely by their level of income, or educational level but rather by the accumulation of all three forms of capital: economic, cultural and institutional. Another interesting pattern of cultural consumption is revealed by rules 17, 18, 21, 22 and 24, which rather than predicting demographic attributes of a respondent, link together distinct forms of participation in the arts. These rules allow for an identification of a structural homology of taste – rules that link together sets of cultural goods such as {Classical_Music, Art_Museum, Books, Poetry} and other type of participation in the arts, such as {Opera} (rule 24) – reveal that high-culture consumers are statistically driven towards other forms of high-brow culture. The system of high-brow culture validates the existence of a coherent aesthetic disposition.

inspect(sort(rules, by = "lift")[1:25])

##      lhs                         rhs                      support confidence   coverage     lift count
## [1]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Sightseeing,                                                                                    
##       Books}                  => {Education=PhD_Prof} 0.005295364  0.4128440 0.01282655 12.99388    45
## [2]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Location=Metropolitan,                                                                          
##       Sightseeing}            => {Education=PhD_Prof} 0.005060014  0.3944954 0.01282655 12.41638    43
## [3]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Sightseeing,                                                                                    
##       Books}                  => {Education=PhD_Prof} 0.005648388  0.3934426 0.01435632 12.38324    48
## [4]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Location=Metropolitan,                                                                          
##       Sightseeing,                                                                                    
##       Books}                  => {Education=PhD_Prof} 0.005177689  0.3928571 0.01317957 12.36481    44
## [5]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Sightseeing}            => {Education=PhD_Prof} 0.005530713  0.3884298 0.01423864 12.22547    47
## [6]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Art_Museum,                                                                                     
##       Books}                  => {Education=PhD_Prof} 0.005295364  0.3879310 0.01365027 12.20977    45
## [7]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Sex=Male}               => {Education=PhD_Prof} 0.005413038  0.3865546 0.01400329 12.16645    46
## [8]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Sex=Male,                                                                                       
##       Location=Metropolitan}  => {Education=PhD_Prof} 0.005060014  0.3839286 0.01317957 12.08380    43
## [9]  {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Location=Metropolitan,                                                                          
##       Art_Museum}             => {Education=PhD_Prof} 0.005413038  0.3833333 0.01412097 12.06506    46
## [10] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Art_Museum}             => {Education=PhD_Prof} 0.005766063  0.3798450 0.01518004 11.95527    49
## [11] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Location=Metropolitan,                                                                          
##       Sightseeing}            => {Education=PhD_Prof} 0.005530713  0.3790323 0.01459167 11.92969    47
## [12] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Sightseeing}            => {Education=PhD_Prof} 0.006001412  0.3722628 0.01612144 11.71663    51
## [13] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Books}                  => {Education=PhD_Prof} 0.007178160  0.3696970 0.01941633 11.63587    61
## [14] {Location=Metropolitan,                                                                          
##       Ballet,                                                                                         
##       Crafts_Fair}            => {Opera}              0.005060014  0.2828947 0.01788656 11.61372    43
## [15] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Location=Metropolitan,                                                                          
##       Books}                  => {Education=PhD_Prof} 0.006589786  0.3684211 0.01788656 11.59571    56
## [16] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Books}                  => {Education=PhD_Prof} 0.008119558  0.3556701 0.02282890 11.19439    69
## [17] {Location=Metropolitan,                                                                          
##       Opera,                                                                                          
##       Crafts_Fair}            => {Ballet}             0.005060014  0.3909091 0.01294422 11.14747    43
## [18] {Race=White,                                                                                     
##       Location=Metropolitan,                                                                          
##       Classical_Music,                                                                                
##       Art_Museum,                                                                                     
##       Books,                                                                                          
##       Live_Dance}             => {Ballet}             0.005060014  0.3909091 0.01294422 11.14747    43
## [19] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Housing=House_Owner,                                                                            
##       Books}                  => {Education=PhD_Prof} 0.005883737  0.3521127 0.01670981 11.08242    50
## [20] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Location=Metropolitan,                                                                          
##       Books}                  => {Education=PhD_Prof} 0.007413509  0.3519553 0.02106378 11.07747    63
## [21] {Location=Metropolitan,                                                                          
##       Opera,                                                                                          
##       Musical}                => {Ballet}             0.005295364  0.3879310 0.01365027 11.06254    45
## [22] {Location=Metropolitan,                                                                          
##       Ballet,                                                                                         
##       Musical}                => {Opera}              0.005295364  0.2694611 0.01965168 11.06222    45
## [23] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Race=White,                                                                                     
##       Housing=House_Owner,                                                                            
##       Location=Metropolitan,                                                                          
##       Books}                  => {Education=PhD_Prof} 0.005413038  0.3511450 0.01541539 11.05196    46
## [24] {Location=Metropolitan,                                                                          
##       Classical_Music,                                                                                
##       Art_Museum,                                                                                     
##       Books,                                                                                          
##       Poetry}                 => {Opera}              0.005648388  0.2666667 0.02118145 10.94750    48
## [25] {Income=Income_High,                                                                             
##       Job=Professional,                                                                               
##       Housing=House_Owner,                                                                            
##       Books}                  => {Education=PhD_Prof} 0.006825135  0.3473054 0.01965168 10.93112    58

Visual Analysis

Following the formal inspection of the generated rules, comes the visual analysis of their specific structure and characteristics. This is initiated with a glimpse into the matrix-based visulization for all of the 1,067,650, which reveals the relationships between rule antecedents and consequents – the x axis illustrates unique sets of items in the LHS and th y axis represents rule consequents. The color intensity of the displayed bars represents the values of lift. The presence of horizontal bands indicates that certain consequents are generated by a large number of distinct sets of attributes.

plot(rules, method="matrix", measure="lift")

The scatter plot of the generated rules presented below – in the shape of the letter L represents explosion of the number of rules, which is to be expected due to the parameters set in the Apriori algorithm, however necessitated by the specificalities of sociological data. The X axis illustrates support (the frequency of occurrence), while the Y axis – lift. The upper left corner captures the rare but significant itesmsets characterized with extremely high values of lift, supposedly pertaining to the elite culture. However, the lower right corner, illustrating occurrences characterized by high support but low loft values comprises the trivial platitudes generated mostly by different combinations of demographic variables.

plot(rules, measure=c("support","lift"), shading="confidence", colors = c("mediumpurple", "lavender"))

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

The Two-Key Plot provides a visual representation of the length of rules scattered according to their support (horizontally) and confidence (vertically). The large rules are concentrated at the beginning of the coordinate system, with a large number of these accumulated in the upper left corner – indicating that higher specialization leads to the decrease of support value, but does not drastically reduce confidence. The determinism of rules is accomplished mostly by highly specified and complex rules, while simple rules pertaining to a larger share of population have lower confidence values.

plot(rules, shading="order", control=list(main="Two-Key Plot"))

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

The exceptionally large number of generated rules, the values of key measure metrics and the banality of the vast majority of rules caused by the use of demographic data necessitate the implementation of targeted comparative analysis.

Targeted Analysis

In order to obtain more specific insights into the inner workings of class habitus, targeted apriori analysis will be implemented. This section will be divided according to Bourdieu’s forms of capital (1986), which in itself are subdivided into distinct forms of manifestation:

economic capital – immediately and directly convertible into money, which might be institutionalized in the form of property rights. The variables present in the SPPA 2017 dataset pertaining this guise of capital include: income and housing.
cultural capital – convertible into economic capital under certain conditions institutionalized in the form of formal educational titles and qualifications. Cultural capital may manifest itself in three distinct forms:
- the embodied state – materialized in the mind and body of an individual, connected to the private acquisition and accumulation of knowledge. This form of cultural capital can be summarized as the means of consumption and understanding of cultural goods – a proxy of taste.
- the objectified state – denoting cultural goods – physical objects or activities, inherently interconnected with the embodied state, which shapes and defines an individual’s relations and valuations of the material objects, such as writings, paintings, monuments, etc. It is transmissible in its materialized physical form through property rights, however the specific appropriation is preconditioned by the possession of a certain taste.
- the institutionalized state – an objectified manifestation of the cultural capital embodied in an individual’s educational credentials – it transmits original properties onto cultural capital, which it, itself, is presumed to guarantee.
social capital – made up of social “connections”, which, in certain conditions, is convertible into economic capital and may be institutionalized in the form of a title of nobility. In this dataset, location and job will serve as a proxy of social capital, as the status of the respondent’s living area and occupation often predicate the possibility of the acquisition of social connections and community building thorugh shared identities.

Having outlined the roadmap for the subsequent analysis, I will proceed with the examination of specifically targeted Apriori rules. The code below serves the purpose of a setup for further analysis. Grouds of variables are created to illustarte distinct forms of capital.

# defining a set of cultural goods 
cultural_goods <- c("Opera", "Ballet", "Art_Museum", "Classical_Music", "Sightseeing", "Crafts_Fair", "Outdoor_Festival", "Latin_Music",
                   "Jazz", "Musical", "Theater", "Books", "Poetry", "Live_Dance", "Audiobook")

# set of economic capital variables
economic <- grep("Income=|Housing=", itemLabels(sppa_t), value = TRUE)

# institutionalized cultural capital
education <- grep("Education=", itemLabels(sppa_t), value = TRUE)

# set of social capital proxies
social <- grep("Job=|Location=", itemLabels(sppa_t), value = TRUE)

# preferred color :)) for visualizations
my_color <- "mediumpurple"

Economic Capital

This section presents the results for targeted analysis of the SPPA 2017 respondents based on their level of income and house ownership – direct indicators of an individual’s socio-economic status and ease of living. Seeing as, cultural goods are rightfully considered a type of experience good, the consumption of which requires time as well as a prior possession of a certain taste, which dictates the value a person attributes to a certain good – the satiation of basic needs and guarantees of stability may serve as strong drivers of participation in the arts. This phenomenon is described by Bourdieu (1986) as distance from necessity, a key characteristics of the upper classes, which allow these social strata an appreciation of art, which favors form over functionality, determining the drive towards abstract forms of art, as well as forms of art historically established as elite – forms characterized by a high cognitive barrier of entry, as their appreciation is predicated on the acquisition of specific cultural codes necessary for decoding the complex aesthetic structures (opera, ballet).

Therefore, the Apriori rules are inspected in terms of how economic capital revealed through an individual’s level of income and property ownership influence the consumption of cultural goods. The table below presents aggregated economic profiles of respondents and the corresponding cultural consumption. The inspection respondents characterized by high income reveals a wide diversification of cultural consumption, reaching the highest numbers of itemset sizes, potentially evaluating a theory proposed by Peterson (1992), stating that following the democratization of access to cultural goods, the tastes allowing for a distinction between certain social strata evolved from high-brow and popular as proposed by Bourdieu (1984) and Gans (1974) into an omnivore-univore distinction. His analysis of the American audience, based on the 1992 edition of the SPPA survey, reveals a correlation between the diversification and volume of cultural consumption and the socio-economic position of an individual, proclaiming that the mode of consumption of higher classes changed from elitist, strictly in the sense of high-brow culture, into an omnivorous consumption – an eclectic combination of a variety of cultural goods, be it high-brow or popular. While the consumption of the lower classes remained tethered to a single lane of cultural good consumption. However, upon further inspection of cultural baskets consumed by respondents with low income, it is visible that while staying in one lane, they also tend to consume popular goods with lower entry level, such as: books, sightseeing, carfts fair and outdoor festivals.

# rule filtering : lhs - income or housing, rhs - cultural goods; economic capital -> cultural goods 
rules_economic <- apriori(sppa_t, parameter = list(supp = 0.005, conf = 0.15, minlen = 2),
                                 appearance = list(lhs = economic, 
                                                   rhs = cultural_goods,
                                                   default = "none"),
                                 control = list(verbose = FALSE))

# data frame of economic rules 
rules_df <- data.frame(
  lhs = labels(lhs(rules_economic)), 
  rhs = labels(rhs(rules_economic)),
  quality(rules_economic) 
)

profile_summary <- rules_df %>%
  mutate(
    lhs = str_remove_all(lhs, "\\{|\\}"),
    rhs = str_remove_all(rhs, "\\{|\\}")
  ) %>%
  group_by(lhs) %>%
  summarise(
    Basket_Size = n(),
    Cultural_Basket = paste0(rhs, " (Lift=", round(lift, 2), ")", collapse = ", "),
    Avg_Lift = round(mean(lift), 2),
    Max_Confidence = round(max(confidence), 2)
  ) %>%
  arrange(desc(Avg_Lift))

datatable(profile_summary, 
          options = list(scrollX = TRUE, pageLength = 10),
          caption = "Economic Profiles and Cultural Consumption")

Thus targeted rules are visualized on the interactive plot below. One can inspect the rules pertaining to specific variables and the numebr and variety of rules that the chose item is connected to.

# interactive network graph 
plot(rules_economic, method = "graph", engine = "htmlwidget", 
     control = list(nodeCol = my_color, edgeCol = my_color))

## Available control parameters (with default values):
## itemCol   =  #CBD2FC
## nodeCol   =  c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B",  "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0",  "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision     =  3
## igraphLayout  =  layout_nicely
## interactive   =  TRUE
## engine    =  visNetwork
## max   =  100
## selection_menu    =  TRUE
## degree_highlight  =  1
## verbose   =  FALSE

The parallel coordinates graph allows for a graphic illustration of rules as arrows – the intensity of their color represents their corresponding values of lift, while the width of their lines corresponds to support values. All of the plotted rules pertain to the individuals characterized by high values of income, the grpah allows for an identification of the diversity of their consumption, ranging from musicals, thorugh museum attendance (highest support) and jazz listening, to classical music.

#  parallel coordinates (static)
plot(head(sort(rules_economic, by="lift"), 10), method="paracoord", 
     control = list(col = my_color))

Cultural Capital

This section proceeds with the analysis of the influence that different forms of cultural capital hold over the other. Though it might seem endogenous, the verification of the power of institutionalized state and academically acquired and evaluated tastes to shape and perpetuate an individual’s mode of cultural consumption. The verification of the homology of tastes seems to also be of the utmost importance in the context of taste formation.

Institutionalized State

Cultural capital in the form of institutionalized state serves as a key proxy of an individual’s cultural socialization process largely shaped by the theories, opinions and competences acquired through education. Education shapes the eye of the beholder of cultural goods and therefore is bound to strongly influence the volume, as well as the type of participation of an individual in the arts. The table below presenting the aggregated baskets of cultural goods of the respondent, depending on their level of education, shows that

# rule filtering : education -> cultural gods 
rules_institutionalized <- apriori(sppa_t, parameter = list(supp = 0.005, conf = 0.15, minlen = 2),
                                 appearance = list(lhs = education, 
                                                   rhs = cultural_goods,
                                                   default = "none"),
                                 control = list(verbose = FALSE))

rules_df_edu <- data.frame(
  lhs = labels(lhs(rules_institutionalized)), 
  rhs = labels(rhs(rules_institutionalized)),
  quality(rules_institutionalized) 
)

profile_summary_edu <- rules_df_edu %>%
  mutate(
    lhs = str_remove_all(lhs, "\\{|\\}"),
    rhs = str_remove_all(rhs, "\\{|\\}")
  ) %>%
  group_by(lhs) %>%
  summarise(
    Basket_Size = n(),
    Cultural_Basket = paste0(rhs, " (Lift=", round(lift, 2), ")", collapse = ", "),
    Avg_Lift = round(mean(lift), 2),
    Max_Confidence = round(max(confidence), 2)
  ) %>%
  arrange(desc(Avg_Lift))

datatable(profile_summary_edu, 
          options = list(scrollX = TRUE, pageLength = 10),
          caption = "Education and Cultural Consumption")

The interested viewer, who may be inclined to further inspect the relationship between the level of education obtained by an individual and their cultural consumption, can inspect the graph of rules choosing the concerned level of education by which to filter the rules.

# interactive network graph 
plot(rules_institutionalized, method = "graph", engine = "htmlwidget", 
     control = list(nodeCol = my_color, edgeCol = my_color))

## Available control parameters (with default values):
## itemCol   =  #CBD2FC
## nodeCol   =  c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B",  "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0",  "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision     =  3
## igraphLayout  =  layout_nicely
## interactive   =  TRUE
## engine    =  visNetwork
## max   =  100
## selection_menu    =  TRUE
## degree_highlight  =  1
## verbose   =  FALSE

The parallel coordinates plot for the first ten rules illustrates the similarly diversified consumption of cultural goods. The arrows, which have their starting point in Education=Master are characterized by higher lift values, but are more sparse compared to those whose staring point is concentrated in Education=PhD_Prof.

plot(head(sort(rules_institutionalized, by="lift"), 10), method="paracoord", 
     control = list(col = my_color))

Embodied and Objectified State

The variables representing cultural goods in the SPPA 2017 dataset cannot be strictly attributed to one form of cultural capital (i.e. embodied or objectified state), the participation in culture requires a certain level of savvy, pre-existing knowledge and predispositions of an individual and cannot, therefore, be reduced merely to the materiality of a public event or an owned good – this is why this section binds these two forms of capital together. What is essential to the embodied state and acquired taste is its homology and coherence – to fully and truly substantialize a form of cultural capital it must be integrated into the sheer self-hood and identity of an individual – their habitus – from which the individual is inseparable. This section analyzes the coherence of tastes through an inspection of rules of which the right-hand-side, as well as the left-hand-side contain solely cultural goods. For that purpose, a subset of SPPA is created so as to contain only cultural variables solely for the respondents, who participated in public arts. Subsequently, apriori algorithm with increased confidence and support values is run. The output below shows the first 20 rules with the highest lift values, however their analysis does not facilitate the qualification of the taste and its homogeneity – most of the printed rules illustrate the relationship between multiple cultural baskets and ballet attendance, the fact that such a lot of disticnt baskets lead to the same rhs point to the heterogeneity of taste, rather than its homology, which might validate the findings of Peterson (1992) – in the era of democratized access taste is captured in the eclectic combination of distinct modes of cultrual consumption.

# temporary dataset - only cultural variables
sppa_active <- sppa_t[size(sppa_t[, cultural_goods]) > 0]

sppa_culture <- sppa_active[, cultural_goods]

# apriori on subset
rules_homology <- apriori(sppa_culture, 
                          parameter = list(supp = 0.005, conf = 0.4, minlen = 2),
                          control = list(verbose = FALSE))

# removing redundant rules 
rules_homology <- rules_homology[!is.redundant(rules_homology)]

# inspect
inspect(head(sort(rules_homology, by = "lift"), 20))

##      lhs                    rhs               support confidence   coverage     lift count
## [1]  {Opera,                                                                              
##       Musical,                                                                            
##       Theater,                                                                            
##       Books}             => {Ballet}      0.005102041  0.4558824 0.01119157 9.295105    31
## [2]  {Opera,                                                                              
##       Art_Museum,                                                                         
##       Crafts_Fair,                                                                        
##       Musical}           => {Ballet}      0.005431205  0.4400000 0.01234365 8.971275    33
## [3]  {Opera,                                                                              
##       Musical,                                                                            
##       Theater}           => {Ballet}      0.005266623  0.4383562 0.01201448 8.937759    32
## [4]  {Opera,                                                                              
##       Sightseeing,                                                                        
##       Crafts_Fair,                                                                        
##       Musical}           => {Ballet}      0.005266623  0.4266667 0.01234365 8.699418    32
## [5]  {Art_Museum,                                                                         
##       Crafts_Fair,                                                                        
##       Jazz,                                                                               
##       Musical,                                                                            
##       Books,                                                                              
##       Live_Dance}        => {Ballet}      0.005102041  0.4246575 0.01201448 8.658454    31
## [6]  {Opera,                                                                              
##       Crafts_Fair,                                                                        
##       Musical}           => {Ballet}      0.006418697  0.4193548 0.01530612 8.550336    39
## [7]  {Art_Museum,                                                                         
##       Sightseeing,                                                                        
##       Crafts_Fair,                                                                        
##       Musical,                                                                            
##       Theater,                                                                            
##       Books,                                                                              
##       Live_Dance}        => {Ballet}      0.005102041  0.4189189 0.01217907 8.541447    31
## [8]  {Art_Museum,                                                                         
##       Sightseeing,                                                                        
##       Crafts_Fair,                                                                        
##       Musical,                                                                            
##       Theater,                                                                            
##       Live_Dance}        => {Ballet}      0.005431205  0.4177215 0.01300197 8.517033    33
## [9]  {Classical_Music,                                                                    
##       Jazz,                                                                               
##       Musical,                                                                            
##       Live_Dance}        => {Ballet}      0.005102041  0.4133333 0.01234365 8.427562    31
## [10] {Art_Museum,                                                                         
##       Crafts_Fair,                                                                        
##       Musical,                                                                            
##       Theater,                                                                            
##       Live_Dance}        => {Ballet}      0.005924951  0.4090909 0.01448321 8.341062    36
## [11] {Opera,                                                                              
##       Art_Museum,                                                                         
##       Sightseeing,                                                                        
##       Musical,                                                                            
##       Books}             => {Ballet}      0.005102041  0.4078947 0.01250823 8.316673    31
## [12] {Opera,                                                                              
##       Art_Museum,                                                                         
##       Sightseeing,                                                                        
##       Musical}           => {Ballet}      0.005431205  0.4074074 0.01333114 8.306736    33
## [13] {Art_Museum,                                                                         
##       Classical_Music,                                                                    
##       Jazz,                                                                               
##       Books,                                                                              
##       Live_Dance}        => {Ballet}      0.005431205  0.4074074 0.01333114 8.306736    33
## [14] {Art_Museum,                                                                         
##       Crafts_Fair,                                                                        
##       Jazz,                                                                               
##       Musical,                                                                            
##       Live_Dance}        => {Ballet}      0.005266623  0.4050633 0.01300197 8.258941    32
## [15] {Opera,                                                                              
##       Art_Museum,                                                                         
##       Theater,                                                                            
##       Books}             => {Ballet}      0.005102041  0.4025974 0.01267281 8.208664    31
## [16] {Opera,                                                                              
##       Theater,                                                                            
##       Books}             => {Ballet}      0.005760369  0.4022989 0.01431863 8.202577    35
## [17] {Art_Museum,                                                                         
##       Classical_Music,                                                                    
##       Sightseeing,                                                                        
##       Outdoor_Festival,                                                                   
##       Jazz,                                                                               
##       Books,                                                                              
##       Poetry}            => {Latin_Music} 0.005431205  0.5156250 0.01053325 7.567482    33
## [18] {Art_Museum,                                                                         
##       Classical_Music,                                                                    
##       Sightseeing,                                                                        
##       Outdoor_Festival,                                                                   
##       Jazz,                                                                               
##       Poetry}            => {Latin_Music} 0.005595787  0.5074627 0.01102699 7.447689    34
## [19] {Art_Museum,                                                                         
##       Sightseeing,                                                                        
##       Jazz,                                                                               
##       Poetry,                                                                             
##       Live_Dance}        => {Latin_Music} 0.005266623  0.5000000 0.01053325 7.338164    32
## [20] {Art_Museum,                                                                         
##       Classical_Music,                                                                    
##       Crafts_Fair,                                                                        
##       Outdoor_Festival,                                                                   
##       Jazz,                                                                               
##       Poetry}            => {Latin_Music} 0.005266623  0.5000000 0.01053325 7.338164    32

The graph below enables further exploration of combinations of cultural goods, which concur with others. The arrows originate from the rule antecedent visualized as the rotund blue square and the arrows pointing to red dots, mark the rule consequent. For example, an arrow originating in “Opera”, pointing towards “Ballet” implies that attending the opera increases the likelihood of attending the ballet.

rules_top <- head(sort(rules_homology, by = "lift"), 50)

plot(rules_top, method = "graph", engine = "htmlwidget", 
     control = list(
       type = "items",  
       layout = igraph::layout_with_fr, 
       alpha = 1,       
       arrowSize = 0.5 
     ))

## Warning: Unknown control parameters: type, layout, alpha, arrowSize

## Available control parameters (with default values):
## itemCol   =  #CBD2FC
## nodeCol   =  c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B",  "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0",  "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision     =  3
## igraphLayout  =  layout_nicely
## interactive   =  TRUE
## engine    =  visNetwork
## max   =  100
## selection_menu    =  TRUE
## degree_highlight  =  1
## verbose   =  FALSE

Social Capital

Social capital approximated with classification of the area, in which the respondent lives and their occupation often determine the possibilities of one’s participation in distinct cultural events – metropolitan areas provide a wider variety of options due to a higher concentration of cultural institutions. Occupation, however, influences an individual inner circle, as well as (and mostly) class identity. This section examines the baskets of cultural goods dependent on the respondent’s occupation and living area. The table below presents aggregated rules for all combinations of occupation and location. The most sparse cultural baskets, containing mostly itemset relating to popular culture are consumed by sales associates, manager and manual workers living in rural areas, while the most diversified baskets belong to professionals and managers living in the metropolitan areas.

# rule filtering : social capital -> cultural gods 
rules_social <- apriori(sppa_t, parameter = list(supp = 0.005, conf = 0.15, minlen = 2),
                                 appearance = list(lhs = social, 
                                                   rhs = cultural_goods,
                                                   default = "none"),
                                 control = list(verbose = FALSE))

# data frame of social rules 
rules_df_social <- data.frame(
  lhs = labels(lhs(rules_social)), 
  rhs = labels(rhs(rules_social)),
  quality(rules_social) 
)

profile_summary_social <- rules_df_social %>%
  mutate(
    lhs = str_remove_all(lhs, "\\{|\\}"),
    rhs = str_remove_all(rhs, "\\{|\\}")
  ) %>%
  group_by(lhs) %>%
  summarise(
    Basket_Size = n(),
    Cultural_Basket = paste0(rhs, " (Lift=", round(lift, 2), ")", collapse = ", "),
    Avg_Lift = round(mean(lift), 2),
    Max_Confidence = round(max(confidence), 2)
  ) %>%
  arrange(desc(Avg_Lift))

datatable(profile_summary_social, 
          options = list(scrollX = TRUE, pageLength = 10),
          caption = "Social Profiles and Cultural Consumption")

The graph plot below facillitates further socially targeted rule exploration.

plot(rules_social, method = "graph", engine = "htmlwidget", 
     control = list(nodeCol = my_color, edgeCol = my_color))

## Warning: Unknown control parameters: edgeCol

## Available control parameters (with default values):
## itemCol   =  #CBD2FC
## nodeCol   =  c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B",  "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0",  "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision     =  3
## igraphLayout  =  layout_nicely
## interactive   =  TRUE
## engine    =  visNetwork
## max   =  100
## selection_menu    =  TRUE
## degree_highlight  =  1
## verbose   =  FALSE

## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).

The parallel coordinates plot illustrate the first ten rules, which all relate to people living in metropolitan areas. This illustration displays professional’s and managers inclination towards consumption of legitimized high-brow culture.

plot(head(sort(rules_social, by="lift"), 10), method="paracoord", 
     control = list(col = my_color))

Conclusions

According to the idea of social distinction, members of distinct classes send out various signals meant to embody the values and tastes of a given social stratum. This process can be treated similarly to the process of signalling as outlined by Michael Spence (1973) in the context of the job market, however taste does not serve solely as a signal of belonging, but rather as a means of symbolic power perpetuating social differences and prejudices. The analysis performed in this project seems to be in line with the later studies of taste, such as that performed by Peterson (1992), stating that the democratization of access led to the abolishing and dismantling of previously functioning hierarchies of taste, as outlined by Bourdieu (1984) and Veblen (1899) – observing a clear bifurcation between high-brow or luxury culture and popular or mass media. Nowadays, the schism previously formed by taste does not seem so clear, imposing and separating. The higher classes can be characterized by a more diversified consumption, no longer treating popular culture with disdain, but rather actively engaging in an eclectic variety of cultural activities – combining high-brow culture along with the popular.

This study successfully applies association rule mining algorithm to analyze the inner workings of taste and the interaction between three forms of capital as distinguished by Bourdieu (1986): economic, cultural and social capital. Implementation of unsupervised learning algorithms for an analysis of sociological or survey-based data serves as a powerful tool enabling the researcher to see patterns beyond common intution. The key contribution of the tehcniqu implemented in this study – association rule mining – is the generation of hidden relationships that might have otherwise remained unnoticed or unthought of. The value of this study lies in offering a nuanced representation of distinct class profiles and their cultural consumption. However, there is a lot of room for improvement in future reserach – while the implementation of asociation rule mining in itself provides interestign results, future studies could also incorporate other forms of unsupervised learnign to uncover hidden patterns in cultural consumption in relation to social classes.

Bibliography

Bourdieu, P. (1984). Distinction a social critique of the judgement of taste. In Inequality (pp. 287-318). Routledge.

Bourdieu, P. (1986). The forms of capital. In The sociology of economic life (pp. 78-92). Routledge.

Gans, H. J., (1974). Popular culture and high culture. New York: Basic Books.

Gondal, N. (2025). Rulenet: Mapping the structure of cultural preferences using association-rules and network graphs. Poetics, 110, 101996.

Marx, K. (1886). Pisma pomniejsze, t. 1, Librairie Keva, Paryż, s. 128

National Endowment for the Arts, and United States. Bureau of the Census. Survey of Public Participation in the Arts (SPPA), United States, 2017. Inter-university Consortium for Political and Social Research [distributor], 2019-02-04.

Pan, Z., Li, J., Chen, Y., Pacheco, J., Dai, L., & Zhang, J. (2019). Knowledge discovery in sociological databases: An application on general society survey dataset. International Journal of Crowd Science, 3(3), 315-332.

Peterson, R. A. (1992). Understanding audience segmentation: From elite and mass to omnivore and univore. Poetics, 21(4), 243-258.

Spence, M. (1973). Job Market Signaling. The Quarterly Journal of Economics, 87(3), 355–374.

Veblen, T., & Howells, W. D. (1899). The theory of the leisure class: 1899. AM Kelley.