1 Introduction

This is an exploratory data analysis (EDA) of the brucellosis knowledge, attitudes, and practices (KAP) survey conducted across six locations in Isiolo County, Kenya including garbatulla_reserve,sericho_reserve,kina_reserve,kina_main,sericho_main,and garbatulla_main. The survey was administered to 409 livestock-keeping households.

The purpose of this document is not just to describe the data, but to check, whether the dataset can actually answer the four research objectives set out in the analysis plan:

Objective 1 : Assess livestock keepers’ level of awareness and knowledge of brucellosis.
Objective 2 : Evaluate management practices and behaviours relevant to brucellosis transmission and control.
Objective 3 : Examine different households perceptions and attitudes towards brucellosis transmission and prevention.
Objective 4 : Determine the influence of household economic status and access to resources on knowledge and adoption of preventive practices

Rows (households): 409

Columns (variables): 147

Note on data missingness

A large share of “missingness” in this survey is not random ,it is structural, caused by the collection tool’s skip logic. For example, only respondents who answered “Yes” to “Are you aware of brucellosis?” (brucella_aware) were shown the entire knowledge section (Section B) and several subsequent sections. This means low coverage on a variable is often a feature of the questionnaire design, not a data quality failure. Each section below makes clear which kind of missingness is in play.

2 Dataset summary

Before looking at brucellosis-specific responses, it’s worth understanding who was surveyed, since every objective will use these demographic and livelihood variables as explanatory factors.

2.1 Where respondents were surveyed

Figure 1: Number of households surveyed by location

Table 1: Households surveyed by ward
ward	n_households	pct
Garbatulla	174	42.5
Kinna	144	35.2
Sericho	91	22.2

The survey spans three wards (Garbatulla, Kinna, Sericho), each split into a “main” town area and a “reserve” (more remote pastoral) area, giving six location strata in total. Garbatulla ward has the largest share of the sample, Sericho the smallest.

2.2 Demographic summary of respondents

Table 2: Demographic summary of respondents (n = 409)
Statistic	Value
Mean age	43.6
Median age	40.0
% Female	44.0
Mean household size	7.0
Mean years schooling	4.8
% No formal education	51.5

Figure 2: Age group and education category distribution

Figure 3: Age group and education category distribution

Figure 4: Primary income source of surveyed households

Explaination

The sample is mostly pastoralist, which fits the research context well since brucellosis is fundamentally a disease of livestock-keeping populations.

For Objectives 1 and 2 (factors associated with knowledge, and management practices), age_group, edu_cat, sex, income_source, and no_hh_members are all fully populated (100% or near-100% coverage) and ready to be used as explanatory variables. There is enough spread across age groups and education categories to support meaningful comparisons.

3 Objective 1 | Awareness and knowledge of brucellosis among livestock-keeping households

We are trying to answer how many households have heard of brucellosis, and among those who have, how much do they actually know about it?

3.1 Overall awareness

Figure 5: Brucellosis awareness among surveyed households

84.1% of respondents (344 of 409) reported having heard of brucellosis before. This is the key skip-logic gate for the rest of the knowledge section , only these 344 respondents went on to answer the detailed knowledge items below.

3.2 Where awareness comes from?

Figure 6: First source of brucellosis awareness, among aware respondents

3.3 Knowledge depth, what aware respondents actually know

Awareness (“have you heard of it”) is a much lower bar than knowledge (“do you know how it spreads, what it looks like, and how it affects people”). The questionnaire probed four separate knowledge domains, each captured as a set of multi-select items now expanded into 0/1 indicator columns:

Animal signs : abortion, stillbirth, weak calf, retained placenta, reduced milk, swollen testes, swollen joints, weight loss
Animal-to-animal transmission routes
Animal-to-human transmission routes
Human signs of brucellosis

Figure 7: Distribution of overall knowledge score (% of items correctly identified)

Table 3: Knowledge score (%) summary statistics, aware respondents only
Min	25th pct	Median	Mean	75th pct	Max
2.5	19.375	22.5	24.1	27.5	75

The distribution is right-skewed and clustered at the low end: most aware respondents correctly identify only a small fraction of the full set of signs and transmission routes. A mean knowledge score around the low-to-mid 20s (out of 100) suggests that simply being “aware” of brucellosis does not translate into detailed understanding of brucellosis.

3.4 Which knowledge domain is weakest?

Figure 8: Average number identified, by knowledge domain

3.5 Knowledge score by demographic group

A comparison of whether knowledge varies systematically by age, sex, or education.

Figure 9: Knowledge score (%) by age group, sex, and education category

Figure 10: Knowledge score (%) by age group, sex, and education category

Figure 11: Knowledge score (%) by age group, sex, and education category

Summary

Awareness itself is recorded for all 409 respondents with zero missingness, and the detailed knowledge items (40 indicator variables across four domains) are consistently populated for all 344 aware respondents (84% of the sample). The data clearly supports both halves of Objective 1: the descriptive half (how aware are people, and what do they actually know) is fully answerable with the figures above, and the inferential half (what predicts higher knowledge) has a complete, non-missing set of explanatory variables — age_group, sex, edu_cat, income_source, yrs_keep_livestock, rec_ext_services, grp_member — to regress knowledge_score_pct or aware_binary against.

Of the three demographic splits shown, education shows the clearest separation:

The tertiary-educated group’s median knowledge score (27.5%) sits visibly above the other three education categories, which cluster tightly together around 22.5% regardless of formal schooling level.
Age shows a smaller shift, with the 31–45 group scoring modestly higher than the other two age bands;
Sex shows the smallest difference of the three, with substantial overlap between male and female distributions.

This ordering education > age > sex gives a working hypothesis for the regression: education is the demographic variable most likely to retain a significant, independent effect on knowledge score once the other covariates are controlled for.

4 Objective 2 | Management practices and behaviours related to transmission an dcontrol

This objective asks what households actually do to prevent or manage brucellosis risk, as opposed to what they know. The questionnaire captured this through a multi-select list of specific prevention practices, plus a smaller sub-section on individual risk behaviours.

4.1 Which prevention practices are actually used?

Figure 12: Prevention practices reported, as a share of all 409 households

A genuine finding, not a data error

Eight of the twelve prevention-practice options in the questionnaire including restricting movement, farm sanitation, slaughtering positive animals, isolating animals during parturition, disposing of fetal material safely, disinfecting, public education, and seeking veterinary advice , were selected by zero respondents across all 409 households.

This suggests a real and large gap between the practices considered “textbook” prevention and what pastoralist households in this part of Isiolo are actually doing.

4.2 How many practices does a typical household report?

Figure 13: Number of prevention practices reported per household

Table 4: Households adopting at least one prevention practice (used as the binary outcome for Objective 2 modelling)
adopted_any_practice	n	pct
Adopted none	124	30.3
Adopted ≥1 practice	285	69.7

4.3 Individual-level risk behaviours

A smaller sub-section of the questionnaire asked about specific risk behaviours during animal handling. This block was only shown to a subset of respondents (the human-illness branch of the questionnaire), so coverage is much lower at about 17% of the sample (60–69 households).

Figure 14: Self-reported risk behaviours, among the sub-sample asked (n ≈ 69)

4.4 Does knowledge translate into practice?

A key analytical question for Objective 2: do households with higher knowledge scores actually report more prevention practices?

Figure 15: Knowledge score vs. number of prevention practices adopted

What this means for Objective 2

The self-reported adoption question (adopt_prev_ctrl) turned out to have zero variance since every respondent who answered it said “Yes.” Therefore, it cannot serve as a regression outcome. The practice-count and practice-variables (prev_practice_count, adopted_any_practice, and the four practices with real variation: vaccination, testing, isolating infected animals, testing new animals) are the variables that should anchor this objective’s analysis instead. The individual risk-behaviour items (raw milk consumption, glove use, etc.) are real and usable, but only for a descriptive sub-analysis — at n ≈ 69 they are too small to support a separate regression model.

The boxplot above shows:

households with 0-3 practices show essentially indistinguishable medians (22–26%) with heavily overlapping IQRs, so partial adoption does not appear to track with knowledge score.
The 4-practices group sits higher (median ≈ 32%) but is also the smallest of the five groups (n = 10, versus n = 59–32 for groups 0–3) and shows no plotted outliers — consistent with too few observations for any point to fall outside 1.5×IQR, rather than a genuinely tighter distribution.

Because the data are cross-sectional, any association that does emerge cannot establish whether knowledge drives adoption or adoption (and the experience that comes with it) builds knowledge ,this directionality should be flagged as a limitation regardless of the regression result.

5 Objective 3 | Perceptions and attitudes towards brucellosis transmission and prevention

This objective covers Section E of the questionnaire: Likert-scale items capturing how serious, preventable, and threatening respondents perceive brucellosis to be. Unlike the knowledge section, this section has two separate blocks with very different coverage:

Perceived risk of transmission (a2a or a2h)
Perceived effectiveness of prevention

Table 5: Coverage of the two perception blocks in Section E
Block	Example items	n (coverage)	% of sample
seqb / seqc (transmission & prevention attitudes)	Risk from consuming milk; vaccination effectiveness	275	67.2
seq1–seq16 (general severity & risk perception)	Brucellosis is a serious threat to animals/humans	69	16.9
comm_* (community-level perception)	Shared grazing; shared water points	275	67.2

Why the seq1–16 block has only 17% coverage

This is a structural skip-logic split tied to which field team administered the questionnaire, not random missingness. The broader seq1–seq16 items were only shown to a subset of respondents.
This means seq1–seq16 should be treated as descriptive-only (n = 69 is too small for a stable regression), while the seqb/seqc/comm_* block (n = 275, 67% coverage) can be used for inferential analysis.

5.1 Attitudes toward transmission risk and prevention (seqb / seqc)

Figure 16: Distribution of responses across transmission and prevention attitude items

5.2 Community-level perception items

Figure 17: Community-level perception items

5.3 General severity perception (descriptive only, n = 69)

Figure 18: General severity/preventability perception (small sub-sample)

Summary

The seqb/seqc/comm_* block (16 items, n = 275) are enough to support both descriptive summaries and a perception-index regression (e.g. summing or averaging Likert scores and regressing against knowledge score and SES).
The seq1–16 block (n = 69) should be reported as a descriptive table but not enpugh for inferential analysis as stated earlier.

6 Objective 4 | Socioeconomic status and access to resources on knowledge and adoption of preventive practices

This objective characterizes household wealth and access as both a descriptive picture of the sample and as an explanatory variable for knowledge and practice in the inferential models.

6.1 The SES index

ses_index is a composite asset index built from ten components: seven binary assets (radio, bicycle, motorbike, car, house ownership, piped water, electricity), phone count (capped at 3), house wall material, and toilet type , each rescaled to a 0–1 contribution and averaged.

Figure 19: Distribution of the household SES index

Table 6: Missingness pattern in the SES block
Households missing SES index	All from brucella_aware = No?	% of full sample
56	TRUE	13.7

The 56 households missing an SES index are exactly the households who were not asked Section I (because they were routed past it after answering “No” to brucellosis awareness), this is structural missingness, not random, and should not be imputed.

6.2 Asset ownership breakdown

6.3 Livestock holdings (a wealth proxy specific to pastoralist households)

Figure 21: Distribution of total livestock holdings

6.4 Institutional access: extension services and group membership

Figure 22: Access to extension services and group membership

6.5 Does SES relate to knowledge?

Figure 23: SES index vs. knowledge score

Summary

The ten-component index, plus all of its raw inputs (assets, house construction, toilet type, phone count) and the separate livestock-holding variables, are populated for 353 of 409 households (86%), enough for both a descriptive wealth profile and use as an explanatory variable in the knowledge and practice regressions. The missingness pattern is fully understood and structural (tied to the brucellosis-aware skip gate), so it should be reported as such rather than imputed.

The scatter plot above shows whether wealthier households tend to know more about brucellosis (n = 344).

The bivariate association is weak , r = 0.06 (p = 0.306) , with the fitted trend rising only about 3 percentage points across the full SES range (0–0.85). The confidence band also widens noticeably above SES ≈ 0.6, where data are sparser, so the apparent upward tilt at the high end of the scale should be read cautiously rather than as a strong trend.

7 Will the final dataset help answer our objectives?

Table 7: Summary verdict by objective
Objective	Key variables	Coverage	Verdict
Obj 1 \| Awareness & knowledge	brucella_aware, knowledge_score_pct, 40 knowledge dummies	100% awareness / 84% knowledge (n=344)	Fully answerable
Obj 2 \| Management practices	prev_practice_count, adopted_any_practice, 4 practice dummies	100% practice dummies / 66%* self-report (unusable)	Answerable, reframed outcome variable
Obj 3 \| Perceptions & attitudes	seqb/seqc (16 items), comm_* (4 items); seq1-16 descriptive only	67% (n=275) main block / 17% (n=69) general block	Answerable in two tiers
Obj 4 \| SES & access	ses_index, 10 asset/housing inputs, livestock holdings	86% (n=353)	Fully answerable

Taken together, this dataset can answer all four objectives, though two of them require a small adjustment in framing compared to how they might originally have been conceived:

For Objective 2, the self-reported “did you adopt any prevention practice” question turned out to have no variation in the data (everyone who answered said yes), so the practice-count and practice-variables built from the multi-select responses should be used as the outcome instead.

For Objective 3, the perception data needs to be presented in two tiers: the seqb/seqc questions (n = 275) suitable for both description and regression, and the seq1-16 questions (n = 69) that is genuinely useful for context but should be explicitly labelled as descriptive variable only.

Objectives 1 and 4 are fully supported by the data as originally framed, with large, well-covered variable sets and no structural barriers.