From Financial Topics to Information Needs: Peer Financial Sense-Making in the Philippine Reddit Community r/phinvest

Author

Affiliation

Dan Anthony Dorado

UP School of Library and Information Studies

Abstract

Online personal-finance communities increasingly shape how users seek advice, evaluate credibility, and make sense of financial uncertainty. Yet computational studies of online financial discourse often classify posts by topic, risking a construct-validity error: financial topics do not, by themselves, reveal what kind of information support users seek, whom they trust, what risks they perceive, or how peer communities evaluate financial claims. This study examines high-visibility discussions in r/phinvest, a Philippine-oriented Reddit community, as public episodes of everyday financial information seeking and peer sense-making. Using a qualitatively driven sequential mixed-methods design, the study combines exploratory corpus mapping with directed qualitative content analysis. Topic-domain mapping, flair analysis, term frequencies, local-term detection, and exploratory topic modeling are used to organize the corpus, while qualitative coding provides the main basis for interpreting information-need types, uncertainty objects, trust objects, risk frames, peer-response types, and Philippine contextual anchors. The corpus consists of 30 high-visibility r/phinvest threads through June 30, 2026, including 5,298 parsed comments and 5,328 mapping-eligible post/comment documents. Findings show that high-visibility r/phinvest discourse spans business, investing, real estate, insurance, banking, digital finance, and government-linked funds, but these topic domains are analytically insufficient. The same financial topic can express different information needs, including procedural guidance, product comparison, trust verification, risk assessment, decision validation, and experiential advice. Peer responses function as informal credibility evaluations through warnings, corrections, calculations, anecdotes, recommendations, and referrals. Philippine-specific institutions and products, including VUL, MP2, SSS, BSP, PDIC, GCash, Maya, Pag-IBIG, and UITFs, localize financial uncertainty and shape how users evaluate safety, legitimacy, suitability, liquidity, and trust. The study contributes to information behavior, financial capability, and computational text-analysis research by showing why financial topics must not be treated as direct evidence of information needs, social importance, or financial behavior.

Keywords

financial information behavior, everyday life information seeking, financial capability, Reddit, r/phinvest, peer financial advice, online credibility, opic modeling, qualitative content analysis

1 Introduction

1.1 Problem background

Everyday financial decision-making in the Philippines increasingly unfolds through digital platforms, peer advice, and locally specific financial institutions. Filipino users encounter questions about banking, e-wallets, government-linked savings vehicles, insurance products, scams, business, real estate, debt, and investing in a financial environment marked by uneven inclusion, product complexity, and rapid fintech adoption. National and international evidence shows that formal access, account ownership, digital payments, insurance, savings, and borrowing remain unevenly distributed (Bangko Sentral ng Pilipinas, 2022; World Bank, 2015, 2022). These conditions make financial information seeking a practical everyday activity, not merely a matter of abstract financial knowledge.

Existing financial-literacy and financial-inclusion research has generated valuable evidence about knowledge, access, and behavior, but it often relies on surveys, institutional indicators, or individual capability measures (Johnson & Sherraden, 2007; Lusardi & Mitchell, 2014; Stolper & Walter, 2017). Such approaches are necessary, but they are less able to show how people publicly formulate financial uncertainty, seek judgment, evaluate credibility, and negotiate decisions with peers. Everyday life information seeking offers a useful starting point because it treats information practices as embedded in routine problem solving, social relationships, and situated constraints (Case & Given, 2016; Savolainen, 1995; Wilson, 1999). In personal finance, the issue behind a post is rarely exhausted by its topic: a discussion of MP2, VUL insurance, digital banks, debt, scams, real estate, or small business may express a need for procedural guidance, product comparison, trust verification, risk assessment, decision validation, or experiential advice.

r/phinvest is a useful site for examining these processes because it is a public, Philippine-oriented Reddit community where financial questions become visible as posts, comments, disagreement, warnings, calculations, anecdotes, and peer judgments. Prior studies of Reddit and online financial discourse show that personal-finance communities can be analyzed to surface financial concerns, advice-seeking, and user interactions (Karpenko et al., 2021; Thukral et al., 2023). However, Reddit data must be interpreted as platform-mediated public discourse, not as a representative measure of Filipino financial behavior. Reddit visibility is shaped by ranking, voting, moderation, timing, humor, controversy, and community norms; public posts may also contain sensitive financial, employment, family, and scam-related details (boyd & Crawford, 2012; Franzke et al., 2020; Lazer et al., 2020; Massanari, 2017; Nissenbaum, 2004; Proferes et al., 2021).

1.2 Research problem

The central problem addressed by this study is not simply that r/phinvest has received limited scholarly attention. A missing-site claim would be too weak. The stronger problem is a construct-validity problem: financial topics do not, by themselves, reveal the underlying information need, uncertainty structure, trust object, or peer-evaluation process. Computational studies of online finance discourse can identify topical clusters, but topic prevalence should not be treated as direct evidence of what users need, why they ask, whom they trust, or how peer communities evaluate financial claims (Blei, 2012; Grimmer & Stewart, 2013; Isoaho et al., 2021). The study therefore intervenes in a common inferential error in computational social research: treating topic prevalence as evidence of user need, social importance, or behavioral reality.

This study therefore examines high-visibility r/phinvest threads as public episodes of financial information seeking and peer sense-making. It distinguishes five related but non-equivalent analytic layers: topic domains, information-need types, uncertainty/trust/risk frames, peer response types, and visible community response. The study treats topic modeling and term mapping as exploratory tools for organizing the corpus, while directed qualitative content analysis provides the main evidentiary basis for interpreting articulated information needs and peer evaluation. This positioning is especially important because the corpus consists of 30 high-visibility threads, a scale appropriate for qualitative mixed-methods analysis but too small for topic modeling to carry standalone inferential weight.

Table 1: Construct-validity boundaries for the study's analytic layers.

Analytic Layer	What It Can Show	What It Must Not Claim
Topic-domain mapping	The financial subjects and issue areas that appear in high-visibility r/phinvest discourse.	That topics are automatically equivalent to information needs, motives, or financial behavior.
Information-need coding	The type of support requested or implied, such as procedural guidance, trust verification, risk assessment, decision validation, or experiential advice.	That inferred needs are latent psychological states independent of textual and interactional evidence.
Trust/risk/uncertainty analysis	The object of doubt, credibility evaluation, or perceived risk in posts and comments.	That Reddit discussion verifies actual product quality, advice accuracy, or real-world outcomes.
Peer-response analysis	How commenters recommend, warn, correct, calculate, share experience, moralize, or refer users elsewhere.	That comments are neutral answers or expert advice.
Engagement description	How score and comment volume indicate visible platform response.	That engagement equals importance, need intensity, correctness, or public representativeness.

Table 1 summarizes the inferential boundaries that organize the study. The key move is to prevent topic discovery, need inference, peer evaluation, and platform engagement from collapsing into one another. Without this discipline, the study would risk becoming a descriptive inventory of popular r/phinvest topics. With it, the study can contribute to information behavior and digital financial discourse research by showing how financial uncertainty is publicly articulated, socially evaluated, and localized through Philippine-specific institutions and products (Grimmer & Stewart, 2013; Wilson, 1999).

1.3 Research questions

The study is guided by five research questions:

What topic domains appear in high-visibility r/phinvest threads, and how are these domains distributed across posts and comments?
What types of financial information needs are articulated in original r/phinvest posts, and how do these needs differ from platform-assigned topic labels?
How do users frame uncertainty, trust, and risk when asking for or evaluating financial advice in r/phinvest discussions?
How are visible community responses, including score and comment volume, descriptively patterned across information-need types and trust/risk frames?
How do Philippine-specific institutions, products, platforms, and household constraints shape the articulation and evaluation of financial information needs?

These questions are deliberately tiered. The first question maps the discourse terrain. The second is the central construct-validity question. The third and fifth interpret the uncertainty and local context that shape information seeking. The fourth treats engagement descriptively and exploratorily rather than as a statistical test of importance, consistent with cautions that behavioral traces and automated text patterns require construct-specific interpretation (boyd & Crawford, 2012; Grimmer & Stewart, 2013; Lazer et al., 2020).

1.4 Scope and boundaries

The study is bounded in five ways. First, it analyzes r/phinvest only. Second, it focuses on high-visibility threads from this year through June 30, 2026. Third, it examines public discourse rather than private decision-making or actual financial outcomes. Fourth, it analyzes articulated information needs and peer evaluation, not latent needs that users did not express. Fifth, it does not claim that Reddit users represent Filipinos, Philippine investors, or financially active adults more broadly.

These boundaries are not weaknesses to be hidden; they are conditions for making defensible claims. The sample is best understood as a purposive set of visible financial sense-making episodes. It can show how particular financial uncertainties become publicly discussable, how peers evaluate credibility and risk, and how local products and institutions structure advice-seeking. It cannot estimate national financial literacy, measure financial capability, determine advice correctness, or generalize to all Filipino financial behavior (Franzke et al., 2020; Nissenbaum, 2004).

1.5 Contribution

The study makes three contributions. Theoretically, it reframes online financial discourse as situated information behavior rather than as a simple expression of financial literacy or deficiency. Methodologically, it offers a construct-validity correction for computational financial discourse research by showing why topic modeling must be paired with validated qualitative coding before researchers infer information needs, trust, risk, or decision support. Contextually, it provides a Philippine-oriented account of how users localize financial uncertainty through institutions, products, platforms, and household constraints such as MP2, SSS, PDIC, BSP, GCash, Maya, VUL insurance, digital banks, business risk, and family financial obligations.

The disciplined claim is therefore modest but publishable: high-visibility r/phinvest discussions reveal how Filipino Reddit users publicly articulate financial uncertainty, seek peer validation, evaluate trust and risk, and localize financial decision-making through Philippine-specific institutions, products, and constraints. Broader claims about Filipino financial behavior, population-level information needs, or actual financial capability are outside the scope of this study (Grimmer & Stewart, 2013; Johnson & Sherraden, 2007; Savolainen, 1995).

2 Review of Related Literature

2.1 Everyday life information seeking

Everyday life information seeking (ELIS) provides the theoretical basis for treating personal finance as a situated information problem rather than as a narrow knowledge deficit. Savolainen (1995) argues that information seeking is embedded in everyday “ways of life” and practical efforts to manage ordinary problems. Later information-behavior scholarship extends this view by emphasizing context, source selection, barriers, and the difficulty of inferring information need directly from observable behavior (Case & Given, 2016; Wilson, 1999). This matters for r/phinvest because a post about insurance, a digital bank, MP2, debt, real estate, or a small business may be topically identifiable while still expressing different kinds of need.

Uncertainty is the key bridge between ELIS and the present study. Kuhlthau (1991) shows that information seeking often begins in uncertainty and moves through gradual formulation, interpretation, and confidence-building. In r/phinvest, uncertainty may concern safety, legality, profitability, affordability, timing, credibility, or product suitability. These uncertainty objects are not reducible to financial topics. For example, the same banking topic may express procedural guidance, institutional distrust, scam suspicion, liquidity planning, or decision validation. ELIS therefore supports the study’s central claim that topic domains must be distinguished from financial information needs.

The concept of information grounds further explains why peer spaces matter. Information can circulate in social settings where people gather around shared interests rather than formal expertise (Fisher & Naumer, 2006). r/phinvest can be understood as a digital information ground where users make financial uncertainty public and invite peer interpretation. This does not imply that the advice is accurate, representative, or professionally vetted. It means that the community provides a setting in which financial claims are narrated, questioned, corrected, validated, or contested.

2.2 Financial capability and inclusion

Financial capability literature complements ELIS by shifting attention from knowledge alone to the conditions under which people can make and act on financial decisions. Financial literacy is associated with planning, debt behavior, and financial outcomes, but literacy does not exhaust the problem of financial decision-making (Lusardi & Mitchell, 2014; Stolper & Walter, 2017). Capability-oriented approaches emphasize the interaction of knowledge, access, opportunity, confidence, institutional context, and usable services (Johnson & Sherraden, 2007; World Bank, 2015). This distinction is important because r/phinvest posts cannot measure actual capability, but they can reveal how users narrate perceived capability constraints.

The Philippine context makes this capability framing necessary. National and international evidence documents expanding digital and formal financial access, but also uneven account ownership, saving, borrowing, insurance, and resilience (Bangko Sentral ng Pilipinas, 2022; World Bank, 2022). These conditions appear in online discourse as uncertainty about government-linked funds, e-wallets, banks, insurance products, scams, business viability, employment insecurity, and household obligations. The study therefore treats r/phinvest not as a measure of Filipino capability, but as a public archive of how capability-related uncertainty is articulated and socially evaluated.

This framing also avoids a common interpretive error. If users ask whether a product is safe, whether an agent is trustworthy, or whether a financial move is reasonable, the issue is not necessarily a lack of literacy. It may reflect product opacity, weak trust, limited recourse, unstable income, family obligations, or uncertainty about regulation. Financial capability, in this study, is treated as a sensitizing framework for interpreting perceived constraints, not as an outcome variable measured by Reddit discourse.

2.3 Peer financial advice and platform-mediated credibility

Online peer spaces increasingly shape how people encounter financial information. Studies of personal finance communities and online financial media show that users discuss credit, money management, real estate, insurance, retirement, investing, employment, fraud, and other practical concerns (Karpenko et al., 2021; Thukral et al., 2023). Personal-finance blogs and social-media finance content may provide accessible information, but they may also reach already engaged audiences and raise questions about source credibility, advice quality, and unequal expertise (Cao et al., 2020; Hoffmann & Otteby, 2018). These studies support using online financial discourse as evidence of public concern while also requiring caution about who participates and how advice is evaluated.

Reddit adds platform-specific credibility dynamics. Pseudonymity may encourage disclosure of sensitive financial details, but it also weakens conventional markers of expertise. Voting and ranking increase the visibility of some responses while burying others. Flair labels organize posts by topic, but they do not necessarily identify the underlying need. Comment count and score measure platform-mediated response, not truth, importance, or decision quality. Research on web credibility shows that users rely on heuristics, source cues, and interface cues when assessing online information (Metzger, 2007; Sundar, 2008). Reddit-specific scholarship further shows that governance, moderation, voting, and platform culture shape what becomes visible and researchable (Massanari, 2017; Proferes et al., 2021). Studies of social influence in online choices also caution that peer interaction can shape decisions in ways that are not always deliberative or beneficial (Zhu et al., 2012).

For this study, comments are therefore not treated as neutral answers. They are peer evaluations. Commenters may recommend, warn, correct, calculate, share experience, moralize, challenge the original framing, or refer the user elsewhere. This peer-evaluation lens is central to analyzing r/phinvest because the community often evaluates credibility and risk, not only the financial topic itself.

2.4 Computational text analysis and construct validity

Computational text analysis can organize large or noisy text collections, but it cannot by itself establish social-scientific constructs. Topic models estimate patterns in words or documents; they do not directly identify information need, trust, uncertainty, risk, capability, or advice quality (Blei, 2012; Grimmer & Stewart, 2013). Structural topic modeling can incorporate document metadata, and BERTopic can be useful for short social-media texts, but both still require human interpretation and validation (Grootendorst, 2022; Roberts et al., 2014).

This point is especially important for the present corpus. Thirty original posts are too few for topic modeling to function as a major inferential method. Comments add text volume, but comments are not equivalent to original information-need articulations; they represent peer response. For that reason, computational analysis is treated as exploratory corpus mapping, not as the evidentiary core. Mixed computational and qualitative designs are strongest when model outputs structure interpretation without replacing close reading, coding, and validation (Isoaho et al., 2021; Nelson, 2020).

The construct-validity problem is therefore methodological and conceptual. If researchers infer needs directly from topics, they risk confusing the financial subject with the user’s requested support. If they infer importance from engagement, they risk confusing platform visibility with social significance. If they infer capability from discourse, they risk confusing articulated uncertainty with actual financial behavior. The literature on text-as-data supports this study’s decision to use topic mapping cautiously and to make directed qualitative content analysis the main analytic engine.

2.5 Internet research ethics

The ethics of public online data are central because r/phinvest discussions may involve income, debt, employment, family obligations, scam experiences, health expenses, and other sensitive details. Public availability does not eliminate contextual privacy (Nissenbaum, 2004). Internet research ethics guidelines emphasize that researchers must consider vulnerability, traceability, platform expectations, and potential harm even when data are publicly accessible (Franzke et al., 2020).

This study therefore treats r/phinvest data as public but sensitive. The reporting strategy avoids unnecessary direct quotation, suppresses identifying combinations of details, separates source links from analysis materials, hashes usernames, and masks sensitive quantities in modeling text. These safeguards are not merely technical. They follow from the study’s theoretical position: if financial information seeking is shaped by uncertainty and vulnerability, the research design must avoid amplifying the exposure of people seeking help in public peer spaces.

2.6 Conceptual model

The literature reviewed above can be synthesized into the model shown in Figure 1. The model integrates ELIS, financial capability, information grounds, platform studies, and computational text-analysis caution into a single analytic sequence: local financial context shapes situated financial uncertainty; users convert that uncertainty into articulated information needs; peers evaluate credibility, risk, and decision quality; and Reddit records that interaction as visible platform response.

A conceptual framework showing local Philippine financial context shaping situated uncertainty, articulated information needs, peer evaluation, and visible platform response, with boundary conditions and sensitizing concepts. — Figure 1: Conceptual model for everyday financial information needs on r/phinvest.

Figure 1 clarifies the study’s core construct relationships. Topic domain and uncertainty object belong to situated financial uncertainty. Information need type describes the kind of support requested or implied. Trust object and peer response type belong to peer evaluation. Score and comment volume are visible platform responses. The model also states the boundary conditions: the study concerns public r/phinvest discourse, high-visibility threads in the sampled period, articulated rather than latent needs, peer evaluation rather than verified advice quality, and visible engagement rather than population-level importance (Fisher & Naumer, 2006; Grimmer & Stewart, 2013; Kuhlthau, 1991).

Table 2: Construct taxonomy for the qualitative analysis.

Construct	Definition	Observable Indicators
Topic domain	The financial subject being discussed.	Banking, insurance, MP2, VUL, business, real estate, debt, digital wallets.
Information need type	The kind of support requested or implied.	Procedural guidance, product comparison, trust verification, risk assessment, decision validation, experiential advice.
Uncertainty object	What is unknown, doubted, or unresolved.	Safety, legality, profitability, credibility, affordability, timing, suitability.
Trust object	The person, product, institution, or claim being evaluated.	Agent, bank, app, government fund, broker, family member, online source.
Peer response type	How commenters respond to the articulated need.	Recommendation, warning, correction, calculation, anecdote, moral judgment, referral.
Local financial context	Philippine-specific institutional or socioeconomic grounding.	SSS, Pag-IBIG, MP2, BSP, PDIC, GCash, Maya, VUL, family obligations, unstable income.
Visible community response	Platform-mediated response to a thread.	Score, declared comments, parsed comments.

Table 2 operationalizes the model for analysis. It formalizes the distinction between topic domain, need type, uncertainty object, trust object, peer response type, local context, and visible response. This taxonomy addresses the central gap identified in the literature: online finance discourse can be topically mapped, but a publishable information-behavior study must show how topics differ from needs and how peer communities evaluate credibility, risk, and decision quality (Isoaho et al., 2021; Thukral et al., 2023; Wilson, 1999).

2.7 Synthesis and research gap

The reviewed literature provides the theoretical materials for the study but also reveals a specific gap. Information behavior research explains everyday uncertainty and information seeking, but has given less sustained attention to Philippine peer financial communities (Kuhlthau, 1991; Savolainen, 1995). Financial capability research explains why knowledge, access, and institutional conditions matter, but often relies on surveys and formal indicators (Johnson & Sherraden, 2007; World Bank, 2015). Online finance studies show that users discuss money in peer spaces, but many analyses emphasize topic prevalence or engagement without fully distinguishing topics from information needs (Karpenko et al., 2021; Thukral et al., 2023). Computational text-analysis research provides useful mapping tools, but it also warns against treating model output as direct evidence of theoretical constructs (Grimmer & Stewart, 2013; Isoaho et al., 2021).

The contribution of this study is to connect these strands through a construct-validity intervention. It examines high-visibility r/phinvest threads not as a representative survey of Filipino financial behavior and not merely as a topic map, but as public episodes in which financial uncertainty is articulated, peer-evaluated, and localized through Philippine institutions, products, platforms, and household constraints (Fisher & Naumer, 2006; Nelson, 2020).

3 Methodology

3.1 Research design

This study uses a qualitatively driven sequential mixed-methods design. The design is sequential because corpus construction and exploratory computational mapping precede directed qualitative content analysis. It is qualitatively driven because the main evidentiary claims concern information need, uncertainty, trust, risk, peer evaluation, and local context, all of which require interpretive coding rather than automated topic assignment alone.

The study treats topic modeling, flair analysis, term frequency, and keyword-in-context review as exploratory mapping tools. These procedures organize the discourse terrain and help identify cases for close reading, but they do not carry the main inferential burden. This is necessary because the corpus contains 30 high-visibility threads: a defensible size for qualitative content analysis of visible financial sense-making episodes, but too small for a robust standalone topic model of original posts. Following Wilson (1999) and Kuhlthau (1991), information need is treated as an inferred construct rather than something directly given by a topic label. Following Hsieh & Shannon (2005), qualitative content analysis is directed: initial categories are derived from ELIS, financial capability, information grounds, platform-mediated credibility, and the construct taxonomy in Table 2. The design also follows methodological work showing that topic modeling can support qualitative inquiry when researchers preserve interpretation, validation, and contextual reading (Isoaho et al., 2021; Nelson, 2020).

3.2 Data source and sampling

The empirical material consists of public r/phinvest threads captured from Reddit and prepared for analysis. The sampling frame is the top 30 r/phinvest threads from this year, with a cutoff date of June 30, 2026. Here, “top” refers to highly visible threads in the sampled period, and the unit is a thread rather than a subreddit. The study treats the corpus as a purposive sample of high-visibility financial sense-making episodes, not as a statistically representative sample of r/phinvest activity or Filipino financial behavior.

This sampling choice is strategic and bounded. High-visibility threads are useful for studying financial issues that attract public attention, disagreement, advice, and peer evaluation. They are less suitable for estimating population prevalence, low-engagement concerns, deleted or removed content, or silent information needs. The study therefore limits its claims to public discourse in a Philippine-oriented Reddit community. It does not claim to represent all Filipinos, all Reddit users in the Philippines, or all financial decision-making behavior (boyd & Crawford, 2012; Franzke et al., 2020).

Table 3: Corpus profile and methodological use.

Corpus Feature	Value	Methodological Use
Cutoff date	June 30, 2026	Defines the endpoint for the this-year sampling frame.
High-visibility threads	30	Thread-level interaction episodes for qualitative analysis.
Unique post-content threads	30	Threads retained after duplicate-content checking.
Parsed comments	5298	Comment-level material for peer response and advice evaluation.
Mapping-eligible documents	5328	Post and comment documents available for exploratory computational mapping.

Table 3 documents the study’s empirical base and clarifies the difference between thread-level interaction episodes, comments, and mapping-eligible text records. This distinction matters because original posts articulate information needs, while comments represent peer evaluation and response. The table also makes explicit that the June 30, 2026 cutoff bounds the phrase “this year” and prevents later r/phinvest activity from being folded into the analysis without a new sampling decision.

3.3 Unit construction and data preparation

Three analytic units are constructed. The original post is the primary unit for coding articulated information need, uncertainty object, topic domain, trust object, and local context. The comment is the primary unit for coding peer response type, credibility cue, risk framing, disagreement, advice orientation, and local-context references. The full thread is the interaction episode that connects the original post, comment sequence, and visible platform response.

The raw source materials included substantive post and comment text as well as navigation elements, advertisements, sidebar rules, voting controls, profile badges, promoted blocks, and repeated interface text. The preparation workflow separates thread metadata, post text, and comment text; removes platform chrome; repairs common text-encoding artifacts; and preserves post/comment distinctions. Duplicate-content checking is performed so repeated high-visibility material does not inflate computational mapping or descriptive results.

Privacy-sensitive cleaning is built into preparation. Usernames are replaced with deterministic hashes, source links are separated from the main analysis materials, and modeling text masks links, emails, phone numbers, Reddit user mentions, money amounts, percentages, and ages. This strategy follows the ethical principle that public internet data may remain contextually sensitive (Franzke et al., 2020; Nissenbaum, 2004). The cleaning protocol avoids over-normalizing language: Taglish, Filipino, English, abbreviations, and Philippine financial terms are retained where they carry meaning. Terms such as MP2, Pag-IBIG, SSS, PDIC, VUL, UITF, GCash, Maya, and Philippine bank names are treated as local-context indicators, not as noise.

3.4 Exploratory computational mapping

Computational analysis is used for exploration, not standalone inference. The study uses platform flairs, term frequencies, local-term detection, keyword-in-context inspection, and an exploratory topic model to map the corpus. Original posts and comments are kept analytically separate because posts articulate information needs while comments enact peer evaluation. Combining them without distinction would contaminate need articulation with response discourse.

The topic model is fitted only as a descriptive aid. For a corpus of 30 original posts, topic modeling cannot establish stable latent topics; it can only suggest issue clusters that guide interpretation. Topic validity is assessed through top terms, representative documents, keyword-in-context checks, and comparison with flair and qualitative codes. The preprocessing strategy is conservative: it removes interface noise and masks sensitive entities but does not aggressively remove local financial terms, mixed-language tokens, product names, or institution names. This approach follows cautions that automated text analysis must be interpreted through construct-specific theory and validation rather than treated as self-explanatory (Blei, 2012; Grimmer & Stewart, 2013; Isoaho et al., 2021).

3.5 Directed qualitative content analysis

Directed qualitative content analysis is the primary analytic method. Original posts are coded for topic domain, information need type, uncertainty object, trust object, and local financial context. Topic domain captures the financial subject of the post. Information need type captures the kind of support requested or implied. Uncertainty object captures what is unknown, doubted, or unresolved. Trust object captures the person, institution, product, platform, or claim whose credibility is being evaluated. Local context captures Philippine institutions, products, platforms, regulations, and household constraints.

Comments are coded for peer response type, credibility cue, risk framing, disagreement, advice orientation, and local context. Peer response type includes recommendation, warning, correction, calculation, anecdote, moral judgment, and referral. Credibility cues include personal testimony, community consensus, institutional reputation, regulatory reference, expert claim, warning signal, and skepticism toward sales agents or promotional claims. Risk framing includes fraud/scam risk, institutional risk, product risk, liquidity risk, market risk, employment or income risk, family-obligation risk, and decision-suitability risk.

Table 4: Required codebook structure for directed qualitative content analysis.

Codebook Element	Purpose
Code name	Provides a stable label for each construct or category.
Definition	States what the code means conceptually.
Inclusion criteria	Specifies when the code should be applied.
Exclusion criteria	Specifies closely related cases that should not be coded this way.
Examples or paraphrases	Gives non-searchable illustrations for coder calibration.
Decision rules	Clarifies ambiguous cases and multiple-coding rules.
Coding level	Identifies whether the code applies to posts, comments, or full threads.
Adjudication notes	Records disagreements and changes after pilot coding.

Table 4 specifies the minimum codebook elements required before treating coded categories as validated findings. In a completed validation phase, the initial codebook would be seeded by the construct taxonomy in Table 2 and then stabilized through pilot coding. At least two coders would independently code a pilot subset consisting of at least 20% of original posts and 10% of comments. Reliability would be calculated for major categorical codes using Cohen’s kappa or Krippendorff’s alpha, with a target threshold of at least .70 for major categories before full coding proceeds. Disagreements would be adjudicated, the codebook revised, and an audit log maintained. The analysis would also use memo writing, negative case analysis, and reflexivity notes on researcher assumptions about finance, Reddit culture, and Philippine institutions (Elo & Kyngas, 2008; Hsieh & Shannon, 2005; Lombard et al., 2004).

3.6 Coding execution and reliability status

The present version reports an exploratory pilot analysis rather than a completed validated content analysis. One researcher prepared the corpus, generated preliminary information-need labels, inspected thread-level patterns, and used the construct taxonomy to identify candidate trust, risk, uncertainty, and peer-response patterns. A second-coder reliability phase has not yet been completed. Therefore, the study does not report Cohen’s kappa, Krippendorff’s alpha, coder characteristics, adjudication outcomes, or a finalized codebook.

This evidence status determines how the Results section is written. Descriptive outputs such as corpus size, topic-domain distribution, local-term frequency, and visible engagement are treated as empirical descriptions of the prepared corpus. Information-need labels, topic-need crosswalks, trust/risk patterns, peer-response types, and qualitative vignettes are treated as pilot findings that demonstrate analytic promise and guide codebook refinement. They are not presented as final validated prevalence estimates. For the study to become a fully validated empirical article, the next stage must involve at least two coders, independent pilot coding, reliability calculation for major categorical codes, codebook revision, adjudication, and a final reliability table (Elo & Kyngas, 2008; Hsieh & Shannon, 2005; Lombard et al., 2004).

3.7 Coverage, saturation, and information power

The top-30 strategy provides signs of coverage, not exhaustive saturation. Saturation here means that recurring topic domains, information-need types, local financial references, and peer-response forms appear sufficiently often to support focused qualitative interpretation of high-visibility discourse. It does not mean that all r/phinvest information needs, low-response concerns, deleted posts, or silent user needs have been exhausted. Qualitative sample adequacy is interpreted in relation to study aim, sample specificity, analytic depth, and information power rather than a universal number of cases (Guest et al., 2006; Malterud et al., 2016).

For this study, coverage is assessed through recurrence of topic domains, local financial terms, uncertainty objects, information-need types, trust objects, and peer-response types. Negative cases are also required: cases where the same topic expresses different needs, where comments reject the original framing, where high score does not correspond with high comment volume, and where local financial context is implicit rather than named. These checks keep the analysis from treating the most frequent or most visible patterns as the only meaningful ones.

3.8 Engagement analysis

Engagement is operationalized as visible community response: score, declared comment count, and parsed comment count. The study does not treat engagement as importance, quality, truth, or need intensity. Engagement is analyzed descriptively and visually to identify patterns that guide interpretation, such as high-comment threads that invite controversy or low-comment threads that express routine questions. This restrained interpretation follows computational social-science cautions that digital traces are shaped by platform affordances and should not be read as direct measures of social importance (boyd & Crawford, 2012; Lazer et al., 2020).

Inferential modeling is not used because the thread-level sample is too small for defensible regression analysis. For larger future samples, count models may be appropriate for comment volume, but in the present study engagement remains exploratory. The key interpretive rule is that visible response reflects platform-mediated visibility and response intensity, not objective importance.

3.9 Robustness and validation checks

The computational analysis uses four robustness checks. First, post-only and comment-inclusive mappings are compared to ensure that peer-response language does not overwhelm original post concerns. Second, alternative preprocessing settings are compared to check whether local terms remain stable. Third, keyword-in-context checks are conducted for major Philippine financial terms. Fourth, model topics are validated through representative documents, not only through top terms. These checks follow recommendations that text models require validation against the substantive construct being claimed (Grimmer & Stewart, 2013; Isoaho et al., 2021; Roberts et al., 2014).

The qualitative analysis uses four validation checks. First, the study examines whether the same topic domain expresses multiple information-need types. Second, it identifies cases where commenters reject or reframe the original post. Third, it compares high-score/low-comment and lower-score/high-comment threads to avoid equating engagement with importance. Fourth, it records negative cases and ambiguous cases in analytic memos. These procedures support trustworthiness by making interpretation auditable rather than treating coding categories as self-evident (Elo & Kyngas, 2008; Hsieh & Shannon, 2005).

3.10 Validity and trustworthiness

Table 5: Validity threats and mitigation strategies.

Threat	Why It Matters	Mitigation
Visibility bias	Top threads may overrepresent controversy, humor, emotional resonance, or practical usefulness.	Limit claims to high-visibility discourse and avoid prevalence claims.
Survivorship and deletion bias	Deleted, removed, collapsed, or dynamically unloaded content may be absent.	Report the corpus as captured public discourse and avoid claims about all subreddit activity.
Platform affordance bias	Voting, ranking, flairs, and moderation shape what becomes visible.	Interpret engagement as platform-mediated response rather than importance.
Language bias	English, Filipino, Taglish, abbreviations, and local terms affect computational mapping.	Retain local terms and validate automated outputs through close reading.
Construct-inference bias	Topics, preliminary labels, and engagement can be mistaken for information needs.	Use the construct taxonomy and directed qualitative coding as the evidentiary core.
Coder inference bias	Information needs and trust/risk frames are interpreted from text.	Use pilot coding, reliability checks, adjudication, memos, and negative case analysis.
Ethical reidentification risk	Financial details can be sensitive even in public posts.	Hash usernames, mask sensitive quantities, paraphrase examples, and suppress rare identifying detail combinations.

Table 5 incorporates the main bias and trustworthiness concerns into the analytic design rather than treating them as afterthoughts. The study explicitly addresses visibility bias, survivorship bias, platform affordance bias, language bias, construct-inference bias, coder inference bias, and ethical reidentification risk. These threats define the scope of interpretation and support the study’s restrained claims (boyd & Crawford, 2012; Lazer et al., 2020; Zimmer, 2018).

3.11 Ethical safeguards

The study uses public Reddit material but treats the data as sensitive. Users did not consent to participate in research, so the public-interest justification rests on studying aggregate patterns of public financial information behavior while minimizing exposure of individual users. The reporting protocol avoids direct searchable quotations unless necessary and ethically justified. Examples should be paraphrased; usernames should not be reported; and rare combinations of income, occupation, location, family situation, product detail, and financial amount should be generalized or suppressed.

Source links are separated from the main analysis materials, and financial quantities are masked in modeling text. Raw data access should be restricted to the research team, retained only as long as necessary, and not redistributed. The study will not contact users, infer identities, or link accounts across platforms. It will also avoid evaluating individual financial decisions as correct or incorrect unless the analysis concerns how the community itself evaluates them. These decisions follow the Association of Internet Researchers’ emphasis on contextual judgment and Nissenbaum’s principle of contextual integrity (Franzke et al., 2020; Nissenbaum, 2004).

3.12 Methodological limitations

The methodology has five major limitations. First, the top-30 sample privileges high-visibility threads and cannot represent the full range of r/phinvest discourse. Second, captured Reddit pages may omit deleted, removed, collapsed, or dynamically unloaded comments. Third, exploratory topic modeling is sensitive to preprocessing, model choice, language mixing, and parameter settings. Fourth, qualitative interpretation depends on coder judgment and therefore requires reliability checks and an audit trail. Fifth, engagement analysis is descriptive because the thread-level sample is too small for inferential modeling.

These limitations do not invalidate the study, but they define its scope. The study is designed to make defensible claims about how everyday financial information needs are articulated and evaluated in a high-visibility r/phinvest sample through June 30, 2026. It is not designed to estimate national financial literacy, measure actual financial capability, verify advice quality, or infer the real-world outcomes of advice received on Reddit (Franzke et al., 2020; Grimmer & Stewart, 2013).

4 Results

The results are reported with explicit attention to evidence status. The descriptive corpus outputs are based on the cleaned dataset and can be read as corpus descriptions. The information-need, trust/risk, and peer-response patterns are pilot analytic outputs that require full intercoder validation before they can be treated as final coded findings. This distinction follows the construct-validity boundary established in Table 1 and the coding-status statement in the Methodology (Grimmer & Stewart, 2013; Hsieh & Shannon, 2005).

Table 6: Evidence status of the results reported in this pilot analysis.

Result Area	Evidence Status	Interpretive Use
Corpus profile	Descriptive corpus output	Documents the scale and composition of the prepared dataset.
Topic-domain distribution	Descriptive corpus output	Maps visible platform-assigned topic domains.
Local-term frequencies	Descriptive corpus output	Identifies Philippine financial anchors in the corpus.
Preliminary information-need patterns	Pilot coding output	Suggests candidate information-need categories for validation.
Topic-need crosswalk	Pilot coding output	Demonstrates why topic labels require qualitative interpretation.
Trust/risk peer response	Illustrative qualitative pattern	Guides codebook refinement and later reliability testing.
Engagement patterns	Descriptive corpus output	Shows visible platform response without inferring importance.

Table 6 is the interpretive key for the Results section. It prevents descriptive corpus outputs from being conflated with validated qualitative findings. The table also clarifies why the current analysis is best read as a construct-validity pilot: it demonstrates the need for qualitative coding and provides candidate categories, but it does not yet claim completed intercoder validation.

4.1 Corpus profile

The prepared corpus contains 30 unique high-visibility r/phinvest threads from this year through June 30, 2026. It includes 5,298 parsed comments and 5,328 mapping-eligible post/comment documents. These counts confirm that the corpus is small at the thread level but dense at the interaction level. The appropriate interpretation is therefore qualitative and discourse-centered: the dataset is suited to analyzing visible financial sense-making episodes, not estimating population-level prevalence.

Table 7: Corpus profile after cleaning and duplicate checking.

Corpus Metric	Observed Value	Analytic Interpretation
Parsed threads	30	Thread-level units available for metadata and post-level analysis.
Unique post-content threads	30	Threads retained after duplicate post-content filtering.
Parsed comments	5,298	Comment-level units available for peer-response analysis.
Modeling-eligible documents	5,328	Post and comment records retained for text modeling after duplicate-thread exclusion.
Duplicate-content threads	0	High-visibility duplicates retained for audit but excluded from modeling.

Table 7 shows the central tradeoff of the dataset. The top-30 sample is limited in thread count, but comment-rich. This supports the study’s revised methodological stance: computational analysis is useful for exploratory mapping, while qualitative content analysis is required for claims about information needs, trust, risk, and peer evaluation.

4.2 Exploratory topic-domain mapping

Platform flairs provide the first descriptive view of the issue areas that entered the high-visibility sample. Business-related discussions were the largest group, followed by general investing, real estate, personal finance, insurance, and banking. Digital banking, government-initiated funds, stocks, and investment advice also appeared. This distribution confirms that the corpus spans both formal financial products and everyday financial decision contexts.

Horizontal bar chart showing the number of unique threads by platform-assigned topic domain. — Figure 2: Distribution of platform-assigned topic domains in unique high-visibility threads.

Figure 2 shows that high-visibility r/phinvest discourse is not restricted to investing in the narrow sense. The prominence of business, real estate, personal finance, insurance, banking, and digital finance suggests that the community functions as a broad financial information ground. However, this result is deliberately descriptive. Flairs identify topic domains, but they do not reveal whether the user is seeking trust verification, procedural guidance, risk assessment, decision validation, or experiential advice (Fisher & Naumer, 2006; Wilson, 1999).

4.3 Preliminary information-need patterns

The strongest result is the separation between topic domain and information-need type. Preliminary need labels show that the same platform topic can contain different forms of support-seeking. Conversely, the same information-need type can appear across different topics. This is the core construct-validity finding: topic labels are necessary for mapping the discourse terrain, but insufficient for identifying what kind of information support is being requested.

Table 8: Preliminary information-need patterns in high-visibility threads.

Preliminary Information-Need Type	Threads	Median Score	Median Declared Comments	Parsed Comments
general_question	9	1,400	347	1,887
not_explicit_or_sharing	9	1,900	251	1,454
scam_or_trust_verification	4	1,042	342	779
decision_validation	4	1,100	190	557
procedural_guidance	4	1,800	174	621

Table 8 shows that general questions, sharing-oriented posts, trust verification, decision validation, and procedural guidance all appear in the high-visibility sample. These labels should be interpreted as preliminary coding outputs that require final human validation. Even at this stage, however, they show why topic prevalence alone is insufficient: a financial discussion may invite explanation, judgment, warning, reassurance, comparison, or procedural instruction (Hsieh & Shannon, 2005; Kuhlthau, 1991).

Table 9: Topic domains crosswalked with preliminary information-need types.

Topic Domain	Threads	Observed preliminary need types	Median declared comments
Business	10	general_question; not_explicit_or_sharing; procedural_guidance	282.5
General Investing	4	decision_validation; not_explicit_or_sharing	176.5
Real Estate	3	general_question; not_explicit_or_sharing	289.0
Personal Finance	3	general_question; procedural_guidance	187.0
Insurance	2	procedural_guidance; scam_or_trust_verification	291.5
Banking	2	decision_validation; scam_or_trust_verification	225.0
Government-Initiated/Other Funds	1	scam_or_trust_verification	446.0
MF/UITF/ETF	1	not_explicit_or_sharing	352.0
Investment/Financial Advice	1	not_explicit_or_sharing	272.0
Merkado Barkada	1	scam_or_trust_verification	262.0
Digital Banking / E-wallets	1	decision_validation	193.0
Stocks	1	not_explicit_or_sharing	71.0

Table 9 is the key empirical demonstration of the topic-versus-need distinction. It shows that platform topics do not map one-to-one onto information needs. Business threads, for example, can involve broad questions, procedural guidance, or experience-sharing, while personal finance and banking threads may involve decision validation or procedural concerns. The table therefore supports the study’s core claim: topic labels organize financial subjects, but information-need coding is required to interpret what users are asking the community to do (Case & Given, 2016; Wilson, 1999).

4.4 Illustrative trust and risk patterns

The pilot analysis suggests that peer response is organized around credibility and risk, not only around topic knowledge. Threads involving insurance agents, government-linked savings, digital banks, financial platforms, business opportunities, and scams invite comments that evaluate whether claims are credible, whether risks are being understated, whether a product or institution can be trusted, and whether a proposed action is suitable. This pattern is consistent with the conceptual model: users convert situated uncertainty into public questions, and commenters evaluate credibility, risk, and decision quality.

Table 10: Trust, risk, and peer-response patterns in the corpus.

Analytic Pattern	Observed Discourse Signal	Interpretive Claim
Trust verification	Questions and comments around legitimacy, safety, agents, banks, apps, funds, or online claims.	Peer response evaluates credibility rather than simply supplying facts.
Risk assessment	Discussion of scams, product loss, liquidity, default, market movement, business viability, or institutional reliability.	Financial uncertainty is framed through possible loss and suitability.
Decision validation	Posts and comments asking whether a choice is reasonable, mistaken, worth continuing, or worth avoiding.	Users seek social judgment and reassurance, not only information retrieval.
Corrective peer response	Comments that challenge assumptions, correct claims, add calculations, or redirect users to institutions or professionals.	The comment thread functions as peer evaluation of financial reasoning.
Experience-based advice	Comments that rely on personal experience with products, agents, platforms, or institutions.	Experiential testimony operates as a credibility cue in the absence of formal expertise.

Table 10 summarizes response patterns for later validation through full coding. The table does not claim that advice is correct or expert. Rather, it shows the kinds of social evaluation visible in the corpus: commenters warn, correct, calculate, refer, and share experience as they assess credibility and risk. This pattern is why comments are treated as peer evaluation rather than as neutral answers (Fisher & Naumer, 2006; Metzger, 2007; Sundar, 2008; Thukral et al., 2023).

Table 11: Paraphrased pilot vignettes illustrating candidate information-need and peer-evaluation patterns.

Pilot Pattern	Paraphrased Corpus Vignette	Analytic Point
Trust verification	A user asks whether an online financial offer or platform-linked opportunity is legitimate and what action should be taken before proceeding.	The central need is credibility assessment, not simply information about a product.
Product suitability	A user describes uncertainty about an insurance or investment-linked product and asks whether continuing, exiting, or changing course is reasonable.	The thread asks peers to evaluate fit, risk, and sales claims.
Institutional safety	A user raises concern about government-linked funds, bank protections, or regulatory limits and asks whether money remains safe.	Local institutions become objects of uncertainty and trust.
Decision validation	A user narrates a household or personal finance decision and seeks reassurance, correction, or alternative reasoning from peers.	The requested support is social judgment as much as factual retrieval.
Negative case	Some high-visibility posts share observations, experiences, or market commentary without a direct request for advice.	Not every popular financial thread expresses an explicit information need.

Table 11 adds qualitative texture without reproducing searchable user text. The vignettes show why a validated codebook is necessary: similar topics can require different kinds of support, and some high-visibility posts function more as experience-sharing or commentary than as direct advice-seeking. The negative case is especially important because it prevents the analysis from treating every visible financial thread as an articulated information need.

4.5 Philippine contextual anchors

The detected local financial terms show that the corpus is strongly Philippine-specific. VUL, MP2, SSS, Maya, GCash, ATRAM, ETF, Crypto, BSP, Pag-IBIG, PDIC, and UITF appear among the most frequently detected local terms. These references anchor the discourse in concrete institutions, products, and platforms rather than in generic personal-finance vocabulary.

Horizontal bar chart showing the most frequent local financial terms detected in the analyzed corpus. — Figure 3: Most frequently detected Philippine financial terms across unique-thread discourse.

Figure 3 supports the study’s claim that financial uncertainty is locally situated. VUL and MP2 are not merely examples of insurance and saving; they are locally meaningful products around which users negotiate trust, fees, guarantees, liquidity, suitability, and long-term risk. Likewise, Maya, GCash, BSP, SSS, Pag-IBIG, PDIC, and Philippine investment products show that everyday financial information needs are shaped by local financial infrastructure (Bangko Sentral ng Pilipinas, 2022; Johnson & Sherraden, 2007; World Bank, 2015).

4.6 Engagement as visible response

Engagement patterns show that community response is uneven across topic domains and preliminary need types. The most commented unique threads included real estate, business, personal finance, government-initiated funds, insurance, and banking. Several of the highest-response threads were preliminarily labeled as general questions or scam/trust verification, suggesting a pattern for future validation rather than proving that uncertainty and credibility concerns always attract peer participation.

Scatterplot of declared comment count against score, colored by preliminary information-need type. — Figure 4: Declared comment count by score for unique high-visibility threads.

Figure 4 illustrates why engagement must be interpreted cautiously. Score and comment count do not move in lockstep: a thread can be highly upvoted without being the most commented, and a thread can attract many comments because it invites debate, skepticism, correction, or clarification. Thus, engagement is best read as visible platform response rather than as a measure of importance, correctness, or need intensity (boyd & Crawford, 2012; Lazer et al., 2020).

4.7 Exploratory topic model as corpus mapping

The exploratory NMF model is retained as a corpus-mapping aid rather than a core result. Because the post-level corpus contains 30 original posts, the model cannot establish stable latent topics. It can, however, identify issue clusters that help orient close reading. The strongest clusters center on insurance and financial-advisor discourse, franchise/business discussion, food and coffee entrepreneurship, public-fund safety and government-linked institutions, overlooked business opportunities, and construction or investment-progress concerns.

Table 12: Exploratory post-focused NMF topic model for corpus mapping.

Exploratory Topic	Top Weighted Terms	Post Documents with Highest Loading
T1	advisor, financial advisor, financial, life, insurance, policy, mong, vul, wag, husband, cousin, pru life	4
T2	potato corner, potato, corner, other people, regarding location, people ideas, location using, statement regarding, statement, corner statement, regarding, location	2
T3	percent, who, want, should, coffee, food, over, healthy, year, most, write, project	11
T4	thoughts, anymore, bsp, anymore want, hear, money safe, hear thoughts, safe anymore, safe, want, sss, money	6
T5	business, basta, nyo, pays, conventional business, lowkey income, realizes talaganng, papansin marami, marami lowkey, pays well, business pays, realizes	2
T6	gold, ayoko, namin, exposure, dahil, meron, bar, alahas, gold bar, kita, laki, villar	5

Table 12 is therefore interpreted as an exploratory map, not as evidence of information-need categories. It supports sampling, comparison, and qualitative interpretation, but it does not decide whether a thread is about trust verification, risk assessment, decision validation, or procedural guidance. That interpretive work belongs to directed qualitative coding (Grimmer & Stewart, 2013; Isoaho et al., 2021; Nelson, 2020).

4.8 Summary of results

The pilot results support five restrained claims. First, the top-30 high-visibility r/phinvest sample is broad in topic domain, but topic domains are not analytically sufficient. Second, preliminary need patterns and the topic-need crosswalk suggest that the same topic can express different information needs. Third, peer response appears to involve trust, risk, correction, and experiential evaluation, but these categories require full coding validation. Fourth, Philippine institutions, products, and platforms localize financial uncertainty. Fifth, engagement reflects visibility and response intensity, not objective importance.

Taken together, these results justify the study’s revised mixed-methods design. Descriptive and computational mapping establish the terrain, but the study’s main contribution depends on validated qualitative coding of information needs, trust/risk frames, and peer response types. The pilot results therefore support the study’s central methodological claim while preserving its evidentiary boundary: financial topics can be mapped computationally, but financial information needs must be interpreted through construct-aware qualitative analysis and validated coding (Grimmer & Stewart, 2013; Hsieh & Shannon, 2005).

5 Discussion

5.1 Financial need is not reducible to financial topic

The study’s central theoretical contribution is the distinction between financial topic domains and financial information needs. The pilot results show that r/phinvest threads cover business, investing, real estate, insurance, banking, personal finance, digital finance, and government-linked funds. Yet these topic labels do not reveal what users are asking the community to do. A post about VUL may involve product comparison, trust verification, decision validation, or risk assessment. A post about MP2, SSS, or digital wallets may involve procedural guidance, institutional trust, safety verification, or concern about liquidity. This supports the information-behavior warning that information need cannot be read directly from topic labels or observable subject matter (Case & Given, 2016; Wilson, 1999).

This finding reframes r/phinvest as an information-behavior site rather than merely a personal-finance discussion board. The issue is not only what financial objects appear in the discourse, but how users convert uncertainty into public requests for explanation, reassurance, judgment, comparison, or warning. The topic-versus-need distinction is therefore not a technical coding detail. It is the conceptual spine of the study and the basis for its contribution to computational financial discourse research (Grimmer & Stewart, 2013; Nelson, 2020).

5.2 r/phinvest as peer credibility infrastructure

The comment-rich structure of the corpus suggests that r/phinvest can be interpreted as a peer credibility infrastructure. Users bring financial claims, proposed decisions, suspicious offers, product questions, and institutional doubts into a public setting where other users evaluate them. In the pilot analysis, comments appear to recommend, warn, correct, calculate, share experience, challenge sales language, and sometimes reject the premise of the original post. This aligns with the concept of information grounds, where information circulates through participation in shared social settings rather than only through formal experts (Fisher & Naumer, 2006). It also aligns with research showing that Reddit interactions can reveal the structure of financial asks and the user interactions that develop around them (Thukral et al., 2023).

Calling r/phinvest a peer credibility infrastructure does not mean that the community produces verified advice. Rather, it means that the community provides informal tests of credibility. Users evaluate whether an agent sounds trustworthy, whether a fund appears safe, whether a business claim is plausible, whether an app or bank deserves confidence, or whether a financial decision is reasonable. These evaluations are socially useful but epistemically uneven: they mix experience, calculation, hearsay, skepticism, humor, and community norms. This is why the study treats comments as peer evaluation rather than neutral answers or expert advice.

5.3 Philippine financial capability as situated uncertainty

The pilot findings complicate narrow financial-literacy interpretations. If users ask whether something is safe, legitimate, worth continuing, or suitable for their situation, the issue is not necessarily a lack of knowledge. It may reflect uncertainty created by opaque products, aggressive sales practices, institutional distrust, unstable income, unclear guarantees, digital-platform risk, or family obligations. This interpretation is consistent with financial capability approaches that extend beyond individual knowledge to include access, confidence, institutional conditions, and usable decision support (Johnson & Sherraden, 2007; World Bank, 2015).

The Philippine setting is not simply background context. Local institutions and products structure the uncertainty visible in the corpus. VUL, MP2, SSS, PDIC, BSP, GCash, Maya, ATRAM, UITF, and related references are locally meaningful anchors around which users appear to negotiate safety, trust, liquidity, fees, regulation, and suitability. A generic personal-finance framework would miss much of this specificity. The study therefore contributes a contextual account of financial capability as situated uncertainty: users’ information needs emerge from the interaction of financial products, household constraints, institutional trust, platform affordances, and local economic conditions (Bangko Sentral ng Pilipinas, 2022; World Bank, 2022).

5.4 Computational text analysis requires construct discipline

The study also makes a methodological contribution. Exploratory topic modeling and term mapping are useful for organizing the corpus, but they cannot identify information needs, trust objects, risk frames, or peer response types by themselves. This is especially true in a 30-thread corpus, where topic modeling cannot carry standalone inferential weight. In comment-rich Reddit data, the problem is compounded because original posts and comments perform different functions: posts may articulate uncertainty and request support, while comments evaluate, contest, or extend the request.

The implication is that computational text analysis must be paired with construct-aware qualitative validation. Topic models can map issue clusters, identify candidate cases, and support comparison, but directed content analysis is needed to determine whether a thread expresses procedural guidance, trust verification, risk assessment, decision validation, experiential advice, or another information-need type. This supports broader cautions in text-as-data research: automated methods are powerful for organizing text, but social-scientific interpretation depends on valid constructs, transparent coding, and human review (Grimmer & Stewart, 2013; Isoaho et al., 2021; Nelson, 2020).

5.5 Limits of visibility-based sampling

The study’s top-thread design is both useful and limited. It is useful because high-visibility threads show where financial uncertainty becomes publicly discussable and where peer evaluation accumulates. These threads reveal visible points of attention, disagreement, advice, and credibility testing. They are therefore well suited to analyzing public episodes of financial sense-making.

The same design limits representativeness. High-visibility threads may overrepresent controversy, entertainment, emotional resonance, practical usefulness, algorithmic amplification, or moderator effects. They may underrepresent routine questions, low-response anxieties, deleted posts, removed posts, and silent information needs. Engagement metrics must therefore be interpreted as platform-mediated visibility and response intensity, not as objective importance or population-level need. The study’s claim is interpretive visibility, not representativeness (boyd & Crawford, 2012; Lazer et al., 2020).

5.6 Practical implications

For financial educators, the pilot findings suggest that materials should be organized around real question types as well as formal topics. Users may need help comparing products, checking credibility, interpreting guarantees, recognizing sales incentives, evaluating scams, and deciding whether advice applies to their situation. Educational materials that only define financial concepts may miss the uncertainty structure that drives peer advice-seeking (Lusardi & Mitchell, 2014; Stolper & Walter, 2017).

For regulators and public agencies, high-visibility peer discourse can serve as an early signal of confusion, mistrust, and unmet information needs around institutions, scams, digital platforms, and government-linked funds. Reddit should not be treated as representative survey evidence, but recurring questions can reveal the language, scenarios, and credibility tests that official communication needs to address (Franzke et al., 2020; World Bank, 2015).

For financial institutions, the pilot findings point to the need for clearer plain-language communication around fees, guarantees, liquidity, risks, complaint mechanisms, and the role of agents or intermediaries. When users turn to peers to verify basic product safety or agent credibility, this may indicate a gap in institutional communication and trust.

For researchers, the recommendation is methodological: do not use topic models as direct evidence of financial information needs. Use computational mapping to organize discourse, then validate information-need, trust, risk, and peer-response claims through qualitative coding, reliability checks, negative cases, and transparent audit trails (Grimmer & Stewart, 2013; Hsieh & Shannon, 2005; Lombard et al., 2004).

5.7 Limitations and future research

The study is limited by its high-visibility sample, small thread count, platform-specific context, browser-captured corpus, and reliance on inferred constructs. It cannot measure actual financial behavior, advice quality, financial capability, or population-level information needs. It also cannot recover deleted, removed, unobserved, or low-visibility discussions.

Future research should expand the sampling frame beyond top threads. A stronger design would combine high-visibility threads with random threads from the same period, low- or no-response threads, and stratified samples from key topic areas such as insurance, MP2, scams, debt, digital banks, and real estate. Longitudinal analysis could examine how financial uncertainty changes over time. Cross-platform analysis could compare Reddit with Facebook groups, TikTok, YouTube, or forums. Finally, interviews or surveys could triangulate whether publicly articulated needs correspond to users’ private decision processes, actual information use, and financial outcomes (Lazer et al., 2020; Malterud et al., 2016).

6 Conclusion

This study advances a construct-validity argument for research on online financial discourse: financial topics are not the same as financial information needs, information needs are not the same as peer evaluation, and engagement is not the same as social importance. Using high-visibility r/phinvest threads through June 30, 2026, the pilot analysis shows that Philippine Reddit financial discourse can be mapped by topic domain, local financial references, visible engagement, and preliminary information-need patterns. It also shows why topic modeling alone cannot support claims about user need, trust, risk, or financial capability.

Empirically, the study demonstrates that high-visibility r/phinvest discourse is broader than investing narrowly defined. It includes business, real estate, insurance, banking, government-linked funds, digital finance, scams, and household financial decision-making. The pilot coding and paraphrased vignettes suggest that similar topic domains can express different needs, including procedural guidance, product suitability assessment, trust verification, risk assessment, decision validation, and experience-sharing. Philippine institutions and products such as MP2, SSS, PDIC, BSP, GCash, Maya, VUL, and UITF anchor these needs in local financial infrastructure.

The study does not claim that Reddit users represent Filipinos, that high-visibility threads represent all r/phinvest discourse, that peer advice is correct, or that preliminary labels are final validated codes. Its strongest methodological implication is therefore cautionary: computational finance-discourse research should not infer information needs directly from topic prevalence or engagement metrics. Topic models can organize discourse, but qualitative coding, reliability checks, negative cases, and ethical reporting are required before making claims about need, trust, risk, and peer evaluation (Grimmer & Stewart, 2013; Hsieh & Shannon, 2005; Lombard et al., 2004).

For Philippine financial communication, the practical implication is that public education and institutional messaging should address the uncertainty structures that users bring to peer spaces. Users are not only asking what a product is; they are often asking whether it is safe, whether a claim is credible, whether an agent or institution can be trusted, whether a decision is suitable, and whether peers have encountered similar risks. A validated next-stage study can build from this pilot by completing intercoder reliability, expanding the sampling frame, and testing whether these preliminary patterns hold across less visible threads and other Philippine financial communities.

7 References

Bangko Sentral ng Pilipinas. (2022). 2021 financial inclusion survey. Bangko Sentral ng Pilipinas. https://www.bsp.gov.ph/Media_And_Research/Financial%20Inclusion%20Dashboard/2021/FIS_2021.pdf

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. https://doi.org/10.1145/2133806.2133826

boyd, danah, & Crawford, K. (2012). Critical questions for big data. Information, Communication & Society, 15(5), 662–679. https://doi.org/10.1080/1369118X.2012.678878

Cao, Y., Gong, F., & Zeng, T. (2020). Antecedents and consequences of using social media for personal finance. Journal of Financial Counseling and Planning, 31(1), 162–176. https://doi.org/10.1891/JFCP-18-00049

Case, D. O., & Given, L. M. (2016). Looking for information: A survey of research on information seeking, needs, and behavior (4th ed.). Emerald Group Publishing.

Elo, S., & Kyngas, H. (2008). The qualitative content analysis process. Journal of Advanced Nursing, 62(1), 107–115. https://doi.org/10.1111/j.1365-2648.2007.04569.x

Fisher, K. E., & Naumer, C. M. (2006). Information grounds: Theoretical basis and empirical findings on information flow in social settings. In A. Spink & C. Cole (Eds.), New directions in human information behavior (pp. 93–111). Springer. https://doi.org/10.1007/1-4020-3670-1_6

Franzke, A. S., Bechmann, A., Zimmer, M., & Ess, C. M. (2020). Internet research: Ethical guidelines 3.0. Association of Internet Researchers. https://aoir.org/reports/ethics3.pdf

Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028

Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv. https://arxiv.org/abs/2203.05794

Guest, G., Bunce, A., & Johnson, L. (2006). How many interviews are enough? An experiment with data saturation and variability. Field Methods, 18(1), 59–82. https://doi.org/10.1177/1525822X05279903

Hoffmann, A. O. I., & Otteby, K. (2018). Personal finance blogs: Helpful tool for consumers with low financial literacy or preaching to the choir? International Journal of Consumer Studies, 42(2), 241–254. https://doi.org/10.1111/ijcs.12412

Hsieh, H.-F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277–1288. https://doi.org/10.1177/1049732305276687

Isoaho, K., Gritsenko, D., & Mäkelä, E. (2021). Topic modeling and text analysis for qualitative policy research. Policy Studies Journal, 49(1), 300–324. https://doi.org/10.1111/psj.12343

Johnson, E., & Sherraden, M. S. (2007). From financial literacy to financial capability among youth. Journal of Sociology & Social Welfare, 34(3), 119–146. https://doi.org/10.15453/0191-5096.3276

Karpenko, V., Mukhina, K., Rybakova, D., Busurkina, I., & Bulygin, D. (2021). A study of personal finance practices. The case of online discussions on reddit. Proceedings of the International Conference Internet and Modern Society, CEUR Workshop Proceedings, 3090, 206–211. https://ceur-ws.org/Vol-3090/spaper19.pdf

Kuhlthau, C. C. (1991). Inside the search process: Information seeking from the user’s perspective. Journal of the American Society for Information Science, 42(5), 361–371. https://doi.org/10.1002/(SICI)1097-4571(199106)42:5<361::AID-ASI6>3.0.CO;2-#

Lazer, D. M. J., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., Freelon, D., Gonzalez-Bailon, S., King, G., Margetts, H., Nelson, A., Salganik, M. J., Strohmaier, M., Vespignani, A., & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060–1062. https://doi.org/10.1126/science.aaz8170

Lombard, M., Snyder-Duch, J., & Bracken, C. C. (2004). A call for standardization in content analysis reliability. Human Communication Research, 30(3), 434–437. https://doi.org/10.1093/hcr/30.3.434

Lusardi, A., & Mitchell, O. S. (2014). The economic importance of financial literacy: Theory and evidence. Journal of Economic Literature, 52(1), 5–44. https://doi.org/10.1257/jel.52.1.5

Malterud, K., Siersma, V. D., & Guassora, A. D. (2016). Sample size in qualitative interview studies: Guided by information power. Qualitative Health Research, 26(13), 1753–1760. https://doi.org/10.1177/1049732315617444

Massanari, A. (2017). #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures. New Media & Society, 19(3), 329–346. https://doi.org/10.1177/1461444815608807

Metzger, M. J. (2007). Making sense of credibility on the web: Models for evaluating online information and recommendations for future research. Journal of the American Society for Information Science and Technology, 58(13), 2078–2091. https://doi.org/10.1002/asi.20672

Nelson, L. K. (2020). Computational grounded theory: A methodological framework. Sociological Methods & Research, 49(1), 3–42. https://doi.org/10.1177/0049124117729703

Nissenbaum, H. (2004). Privacy as contextual integrity. Washington Law Review, 79(1), 119–157. https://digitalcommons.law.uw.edu/wlr/vol79/iss1/10

Proferes, N., Jones, N., Gilbert, S., Fiesler, C., & Zimmer, M. (2021). Studying Reddit: A systematic overview of disciplines, approaches, methods, and ethics. Social Media + Society, 7(2). https://doi.org/10.1177/20563051211019004

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103

Savolainen, R. (1995). Everyday life information seeking: Approaching information seeking in the context of "way of life". Library & Information Science Research, 17(3), 259–294. https://doi.org/10.1016/0740-8188(95)90048-9

Stolper, O. A., & Walter, A. (2017). Financial literacy, financial advice, and financial behavior. Journal of Business Economics, 87(5), 581–643. https://doi.org/10.1007/s11573-017-0853-9

Sundar, S. S. (2008). The MAIN model: A heuristic approach to understanding technology effects on credibility. In M. J. Metzger & A. J. Flanagin (Eds.), Digital media, youth, and credibility (pp. 73–100). MIT Press. https://doi.org/10.1162/dmal.9780262562324.073

Thukral, S., Sangwan, S., Chauhan, V., Chatterjee, A., & Dey, L. (2023). Generating insights about financial asks from reddit posts and user interactions. Proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’23, 294–299. https://doi.org/10.1145/3625007.3627313

Wilson, T. D. (1999). Models in information behaviour research. Journal of Documentation, 55(3), 249–270. https://doi.org/10.1108/EUM0000000007145

World Bank. (2015). Enhancing financial capability and inclusion in the philippines: A demand-side assessment. World Bank. https://doi.org/10.1596/25073

World Bank. (2022). The global findex database 2021: Financial inclusion, digital payments, and resilience in the age of COVID-19. World Bank. https://doi.org/10.1596/978-1-4648-1897-4

Zhu, H., Huberman, B. A., & Luon, Y. (2012). To switch or not to switch: Understanding social influence in online choices. American Behavioral Scientist, 56(12), 1799–1813. https://doi.org/10.1177/0002764212463363

Zimmer, M. (2018). Addressing conceptual gaps in big data research ethics: An application of contextual integrity. Social Media + Society, 4(2). https://doi.org/10.1177/2056305118768300