PIF_LIST_ALL Data Issue: Row Bleed / Column Shift

Date: 2026-01-30
Reported by: A. Perez
Affected table: Alliance_RPT.DataScience.PIF_LIST_ALL (replicated from PROD_EDW.DATASCIENCE.PIF_LIST_ALL)
Impact: “CLOSED” and “OPEN” appearing in HCPCS code fields on the PDR dashboards
Severity: 100% of rows affected (63,225 out of 63,225 contain a status value in hcpcs_code_list)


Summary

Data from unrelated closed cases is bleeding into columns 34-40 of every row in the SQL Server replica of PIF_LIST_ALL. Starting at column 34 (HCPCS_CODE_LIST), the values are not HCPCS codes — they are the STATUS, CASENUMBER, CLAIMNUMBER, etc. of a different case that got concatenated onto the row.

This is causing the word “CLOSED” to appear in the HCPCS Codes field on our PDR dashboards.


Root Cause: Replication Creates Phantom Columns

The Snowflake source table has 33 columns. The SQL Server replica has 40 columns.

Snowflake (PROD_EDW) SQL Server (Alliance_RPT)
Column count 33 40
Last column HCPCS_CODE_LIST (col 33) PaidDateList (col 40)
Column casing ALL UPPERCASE Cols 1-33 lowercase, cols 34-40 mixed case
Extra columns HCPCS_CODE_LIST, REVENUECODE_LIST, FirstPaidDate, LastPaidDate, DistinctPaidDates, MultiplePaidDates, PaidDateList

The 7 extra columns (34-40) in SQL Server do not exist in Snowflake. They are being created by the replication process. The different casing of these columns (mixed case vs. the lowercase of the original 33 columns) confirms they did not come from the same source DDL.

The contamination is NOT in the Snowflake PIF_LIST_ALL table — it is in the Snowflake-to-SQL-Server replication/ETL that populates Alliance_RPT.DataScience.PIF_LIST_ALL.

Column name comparison

Position  Snowflake (PROD_EDW)           SQL Server (Alliance_RPT)
--------  ----------------------------   ----------------------------
 1        STATUS                         status
 2        CASENUMBER                     casenumber
 3        CLAIMNUMBER                    claimnumber
...       ... (cols 4-32 match) ...      ...
33        HCPCS_CODE_LIST                icd10_rollup_list
                                         ---- Snowflake table ends here ----
34        (does not exist)               HCPCS_CODE_LIST      <-- contains STATUS of another case
35        (does not exist)               REVENUECODE_LIST     <-- contains CASENUMBER of another case
36        (does not exist)               FirstPaidDate        <-- contains CLAIMNUMBER of another case
37        (does not exist)               LastPaidDate         <-- contains UNIQUE_KEY of another case
38        (does not exist)               DistinctPaidDates    <-- contains BUS_DAYS of another case
39        (does not exist)               MultiplePaidDates    <-- contains FIRST_RECEIVEDDATE of another case
40        (does not exist)               PaidDateList         <-- contains remaining columns of another case

The extra columns 34-40 contain another case’s fields 1-7+, shifted by exactly 33 positions. The replication process is appending a second case’s entire row as extra columns.


Proof

We verified this by reading the contaminating case number directly from the phantom columns, looking it up in PIF_LIST_CLOSED, and confirming the values match field-by-field.

Example 1: FL026174482

Phantom column (SQL Server) Value found Maps to Value from PIF_LIST_CLOSED for FL022084990 Match
HCPCS_CODE_LIST (col 34) CLOSED status CLOSED TRUE
REVENUECODE_LIST (col 35) FL022084990 casenumber FL022084990 TRUE
FirstPaidDate (col 36) 44108602 claimnumber 44108602 TRUE
LastPaidDate (col 37) FL022084990_44108602 unique_key FL022084990_44108602 TRUE
  • The contaminating case FL022084990 does not exist in PIF_LIST_ALL as its own row
  • It does exist in PIF_LIST_CLOSED — it’s a closed case whose data got absorbed into FL026174482’s row

Example 2: FL026174535

Phantom column (SQL Server) Value found Maps to Value from PIF_LIST_CLOSED for FL025036053 Match
HCPCS_CODE_LIST (col 34) CLOSED status CLOSED TRUE
REVENUECODE_LIST (col 35) FL025036053 casenumber FL025036053 TRUE
FirstPaidDate (col 36) 46893019 claimnumber 46893019 TRUE
LastPaidDate (col 37) FL025036053_46893019 unique_key FL025036053_46893019 TRUE
  • Same pattern — FL025036053 exists only in PIF_LIST_CLOSED, not in PIF_LIST_ALL

Scale of the Problem

total_rows    contains_status    starts_with_newline
63225         63225              43570
  • All 63,225 rows in the SQL Server replica have “CLOSED” or “OPEN” somewhere in the phantom HCPCS_CODE_LIST column (col 34)
  • 43,570 rows (69%) have the phantom HCPCS_CODE_LIST starting with a newline character (\n), which is the row-concatenation seam

Sample of affected rows

Every row’s phantom REVENUECODE_LIST (col 35) contains a case number instead of revenue codes:

casenumber   claimnumber  status  hcpcs_code_list  revenuecode_list  firstpaiddate
FL027292966  <NA>         OPEN    \nCLOSED         FL026433261       <NA>
FL013528788  <NA>         CLOSED  \nCLOSED         FL08476382        <NA>
FL08474305   <NA>         CLOSED  \nCLOSED         FL08899910        <NA>
FL026433982  <NA>         CLOSED  \nCLOSED         FL010675624       <NA>
FL010235252  <NA>         CLOSED  \nCLOSED         FL08410660        <NA>
FL026433911  <NA>         CLOSED  \nCLOSED         FL09052943        <NA>
FL08381633   <NA>         CLOSED  \nCLOSED         FL010295368       <NA>
FL07715766   <NA>         CLOSED  \nCLOSED         FL013528874       <NA>
FL026433874  <NA>         CLOSED  \nCLOSED         FL026360933       <NA>
FL013325035  <NA>         CLOSED  \nCLOSED         FL018776585       <NA>

Full Row Dump (FL026174482)

Column                                     Value
----------------------------------------   -------------------------------------------------------
status                                   : OPEN
casenumber                               : FL026174482
claimnumber                              : 44108531
unique_key                               : FL026174482_44108531
bus_days                                 : 74.000000
first_receiveddate                       : 2025-10-14
last_receiveddate                        : 2025-10-14
first_closeddate                         : <NA>
last_closeddate                          : <NA>
rows_merged                              : 36
max_count                                : 1
vendorname_list                          : THE FOOT DOCTORS SANTA CRUZ CTY
vendornumber_list                        : 1091975
contracttype_list                        : NC
claimcategorycode_list                   : MED
determination_list                       : <NULL>
finalcategory_list                       : MED
maincategory_list                        : MED
has_subcategory_list                     : false
dim_cptprocedure_key_list                : 206774,209983
dim_claimprofile_key_list                : 11057,16048,24533,54792
claimtypedesc_list                       : Medical Claims
claimvisittype_list                      : Other Visit
claimpaidstatus_list                     : Denied,Paid
cptprocedurecode_list                    : 11055,11721
icd10diagnosisgroupsize_list             : 4
hcpcs_key_list                           : <NA>
hcpcs_desc_list                          : Unknown
hcpcs_cat1_list                          : Unknown
hcpcs_cat2_list                          : Unknown
icd10_key_list                           : 2948,45832,63328,63332
icd10_desc_list                          : Corns And Callosities | Pain In Left Toe(S) | ...
icd10_rollup_list                        : B35,L84,M79
--- BELOW THIS LINE: PHANTOM COLUMNS (34-40) NOT IN SNOWFLAKE SOURCE ---
--- DATA IS FROM A DIFFERENT CASE (FL022084990) ---
HCPCS_CODE_LIST                          : \nCLOSED          <-- STATUS of other case
REVENUECODE_LIST                         : FL022084990        <-- CASENUMBER of other case
FirstPaidDate                            : 44108602           <-- CLAIMNUMBER of other case
LastPaidDate                             : FL022084990_44108602  <-- UNIQUE_KEY of other case
DistinctPaidDates                        : 31.000000          <-- BUS_DAYS of other case
MultiplePaidDates                        : 2024-10-11         <-- FIRST_RECEIVEDDATE of other case
PaidDateList                             : 2024-10-11|*2024-11-26|*2024-11-26|*18|*1|*
                                           VALLEY CHILDREN'S HOSPITAL OP|*1083798|*C|*
                                           OTC|*Upheld|*Hospital Outpatient|*
                                           <-- ALL remaining columns of other case

What’s Happening

The Snowflake source table PROD_EDW.DATASCIENCE.PIF_LIST_ALL has 33 columns and appears to be clean. The replication process that copies this table to Alliance_RPT.DataScience.PIF_LIST_ALL is creating 7 extra columns (34-40) that do not exist in the source.

These phantom columns contain another case’s data — specifically, the first 7+ fields of an unrelated closed case get appended as new columns on every row:

Phantom column position Phantom column name Contains
34 HCPCS_CODE_LIST STATUS of another case
35 REVENUECODE_LIST CASENUMBER of another case
36 FirstPaidDate CLAIMNUMBER of another case
37 LastPaidDate UNIQUE_KEY of another case
38 DistinctPaidDates BUS_DAYS of another case
39 MultiplePaidDates FIRST_RECEIVEDDATE of another case
40 PaidDateList Remaining columns (pipe-delimited) of another case

The \n (newline) before CLOSED in column 34 is the seam — the second case’s data is appended as a new line during replication, and the column parser interprets it as additional columns.

Likely cause

The replication process (Snowflake -> SQL Server) is encountering newline characters within the data of column 33 (HCPCS_CODE_LIST in Snowflake). The replication likely uses a text-based export (CSV/flat file) and the newline in the data is being interpreted as a row delimiter. This causes:

  1. The first part of the row (columns 1-33) is written as one row
  2. The text after the newline is interpreted as a new row — but since it doesn’t have enough delimiters, it gets appended as extra columns to the previous row
  3. The “new row” data starting with CLOSED (a status from another case) lands in column 34

Alternatively, the replication may be doing a join or union that combines rows from PIF_LIST_ALL with rows from another table (like PIF_LIST_CLOSED), and the result has more columns than the source.

Evidence

  • The Snowflake source has 33 columns; SQL Server has 40 — 7 phantom columns are created during replication
  • Column casing differs: Snowflake columns are ALL UPPERCASE, SQL Server cols 1-33 are lowercase, but cols 34-40 are mixed case — proving they came from a different source/process
  • 100% of rows are affected — this is not an edge case, it’s a structural issue in the replication
  • rows_merged = 36 for case FL026174482, which only has 1 claim (max_count = 1) — the extra rows likely come from the other case
  • The contaminating cases (FL022084990, FL025036053) are closed cases that exist in PIF_LIST_CLOSED but not in PIF_LIST_ALL as their own rows — their data was absorbed into other rows
  • The contamination always starts at column 34, and the values map 1:1 to columns 1-6+ of PIF_LIST_ALL’s own schema

Suggested Investigation

1. Confirm Snowflake source is clean

-- Snowflake: verify only 33 columns
DESCRIBE TABLE PROD_EDW.DATASCIENCE.PIF_LIST_ALL;

-- Snowflake: check if HCPCS_CODE_LIST contains status values
SELECT
    COUNT(*) AS total_rows,
    SUM(CASE WHEN HCPCS_CODE_LIST LIKE '%CLOSED%'
              OR HCPCS_CODE_LIST LIKE '%OPEN%' THEN 1 ELSE 0 END) AS contains_status
FROM PROD_EDW.DATASCIENCE.PIF_LIST_ALL;
-- If contains_status = 0, Snowflake is clean and the issue is in replication

2. Check the replication process

  • What tool/process replicates PROD_EDW.DATASCIENCE.PIF_LIST_ALL to Alliance_RPT.DataScience.PIF_LIST_ALL?
  • Does it use a CSV/flat file export? If so, check the text qualifier and row delimiter settings — newlines in data fields will break row boundaries
  • Does it do a join or union with another table during replication?
  • Why does it produce 40 columns when the source only has 33?

3. Verify the contaminating cases

-- Snowflake: the affected case
SELECT * FROM PROD_EDW.DATASCIENCE.PIF_LIST_ALL WHERE CASENUMBER = 'FL026174482';

-- Snowflake: the contaminating closed case is NOT in PIF_LIST_ALL
SELECT * FROM PROD_EDW.DATASCIENCE.PIF_LIST_ALL WHERE CASENUMBER = 'FL022084990';
-- returns 0 rows

-- Snowflake: but it IS in PIF_LIST_CLOSED
SELECT STATUS, CASENUMBER, CLAIMNUMBER
FROM PROD_EDW.DATASCIENCE.PIF_LIST_CLOSED
WHERE CASENUMBER = 'FL022084990';
-- returns CLOSED, FL022084990, 44108602
-- These match columns 34-36 of the SQL Server row exactly

Impact

  • All 63,225 rows in the SQL Server replica have contaminated data in phantom columns 34-40
  • HCPCS code fields on all three PDR dashboards show “CLOSED” or “OPEN” instead of actual procedure codes
  • Revenue codes, paid dates, and paid date lists are also wrong (columns 35-40 contain the other case’s identity fields)
  • The rows_merged count may be inflated by rows from unrelated cases
  • Dashboards are reading from SQL Server (Alliance_RPT), not Snowflake, so the Snowflake data being clean does not help until replication is fixed

Schema Reference

Snowflake source: PROD_EDW.DATASCIENCE.PIF_LIST_ALL — 33 columns

# Column
1 STATUS
2 CASENUMBER
3 CLAIMNUMBER
4 UNIQUE_KEY
5 BUS_DAYS
6 FIRST_RECEIVEDDATE
7 LAST_RECEIVEDDATE
8 FIRST_CLOSEDDATE
9 LAST_CLOSEDDATE
10 ROWS_MERGED
11 MAX_COUNT
12 VENDORNAME_LIST
13 VENDORNUMBER_LIST
14 CONTRACTTYPE_LIST
15 CLAIMCATEGORYCODE_LIST
16 DETERMINATION_LIST
17 FINALCATEGORY_LIST
18 MAINCATEGORY_LIST
19 HAS_SUBCATEGORY_LIST
20 DIM_CPTPROCEDURE_KEY_LIST
21 DIM_CLAIMPROFILE_KEY_LIST
22 CLAIMTYPEDESC_LIST
23 CLAIMVISITTYPE_LIST
24 CLAIMPAIDSTATUS_LIST
25 CPTPROCEDURECODE_LIST
26 ICD10DIAGNOSISGROUPSIZE_LIST
27 HCPCS_KEY_LIST
28 HCPCS_DESC_LIST
29 HCPCS_CAT1_LIST
30 HCPCS_CAT2_LIST
31 ICD10_KEY_LIST
32 ICD10_DESC_LIST
33 ICD10_ROLLUP_LIST

SQL Server replica: Alliance_RPT.DataScience.PIF_LIST_ALL — 40 columns

Columns 1-33 match Snowflake (but lowercase). Columns 34-40 are phantom columns not in the source:

# Column Notes
1-33 (same as Snowflake, lowercase) Legitimate data
34 HCPCS_CODE_LIST Phantom — contains STATUS of another case
35 REVENUECODE_LIST Phantom — contains CASENUMBER of another case
36 FirstPaidDate Phantom — contains CLAIMNUMBER of another case
37 LastPaidDate Phantom — contains UNIQUE_KEY of another case
38 DistinctPaidDates Phantom — contains BUS_DAYS of another case
39 MultiplePaidDates Phantom — contains FIRST_RECEIVEDDATE of another case
40 PaidDateList Phantom — contains remaining columns of another case (pipe-delimited)