Did you know that the IRS releases annual datafiles on e-filers? We use the ECDS and PTIN datasets to describe the modern tax preparation industry. While our analysis has limitations due to the source data, it provides context for firms looking to compare themselves to peer competitions.
Our major findings include:
First, firm size follow a long-tail distribution. Intuit, H&B Block, and Tax Hawk are the largest e-filers, submitting 35% of e-filed returns. 89% of e-filers are small firms submitting fewer than a 1,000 returns.
Second, firms have significantly different rejection rates. The average rejection rate is 6.55%. But firms focusing on direct-to-consumer services have high rejection rates (such as 14% at Intuit or 61% at Jackson Hewitt). Lower rejection rates are found in firms that focus on payroll services or more complex returns (such as 1.6% at ADP or 2.7% at CohnReznick)
Third, we find gender imbalances. All firms have fewer women partners, but this imbalance becomes worse as firm size increases. Gender is balanced among PTIN records, except for larger firms that are male-dominated.
Fourth, we matching PTIN records to e-filers to find productivity baselines. While our analysis is limited, it suggests that the average preparer files between 267 and 323 returns per year.
Fifth, geography plays a role in firm size, rejection rates, and gender. Southern states have higher rejection rates, and slightly more small firms. Gender imbalances are also present, with the South having a higher percentage of PTIN preparers.
This study builds on … No prior works were found that summarize… Surveys show that …
We use two datasets. The first is the External Customer Data Store (ECDS), which provides information on all registered e-filers. The ECDS data is composed of three files. Each required significant pre-processing, as the numbers do not align with the IRS submission data. Additionally, we combine this data with the PTIN dataset, which has information on registered tax preparers.
The first ECDS file is titled ‘master.’ This file contains one row per e-filer identity, with 368,412 rows and 166 columns. Some e-filers have multiple identities, either for different business lines, geographic locations, or other reasons. Some organizations use multiple ECDS identities. We manually reviewed all e-filers with more than 1,000 returns to combine multiple identities, and wrote code to automatically merge smaller filer identities for H&R Block and Jackson Hewitt. In this article, we will use the term ‘e-filer’ to refer to a single ECDS identity and ‘firm’ when referring to the combined entity.
There are differences between the main IRS submission information and the ECDS file. The ECDS file records 246 million transmitted returns, significantly more than the IRS’s count. This is likely because the ECDS file does not differentiated between original returns, amended returns, extensions, or other submissions. Also, E-filing records do not differentiate between self-prepared returns or those signed by a CPA or EA. According to the IR, 152 million returns were e-filed in 2024 (of the 163 million total returns). Of these, 86m were from tax professionals, and 66 million from self-prepared Source. Similarly, the TIGTA report on the 2023 filing season found that 75m were from a tax preparer, and 63m were prepared by the tax payer (page 4).
The second ECDS file contains e-filer partners information, with 591,467 rows. People show up multiple times, as each person has a separate record for each title, prof-type, and title. De-duplicating the information leaves a total of 350,102 unique customer ids (e-filer identify) and names. Data on the number of partners at an e-filer was inconsistent, as the field in the master table does not align with the number of partners in the partner table.
The third and final ECDS file contains contact information, and has 408,656 rows and 7 columns. Each person showed up one time per row, and with their company, phone number, and if they are the primary contact for their company. Each e-filer has either one or two contacts. We used this contact information to merge PTIN and e-filer data.
We downloaded the IRS list of PTIN holders and matched it to the ECDS data. Joining PTIN to e-filers is complicated, as they do not share any common ID fields.
We used the following logic to join the tables:
Since PTINs often matched a e-filer based on more than one rule, this left a total of 314,707 matched and 401,306 unmatched PTIN rows. When looking at organizations that submitted returns in 2024, 190,165 had at least one PTIN, leaving 122,306 with no matching PTIN holders.
The probability of an e-filer having matching PTINs is generally proportional to their size, with 52% of tiny firms, 66% of small firms, and 77% of medium firms having a match. Large firms were the reverse, with only 54% having at least one matching PTIN.
Our first approach to analyzing the data focuses on e-filers. Who submits returns, and what does the data reveal about their characteristics?
Intuit files the most returns (20%), followed by H&R Block (10%), TaxHawk (5%), and ADP - San Dimas (2%). The number of returns submitted by each firm follows a long-tail distribution, with a few large entities and many smaller ones. A long-tail distribution describes data with a few large entities, but many smaller entities far from the ‘head’ of the data.
This long tail distribution can can be best seen when grouping firms into size categories based on the number of returns transmitted in 2024. The categories are large (more than 10,000 returns), medium (over 1,000), small (over 100), and tiny (under 100).
While there were 325 large companies, they account for 46% of returns. In contrast, tiny firms only submitted 2% of total returns, but make up 46% of registered e-filers.
More eFilings were submitted in 2024 than in 2023. However, most of this growth is from large firms. Among firms submitting in both years, large firms grew an average of 30.2%, medium firms 3.5%, small firms 1.1%, and tiny firms only 0.3%.
The ECDS contains information on filing rejection rate over time. The rejection rate have slightly increased over the last 3 years, from 6.49% two years ago, to 6.55% in 2024.
However, the rejection rate vary significantly by firm. Among the 325 firms with at least 10,000 returns, the rejection rate varies from 84% to 0.02%.
Some of the high rejections rates can be explained by organization’s niche. For example, the worst rejection rate of 85% is for Administrative Systems, Inc. Their minimal online presence indicates that they are a payroll service with a focus on charitable trusts and funeral homes. Other organizations with high failure rates include those focused on self-service or simple returns. For example, Jackson Hewitt has a 61% rejection rate. However, some self-service firms show much lower rejection rates, with H&R Block at 28% and Intuit at 14%.
What causes a return to be rejected? The Taxpayer’s Advocacy Group’s 2021 Annual report to Congress reports that mismatches with prior year PIN or adjusted gross income are the most common business rule broken (38%). They also note that problems with SSN or duplicate returns make up another 16% of the rejections.
Moving beyond firms, we can analyze the people behind the e-filers. Who are the partners and preparers, and what can we learn about them?
We used the first name to estimate the gender information of each e-filer’s partners. While imperfect, (Genderizer)[https://genderize.io/] was able to estimate 90.1% of partner names at a ≥90% confidence level. We exclude the 9.9% of names with an unknown gender from further analysis.
Partners are male dominated, with 57.5% male versus 42.5% female. But, this ratio differs by firm size. Tiny e-filers have almost double the number of female partners than large e-filers (42% versus 24%).
While partners differ by gender, PTIN holders are more balanced. Gender differences only appear for PTIN holders matched to large e-filers. Excluding ambiguous names, female PTIN holders make up 52% of tiny e-filers, 49% of small e-filers, 52% of medium e-filers, and 43% of large e-filers. Among PTIN holders unable to be matched to a firm, were 56% were classified as female by Genderizer.
Determining the average productivity for a preparer is complicated. Some e-filers act as a platform for self-submissions, while others have a team of licensed professionals that prepare each return. An e-filer may focus on differently levels of complex among their returns. To simplify the analysis, we exclude larger e-filers, as they are more likely to have a mix of self-service and professional services.
Matching PTIN holders to e-filer identities allows us to estimate the number of returns issued per preparer. Among e-filers with matched PTIN holders, the median number of returns per preparer is 160. However, this average is somewhat misleading, as e-filers with only a single associated PTIN have a highly skewed long-tail distribution. E-filers with multiple matched PTIN holders show a more normal curve. The number of returns per preparer is also highly dependent on the the number of preparers, with e-filers have fewer preparers showing higher productivity.
Finally, we analyzed geography. Do e-filers have an association with their region or state? Does region have an impact on an e-filers’s size, rejection rate, productivity, or gender balance? To simplify the analysis, we exclude e-filers in US territories and foreign countries.
Using e-filer locations can be misleading. For example, Intuit has a primary e-filer identity in California, and H&B Block has one of its larger e-filer identities in Missouri. Removing large e-filers results in a more accurate distribution of e-filers by state, with an average of .37 e-filer per resident (including large e-filers would average 0.57 e-file per resident).
The number of returns issued per e-filer differs by region. Those in the South tend to submit fewer returns than those in the other regions.
E-filers in the South also have a higher rejection rate than other regions. This is most prominent with large e-filers, but the same pattern can be seen for all firm sizes.
Mapping rejection rates by state shows a trend where southern states have a higher rejection rates than other regions. The map excludes large e-filers, as H&B Block in Missouri and Intuit in California have a disproportionate number of returns.
Gender is also unevenly distributed by state. The map shows that the South has a higher proportion of female PTIN holders than other regions.
TBD
On average, e-filers submitted 10% of their returns in the second half of the year. There was a very slight increase in rejection rates, with firms averaging a 6% median rejection rate in the first half of the year, and 7% rate in the second half.
#368,412
t_master_diff <- t_master %>%
select(cust_id, ytd_trnsmt_ret, ytd_rej_ret, firm_size) %>%
left_join(
t_master_midyear %>%
select(cust_id, ytd_trnsmt_ret, ytd_rej_ret) %>%
rename(mid_trnsmt_ret = ytd_trnsmt_ret,
mid_rej_ret = ytd_rej_ret),
by = 'cust_id') %>%
mutate(trnsmt_ret_change = (ytd_trnsmt_ret - mid_trnsmt_ret) / ytd_trnsmt_ret,
ytd_reject_rate = ytd_rej_ret / ytd_trnsmt_ret,
mid_reject_rate = mid_rej_ret / mid_trnsmt_ret,
end_reject_rate = ifelse(ytd_trnsmt_ret - mid_trnsmt_ret == 0,
NA,
(ytd_rej_ret-mid_rej_ret) / (ytd_trnsmt_ret - mid_trnsmt_ret)),
ytd_v_mid = ytd_reject_rate - mid_reject_rate)
hist( filter(t_master_diff, trnsmt_ret_change < 1)$trnsmt_ret_change)
hist( filter(t_master_diff, trnsmt_ret_change < 100000)$ytd_v_mid)
summary(t_master_diff$trnsmt_ret_change)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.02 0.10 0.14 0.21 1.00 58232
#summary(t_master_diff$ytd_v_mid)
summary(t_master_diff$ytd_reject_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.03 0.07 0.10 0.13 1.00 55945
summary(t_master_diff$mid_reject_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.02 0.06 0.09 0.12 1.00 61370
summary(t_master_diff$end_reject_rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 0.00 0.07 0.12 0.17 1.00 111550
t_master_diff <- NULL
https://www.cpajournal.com/2024/01/23/state-of-the-profession-3/
Some insights can be gathered by examining the list of contacts. Unfortunately, the files are posted without data dictionaries that explain the certification level for the partners table.
We used LinkedIn to cross-reference individuals, as well as consulting lists of qualifications from the (IRS Return Preparer Office federal tax return preparer statistics)[https://www.irs.gov/tax-professionals/return-preparer-office-federal-tax-return-preparer-statistics]. Our best guess is for the following classifications.