Holdings by Blockholders

Note that there can be multiple values of shares outstanding, as I guess these numbers come from the filings themselves and the filings can occur on different dates even for a given report date. The SAS code essentially selects the number at random from the alternatives, whereas it seems to make most sense to use the filings made on the same date as the report date (i.e., what most institutions do).

rs <- dbGetQuery(pg, "
    DROP TABLE IF EXISTS agl.percent_held;
    SET work_mem='15GB';
    -- SELECT COUNT(*) FROM tfn.s34;

    CREATE TABLE agl.percent_held AS
    WITH s34 AS (SELECT * FROM tfn.s34)
    SELECT *
    FROM (
      SELECT cusip, rdate, 
        CASE WHEN total_shares_out > 0 THEN shares_held/total_shares_out END AS percent_held
      FROM (
        SELECT cusip, rdate, sum(shares)/1000 AS shares_held
        FROM s34
        GROUP BY cusip, rdate) AS a
      INNER JOIN (
        SELECT DISTINCT cusip, rdate,
          CAST(COALESCE(shrout2, shrout1*1000) AS bigint) AS total_shares_out
        FROM s34
        WHERE fdate=rdate) AS b
      USING (cusip, rdate)) AS a
    LEFT JOIN (SELECT DISTINCT permno, ncusip::character(8) AS cusip FROM crsp.stocknames) AS b
    USING (cusip);")

Here's a histogram of percent_held.

library(lattice)
instit <- dbGetQuery(pg, "SELECT * FROM agl.percent_held")
histogram(~percent_held, data = instit, subset = percent_held <= 1, breaks = 100)

plot of chunk make_hist