Institutional Investor Classification Data:
Variable Definitions

(Website updated on September 27, 2018)

Spectrum manager number

This is the fund manager number used by the Spectrum database, which WRDS labels as “mgrno”.
Return to data page

Manager number version

Spectrum recycles manager numbers. I assign a new version number every time there is more than a two quarter break in holdings information for a manager number. I use this information to help update the permanent key (see below).
Return to data page

Permanent key

For a few years, Spectrum assigned fund managers a permanent key to allow researchers to merge the 13F data with the mutual fund data. Spectrum has discontinued this data item, but I have used it as a basis to tie together the holdings history for fund managers that change manager numbers. My RAs and I have created permanent keys for managers that did not have them assigned by Spectrum and have updated them over time. Given this procedure, I cannot guarantee that these histories are 100% accurate. Please contact me if you find any errors in this assignment and I will update the dataset.
Return to data page

Year

This is the calendar year of the classification. In classifying institutions, I compute averages across the four holdings reports for each calendar year.
Return to data page

Type

This is the legal type of the institutional investor, using the following code:

BNK = bank trust (Spectrum type code 1)
INS = insurance company (2)
INV = investment company (3)
IIA = independent investment advisor (4)
CPS = corporate (private) pension fund (5)
PPS = public pension fund (5)
UFE = university and foundation endowments (5)
MSC = miscellaneous (5)

As noted on the WRDS website, the type code variable on Spectrum is not reliable after 1998. I have taken the “reliable” Spectrum type codes and carried them forward in time for institutions still in existence after 1998. For new institutions, my RAs and I have attempted to assign a type code based on searches for information about the fund manager. In doing so, we have not attempted to distinguish between type code 3 and type code 4. In my research, I merge these two types into one group. In addition, we have taken the type code 5 group (“other”) and attempted to determine whether the fund manager was a private pension, public pension, or an endowment. All other institutions were classified as “miscellaneous”. Given this procedure, I cannot guarantee that these histories are 100% accurate. Please contact me if you find any errors in this assignment and I will update the dataset.
Return to data page

Transient/Quasi-indexer/Dedicated classification

This classification uses the following code:

DED = dedicated
QIX = quasi-indexer
TRA = transient

This classification is based on the one used in Bushee (2001) and Bushee and Noe (2000). Note that I changed my classification scheme after the Bushee (1998) paper by dropping the momentum variables to allow it to be used in more general situations. I extended those classifications by applying the factor loadings reported in those papers to the more recent data to compute factor scores, which I used to add the new data to the existing clusters. If a fund has no classification for a given year, it means that some of the data was missing, the fund has a small portfolio (i.e., fewer than four stocks that have available CRSP and Compustat data), or the fund has not been listed on Spectrum for two years. As a consequence of these restrictions, not all of the institutions have been classified using this approach. There are a large number of unclassified institutions in the past five years, primarily due to growth in the number of new fund managers. For a potential solution to this problem, please see the data item below.
Return to data page

Permanent Transient/Quasi-indexer/Dedicated classification

Using the permanent key variable we created, I find the modal classification for each permanent key and assign that classification to each year of data for the fund manager.

This approach potentially helps solve the following problems:
  • Some fund managers, especially those that manage a large number of mutual funds, have classifications that frequently shift across years (the first-order autocorrelation in the classifications is generally around 0.8). This approach fixes the classification across time to the most common classification.

  • Because fund manager classifications can change, it is generally not a good idea to compute changes in holdings for a given type as the difference between the percentage ownership by the type in one period and percentage ownership by the type in a prior period. Such a measure would consider a fund that does not change its holdings, but does change its type, to be a change in holdings by the type. The permanent approach eliminates this potential problem.

  • As noted above, in the first two years of a fund's history, I cannot compute its classification. This approach allows me to fill in some of this missing data.
This approach has the following drawbacks:
  • Some fund managers likely do change their trading orientation over time. Using the modal classification obscures such changes.

  • The modal classification is based on my permanent key, which may not be 100% accurate. A potentially safer way to compute this variable would be to take the modal classification across manager number and manager version number. Note: you do not want to compute the mode by manager number alone as this is recycled and may combine different managers in the measure.
I have tended to use this modal classification scheme in my work, but you should make your own call based on this trade-off.
Return to data page

Investment style classification

In Abarbanell, Bushee, Raedy (2003), we classified institutions based on investment styles or preferences for firm size and growth, using the following code:

LVA = Large Value style
LGR = Large Growth style
SVA = Small Value style
SGR = Small Growth style

As discussed in that paper, this classification stems from a cluster analysis on a firm size factor and a value/growth factor. Please see the paper for more details.
Return to data page

Permanent Investment style classification

This classification is based on the modal classification for each permanent key. See above for more details.
Return to data page

Growth style classification

In Bushee and Goodman (2007), we classified institutions based on preferences for growth or value firms, using the following code:

GRO = Growth style
VAL = Value style
G&I = Growth & Income style (middle group)

As discussed in that paper, this classification stems from a cluster analysis on a value/growth factor. Please see the paper for more details.
Return to data page

Permanent Growth style classification

This classification is based on the modal classification for each permanent key. See above for more details.
Return to data page

Tax-sensitivity classification

In Blouin, Bushee, and Sikes (2017), we classified institutions based on sensitivity to capital gains taxes, using the following code:

TII = Tax-insensitive institutional investor
TSI = Tax-sensitive institutional investor

As discussed in that paper, this classification is based on measures of tax-motivated trading around calendar-year ends and on other portfolio characteristics. We first classify institutions as tax-senstive on a yearly basis, and then use the mode over the prior five years to determine the TSI or TII classification. Because of the requirement of five-years of data, this method does not classify institutions until we have five years of data. Also, this method does not classify institutions prior to 1991 because we chose to compute embedded gains and losses from 1980 to 1987 before attempting to use them to classify institutions. Please see the paper for more details.
Return to data page

Extended tax-sensitivity classification

This classification makes two modifications to the BBS (2017) approach to extend the classification. First, for institution-years with fewer than five years of data, I use as much past data as possible plus some future data to get five years of data to classifiy the institution. For example, for an institution that started in 2001, its 2003 classification would be based on data from 2001, 2002, 2003, 2004, and 2005. Second, I "smooth" the classification by re-classifying any one-year blips in the rolling five-year modal classification. For example, if an institution has a time-series like this--TSI, TSI, TII, TSI, TSI--I will change the TII to a TSI. There are only 58 institution-years that were smoothed in this way.
Return to data page


Home E-mail me Wharton Accounting