UT wordmark
College of Liberal Arts wordmark

Datasets & Codebooks

Datasets & Codebooks

As of Spring, 2014,  we have updated our website to include a revised 2014 Topics Codebook and subsequent updates to all datasets.  These revisions increase alignment with the Comparative Agendas Project (CAP) coding system.  Dataset updates include new columns for corresponding CAP codes where appropriate.  Archived versions and detailed summary information are available here. We are presently working on further recoding of foreign affairs topics within the NYT Index dataset.

The major topics codebook and a complete list of the Policy Agendas datasets are listed below.  Click on each dataset name to expand its description (including date range) as well as to download a .csv of current data and a .pdf codebook that documents included variables. 

Except as noted, datasets and codebooks available on this site are © Policy Agendas Project 2013.  These datasets are distributed free of charge, with attribution, for the educational and research communities.  Policy Agendas Project topic codes and other Project generated variables are released under a Creative Commons Attribution Non-commerical ShareALike License.  For more information visit our how to cite page.

We are currently engaged in the collection and coding of additional observations for most of our datasets listed below.  We continuously make improvements to our datasets and researchers should routinely check for updates.  Email us at policyagendas@gmail.com with any questions.  Recent changes are listedhere.  Additional related datasets can be found here

    Master Topics Codebook
  • Codebook
    The Policy Agendas Project employs a coding scheme utilizing 19 major topic and 225 subtopic codes. Codes are assigned based on policy content across all project datasets and are mutually exclusive and exhaustive.
    view codebook  download codebook
  • Committee Codebook
    Committee codes are found in the hearings, laws, and CQ almanac datasets. These codes assign a unique number to each congressional committees associated with a particular record in each of the datasets.
    download codebook
  • Congressional Bills
    This public resource, compiled by E. Scott Adler and John Wilkerson, provides information about more than 400,000 bills introduced in the U.S. Congress, currently 1947-2012, along with extensive information about each bill's progress and sponsor. It is used by researchers to study legislative institutions and behavior; by policy experts to study issue attention in Congress; and even by citizens studying their family histories (the dataset provides the only digitized records of tens of thousands of private bills introduced between 1947 and 1972). Each bill is coded according to the topic coding system of the Policy Agendas Project. Note the full dataset is available for download, with corresponding codebook, via the Congressional Bills Project's website. An abbreviated dataset is utilized in our Trends Tool and available for download by topic via the tool with updated PAP2014/CAP codes.
    Congressional Bills Project for codebook and full data
  • Congressional Hearings
    This dataset contains information summarizing each U.S. Congressional hearing from 1946 to 2013 (95,060 hearings). Using the Congressional Information Service (CIS) Abstracts, we code each hearing by our system of policy content codes. Other variables, including committee and subcommittee, are also available. Identification variables link our records to the original CIS source material. Note: Research making use of the congressional hearings dataset should bear in mind that the hearings for the last year available on our website are incomplete. This is due to the CIS archival system.
    download dataset  download codebook
  • Congressional Quarterly Almanac
    This dataset contains information from all articles in the main chapters of the CQ Almanac from 1948 to 2011 (14,217 records). Each CQ Almanac articles typically covers one legislative initiative; when an article contains information about several different public laws or bills, it is divided so that each record in our dataset contains information about one legislative initiative. Each record is coded according to our policy content scheme. Several other variables concerning each legislative initiative (e.g., bill numbers, Public Law number if applicable, committees involved, primary sponsors, etc.) are also included. Identification variables link our records to the original CQ source material as well as to our Public Laws dataset. A note of caution, article length has varied over the span of this dataset.
    download dataset  download codebook
  • Public Laws
    This dataset contains information about each public law passed from 1948 to 2011 (19,914 records). Each record is coded by our policy content scheme and other variables. Identification variables allow linkage to the CQ Almanac dataset. The dataset directly links users to the full text (starting with the 104th Congress) and bill summary (starting with the 93rd Congress) information found on THOMAS and other public domain websites.
    download dataset  download codebook
  • Roll Call Votes
    The Congressional Roll Call Voting dataset codes every congressional roll call vote from 1947 to 2012 (49,216 votes) using the Policy Agendas Project content coding system. In addition, this dataset standardizes information from multiple sources into an easily utilized format. As of August 2014, we have streamlined the variables that we collect and offer for download in the RC dataset. A link to the legacy version and corresponding data codebook is available below.
    download dataset  download codebook  download legacy dataset  download legacy codebook
  • Executive Orders
    This dataset contains information about each executive order issued from 1945 to 2013 (4,129 records). Each record is coded according to our policy content scheme and other variables including the presidents party, whether the order was issued during a time of divided government, and whether the order was issued at the beginning or end of a presidential term.
    download dataset  download codebook
  • State of the Union Speeches
    This dataset contains information on each quasi-statement in the Presidential State of the Union Speeches from 1946 to 2015 (22,417 records). Each quasi-statement is coded according to our system of policy content categories and other variables. Users can directly link to full text versions of the speech for further analysis.
    download dataset  download codebook
    Supreme Court
  • Supreme Court Cases
    The Supreme Court dataset contains information on each case on the Courts docket from 1945 to 2009 (8,955 records), and is the only publicly available dataset to examine the Courts agenda from a policy perspective. Cases are coded according to policy content and include additional variables such as the Courts ruling in cases in which one was issued. The accompanying codebook addresses Court-specific coding issues and serves as a reference guide for those unfamiliar with the Courts terminology and procedures.
    download dataset  download codebook
    Public Opinion and Interest Groups
  • Encyclopedia of Associations
    Since 1956, Gale Research, later Thomson/Gale, has published a printed volume entitled the Encyclopedia of Associations. The database on which the book is based also serves as a web-based research tool available through libraries and entitled Associations Unlimited. While not originally designed with the idea of dynamic analysis in mind, the accumulated volumes of the EA in fact allow a researcher considerable opportunity for analyzing trends over time. The Policy Agendas Project (PAP) has used the annual volumes of the EA to compile a time-series database of all associations, coded both by the EA subject categories as well as by the major topics of the PAP. Forty-two editions of the EA have been published from 1956 to 2005. We have compiled a simple list of each group and coded it into the PAP topic classification system. Complete data are available in 5-year intervals from 1970-2005 as well as estimated annual counts for the full period. A description of coverage and important details concerning the lag between reported copyright years and the information they represent is included in the full dataset codebook. Note that as of March 2014, we have implemented a 4 year lag in the annual dataset, with the previous Year variable now listed as CopyrightYear. Below are links to the annual imputed counts dataset (1966-2001, 972 records) used in the trends analysis tool (with corresponding codebook) as well the full 1970-2005 dataset (with corresponding codebook). A recently published article about the dataset is also provided.
    download dataset  download codebook  full dataset  full dataset codebook  article
  • Gallup's Most Important Problem
    This dataset contains responses to Gallup's Most Important Problem question aggregated at the annual level from 1946 to 2012 (1,407 records) and coded by major topic. Years with missing observations (1953/1955) are those in which there were no corresponding MIP data available. Contact us for quarterly MIP data if needed.
    download dataset  download codebook
  • Policy Moods
    The policy specific moods data set, compiled by James A. Stimson and K. Elizabeth Coggins, was created to supplement the traditional Global Mood measure in an effort to provide scholars with as many policy speci fic mood measures as possible. The global mood database, which consists of nearly 400 survey questions and almost 8,000 administrations across 70 years, was disaggregated to generate longitudinal measures of public opinion in speci c policy domains. By matching each survey item with a policy code from the Policy Agendas Project coding scheme, it was possible to estimate 61 unique series as well as fi ve additional series relating to abortion and gay rights spanning 1940 to 2015 (5,604 records). More information about survey items, administrations and time periods can be found in the corresponding data codebook.
    download dataset  download codebook  custom series application
    News Media
  • New York Times Index
    This dataset is a systematic random sample of the New York Times Index from 1946 to 2008 (49,201 records). The sample includes the first entry on every odd-numbered page of the Index. Each entry is coded by Policy Agendas major topics and includes other variables such as the length, date and location of the story and whether it addressed government actions.
    download dataset  download codebook
  • New York Times Index Weights
    This dataset provides information on the number of pages in the New York Times Index and an estimate of the number of articles per page for each of the years included in our Index dataset. These weights address the occasional newspaper format changes that systematically alter the number of articles on each page and the variation in the size of the New York Times and its Index over time.
    download dataset
    Federal Budget
  • Budget Authority (Adjusted)
    This dataset provides annual data, adjusted for inflation, of U.S. Budget Authority from FY 1947 through FY 2014 (7,820 records). Using Office of Management and Budget Functions and Subfunctions, we have revised the data to be consistent across time. We utilize the most recent OMB deflator to generate inflation-adjusted variables.
    download dataset  download full codebook  data codebook
  • Budget Authority-Policy Crosswalk
    This file compares the Policy Agendas Project topic codes with the OMB codes used in the Budget Authority dataset to assess how well they correspond. A "1" represents nearly complete correspondence, while a "5" represents significant divergence.
    download crosswalk
  • Budget Outlays
    This dataset, compiled by Bryan D. Jones, Frank R. Baumgartner and John Lovett, provides two 'synthetic' series of annual, long-term budget outlays. There is no single series reporting expenditures (outlays) for the US Federal Government since the founding of the Republic. However, two separate data series are available for US Federal Expenditures, compiled by the Treasury Department and Office of Management and Budget. The Treasury Series runs from 1791 to 1970, and the OMB series covers 1940 to the present. From these data sources, two synthetic budget series are constructed by merging data from the US Treasury with data from OMB. The series labeled Treasury Synthetic uses Treasury data from 1791 through 1970, OMB afterward. OMB Synthetic uses Treasury numbers until 1940, OMB afterward. For a complete description of these data sources, methods used to construct the series, and variable descriptions, please see the corresponding codebook below.
    download codebook  download dataset  related article appendix
  • Budget Resources
    These pages highlight the main issues concerning the study of budgetary outcomes across countries and time. A brief glossary of budgetary terminology and data sources from international, national, and research institutions are provided.
    view resources