Datasets & Codebooks
Datasets & Codebooks
Below you will find the major topics codebook and a complete list of the Policy Agendas datasets listed by political institution. Click on each dataset name to expand its description (including date range) as well as to download a .csv of current data and a .pdf codebook that documents each variable.
Except as noted, datasets and codebooks available on this site © Policy Agendas Project 2013. These datasets are distributed free of charge, with attribution, for the educational and research communities. Policy Agendas Project topic codes and other Project generated variables are released under a Creative Commons Attribution Non-commerical ShareALike License. For more information visit our how to cite page.
We are currently engaged in the collection and coding of additional observations for most of our datasets listed below. Email us at policyagendas@gmail.com with any questions. Recent changes are listedhere. Additional related datasets can be found here.
- Master Topics Codebook
- CodebookThe Policy Agendas Project employs a coding scheme utilizing 19 major topic and 225 subtopic codes. Codes are assigned based on policy content across all project datasets and are mutually exclusive and exhaustive.
view codebook download codebook
- Congress
- Committee CodebookCommittee codes are found in the hearings, laws, and CQ almanac datasets. These codes assign a unique number to each congressional committees associated with a particular record in each of the datasets.
download codebook - Congressional BillsThis public resource, compiled by E. Scott Adler and John Wilkerson, provides information about more than 400,000 bills introduced in the U.S. Congress, currently 1947-2008, along with extensive information about each bill's progress and sponsor. It is used by researchers to study legislative institutions and behavior; by policy experts to study issue attention in Congress; and even by citizens studying their family histories (the dataset provides the only digitized records of tens of thousands of private bills introduced between 1947 and 1972). We also have manually classified each bill's title according to the topic coding system of the Policy Agendas Project. Each bill is designated to be primarily about one major topic and one subtopic. Note the full dataset is available for download, with corresponding codebook, via the Congressional Bills Project's website. An abbreviated dataset is utilized in our Trends Tool.
download data and codebook - Congressional HearingsThis dataset contains information summarizing each U.S. Congressional hearing from 1946 to 2010 (91,658 hearings). Using the Congressional Information Service (CIS) Abstracts, we code each hearing by our system of policy content codes. Other variables, including committee and subcommittee, are also available. Identification variables link our records to the original CIS source material. Note: Research making use of the congressional hearings dataset should bear in mind that the hearings for the last year available on our website are incomplete. This is due to the CIS archival system.
download dataset download codebook - Congressional Quarterly AlmanacThis dataset contains information from all articles in the main chapters of the CQ Almanac from 1948 to 2011 (14,217 records). Each CQ Almanac articles typically covers one legislative initiative; when an article contains information about several different public laws or bills, it is divided so that each record in our dataset contains information about one legislative initiative. Each record is coded according to our policy content scheme. Several other variables concerning each legislative initiative (e.g., bill numbers, Public Law number if applicable, committees involved, primary sponsors, etc.) are also included. Identification variables link our records to the original CQ source material as well as to our Public Laws dataset. A note of caution, article length has varied over the span of this dataset.
download dataset download codebook - Public LawsThis dataset contains information about each public law passed from 1948 to 2011 (19,914 records). Each record is coded by our policy content scheme and other variables. Identification variables allow linkage to the CQ Almanac dataset. The dataset directly links users to the full text (starting with the 104th Congress) and bill summary (starting with the 93rd Congress) information found on THOMAS and other public domain websites.
download dataset download codebook - Roll Call VotesThe Congressional Roll Call Voting dataset codes every congressional roll call vote from 1947 to 2004 (40,384 votes) using the Policy Agendas Project content coding system. In addition, this dataset standardizes information from multiple sources into an easily utilized format.
download dataset download codebook
- Presidency
- Executive OrdersThis dataset contains information about each executive order issued from 1945 to 2012 (4,109 records). Each record is coded according to our policy content scheme and other variables including the presidents party, whether the order was issued during a time of divided government, and whether the order was issued at the beginning or end of a presidential term.
download dataset download codebook - State of the Union SpeechesThis dataset contains information on each quasi-statement in the Presidential State of the Union Speeches from 1946 to 2012 (21,285 records). Each quasi-statement is coded according to our system of policy content categories and other variables. Users can directly link to full text versions of the speech for further analysis.
download dataset download codebook
- Supreme Court
- Supreme Court CasesThe Supreme Court dataset contains information on each case on the Courts docket from 1944 to 2009 (8,956 records), and is the only publicly available dataset to examine the Courts agenda from a policy perspective. Cases are coded according to our policy content scheme codes and include additional variables such as the Courts ruling in cases in which one was issued. The accompanying codebook addresses Court-specific coding issues and serves as a reference guide for those unfamiliar with the Courts terminology and procedures.
download dataset download codebook
- Public Opinion and Interest Groups
- Encyclopedia of AssociationsSince 1956, Gale Research, later Thomson/Gale, has published a printed volume entitled the Encyclopedia of Associations. The database on which the book is based also serves as a web-based research tool available through libraries and entitled Associations Unlimited. While not originally designed with the idea of dynamic analysis in mind, the accumulated volumes of the EA in fact allow a researcher considerable opportunity for analyzing trends over time. The Policy Agendas Project (PAP) has used the annual volumes of the EA to compile a time-series database of all associations, coded both by the EA subject categories as well as by the major topics of the PAP. Forty-two editions of the EA have been published from 1956 to 2005. We have compiled a simple list of each group and coded it into the PAP topic classification system. Complete data are available in 5-year intervals from 1970-2005 as well as estimated annual counts for the full period. A description of coverage and important details concerning the lag between reported copyright years and the information they represent is included in the full dataset codebook. Below are links to the annual imputed counts dataset (1970-2005, 972 records) used in the trends analysis tool (with corresponding codebook) as well the full 1970-2005 dataset (with corresponding codebook).
download dataset download codebook full dataset full dataset codebook - Gallup's Most Important ProblemThis dataset contains responses to Gallup's Most Important Problem question aggregated at the annual level from 1946 to 2012 (1,340 records) and coded by major topic. Years with missing observations (1953/1955) are those in which there were no corresponding MIP data available. Quarterly data are available from 1956 through 2012 below.
download dataset download codebook download quarterly dataset - Policy MoodsThe policy specific moods data set, compiled by James A. Stimson and K. Elizabeth Coggins, was created to supplement the traditional Global Mood measure in an effort to provide scholars with as many policy specific mood measures as possible. The global mood database, which consists of nearly 400 survey questions and almost 8,000 administrations across 70 years, was disaggregated to generate longitudinal measures of public opinion in specic policy domains. By matching each survey item with a policy code from the Policy Agendas Project coding scheme, it was possible to estimate 61 unique series as well as five additional series relating to abortion and gay rights spanning 1946 to 2011 (4,086 records). More information about survey items, administrations and time periods can be found in the corresponding data codebook.
download dataset download codebook
- News Media
- New York Times IndexThis dataset is a systematic random sample of the New York Times Index from 1946 to 2008 (49,204 records). The sample includes the first entry on every odd-numbered page of the Index. Each entry is coded by Policy Agendas major topics and includes other variables such as the length, date and location of the story and whether it addressed government actions.
download dataset download codebook - New York Times Index WeightsThis dataset provides information on the number of pages in the New York Times Index and an estimate of the number of articles per page for each of the years included in our Index dataset. These weights address the occasional newspaper format changes that systematically alter the number of articles on each page and the variation in the size of the New York Times and its Index over time.
download dataset
- Federal Budget
- BudgetThis dataset provides annual data, adjusted for inflation, of U.S. Budget Authority from FY 1947 through FY 2009 (7,245 records). Using Office of Management and Budget Functions and Subfunctions, we have revised the data to be consistent across time.
download dataset download codebook - Budget ResourcesThese pages highlight the main issues concerning the study of budgetary outcomes across countries and time. A brief glossary of budgetary terminology and data sources from international, national, and research institutions are provided.
view resources - Budget-Policy CrosswalkThis file compares the Policy Agendas Project topic codes with the OMB codes used in the Budget dataset to assess how well they correspond. A "1" represents nearly complete correspondence, while a "5" represents significant divergence.
download dataset


