Datasets & Codebooks
Datasets & Codebooks
Below you will find the major topics codebook and a complete list of the Policy Agendas datasets listed by political institution. Click on each dataset name to expand its description (including date range) as well as to download a .csv of current data and a .pdf codebook that documents each variable.
Note that we are currently engaged in the collection and coding of additional observations for most of our datasets listed below. If you have specific questions about the future availability of updates, contact us at policyagendas@gmail.com.
- Master Topics Codebook
- CodebookThe Policy Agendas Project employs a coding scheme utilizing 19 major topic and 225 subtopic codes. Codes are assigned based on policy content across all project datasets and are mutually exclusive and exhaustive.
view codebook download codebook
- Congress
- Congressional HearingsThis dataset contains information summarizing each U.S. Congressional hearing from 1946 to 2008 (87,896 hearings). Using the Congressional Information Service (CIS) Abstracts, we code each hearing by our system of policy content codes. Other variables, including committee and subcommittee, are also available. Identification variables link our records to the original CIS source material. Note: Research making use of the congressional hearings dataset should bear in mind that the hearings for the last year available on our website are incomplete. This is due to the CIS archival system.
download dataset download codebook - Congresssional Quarterly AlmanacThis dataset contains information from all articles in the main chapters of the CQ Almanac from 1948 to 2007 (14,028 records). Each CQ Almanac articles typically covers one legislative initiative; when an article contains information about several different public laws or bills, it is divided so that each record in our dataset contains information about one legislative initiative. Each record is coded according to our policy content scheme. Several other variables concerning each legislative initiative (e.g., bill numbers, Public Law number if applicable, committees involved, primary sponsors, etc.) are also included. Identification variables link our records to the original CQ source material as well as to our Public Laws dataset. A note of caution, article length has varied over the span of this dataset.
download dataset download codebook - Public LawsThis dataset contains information about each public law passed from 1948 to 2007 (19,215 records). Each record is coded by our policy content scheme and other variables. Identification variables allow linkage to the CQ Almanac dataset. The dataset directly links users to the full text (starting with the 104th Congress) and bill summary (starting with the 93rd Congress) information found on THOMAS and other public domain websites.
download dataset download codebook - Most Important LawsThis dataset identifies 576 of the most important laws from 1948 through 1998, based on the number of lines of CQ Almanac coverage they receive (with adjustments made between 1948 and 1961 for below-average levels of CQ coverage).
download dataset - Roll Call VotesThe Congressional Roll Call Voting dataset codes every congressional roll call vote from 1946 to 2004 using the Policy Agendas Project content coding system. In addition, this dataset standardizes information from multiple sources into an easily utilized format.
download dataset download codebook - Committee CodebookCommittee codes are found in the hearings, laws, and CQ almanac datasets. These codes assign a unique number to each congressional committees associated with a particular record in each of the datasets.
download codebook
- Presidency
- Executive OrdersThis dataset contains information about each executive order issued from 1945 to 2003 (3,800 records). Each record is coded according to our policy content scheme and other variables including the presidents party, whether the order was issued during a time of divided government, and whether the order was issued at the beginning or end of a presidential term.
download dataset download codebook - State of the Union SpeechesThis dataset contains information on each quasi-statement in the Presidential State of the Union Speeches from 1946 to 2005 (18,854 records). Each quasi-statement is coded according to our system of policy content categories and other variables. Users can directly link to full text versions of the speech for further analysis.
download dataset download codebook
- Supreme Court
- Supreme Court CasesThe Supreme Court dataset contains information on each case on the Courts docket from from 1944 to 2006 (8,776 records), and is the only publicly available dataset to examine the Courts agenda from a policy perspective. Cases are coded according to our policy content scheme codes and include additional variables such as the Courts ruling in cases in which one was issued. The accompanying codebook addresses Court-specific coding issues and serves as a reference guide for those unfamiliar with the Courts terminology and procedures.
download dataset download codebook
- Public Opinion and Interest Groups
- Gallup's Most Important ProblemThis dataset contains responses to Gallup's Most Important Problem question aggregated at the annual level from 1946 to 2007 (1,240 records) and coded by major topic.
download dataset download codebook - Encyclopedia of AssociationsSince 1956, Gale Research, later Thomson/Gale, has published a printed volume entitled the Encyclopedia of Associations. The database on which the book is based also serves as a web-based research tool available through libraries and entitled Associations Unlimited. While not originally designed with the idea of dynamic analysis in mind, the accumulated volumes of the EA in fact allow a researcher considerable opportunity for analyzing trends over time. The Policy Agendas Project (PAP) has used the annual volumes of the EA to compile a time-series database of all associations, coded both by the EA subject categories as well as by the major topics of the PAP. Forty-two editions of the EA have been published from 1956 to 2005. We have compiled a simple list of each group and coded it into the PAP topic classification system. Please see the codebook for a more detailed description.
Full Dataset Codebook Annual Counts Dataset Annual Counts Codebook
- News Media
- New York Times IndexThis dataset is a systematic random sample of the New York Times Index from 1946 to 2005 (46,458 records). The sample includes the first entry on every odd-numbered page of the Index. Each entry is coded by Policy Agendas major topics and includes other variables such as the length, date and location of the story and whether it addressed government actions.
download dataset download codebook - New York Times Index WeightsThis dataset provides information on the number of pages in the New York Times Index and an estimate of the number of articles per page for each of the years included in our Index dataset. These weights address the occasional newspaper format changes that systematically alter the number of articles on each page and the variation in the size of the New York Times and its Index over time.
download dataset
- Federal Budget
- BudgetThis dataset provides annual data, adjusted for inflation, of U.S. Budget Authority from FY 1947 through FY 2008. Using Office of Management and Budget Functions and Subfunctions, we have revised the data to be consistent across time.
download dataset download codebook - Budget-Policy CrosswalkThis file compares the Policy Agendas Project topic codes with the OMB codes used in the Budget dataset to assess how well they correspond. A "1" represents nearly complete correspondence, while a "5" represents significant divergence.
download dataset - Budget ResourcesThese pages highlight the main issues concerning the study of budgetary outcomes across countries and time. A brief glossary of budgetary terminology and data sources from international, national, and research institutions are provided.
view resources


