Post45 Data Collective
  • Our Data
  • Submissions
  • People
  • About
  • News

Table of Contents

  • 1. Major Literary Prize Winners and Judges
    • Data Table
    • Collection and Creation
    • Description
  • 2. Prize-Winning Authors’ Books (HathiTrust Metadata)
    • Data Table
    • Collection and Creation
    • Description

The Index of Major Literary Prizes in the US

prizes
education
gender
MFA
dataset
Authors

Claire Grossman

Juliana Spahr

Stephanie Young

Published

December 5, 2022

Doi

10.18737/CNJV1733p4520221212

Abstract
The Index of Major Literary Prizes in the US includes datasets to the winners and judges of prizes for prose, poetry, or unspecified genre between 1918 and 2020 with a purse of $10,000 and over.

Credit: National Book Foundation

The Index of Major Literary Prizes in the US includes two related datasets:  

  1. The first is a dataset of the winners and judges of prizes for prose, poetry, or unspecified genre between 1918 and 2020 with a purse of $10,000 and over.
  2. The second contains records for volumes in the HathiTrust Digital Library written by authors who won a prize in the prize winners dataset.

1. Major Literary Prize Winners and Judges

This dataset includes information about the winners and judges of literary prizes (for prose, poetry, or unspecified genres) between 1918 and 2020 with a purse of $10,000 and over. The dataset includes details about the winners of 52 unique prizes awarded by 22 institutions. For a subset of 39 prizes, it includes details about judges; not every prize has complete judge data. The dataset does not include prizes awarded specifically for children’s literature, nonfiction, drama, or translation.

Data Table

Creative Commons License

This work is licensed under CC BY 4.0

The details about winners and judges includes information about their gender and education (if and where the winners/judges attended college, MFA programs, or other graduate programs, if applicable). This information was collected by hand and is described in more detail below.

Additionally, the dataset also includes persistent identifiers for authors, such as VIAF, LCCN, and Wikidata numbers.

Collection and Creation

The data about prizes and winner/judge demographics was collected by hand mainly from institutional websites. Gender and higher education data for individuals was collected from author biographies, interviews, and other materials.  Some information about judges not listed on websites was obtained through correspondence with institutions. Claire Grossman, Juliana Spahr, and Stephanie Young are the principal investigators, did the majority of the data gathering, and are responsible for any errors. They were assisted by Jennifer Chukwu, Clare Lilliston, Jordan Pruett, Esther Vinarov, and Betty He. Richard Jean So provided significant support for this project.

Gender information was provisionally labeled by the research team based on pronouns used by author in biographical notes at the time research was completed. It is possible a judge/winner’s gender identity and/or pronoun may have changed subsequently. This information is intended to enable study of broad patterns over time and not as definitive statements on any individual identity. The possible gender values are “male,” “female,” “nonbinary/he,” “nonbinary/they,” “unknown,” and “No Winner”; nonbinary was used only when the term appeared in the individuals’ biography.

Higher education information was labelled by the research team based on whether the individual mentioned that they attended (even if they did not graduate from) an institution. Again, this information is intended to enable the study of broad patterns over time and is not meant to be definitive. The possible MFA degree values are the name of institution, “No Winner,” or blank (in most cases, a blank means it is unlikely that the individual attended an MFA program, because higher education affiliations were listed in biographical notes but did not include an MFA, or because the team was unable to locate any educational information about the individual).

The possible “elite education” values are “Barnard College,” “Brown University,” “Columbia University,” “Cornell University,” “Dartmouth College,” “Harvard University,” “Princeton University,” “Radcliffe College,” “Stanford University,” “University of Pennsylvania,” “University of Chicago,” “Yale University,” “No Winner,” or blank. The possible “graduate degree” values (including masters, PhD, JD, and medical degrees) are “graduate,” “No Winner,” or blank.

At a later stage, persistent identifiers for winners and judges, such as VIAF, LCCN, and Wikidata identifiers, were added by Matt Miller computationally.

Please report any errors and/or corrections via this Google Form.

Description

The columns in the dataset include:

  • person_id: unique numeric identifier for each name; assigned alphabetically by first name

  • full_name: pen names were used; in case of name change, most recent name was used

  • given_name: first name; includes middle name, if used

  • last_name: last name

  • gender: provisionally labeled by research team based on pronouns used by author in biographical notes at the time research was completed; it is possible a judge/winner’s gender identity and/or pronoun may have changed subsequently; intended for study of broad patterns over time and not as definitive statements on any individual identity; values are “male,” “female,” “nonbinary/he,” “nonbinary/they,” “unknown,” and “No Winner”; nonbinary was used only when the term appeared in the individuals’ biography.

  • elite_institution: individual mentioned they attended (even if they did not graduate from) one of the listed institutions; intended for study of broad patterns over time and not as definitive; values are “Barnard College,” “Brown University,” “Columbia University,” “Cornell University,” “Dartmouth College,” “Harvard University,” “Princeton University,” “Radcliffe College,” “Stanford University,” “University of Pennsylvania,” “University of Chicago,” “Yale University,” “No Winner,” or blank (means unlikely as individual listed higher education affiliations in biographical notes but did not include an elite institution or unable to locate any educational information about the individual); intended for study of broad patterns over time but not as definitive.

  • graduate_degree: individual mentioned they attended (even if they did not graduate from) a graduate program (includes masters, PhD, JD, and medical degrees); values are “graduate,” “No Winner,” or blank (means unlikely as individual listed higher education affiliations in biographical notes but did not include a graduate degree or unable to locate any educational information about the individual); intended for study of broad patterns over time but not as definitive.

  • mfa_degree: individual mentioned they attended (even if they did not graduate from) an MFA program; values are name of institution, “No Winner,” or blank (means unlikely as individual listed higher education affiliations in biographical notes but did not include an MFA or unable to locate any educational information about the individual); intended for study of broad patterns over time and not as definitive.

  • iowa_mfa_person_id: values are either a number that corresponds to the Post45 Iowa Writers’ Workshop “People” table, “missing” (means that the individual’s biographical materials suggest they attended Iowa for an MFA but a corresponding entry could not be found in the Iowa dataset which ends in 2014 and does not include graduates of the MFA in playwriting), “unknown” (unable to locate any educational information about the individual), “No Winner,” or blank (means that the individual did not list University of Iowa in their biographical notes or unable to locate any educational information about the individual)

  • stegner: individual mentioned they were awarded a Wallace Stegner Fellowship at Stanford; the Stegner program does not award degrees but it resembles an MFA program in pedagogy except it is not unusual for those admitted to already have an MFA; we thus treat it as the equivalent of an MFA (and not a prize); values are either “Stegner,” “No Winner,” or blank (means that the individual did not mention the Stegner Fellowship in their biographical notes or unable to locate any educational information about the individual)

  • role: values are “winner” or “judge”

  • prize_institution: nonprofit organization that oversees the prize

  • prize_name: name of prize; for the Gold Medal Awards from the American Academy of Arts and Letters, we only included awards categorized as fiction and poetry; for the Morton Dauwen Zabel Award from American Academy of Arts and Letters, we excluded periodic awards given specifically for “Criticism”; for the National Book Award, we only included prizes for poetry and fiction; for the Academy of American Poets, we only included the Academy of American Poets Fellowship, the Lenore Marshall Poetry Prize, and the Wallace Stevens Award; for the Poet Laureate Consultant in Poetry to the Library of Congress, we included the US Consultants in Poetry but did not include the three Special Bicentennial Consultants that served in an advisory role from 1999-2000 and excluded William Carlos Williams (who was named as Laureate, but did not serve); for the Pulitzer Prize, we only included prizes for fiction and poetry; for the MacArthur Fellowships, we included those who were categorized by the MacArthur website as “poetry” and most of those categorized as “fiction and nonfiction” (if a writer exclusively published journalistic nonfiction or essay, they were not included).

  • prize_year: year awarded; in the case of the Poet Laureate Consultant in Poetry to the Library of Congress, which begins in September and continues until May, we included entries for the Laurate under both years

  • prize_genre: values are “poetry,” “prose” (“prose” includes prizes for “short stories,” “essays,” “fiction,” and “novel”), and “no genre” (prize has no genre requirement, as in the MacArthur Fellowship or the Whiting Award)

  • prize_type: values are “career” (prize is awarded to author on basis of overall career) or “book” (prize is awarded to author for a specific book)

  • prize_amount: value here is the amount of money awarded in 2022; amounts change over time, which we do not track

  • title_of_winning_book: if “prize_type” is “book,” then the awarded book title is listed (if the jury awarded more than one book in same year, titles for both are listed); other values are “No Winner,” and blank (prize was not awarded for a specific book)

This dataset also includes various persistent identifiers:

  • author_lccn – Author’s LCCN from id.loc.gov
  • author_viaf – Author viaf.org cluster number
  • author_wikidata – Author’s Wikdiata Q number

2. Prize-Winning Authors’ Books (HathiTrust Metadata)

The second dataset contains records for volumes in the HathiTrust Digital Library written by authors who won a prize in the prize winners dataset. It includes many duplicate titles, as in the case of later editions of the same work.

The HathiTrust identifiers included here can be used to extract words per page for all included volumes (using tools such as the HTRC Feature Reader), thus enabling various kinds of computational text analysis on these works.

Data Table

Note: You can also explore and export this data via  GitHub’s Flat Viewer, which includes some additional data visualizations and filtering options.

Collection and Creation

This data was assembled by Jordan Pruett. Matches between prize-winning authors and authors in the HathiTrust Digital Library were produced by performing an exact string comparison between the last_name and given_name columns of the prize dataset to the author column of the HathiTrust dataset held by the Post45 Data Collective. Fields were forced to lowercase and stripped of punctuation and spacing before comparison. This conservative matching process is likely to produce two kinds of errors: missed matches, in the case of authors who appear under different names in the two spreadsheets; and false positive matches, in the rare case of two authors with identical first and last names. The first type of error was considered acceptable: the data aims to maximize true positive matches rather than minimize false negatives. Since matches were assessed restrictively, researchers can be confident that the vast majority of the entries in this dataset were in fact authored by somebody who also appears in the prize dataset. In order to estimate the rate of the second, more problematic type of error, a random sample of 100 entries was taken from the final dataset and checked manually for accuracy. This sample contained no errors, though it did contain one match that could not be verified, since no secondary literature could be located for the author in question.

Finally, it is worth noting that hathitrust_prizewinners.csv does not distinguish between the types of prizes won by authors nor the point in their careers that those authors won those prizes. For each author in the prize dataset, it simply lists every HathiTrust volume authored by that author that could be located.

At a later stage, persistent identifiers for authors, such as VIAF, LCCN, and Wikidata identifiers, were added by Matt Miller computationally. He also added book information from OCLC—a global library organization that contains information from more than 16,000 member libraries in more than 100 countries—such as how many copies of each edition are held by these libraries (oclc_holdings or oclc_eholdings). This information was accessed through the OCLC Classify API, which was shut down in January 2024.

Description

The columns in the dataset include:

  • hathi_id: HathiTrust item identifier number

  • hathi_rights - the rights code from Hathi Trust

All HathiTrust Rights Codes

HathiTrust Rights Codes (Source)

id name type dscr
1 pd copyright public domain
2 ic copyright in-copyright
3 op copyright out-of-print (implies in-copyright)
4 orph copyright copyright-orphaned (implies copyright)
5 und copyright undetermined copyright status
6 umall access available to UM affiliates and walk-in patrons (all campuses)
7 ic-world access in-copyright and permitted as world viewable by the copyright holder
8 nobody access available to nobody; blocked for all users
9 pdus copyright public domain only when viewed in the US
10 cc-by-3.0 copyright Creative Commons Attribution license, 3.0 Unported
11 cc-by-nd-3.0 copyright Creative Commons Attribution-NoDerivatives license, 3.0 Unported
12 cc-by-nc-nd-3.0 copyright Creative Commons Attribution-NonCommercial-NoDerivatives license, 3.0 Unported
13 cc-by-nc-3.0 copyright Creative Commons Attribution-NonCommercial license, 3.0 Unported
14 cc-by-nc-sa-3.0 copyright Creative Commons Attribution-NonCommercial-ShareAlike license, 3.0 Unported
15 cc-by-sa-3.0 copyright Creative Commons Attribution-ShareAlike license, 3.0 Unported
16 orphcand copyright orphan candidate - in 90-day holding period (implies in-copyright)
17 cc-zero copyright Creative Commons Zero license (implies pd)
18 und-world access undetermined copyright status and permitted as world viewable by the depositor
19 icus copyright in copyright in the US
20 cc-by-4.0 copyright Creative Commons Attribution 4.0 International license
21 cc-by-nd-4.0 copyright Creative Commons Attribution-NoDerivatives 4.0 International license
22 cc-by-nc-nd-4.0 copyright Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license
23 cc-by-nc-4.0 copyright Creative Commons Attribution-NonCommercial 4.0 International license
24 cc-by-nc-sa-4.0 copyright Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license
25 cc-by-sa-4.0 copyright Creative Commons Attribution-ShareAlike 4.0 International license
26 pd-pvt access public domain but access limited due to privacy concerns
27 supp access suppressed from view; see note for details
  • shorttitle: the short title of the work as listed in HathiTrust

  • prize: name of prize; either NBA (National Book Award) or pulitzer

  • author: name of the author of the award-winning work

  • person_id: unique numeric identifier for each name; assigned alphabetically by first name

  • inferreddate: earliest publication date for this particular volume

  • imprintdate: the date of this edition of the text

  • oclc: a unique identifier for this volume as registered in WorldCat

  • full_name: pen names were used; in case of name change, most recent name was used

  • given_name: first name; includes middle name, if used

  • last_name: last name

  • gender: provisionally labeled by research team based on pronouns used by author in biographical notes at the time research was completed; it is possible a judge/winner’s gender identity and/or pronoun may have changed subsequently; intended for study of broad patterns over time and not as definitive statements on any individual identity; values are “male,” “female,” “nonbinary/he,” “nonbinary/they,” “unknown,” and “No Winner”; nonbinary was used only when the term appeared in the individuals’ biography.

This dataset also includes various persistent identifiers and information from OCLC:

  • author_lccn – Author’s LCCN from id.loc.gov
  • author_viaf – Author viaf.org cluster number
  • author_wikidata – Author’s Wikdiata Q number
  • oclc_eholdings– from OCLC Classify – the electronic holdings count
  • oclc_holdings – from OCLC Classify – the total holdings count
  • oclc_owi – from OCLC Classify – the Classify work identifier

Citation

BibTeX citation:
@article{grossman2022,
  author = {Grossman, Claire and Spahr, Juliana and Young, Stephanie},
  editor = {Sinykin, Dan and Walsh, Melanie},
  title = {The {Index} of {Major} {Literary} {Prizes} in the {US}},
  journal = {Post45 Data Collective},
  date = {2022-12-05},
  url = {data.post45.org/the-index-of-major-literary-prizes-in-the-us/},
  doi = {10.18737/CNJV1733p4520221212},
  langid = {en},
  abstract = {The Index of Major Literary Prizes in the US includes
    datasets to the winners and judges of prizes for prose, poetry, or
    unspecified genre between 1918 and 2020 with a purse of \$10,000 and
    over.}
}
For attribution, please cite this work as:
Grossman, Claire, Juliana Spahr, and Stephanie Young. 2022. “The Index of Major Literary Prizes in the US.” Edited by Dan Sinykin and Melanie Walsh. Post45 Data Collective, December. https://doi.org/10.18737/CNJV1733p4520221212.
 

Supported by National Endowment for the Humanities Emory Center for Digital Scholarship Built with Quarto