The Index of Major Literary Prizes in the US

Authors: Claire Grossman, Juliana Spahr, and Stephanie Young

DOI: https://doi.org/10.18737/CNJV1733p4520221212

The Index of Major Literary Prizes in the US includes two related datasets.  

The first is a dataset of the winners and judges of prizes for prose, poetry, or unspecified genre between 1918 and 2020 with a purse of $10,000 and over. The second contains records for volumes in the HathiTrust Digital Library written by authors who won a prize in the prize winners dataset.

1. Major Literary Prize Winners and Judges (Dataset)

This dataset includes information about the winners and judges of literary prizes (for prose, poetry, or unspecified genres) between 1918 and 2020 with a purse of $10,000 and over. The dataset includes details about the winners of 52 unique prizes awarded by 22 institutions. For a subset of 39 prizes, it includes details about judges; not every prize has complete judge data. The dataset does not include prizes awarded specifically for children’s literature, nonfiction, drama, or translation.

Note: You can also explore and export this data via  GitHub’s Flat Viewer, which includes some additional data visualizations and filtering options.

The details about winners and judges includes information about their gender and education (if and where the winners/judges attended college, MFA programs, or other graduate programs, if applicable). This information was collected by hand and is described in more detail below.

Additionally, the dataset also includes persistent identifiers for authors, such as VIAF, LCCN, and Wikidata numbers.

Collection and Creation

The data about prizes and winner/judge demographics was collected by hand mainly from institutional websites. Gender and higher education data for individuals was collected from author biographies, interviews, and other materials.  Some information about judges not listed on websites was obtained through correspondence with institutions. Claire Grossman, Juliana Spahr, and Stephanie Young are the principal investigators, did the majority of the data gathering, and are responsible for any errors. They were assisted by Jennifer Chukwu, Clare Lilliston, Jordan Pruett, Esther Vinarov, and Betty He. Richard Jean So provided significant support for this project.

Gender information was provisionally labeled by the research team based on pronouns used by author in biographical notes at the time research was completed. It is possible a judge/winner’s gender identity and/or pronoun may have changed subsequently. This information is intended to enable study of broad patterns over time and not as definitive statements on any individual identity. The possible gender values are “male,” “female,” “nonbinary/he,” “nonbinary/they,” “unknown,” and “No Winner”; nonbinary was used only when the term appeared in the individuals’ biography.

Higher education information was labelled by the research team based on whether the individual mentioned that they attended (even if they did not graduate from) an institution. Again, this information is intended to enable the study of broad patterns over time and is not meant to be definitive. The possible MFA degree values are the name of institution, “No Winner,” or blank (in most cases, a blank means it is unlikely that the individual attended an MFA program, because higher education affiliations were listed in biographical notes but did not include an MFA, or because the team was unable to locate any educational information about the individual).

The possible “elite education” values are “Barnard College,” “Brown University,” “Columbia University,” “Cornell University,” “Dartmouth College,” “Harvard University,” “Princeton University,” “Radcliffe College,” “Stanford University,” “University of Pennsylvania,” “University of Chicago,” “Yale University,” “No Winner,” or blank. The possible “graduate degree” values (including masters, PhD, JD, and medical degrees) are “graduate,” “No Winner,” or blank.

At a later stage, persistent identifiers for winners and judges, such as VIAF, LCCN, and Wikidata identifiers, were added by Matt Miller computationally.

Please report any errors and/or corrections via this Google Form.

Description

The columns in the dataset include:

    • person_id: unique numeric identifier for each name; assigned alphabetically by first name
    • full_name: pen names were used; in case of name change, most recent name was used
    • given_name: first name; includes middle name, if used
    • last_name: last name
    • gender: provisionally labeled by research team based on pronouns used by author in biographical notes at the time research was completed; it is possible a judge/winner’s gender identity and/or pronoun may have changed subsequently; intended for study of broad patterns over time and not as definitive statements on any individual identity; values are “male,” “female,” “nonbinary/he,” “nonbinary/they,” “unknown,” and “No Winner”; nonbinary was used only when the term appeared in the individuals’ biography.
    • elite_institution: individual mentioned they attended (even if they did not graduate from) one of the listed institutions; intended for study of broad patterns over time and not as definitive; values are “Barnard College,” “Brown University,” “Columbia University,” “Cornell University,” “Dartmouth College,” “Harvard University,” “Princeton University,” “Radcliffe College,” “Stanford University,” “University of Pennsylvania,” “University of Chicago,” “Yale University,” “No Winner,” or blank (means unlikely as individual listed higher education affiliations in biographical notes but did not include an elite institution or unable to locate any educational information about the individual); intended for study of broad patterns over time but not as definitive.
    • graduate_degree: individual mentioned they attended (even if they did not graduate from) a graduate program (includes masters, PhD, JD, and medical degrees); values are “graduate,” “No Winner,” or blank (means unlikely as individual listed higher education affiliations in biographical notes but did not include a graduate degree or unable to locate any educational information about the individual); intended for study of broad patterns over time but not as definitive.
    • mfa_degree: individual mentioned they attended (even if they did not graduate from) an MFA program; values are name of institution, “No Winner,” or blank (means unlikely as individual listed higher education affiliations in biographical notes but did not include an MFA or unable to locate any educational information about the individual); intended for study of broad patterns over time and not as definitive.
    • iowa_mfa_person_id: values are either a number that corresponds to the Post45 Iowa Writers’ Workshop “People” table, “missing” (means that the individual’s biographical materials suggest they attended Iowa for an MFA but a corresponding entry could not be found in the Iowa dataset which ends in 2014 and does not include graduates of the MFA in playwriting), “unknown” (unable to locate any educational information about the individual), “No Winner,” or blank (means that the individual did not list University of Iowa in their biographical notes or unable to locate any educational information about the individual)
    • stegner: individual mentioned they were awarded a Wallace Stegner Fellowship at Stanford; the Stegner program does not award degrees but it resembles an MFA program in pedagogy except it is not unusual for those admitted to already have an MFA; we thus treat it as the equivalent of an MFA (and not a prize); values are either “Stegner,” “No Winner,” or blank (means that the individual did not mention the Stegner Fellowship in their biographical notes or unable to locate any educational information about the individual)
    • role: values are “winner” or “judge”
    • prize_institution: nonprofit organization that oversees the prize
    • prize_name: name of prize; for the Gold Medal Awards from the American Academy of Arts and Letters, we only included awards categorized as fiction and poetry; for the Morton Dauwen Zabel Award from American Academy of Arts and Letters, we excluded periodic awards given specifically for “Criticism”; for the National Book Award, we only included prizes for poetry and fiction; for the Academy of American Poets, we only included the Academy of American Poets Fellowship, the Lenore Marshall Poetry Prize, and the Wallace Stevens Award; for the Poet Laureate Consultant in Poetry to the Library of Congress, we included the US Consultants in Poetry but did not include the three Special Bicentennial Consultants that served in an advisory role from 1999-2000 and excluded William Carlos Williams (who was named as Laureate, but did not serve); for the Pulitzer Prize, we only included prizes for fiction and poetry; for the MacArthur Fellowships, we included those who were categorized by the MacArthur website as “poetry” and most of those categorized as “fiction and nonfiction” (if a writer exclusively published journalistic nonfiction or essay, they were not included).
    • prize_year: year awarded; in the case of the Poet Laureate Consultant in Poetry to the Library of Congress, which begins in September and continues until May, we included entries for the Laurate under both years
    • prize_genre: values are “poetry,” “prose” (“prose” includes prizes for “short stories,” “essays,” “fiction,” and “novel”), and “no genre” (prize has no genre requirement, as in the MacArthur Fellowship or the Whiting Award)
    • prize_type: values are “career” (prize is awarded to author on basis of overall career) or “book” (prize is awarded to author for a specific book)
    • prize_amount: value here is the amount of money awarded in 2022; amounts change over time, which we do not track
    • title_of_winning_book: if “prize_type” is “book,” then the awarded book title is listed (if the jury awarded more than one book in same year, titles for both are listed); other values are “No Winner,” and blank (prize was not awarded for a specific book)

 

This dataset also includes various persistent identifiers:

    • author_lccn – Author’s LCCN from id.loc.gov
    • author_viaf – Author viaf.org cluster number
    • author_wikidata – Author’s Wikdiata Q number

2. Prize-Winning Authors' Books (HathiTrust Metadata) Dataset

The second dataset contains records for volumes in the HathiTrust Digital Library written by authors who won a prize in the prize winners dataset. It includes many duplicate titles, as in the case of later editions of the same work.

The HathiTrust identifiers included here can be used to extract words per page for all included volumes (using tools such as the HTRC Feature Reader), thus enabling various kinds of computational text analysis on these works.

wdt_ID wdt_created_by wdt_created_at wdt_last_edited_by wdt_last_edited_at person_id full_name gender shorttitle inferreddate hathi_id oclc_holdings oclc_eholdings author author_authorized_heading author_lccn author_viaf author_wikidata_qid given_name hathi_rights imprintdate last_name oclc oclc_owi
1 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 1891 Reuben Bercovitch unknown Hasen : a novel 1978 mdp.39015030947983 395 11 Bercovitch, Reuben Bercovitch, Reuben n85265603 11295941 Reuben ic 1978 Bercovitch 3396700 3855432226
2 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 2322 Walter Abish male In the future perfect 1977 uc1.b3464211 373 11 Abish, Walter Abish, Walter n80102276 96215590 Q213806 Walter ic 1977 Abish 3034528 470642
3 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 2322 Walter Abish male Eclipse fever 1993 mdp.39015029965731 465 11 Abish, Walter Abish, Walter n80102276 96215590 Q213806 Walter ic 1993 Abish 26805844 29236443
4 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 378 Chris Adrian male Gob's grief 2000 mdp.39015049989240 680 62 Adrian, Chris Adrian, Chris, 1970- n00032432 23370866 Q1076930 Chris ic 2000 Adrian 44541880 38606595
5 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 378 Chris Adrian male The children's hospital 2006 uc1.32106018461399 680 61 Adrian, Chris Adrian, Chris, 1970- n00032432 23370866 Q1076930 Chris ic 2006 Adrian 71260047 58223197
6 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male A death in the family 1956 inu.30000117261648 5,886 157 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1979 Agee 4755871 533662
7 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male A death in the family : a restoration of the author's text 2007 mdp.39015073600093 5,886 157 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 2007 Agee 86110082 533662
8 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male Let us now praise famous men ; : A death in the family, & shorter fiction 2005 mdp.39015062426500 1,408 4 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 2005 Agee 58422789 1864205539
9 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male The collected short prose of James Agee 1972 mdp.39015004995091 1,233 28 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1972 Agee 1863828 556283
10 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male A death in the family 1969 pst.000029710478 5,886 157 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1969 Agee 313456 533662
11 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male The collected short prose of James Agee 1968 mdp.39015008527270 1,233 28 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1968 Agee 449409 556283
12 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male A death in the family 1967 mdp.39015000695174 5,886 157 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1967 Agee 249006 533662
13 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male The morning watch 1951 mdp.39015058019038 945 16 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James und 1951 Agee 276629 1413478
14 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male A death in the family 1957 pst.000028906216 5,886 157 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1970 Agee 1932948 533662
15 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 960 James Agee male A death in the family 1957 mdp.39015004995885 5,886 157 Agee, James Agee, James, 1909-1955 n79039544 46756190 Q352963 James ic 1957 Agee 276631 533662
16 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male Thee; a poem 1967 uc1.$b399274 362 11 Aiken, Conrad Aiken, Conrad, 1889-1973 n80060447 71433883 Q380645 Conrad ic 1967 Aiken 780400 2063847
17 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male The collected short stories of Conrad Aiken 1966 mdp.39015012310051 512 40 Aiken, Conrad Aiken, Conrad n80060447 71433883 Q380645 Conrad ic 1966 Aiken 3333703 196722594
18 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male 3 novels : Blue voyage, Great circle, King Coffin 1965 uc1.b4097285 122 1 Aiken, Conrad Aiken, Conrad, 1889-1973 n80060447 71433883 Q380645 Conrad ic 1965 Aiken 3714274 3943649328
19 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male The collected novels of Conrad Aiken 1964 mdp.39015012825637 124 2 Aiken, Conrad Aiken, Conrad (Conrad Potter), 1889-1973 n80060447 71433883 Q380645 Conrad ic 1964 Aiken 63150510 47347154
20 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male The collected short stories of Conrad Aiken 1960 uc1.b4098076 933 12 Aiken, Conrad Aiken, Conrad, 1889-1973 n80060447 71433883 Q380645 Conrad und|ic 1960 Aiken 15498682 2908636461
21 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male Great circle 1985 mdp.39015007064101 348 30 Aiken, Conrad Aiken, Conrad, 1889-1973 n80060447 71433883 Q380645 Conrad ic 1985 Aiken 11599198 2207467
22 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male The short stories of Conrad Aiken 1950 uc1.b4098079 566 4 Aiken, Conrad Aiken, Conrad, 1889-1973 n80060447 71433883 Q380645 Conrad und 1950 Aiken 181324 2908571720
23 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 428 Conrad Aiken male Collected short stories 1960 mdp.39015010832353 933 12 Aiken, Conrad Aiken, Conrad, 1889-1973 n80060447 71433883 Q380645 Conrad und|ic 1960 Aiken 244000 2908636461
24 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 2100 Sherman Alexie male The toughest Indian in the world 2000 mdp.39015048835402 2,132 360 Alexie, Sherman Alexie, Sherman 1966- n91093126 46892548 Q727331 Sherman ic 2000 Alexie 43227429 27966210
25 melanie.walsh 19/04/2024 10:53 AM melanie.walsh 19/04/2024 10:53 AM 1715 Nelson Algren male The neon wilderness 1960 pst.000028010135 1,353 18 Algren, Nelson Algren, Nelson, 1909-1981 n50033341 44294455 Q547914 Nelson ic 1966 Algren 10818350 20647360
person_id full_name gender shorttitle inferreddate hathi_id oclc_holdings oclc_eholdings author author_authorized_heading author_lccn author_viaf author_wikidata_qid given_name hathi_rights imprintdate last_name oclc oclc_owi

Note: You can also explore and export this data via  GitHub’s Flat Viewer, which includes some additional data visualizations and filtering options.

Collection and Creation

This data was assembled by Jordan Pruett. Matches between prize-winning authors and authors in the HathiTrust Digital Library were produced by performing an exact string comparison between the last_name and given_name columns of the prize dataset to the author column of the HathiTrust dataset held by the Post45 Data Collective. Fields were forced to lowercase and stripped of punctuation and spacing before comparison. This conservative matching process is likely to produce two kinds of errors: missed matches, in the case of authors who appear under different names in the two spreadsheets; and false positive matches, in the rare case of two authors with identical first and last names. The first type of error was considered acceptable: the data aims to maximize true positive matches rather than minimize false negatives. Since matches were assessed restrictively, researchers can be confident that the vast majority of the entries in this dataset were in fact authored by somebody who also appears in the prize dataset. In order to estimate the rate of the second, more problematic type of error, a random sample of 100 entries was taken from the final dataset and checked manually for accuracy. This sample contained no errors, though it did contain one match that could not be verified, since no secondary literature could be located for the author in question.

Finally, it is worth noting that hathitrust_prizewinners.csv does not distinguish between the types of prizes won by authors nor the point in their careers that those authors won those prizes. For each author in the prize dataset, it simply lists every HathiTrust volume authored by that author that could be located.

At a later stage, persistent identifiers for authors, such as VIAF, LCCN, and Wikidata identifiers, were added by Matt Miller computationally. He also added book information from OCLC—a global library organization that contains information from more than 16,000 member libraries in more than 100 countries—such as how copies of each edition are held by these libraries (oclc_holdings or oclc_eholdings). This information was accessed through the OCLC Classify API, which was shut down in January 2024.

Description

The columns in the dataset include:

    • hathi_id: HathiTrust item identifier number
    • shorttitle: the short title of the work as listed in HathiTrust
    • prize: name of prize; either NBA (National Book Award) or pulitzer
    • author: name of the author of the award-winning work
    • person_id: unique numeric identifier for each name; assigned alphabetically by first name
    • inferreddate: earliest publication date for this particular volume
    • imprintdate: the date of this edition of the text
    • oclc: a unique identifier for this volume as registered in WorldCat
    • full_name: pen names were used; in case of name change, most recent name was used
    • given_name: first name; includes middle name, if used
    • last_name: last name
    • gender: provisionally labeled by research team based on pronouns used by author in biographical notes at the time research was completed; it is possible a judge/winner’s gender identity and/or pronoun may have changed subsequently; intended for study of broad patterns over time and not as definitive statements on any individual identity; values are “male,” “female,” “nonbinary/he,” “nonbinary/they,” “unknown,” and “No Winner”; nonbinary was used only when the term appeared in the individuals’ biography.

 

This dataset also includes various persistent identifiers and information from OCLC:

    • author_lccn – Author’s LCCN from id.loc.gov
    • author_viaf – Author viaf.org cluster number
    • author_wikidata – Author’s Wikdiata Q number
    • oclc_eholdings– from OCLC Classify – the electronic holdings count
    • oclc_holdings – from OCLC Classify – the total holdings count
    • oclc_owi – from OCLC Classify – the Classify work identifier

The Post45 Data Collective is supported and maintained by the Emory Center for Digital Scholarship.