Recapping the Data Development Project, 2021

In summer of 2021, the Post45 Data Collective awarded its inaugural round of Data Development Grants. These grants were designed to help researchers in the late stages of data collection — where projects have a tendency to stall — complete their data curation and cleaning, and prepare their data for publication with the Data Collective.

We were fortunate to have an excellent partner supporting these efforts. In collaboration with the University of Chicago’s Graduate Global Impact Program, we worked with Jordan Pruett, a doctoral candidate in English, to provide DDP Grant Recipients with up to 60 hours of research assistance each. Jordan offered his expertise as a student of computational text analysis and post-45 American literature, providing considerable support and insight.

In sum, we supported the work of four excellent projects addressing a range of subjects, from the Literature Program of the NEA to a corpus of Film Scripts. Together, these projects represent the exciting possibilities for computational approaches to post-45 US literature and culture.

—

The Wars in Vietnam, Iraq, and Afghanistan in US Fiction, 1965-2020

David F. Eisler

This dataset collects titles of American novels and short story collections published between 1965 and 2020 about the conflicts in Vietnam, Iraq, and Afghanistan, bringing together multiple bibliographic resources to build a comprehensive picture of more than half a century of the US war fiction genre. We began by merging and cleaning two established sources on the works about the Vietnam War, John Newman’s Vietnam War Literature: An Annotated Bibliography of Imaginative Works about Americans Fighting in Vietnam and La Salle University’s special archival collection, “Imaginative Representations of the Vietnam War.” We then used the WorldCat database to extract a set of imaginative works about the wars in Iraq and Afghanistan, which were combined with the Vietnam data to form the final dataset. We will complete the dataset by adding ethnographic data about the author’s gender and whether they were a military veteran.

New York Times Nonfiction Bestsellers by Black Authors

Ariel Lawrence

To examine the prevalence, profitability, and popularity of nonfiction texts written by Black authors, we collected weekly bestseller lists published by the New York Times from 2010 to 2020. We researched each author and located those who either self-identified as Black or of African descent or were identified by reputable sources as such. Out of 1657 titles, we located 117 identified as by at least one Black author, a rate of about 7%. In addition to title and author, this dataset includes categories for genre (essays, celebrity or general memoir, anti racist pedagogy, manifesto, social analysis, etc.), author occupation, publisher, year published, book description, top rank, and number of weeks each book remained on the list. With this dataset, we aim to better understand the past and future of Black nonfiction.

The Black List Film Script Corpus

Johnny Ma

The Black List Film Script Corpus (TBL Corpus) is a collection of ~1,000 movie scripts that were featured on The Black List. Founded in 2006, The Black List is an annual collection of promising screenplays compiled from the votes of hundreds of film executives. Our TBL Corpus contains nearly all TBL scripts in .txt format, with line indenting parsed from the original PDF submission. We include author and script metadata where available.

This corpus builds upon other screenplay corpora, such as ScriptBase, that aim to understand the creative output of screenwriters. The TBL Corpus differs from previous such datasets as it contains unproduced screenplays, giving us an exclusive look into screenplays at the pre-greenlight stage of film production. The corpus also comes from scripts written in 2006-2020, making it the most current dataset of movie scripts available. We plan to update the dataset with each annual iteration of The Black List.

The Literature Program of the National Endowment for the Arts

Alexander Manshel

This project will create a comprehensive database of the writers funded by the National Endowment for the Arts (NEA), under the auspices of the NEA’s literature fellowship program. Since its inception in 1965, the NEA has funded more than 3600 poets and fiction writers. Over that half century, the Endowment’s selection process and funding priorities have shifted repeatedly in response to government oversight, the composition of its advisory boards, and evolving notions of literary prestige. This database will allow scholars and students to investigate how these changes have influenced which writers are sponsored by the NEA. In its first iteration, the database will include essential information about each writer, including year of funding, demographic data, and information on their undergraduate and graduate education. As the project develops, it may also include information about the specific literary works funded by NEA grants. In all, this project aims to provide a useful resource for scholars of twentieth- and twenty-first-century American literature, multi-ethnic American literature, and literary sociology.

The Post45 Data Collective is supported and maintained by the Emory Center for Digital Scholarship.