Submissions

The Post45 Data Collective peer reviews and houses literary and cultural data from 1945 to the present.

Data Tables: All datasets on the Post45 Data Collective website are displayed as interactive data tables, which enables users to filter, explore, and download them. The creation of the data tables is handled by the Post45 Data Collective editorial team. Contributors need only submit a spreadsheet file(s), such as an Excel or CSV file.

Data Essays: All datasets are accompanied by a contextual statement or “data essay” that addresses (in 4,000 words or less) the significance of the data; the social and historical context of the data; how the data was collected, cleaned, and organized; and possible ethical concerns or misuses of the data. This essay will be prepared by the contributors.

Visualizations: We sometimes feature visualizations alongside the data tables, such as those featured in “The Canon of Asian American Literature.” We use Quarto to host our website, and we can display visualizations created with Python, R, and JavaScript Observable notebooks. If you’d like to prepare your own visualization or discuss the creation of one, please get in touch.

Interested in Submitting?

If you’re interested in submitting to the Post45 Data Collective, please consult our “data formatting” guidelines below, and prepare a data essay with this Google Doc/Word template.

We would appreciate if you share your data essay as a Google Doc or a Word document. If you’re familiar with Quarto, you can also submit with our .qmd template.

When you’re ready, please email your dataset and data essay to Melanie Walsh (melwalsh@uw.edu), Alexander Manshel (alexander.manshel@mcgill.ca), and J.D. Porter (porterjd@sas.upenn.edu).

Please note that no technical expertise is required to submit to the Post45 Data Collective! We know that many researchers have created valuable datasets manually, and we are eager to work with you to publish this data. We may be able to offer some technical support to help clean, manipulate, or augment data. Feel free to reach out if you’d like to discuss this possibility.

Data Essay Criteria

Data essays (4,000 words or less) should include the following sections and answer most, if not all, of the following questions.

Significance and Context 

  • How is the data significant for the field of literary studies and cultural studies, especially post-1945 scholarship?
  • What is the social and historical context of the data? What background information or domain knowledge is necessary to use this data responsibly? (Please note that Post45 Data Collective datasets are often used by people who do not specialize in post-1945 scholarship.)
  • Who might the data be useful for? What could the data be used for? Please suggest at least three specific uses.
  • For what purpose was the dataset created? Was there a gap that needed to be filled? Has the data been used already? Does similar or overlapping data exist publicly? If so, please describe.

Collection + Creation

  • How and when was the data collected, acquired, or created? What mechanisms or procedures were used to curate it (e.g. human curation, software, API)?
  • If the data was hand-curated, what organizational heuristic was adopted, and why?
  • What aspects of the data are products of the researcher’s judgment or interpretation? What judgments, interpretations, or decisions have been inherited by the researcher(s), if any? What are the implications of these interpretations or decisions?
  • Who was involved in the data collection process (e.g. students, crowdworkers, contractors), and how were they compensated?
  • Was any cleaning of the data undertaken (e.g. removal of instances, processing of missing values)?

Provide sufficient detail such that readers understand how the dataset was created, and would within reason be able to recreate it.

Description

  • What does the data describe? Are all instances included or a selection? If selected, what principles were used to justify inclusions and exclusions?
  • If your dataset uses categorical variables or other labels or fields that you have created, explain how they were constructed. Should the user be aware of any categories or fields that condense or erase information?
  • Is any information missing? If so, please provide a description, explaining why this information is missing (e.g. because it was unavailable). Are there any errors, sources of noise, or redundancies? If so, please describe.

Ethical Considerations

  • What possible negative impacts or harms might result from the publication of your data? How could the data be misused?
  • Does the dataset contain information that might be considered confidential (e.g. data that includes the content of individuals’ non-public communications)? If so, please describe.
  • Does the dataset contain information that might be considered sensitive (e.g. data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)? If so, please describe.
  • Were any ethical review processes undertaken in the collection or curation of this dataset (e.g. the consultation of an institutional review board)? If so, please describe these review processes, including the outcomes, as well as a link to supporting documentation.

Versioning

  • Will the data be updated (e.g. to correct errors, add new instances, delete instances)? If so, please describe how often and by whom.

Bibliography

  • Please provide a list of sources that reference or draw on this data, or that were used or consulted to produce the data. If there are other sources that would help users understand this data, please include them, as well.

Licensing

  • What is the license for this data? If applicable, the data must be deposited under an open license that permits unrestricted access (e.g. CC0, CC-BY).

Dataset Criteria: Preparing + Formatting

The Post45 Data Collective aims to maximize the reusability and interoperability of our datasets. To that end, we have worked to include unique identifiers, such as OCLC numbers, ISBNs, HathiTrust ids, and VIAF records, for data related to books and authors whenever possible.

When curating your own datasets, we ask that you use similar unique identifiers.

To aid in this process, we’ve been developing a new tool called BookReconciler, which can help automatically add persistent identifiers to spreadsheets with book title and author information. You can learn more about installing and using BookReconciler on our GitHub page. We also published a tutorial on YouTube.

While our goal is to make BookReconciler as accessible as possible, we understand that the barrier to entry is still high. If you’re not able to add persistent identifiers to your dataset, we may be able to collaborate with you to make this possible.

For further help formatting your datasets, consider the following resources:

The language for these criteria was drawn from Katherine Bode, Jennifer Doty, Lauren F. Klein, Melanie Walsh, Cultural Analytics, Journal of Open Humanities Data, and “Datasheets for Datasets” by Timnit Gebru et. al.