SUBMISSIONS


The Post45 Data Collective peer reviews and houses literary and cultural data from 1945 to the present on an open-access website designed, hosted, and maintained by Emory University’s Center for Digital Scholarship.

Each dataset must be accompanied by a contextual statement or “data essay” that addresses (in 4,000 words or fewer) the significance of the data; the social and historical context of the data; how the data was collected, cleaned, and organized; and possible ethical concerns or misuses of the data. Data essays are published alongside datasets on the Post45 Data Collective website.

If you’re interested in submitting a dataset to the Post45 Data Collective, please prepare a data essay and format your data according to the guidelines below. When you’re ready, email the dataset and data essay (as a Word document) to Dan Sinykin (daniel.sinykin@emory.edu) and Melanie Walsh (melwalsh@uw.edu).

Peer Review & Data Essay Criteria

Data essays should include the following sections and answer most, if not all, of the following questions. This criteria is also used by reviewers during the peer review process.

Significance and Context 

  • How is the data significant for the field of literary studies and cultural studies, especially post-1945 scholarship?
  • What is the social and historical context of the data? What background information or domain knowledge is necessary to use this data responsibly? (Please note that Post45 Data Collective datasets are often used by people who do not specialize in post-1945 scholarship.)
  • Who might the data be useful for? What could the data be used for? Please suggest at least three specific uses.
  • For what purpose was the dataset created? Was there a gap that needed to be filled? Has the data been used already? Does similar or overlapping data exist publicly? If so, please describe.

Collection + Creation

  • How and when was the data collected, acquired, or created? What mechanisms or procedures were used to curate it (e.g. human curation, software, API)?
  • If the data was hand-curated, what organizational heuristic was adopted, and why?
  • What aspects of the data are products of the researcher’s judgment or interpretation? What judgments, interpretations, or decisions have been inherited by the researcher(s), if any? What are the implications of these interpretations or decisions?
  • Who was involved in the data collection process (e.g. students, crowdworkers, contractors), and how were they compensated?
  • Was any cleaning of the data undertaken (e.g. removal of instances, processing of missing values)?

 

Provide sufficient detail such that readers understand how the dataset was created, and would within reason be able to recreate it.

 

Description

  • What does the data describe? Are all instances included or a selection? If selected, what principles were used to justify inclusions and exclusions?
  • If your dataset uses categorical variables or other labels or fields that you have created, explain how they were constructed. Should the user be aware of any categories or fields that condense or erase information?
  • Is any information missing? If so, please provide a description, explaining why this information is missing (e.g. because it was unavailable). Are there any errors, sources of noise, or redundancies? If so, please describe.

Ethical Considerations

  • What possible negative impacts or harms might result from the publication of your data? How could the data be misused?
  • Does the dataset contain information that might be considered confidential (e.g. data that includes the content of individuals’ non-public communications)? If so, please describe.
  • Does the dataset contain information that might be considered sensitive (e.g. data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)? If so, please describe.
  • Were any ethical review processes undertaken in the collection or curation of this dataset (e.g. the consultation of an institutional review board)? If so, please describe these review processes, including the outcomes, as well as a link to supporting documentation.

Versioning

  • Will the data be updated (e.g. to correct errors, add new instances, delete instances)? If so, please describe how often and by whom.

Bibliography

  • Please provide a list of sources that reference or draw on this data, or that were used or consulted to produce the data. If there are other sources that would help users understand this data, please include them, as well.

Licensing

  • What is the license for this data? If applicable, the data must be deposited under an open license that permits unrestricted access (e.g. CC0, CC-BY).

The language for these criteria was drawn from Katherine Bode, Jennifer Doty, Lauren F. Klein, Melanie Walsh, Cultural Analytics, Journal of Open Humanities Data, and “Datasheets for Datasets” by Timnit Gebru et. al.

Preparing + Formatting Your Data

The Post45 Data Collective aims to maximize the reusability and interoperability of our datasets. To that end, we have worked to include unique identifiers, such as OCLC numbers, ISBNs, HathiTrust ids, and VIAF records, for data related to books and authors whenever possible.

When curating your own datasets, please use similar unique identifiers. If necessary, we may be able to help potential authors add unique identifiers to datasets. Please reach out to the editors to consult about this possibility.

For further help, consider the following resources:

•    Format your data from the UK Data Service

•    Sustainability of Digital Formats from the Library of Congress

The Post45 Data Collective is supported and maintained by the Emory Center for Digital Scholarship.