Post45 data collective

The Post45 Data Collective peer reviews and houses post-1945 literary data on an open-access website designed, hosted, and maintained by Emory University’s Center for Digital Scholarship.

About The PEER REVIEW PROCESS

Submissions must adhere to the following criteria and must be accompanied by a 800-1000 word paper that addresses the following questions, as appropriate.

The language for these criteria was drawn from Katherine Bode, Jennifer Doty, Lauren F. Klein, Melanie Walsh, Cultural Analytics, Journal of Open Humanities Data, and “Datasheets for Datasets” by Timnit Gebru et. al.

How is the data relevant to post-1945 scholarship? Who might it be useful for? What could it be used for? Please suggest at least three specific uses.

For what purpose was the dataset created? Was there a gap that needed to be filled? Has the data been used already? Does similar or overlapping data exist publicly? If so, please describe.

What does the data describe? Are all instances included or a selection? If selected, what principles were used to justify inclusions and exclusions?

If your dataset uses categorical variables or other labels or fields that you have created, explain how they were constructed. Should the user be aware of any categories or fields that condense or erase information

Is any information missing? If so, please provide a description, explaining why this information is missing (e.g. because it was unavailable). Are there any errors, sources of noise, or redundancies? If so, please describe.

What is the file type and size of the data?

How was the data acquired or created? What mechanisms or procedures were used to collect it (e.g. hardware apparatus, human curation, software, API)?

If the data was hand-curated, what organizational heuristic was adopted, and why? What aspects of the data are products of the researcher’s judgment or interpretation, and which aspects were inherited? What are the implications of these decisions?

Who was involved in the data collection process (e.g. students, crowdworkers, contractors) and how were they compensated? Over what timeframe was the data collected?

Was any cleaning of the data done (e.g. removal of instances, processing of missing values)? Was the “raw” data saved in addition to the cleaned (e.g. to support unanticipated future uses)?

Provide sufficient detail such that readers understand how the dataset was created, and would within reason be able to recreate it.

What possible negative impacts or harms might result from the publication of your data?

Does the dataset contain data that might be considered confidential (e.g. data that includes the content of individuals’ non-public communications)? If so, please describe.

Does the dataset contain data that might be considered sensitive (e.g. data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)? If so, please describe.

Were any ethical review processes conducted (e.g. by an institutional review board)? If so, please describe these review processes, including the outcomes, as well as a link or other access point to supporting documentation.

The Collective aims to maximize interoperability. To that end, we have strict requirements for the format of submitted data if it can be merged with extant data. For example, data oriented around book titles must use columns that match those used by HathiTrust. If it is a new category of data, the Collective will work with submissions toward creating exemplary standards.

For further help, consider the following resources:

Will the data be updated (e.g. to correct errors, add new instances, delete instances)? If so, please describe how often and by whom.

Provide a list of sources consulted or drawn from to produce the dataset.

If applicable, the data must be deposited under an open license that permits unrestricted access (e.g. CC0, CC-BY).

Terms of use

These terms have been derived from Dataverse Project’s recommendations for best practices in academic credit and data citation.

The Post45 Data Collective standardizes the citation of datasets to make it easier for researchers to publish their data and get credit as well as recognition for their work. When you create a dataset in the Post45 Data Collective, As an open-access framework and research data repository the Post45 Data Collective is committed to helping researchers, journals, and organizations make humanities data accessible, reusable, and open (when possible), which includes implementing community accepted standards for data publication.

The citation standard defined here offers proper recognition to authors as well as permanent identification through the use of global, persistent identifiers in place of URLs, which can change frequently.

By depositing data into the Post45 Data Collective, researchers make their datasets more discoverable to the scholarly community.

By increasing research data’s visibility with the Post45 Data Collective, researchers can get recognition and proper academic credit for their scholarly work through a data citation. These citations also help ensure that when research data is published, funder and publisher requirements are met, and data is reused by other scholars, replicated for verification, and tracked to measure usage and impact over time, which can help fund future research.

A data citation in the Post45 Data Collective has seven components:

  • author name(s)  
  • date published in the Post45 repository 
  • title  
  • global persistent identifier: DOI   
  • Post45 Data Collective 
  • version number
     

Example replication data citation from The Program Era Project, Kelly, White, and Glass, 2021: 

Kelly, Nicholas; White, Nicole, Glass, Loren, 03/01/2021, “The Program Era Project,” DOI:TBD, Post45 Data Collective, V1.

The EditorS

Dan Sinykin, Assistant Professor of English, Emory University
Melanie Walsh, Assistant Teaching Professor, Information School at the University of Washington

The Editorial board

Katherine Bode, Professor of Literary and Textual Studies, ANU
J.D. Connor, Associate Professor of Cinematic Arts, USC
Jennifer Doty, Research Data Librarian, Emory 
Lauren F. Klein, Associate Professor of English and Quantitative Theory and Methods, Emory
Laura B. McGrath, Assistant Professor of English and Digital Humanities, Temple University
Thomas Padilla, Director of Information Systems and Technology Strategy, Center for Research Libraries
Kenton Rambsy, Assistant Professor of African American Literature and Digital Humanities, UT-Arlington
Richard Jean So, Assistant Professor of English, McGill

the project team

Bailey Betik, Digital Publication Specialist, Emory Center for Digital Scholarship

Sara Palmer, Digital Text Specialist, Emory Center for Digital Scholarship

The Post45 Data Collective is supported and maintained by the Emory Center for Digital Scholarship.