Post45 data collective
The Post45 Data Collective peer reviews and houses post-1945 literary data on an open-access website designed, hosted, and maintained by Emory University’s Center for Digital Scholarship.
About The PEER REVIEW PROCESS
Submissions must adhere to the following criteria and must be accompanied by a 800-1000 word paper that addresses the following questions, as appropriate.
The language for these criteria was drawn from Katherine Bode, Jennifer Doty, Lauren F. Klein, Melanie Walsh, Cultural Analytics, Journal of Open Humanities Data, and “Datasheets for Datasets” by Timnit Gebru et. al.
Relevance and Reuse
How is the data relevant to post-1945 scholarship? Who might it be useful for? What could it be used for? Please suggest at least three specific uses.
For what purpose was the dataset created? Was there a gap that needed to be filled? Has the data been used already? Does similar or overlapping data exist publicly? If so, please describe.
Description
What does the data describe? Are all instances included or a selection? If selected, what principles were used to justify inclusions and exclusions?
If your dataset uses categorical variables or other labels or fields that you have created, explain how they were constructed. Should the user be aware of any categories or fields that condense or erase information
Is any information missing? If so, please provide a description, explaining why this information is missing (e.g. because it was unavailable). Are there any errors, sources of noise, or redundancies? If so, please describe.
What is the file type and size of the data?
Collection and Creation
How was the data acquired or created? What mechanisms or procedures were used to collect it (e.g. hardware apparatus, human curation, software, API)?
If the data was hand-curated, what organizational heuristic was adopted, and why? What aspects of the data are products of the researcher’s judgment or interpretation, and which aspects were inherited? What are the implications of these decisions?
Who was involved in the data collection process (e.g. students, crowdworkers, contractors) and how were they compensated? Over what timeframe was the data collected?
Was any cleaning of the data done (e.g. removal of instances, processing of missing values)? Was the “raw” data saved in addition to the cleaned (e.g. to support unanticipated future uses)?
Provide sufficient detail such that readers understand how the dataset was created, and would within reason be able to recreate it.
Ethics
What possible negative impacts or harms might result from the publication of your data?
Does the dataset contain data that might be considered confidential (e.g. data that includes the content of individuals’ non-public communications)? If so, please describe.
Does the dataset contain data that might be considered sensitive (e.g. data that reveals racial or ethnic origins, sexual orientations, religious beliefs, political opinions or union memberships, or locations; financial or health data; biometric or genetic data; forms of government identification, such as social security numbers; criminal history)? If so, please describe.
Were any ethical review processes conducted (e.g. by an institutional review board)? If so, please describe these review processes, including the outcomes, as well as a link or other access point to supporting documentation.
Format
The Collective aims to maximize interoperability. To that end, we have strict requirements for the format of submitted data if it can be merged with extant data. For example, data oriented around book titles must use columns that match those used by HathiTrust. If it is a new category of data, the Collective will work with submissions toward creating exemplary standards.
For further help, consider the following resources:
- Format your data from the UK Data Service
- Sustainability of Digital Formats from the Library of Congress
Versioning
Will the data be updated (e.g. to correct errors, add new instances, delete instances)? If so, please describe how often and by whom.
Bibliography
Provide a list of sources consulted or drawn from to produce the dataset.
Licensing
If applicable, the data must be deposited under an open license that permits unrestricted access (e.g. CC0, CC-BY).
Terms of use
These terms have been derived from Dataverse Project’s recommendations for best practices in academic credit and data citation.
Data Citation
The Post45 Data Collective standardizes the citation of datasets to make it easier for researchers to publish their data and get credit as well as recognition for their work. When you create a dataset in the Post45 Data Collective, As an open-access framework and research data repository the Post45 Data Collective is committed to helping researchers, journals, and organizations make humanities data accessible, reusable, and open (when possible), which includes implementing community accepted standards for data publication.
The citation standard defined here offers proper recognition to authors as well as permanent identification through the use of global, persistent identifiers in place of URLs, which can change frequently.
Academic Credit
By depositing data into the Post45 Data Collective, researchers make their datasets more discoverable to the scholarly community.
By increasing research data’s visibility with the Post45 Data Collective, researchers can get recognition and proper academic credit for their scholarly work through a data citation. These citations also help ensure that when research data is published, funder and publisher requirements are met, and data is reused by other scholars, replicated for verification, and tracked to measure usage and impact over time, which can help fund future research.
A data citation in the Post45 Data Collective has seven components:
- author name(s)
- date published in the Post45 repository
- title
- global persistent identifier: DOI
- Post45 Data Collective
- version number
Example replication data citation from The Program Era Project, Kelly, White, and Glass, 2021:
Kelly, Nicholas; White, Nicole, Glass, Loren, 03/01/2021, “The Program Era Project,” DOI:TBD, Post45 Data Collective, V1.
The EditorS
Dan Sinykin, Assistant Professor of English, Emory University
Melanie Walsh, Assistant Teaching Professor, Information School at the University of Washington
The Editorial board
Katherine Bode, Professor of Literary and Textual Studies, ANU
J.D. Connor, Associate Professor of Cinematic Arts, USC
Jennifer Doty, Research Data Librarian, Emory
Lauren F. Klein, Associate Professor of English and Quantitative Theory and Methods, Emory
Laura B. McGrath, Assistant Professor of English and Digital Humanities, Temple University
Thomas Padilla, Director of Information Systems and Technology Strategy, Center for Research Libraries
Kenton Rambsy, Assistant Professor of African American Literature and Digital Humanities, UT-Arlington
Richard Jean So, Assistant Professor of English, McGill
the project team
Bailey Betik, Digital Publication Specialist, Emory Center for Digital Scholarship
Sara Palmer, Digital Text Specialist, Emory Center for Digital Scholarship