Humanities Data: a Hands-on Approach 2016
Making the Most of Messy Data
- Conveners: Megan Senseney and Andrea Thomer
- Hashtag: #dhcuration and #DHOxSS
- Computers: Students are not required to bring their own laptops for this workshop. Desktop computers will be provided by DHOxSS
Abstract:
Humanists have data. Moreover, advances in the methodologies and approaches of digital humanities research have exposed the importance of maintaining research data and digital information in a manner that preserves its meaning and usefulness. Data curation is the active and ongoing management of data through its lifecycle of interest. Purposeful curation provides the foundation for a range of related activities from analyzing and visualizing research data to promoting access and reuse across a broader scholarly community.
This workshop will provide a hands-on introduction to a suite of useful tools, methods, and concepts for managing, organizing, cleaning, and processing data in digital humanities projects. Sessions will cover a range of topics, including information organization, data modelling, data quality and cleaning, and workflows. Participants will be introduced to humanities data from a selection of real-world digital humanities projects, and these datasets will serve as project case studies for use with each tool introduced throughout the week. At the end of the week, participants will present on their experiences working with the case studies throughout the week.
The programme is aimed at humanities researchers—whether traditional faculty or alternative academic professionals—and may also be of interest to librarians, archivists, cultural heritage specialists, other information professionals, and advanced graduate students. Sessions will be led by experts from the iSchool at Illinois's Center for Informatics Research in Science and Scholarship and the HathiTrust Research Center as well as the University of Oxford’s Bodleian Libraries, Oxford e-Research Centre, and Oxford Internet Institute.
Timetable
Time |
Monday |
Tuesday |
Wednesday |
Thursday |
Friday |
---|---|---|---|---|---|
11:00 - 12:30 |
Introductions
Introduction to Humanities Data |
Information Organization |
Contextual Data Modeling |
Provenance, Reproducibility, and Research Workflows |
Further Topics in Data Curation |
Lunch |
Venue: St Anne's College, Dining Room |
||||
14:00 - 16:00 |
Hands On with GitHub |
Introduction to Data Quality
Hands on with OpenRefine |
Hands on with SQLite |
Hands on Provenance, Reproducibility |
Further hands on with GitHub, OpenRefine, SQLite, and YesWorkflow
Participant presentations on their work with DH use cases |
16:30 - 17:30 |
Data and Project Management |
Hands on with OpenRefine [Continued] |
The Physical and Digital via the Meta: A Hands-On Linked Data in a Musicological Case Study |
From Project to Preservation: Institutional Repositories |
Closing Discussion |
Schedule Details
Monday
11:00 - 12:30
Introductions
David De Roure
Introductions should come first. We want to know about you, your projects, and your data.
Introduction to Humanities Data
Allen Renear and Andrea Thomer
In this session, we’ll review some of the unique characteristics and challenges in working with humanities data. We will also introduce the “messy” dataset we’ll be cleaning throughout the week, and review the workshop agenda.
14:00 - 16:00
Hands On with GitHub
Andrea Thomer
Github is a git-based web repository service for code, documentation, and data. We’ll explain what all those words mean and provide a brief overview of this useful service.
16:30 - 17:30
Data and Project Management
Andrea Thomer
Moving away from agency-required data management plans, this session discusses data management within the overarching context of digital project management. We will also introduce Zenhub, a free project management plugin for Github.
Tuesday
11:00 - 12:30
Information Organization
Allen Renear
An overview of basic strategies for information organization through structured tables, trees, and triples along with a discussion of different levels of information representation.
14:00 - 16:00
Introduction to Data Quality
Andrea Thomer and Bertram Ludäscher
This session introduces key concepts in data quality and cleaning, including stakeholder analysis, fitness for use, and provenance.
Hands on with OpenRefine
Andrea Thomer
OpenRefine is a “free, open source power tool for working with messy data and improving it.” We’ll demo this tool and prepare you to work with OpenRefine on your own in this session.
16:30 - 17:30
Hands on with OpenRefine [Continued]
Wednesday
11:00 - 12:30
Contextual Data Modeling
Neil Jefferies
Building upon concepts from Information Organization, this session approaches data modeling through deeper considerations of context, provenance, and evidence.
14:00 - 16:00
Hands on with SQLite
Bertram Ludäscher
SQLite is the most ubiquitous database engine across the globe. It’s lightweight, relatively easy to learn, and can be an important asset in your data curation arsenal. Participants will dive in with a hands-on introduction to database structures and data profiling.
16:30 - 17:30
The Physical and Digital via the Meta: A Hands-On Linked Data in a Musicological Case Study
Kevin Page
A data curation case study from the MetaMuSAK that explores a musicological annotation effort through digital tools, data capture strategies, RDF representation models, and explorations of linked data outputs.
Thursday
11:00 - 12:30
Provenance, Reproducibility, and Research Workflows
David De Roure
This session will explore how and why scholars capture their personal research workflows to ensure documentation of provenance and support reproducibility and reuse.
14:00 - 16:00
Hands on Provenance, Reproducibility
Bertram Ludäscher and David De Roure
Concepts from the morning session will be put to use through a demonstration of the YesWorkflow initiative. Participants will also have an opportunity to explore, alter, and run annotated scripts from the workshop’s shared dataset.
16:30 - 17:30
From Project to Preservation: Institutional Repositories
David Tomkins
What happens to your data when your project is complete? This session provides an overview of archiving and data management from the perspective of institutional repositories.
Friday
11:00 - 12:30
Further Topics in Data Curation
David Weigl and Andrea Thomer
This session will be a set of lightning-round demos and discussions of special topics in data curation including data integration, data visualization, and non-computational workflows.
- 11:00-11:30 -- Data integration with Karma (David Weigl)
- 11:30-12:00 -- Data visualization using Bookworm (Andrea Thomer)
- 12:00-12:30 -- Capturing non-computational workflows (Andrea Thomer)
14:00 - 16:00
Further hands on with GitHub, OpenRefine, SQLite, and YesWorkflow
Andrea Thomer and Bertram Ludäscher available for consultation
Each group will have a final opportunity to revisit tools from the week and continue working on their use cases.
Participant presentations on their work with DH use cases
Andrea Thomer and Bertram Ludäscher available for consultation
Every working group had the same dataset, but each use case posed a different set of challenges. What choices did you make based on your use scenarios? What issues did you encounter and how might you resolve them?
16:30 - 17:30
Closing discussion
Andrea Thomer, Bertram Ludäscher, and Allen Renear
What can you do to improve your own curatorial practices in the near term and in the long term? What are the key lessons you learned from your week as data curators?