Tech Week: Online Databases and Data Sharing
It’s Tech Week on the Blog and the Technology Committee has something special in store.…
This post is part of the May 2012 Technology Week, a quarterly topical discussion about technology and historical archaeology, presented by the SHA Technology Committee. This week’s topic examines the use and application of digital data in historical archaeology. Visit this link to view the other posts.
The Digital Archaeological Archive of Comparative Slavery (http://www.daacs.org/) provides standardized artifact, contextual, spatial, and image data from excavated sites of slavery throughout the early modern Atlantic World. Currently, DAACS is the largest archive, paper or digital, of standardized archaeological data related to slavery and slave societies. We have built it, with grant funds, generous data sharing, and intellectual input of more than 50 collaborating archaeologists and historians. For over ten years, these scholars and many others have contributed to DAACS’ overarching goal: to facilitate the comparative archaeological study of the spatial and temporal variation in slavery and the archaeological record by providing standardized archaeological data from multiple archaeological sites that were once homes to enslaved Africans.
DAACS strives to achieve this goal by giving researchers access to detailed standardized archaeological data in a format that allows the assemblages to be seamlessly compared quantitatively without any additional processing by the researcher. We do so by physically reanalyzing the assemblages, and their associated contexts, to the same classification and measurement protocols that were established with the help of the DAACS Steering Committee in 2000. This is the critical aspect of the DAACS program—providing the standardized data that are essential to any comparative archaeological study.
DAACS data are stored in a massive relational Structured Query Language (SQL) database and are delivered over the internet via the DAACS website. The website debuted in 2004 with complete data sets from 15 domestic slave sites in Virginia. They were made available then, as they are today, through an easy-to-use, point-and-click query interface. By the end of 2012, DAACS will contain complete archaeological datasets, including data on over 2 million artifacts, from sixty sites of slavery in Maryland, Virginia, South Carolina, Jamaica, Nevis, and St. Kitts.
During the past year, over 10,000 unique visitors have landed on the DAACS website. Many DAACS users go straight for the Archive’s meta-data: the section of the website that contains information on the DAACS data structures and authority terms, DAACS cataloging manuals and stylistic element guides, and research papers and posters. Others spend time browsing and reading through the archaeological sites pages, the text-heavy portion of the DAACS website that provides extensive background data on each site, site chronologies, access to images and maps, and bibliographies. We consider these pages essential to anyone using the archaeological data accessible through the DAACS Query Module.
Visitors often move from the background pages to the DAACS Query Module, which provides access to standardized data on hundreds-of-thousands of artifacts and archaeological contexts. The query interface masks a complex set of queries to the relational database that contains the raw archaeological data from all sites in the Archive. Queried data are returned and made available to users through the web browser and through downloadable ASCII files that can easily be imported into the user’s favorite statistical package.
DAACS is explicitly and clearly designed for large-scale comparative archaeological research. The website features—the Query Module, Archaeological Sites Pages, and corresponding meta-data—are critical to meeting the goals of the project.
In the evolving ecology of accessible digital data, digital archives vary in the extent to which they are designed to facilitate comparative research versus the extent to which they facilitate and make possible the preservation of archaeological data. These elements of online archives and databases are not mutually exclusive; many research archives preserve data and preservation archives encourage research. Projects such as tDAR (The Digital Archaeological Record) and ADS (Archaeological Data Service) are essential to the preservation of born-digital data generated by individual researchers. These critical resources preserve and make searchable data from any type of archaeological project, regardless of region or time period. Data from projects range from digital reports and basic finds lists to full-blown archaeological databases. However, there are comparability problems, to the extent that the contributing researchers use different classification and measurement protocols.
To date, research archives have focused on specific regions and time periods in order to provide datasets that enable researchers to address synthetic research questions. Examples include the Chaco Research Archive, A Comparative Archaeological Study of Colonial Chesapeake Culture, and DAACS. These projects provide a venue in which protocols that work well in particular times and places encourage individual researchers to think seriously about how to ensure their data plays well with others’ data, making it easier to researchers to glimpse the fruits of comparative analysis that shared protocols make possible.
But each archive type requires specific tradeoffs. For research archives making comparative quantitative research easy requires standardization. However, it is not clear how, over the long-term, the requisite standardization will emerge. Sites like DAACS may be one way forward. No matter where one sits on the continuum, a firm commitment to open and transparent data sharing underpins all digital archiving projects.
The demand for archives that specialize in digital data preservation and accessibility will continue to grow as individuals, museums, universities, and the government grapple with archiving and making the large quantities of archaeological data they curate accessible. The success and growth of research archives that generate detailed comparable digital data accessible for the explicit research purposes will depend on how we meet the analytical needs of inquisitive archaeological researchers.
Over the past six years, we’ve seen a marked increase in the number of graduate students who approach us with the desire to pursue data-driven comparative research. Their questions and needs may be a bellwether for the development, use and longevity of research archives.
Our experience at DAACS is that undergraduate and graduate students are eager to engage in archaeological data analysis, both on the single site and comparative levels. They come to DAACS asking questions that require serious archaeological data analysis however many are missing two critical skills: the ability to link arguments about what happened in the past to archaeological variation and the skills in data analysis that allow them to summarize patterns in the data that speak to the arguments.
A concrete example is one related to chronology. Chronological control is the critical first analytical step in doing any archaeological study, whether at a single site or comparative analysis – you do not want to mistake temporal change for synchronic variation. Yet we have discovered that graduate students who have completed their coursework in Historical Archaeology do not know how to get started. From framing an argument to executing data retrieval, discovering patterns in the results, and linking those patterns back to the original argument we have discovered that most historical archaeology students come to us seeking advice on where and how to begin working with their data and the data in DAACS. An informal survey suggests that one reason is that only a handful of graduate programs that provide advanced degrees with specializations in historical archaeology require students to take even a single course in statistical methods.
But it is clear that students (and our colleagues) want more resources for learning how to work with these data. We receive regular requests to provide training in statistical analysis and to teach the more arcane analytical methods that we occasionally use but which are necessary to fully engage with the quantity of fine-grained data available through DAACS.
As the promise of using online databases for research has become increasingly obvious over the past five years, the demand for data has risen. It is how we meet the demand not only for the data but also for the analytical skills to make sense of the data that will determine the trajectory of online databases in the next 5 to 10 years.
While I worry about the trajectory of archaeological training, I remain sanguine about the promise of research archives in large part because I am lucky enough to work with graduate and undergraduate students engaging with DAACS’s online database, students who work doggedly to learn methods they were never taught, and who have come to realize that the data in DAACS are so rich that the hard work it takes to learn analytical approaches to their data provides big payoffs and exciting answers to previously unanswerable questions.