Reporters:
* Dashun Wang, Northwestern University (dashun.wang@kellogg.northwestern.edu)
* Lu Liu, Pennsylvania State University (lpl5107@ist.psu.edu)
Abstract:
The rapid development of digital libraries and the proliferation of scholarly big data have created an
unprecedented opportunity to explore scientific production and reward at scale. The science of science is an
emerging field fueled by the recent data explosion in digital libraries, which uses and develops tools from
computer science, information science and network science to offer lessons underlying creativity from a
systematic and quantitative framework by mining scholarly big data, hoping to broadly explore the opportunities
for innovation. In this tutorial, we aim to provide an overview of the science of science, including its rich
historical context, the state-of-art technologies and exciting discoveries, and promising future applications
especially for the JCDL community. This tutorial will cover main topics in this field including scientific
careers, scientific collaborations and scientific knowledge from the perspective of participants in the digital
library domain, and will be specifically geared toward the audience that JCDL attracts.
Reporters:
* William Ingram, Virginia Polytechnic Institute and State University (waingram@vt.edu)
* Edward Fox, Virginia Polytechnic Institute and State University (fox@vt.edu)
Abstract:
Computational analyses are playing an increasingly central role in research and are a feature of many advanced
digital libraries. Journals, sponsors, and researchers, including in the digital library field, are calling for
published research to include associated data and code. However, many involved in research have not received
training in best practices and tools for building systems (e.g., using containers) and implementing methods that
facilitate sharing code and data. This tutorial aims to address this gap in training while also providing those
who support researchers with curated best practices guidance and tools.
This tutorial is unique compared to other reproducibility events due to its practical, step-by-step design. It is
comprised of hands-on exercises to prepare research code and data for computationally reproducible publication.
Although the tutorial starts with some brief introductory information about computational reproducibility, the
bulk of the tutorial is guided work with data and code. The basic best practices for publishing code and data are
covered with curated resources. Examples will include from the digital library and information retrieval domains.
Participants move through preparing research for reuse, organization, documentation, automation, and submitting
their code and data to share. Tools to support reproducibility will be introduced but all lessons will be
platform agnostic.
Reporters:
* Edward Fox, Virginia Polytechnic Institute and State University (fox@vt.edu)
* William Ingram, Virginia Polytechnic Institute and State University (waingram@vt.edu)
Abstract:
This tutorial is a thorough and deep introduction to the Digital Libraries (DL) field, providing a firm
foundation: covering key concepts and terminology, as well as services, systems, technologies, methods,
standards, projects, issues, and practices. It introduces and builds upon a firm theoretical foundation (starting
with the ‘5S’ set of intuitive aspects: Streams, Structures, Spaces, Scenarios, Societies), giving careful
definitions and explanations of all the key parts of a ‘minimal digital library’, and expanding from that basis
to cover key DL issues. Illustrations come from a set of case studies, including from multiple current projects,
including with webpages, tweets, and social networks. Attendees will be exposed to four Morgan and Claypool books
that elaborate on 5S, published 2012-2014. Complementing the coverage of 5S will be an overview of key aspects of
the DELOS Reference Model and DL.org activities. Further, new material will be added on building digital
libraries using container and cloud services, on developing a digital library for electronic theses and
dissertations, and methods to integrate UX and DL design approaches.
Reporters:
* Kevin Bretonnel Cohen, University of Colorado School of Medicine (kevin.cohen@gmail.com)
* Daniela Gifu, University of Iasi & Romanian Academy - Iasi Branch (daniela.gifu73@gmail.com)
Abstract:
We propose a tutorial on writing about research in data science. The focus will be on writing conference papers
and journal articles. The target audience is graduate students, post-doctoral fellows, and early-career faculty.
The teaching methodology will include lectures and a heavy hands-on component.
Reporters:
* David Bainbridge, University of Waikato (davidb@cs.waikato.ac.nz)
Abstract:
This tutorial is designed for those who want an introduction to building a digital library using an open source
software program. The tutorial will focus on the Greenstone digital library software. In particular, participants
will work with the Greenstone Librarian Interface, a flexible graphical user interface designed for developing
and managing digital library collections. Attendees do not require programming expertise, however they should be
familiar with HTML and the Web, and be aware of representation standards such as Unicode, Dublin Core and XML.
The Greenstone software has a pedigree of more than two decades, with over 1 million downloads from SourceForge.
The premier version of the software has, for many years, been Greenstone2. This tutorial will introduce users to
Greenstone3 -- a redesign and reimplementation of the original software to take better advantage of newer
standards and web technologies that have been developed since the original implementation of Greenstone. Written
in Java, the software is more modular in design to increase the flexibility and extensibility of the software
design. Emphasis in the tutorial is placed on where Greenstone3 goes beyond what Greenstone2 can do. Through the
hands-on practical exercises participants will, for example, build collections where geo-tagged metadata embedded
in photos is automatically extracted and used to provide a map-based view in the digital library of the
collection.