Dean, Penrose Library, University of Denver
Nancy Allen discusses the Colorado Digitization Project (CDP) and working collaboratively with other cultural heritage institutions and creating an infrastructure for sharing digital content, which became a model for digitization projects nationwide.
“Information flow rather than archival management needs to be ported out to the cultural heritage world.”
A shared vision
Nancy Allen and Nancy Bolt (former Colorado State Librarian) shared the vision for CDP: “Nancy and I were talking about how it was too bad that there was no state infrastructure to share the knowledge that they [DPL’s Western History collections] had gained organizationally about digitization.” Many of the projects discussed by the Digital Pioneers began with one, two or a few people with a shared vision. It was often a vision of creating improved access.
Your goal is to be able to share institutional knowledge and systems that you are creating. You are creating knowledge. Therefore, in designing the process and while undergoing the process, build in a way to document and share it.
User adoption is key
Adoption by others is key since the value of a project is in its ability to be duplicated and re-purposed.
The larger purpose of information
What is the purpose of information? Information has value to users when it serves their purposes. You are creating value by allowing others to duplicate and repurpose data and information.
When applying for grants, think big and consider using the same application for a larger grant in the future to further facilitate your grant’s goals. Build into the grant a way of creating a system for documenting the process and sharing the knowledge so that others can adopt and benefit from the same grant.
Focus on collaboration between libraries and cultural heritage institutions, not just the technology
Technology is the tool to help you achieve your goals. Therefore, you need to focus not just the technology but on the goal on what to do with the information and the data.
Expect many challenges
Anticipate many challenges and build them into your grant: plan both how to anticipate them and how to deal with them. These include challenges in collaboration, standards, best practices and technology. There may not be standards in place so you may be the one to develop them. Keep this in mind as it will influence your project design. There are many benefits to having the standards that you create to meet not just the needs of libraries but museum and other cultural heritage institutions. The CDP had the goal of sharing infrastructure with libraries and cultural heritage institutions like museums, archives and historical societies.
Collaborative challenges: Project collaborators must share a common purpose
Project collaborators must have common ground with shared professional understandings, goals and purpose for the work. Lacking shared purpose will cause problems later in the process.
An effective Executive Director with vision and drive
You will need an Executive Director who can drive the growth and vision of the project. They will need to be well-connected so that they can network beyond libraries and create collaborative organizational relationships and partnerships.
A board with grand vision
It is important to have a high-level, big-thinking and visionary board.
Have a massive vision of infrastructure elements
A grant will make you come up with a plan. Build a vision very practically. A first grant can be the gateway to much larger grants. Additional grants can add to the infrastructure. Creating an adaptable and adoptable infrastructure will allow you to make a difference nationwide.
Visualize your vision
Storyboard your vision so that you can visualize the elements needed to achieve your goals.
Provide best practices and guidelines to collaborators
Provide best practices and guidelines to collaborators for greater efficiency and effectiveness in implementation. Provide the same equipment, metadata, software to all partners. Make teaching, training and workshops available to all partners. Have practical use rules agreed upon by partners.
The biggest challenge of the CDP digitization collaborative was sustainability. You need to develop a sustainability model from the very beginning. You must plan for both funding and leadership sustainability.
You cannot depend on always getting your grants. There is a need to build a sustainable model from the beginning and create more diverse revenue streams, creative ways to fundraise for infrastructure, upgrades, new policies, travel funding and unanticipated costs. Strive to generate funding sources that will have minimal disruptions.
Leadership and staff will change during the course of projects. Strive to minimize the impact of leadership and staff turnover. Plan for the institutional transfer of information. Combine leadership expertise by having leaders with varying strengths. It is critical to maintain your commitments to members through changes in leadership, staff, operations and grants.
Information Flow in a collaborative world
“Information flow rather than archival management needs to be ported out to the cultural heritage world.” Other cultural institutions need to switch their view from traditional archival management to the movement of information from repositories to users.
Digitization projects bridge the gap
Digitization projects bridge the gap between libraries and cultural heritage institutions since digitizing collections connects users to archived materials. We must work to systematically bridge the knowledge gap between the developers in libraries and archives, and other cultural heritage organizations.
Collaboration and partnerships with cultural heritage institutions
Collaborative is hard work but it is worth it. It is better for society. It is deep and lasting work.
Libraries need to do outreach to museums, galleries and school teachers.
Associate Dean Information Technology Services, Marriott Library, University of Utah
Kenning Arlitsch discusses four very successful digitization projects: the Mountain West Digital Library (portal for digital collections about the Mountain West region), Utah Digital Newspapers, Western Waters Digital Library (primary and secondary resources on water in the western United States) and the Western Soundscape Archive (thousands of recordings of Western animal species and their environments). Learn about the “lightweight” digital library model, the power of pilot projects, the importance of marketing your digital collections to the public, and the demand for catalogers who understand linked data and how it can help create context around materials.
“Everybody, virtually everybody in the library is now a stakeholder in how their digital collections…get presented to the world. And whether the world can find them.”
Mountain West Digital Library (MWDL)
MWDL “is a portal to digital resources from universities, colleges, public libraries, museums, archives, and historical societies in Utah, Nevada, Idaho, and Hawaii.” (About)
What initially began as a project in 2001 to digitize 200 glass-plate negatives from the Utah State Historical Society’s collection has grown into a portal for over 690,000 resources (which translates to several million digital objects) from the digital collections of 17 institutions and over 60 partners including universities, colleges, public libraries, museums, historical societies, and government agencies, counties, and municipalities in Utah, Nevada, Idaho, Wyoming, Hawaii and other parts of the U.S. West.
The successful MWDL’s distributed and lightweight digital library model
- The University of Utah manages the aggregating server and the website
- Each digitization center digitizes their own collections and puts them on the University of Utah’s aggregating server
- Each center also supports and hosts other institutions (historical societies, museums, etc.) that cannot afford their own digitization and digital preservation infrastructure
- Use an OAI harvester that just harvests metadata from each of the servers back to the aggregating server; the metadata is exposed through the OAI
-- Lessons --
- Management: Hire a program director to lead, manage, and grow the digital library by bringing in new partners and new collections
- Financial: Use a lightweight model that does not rely long-term on soft money (grant funding)
- Ex Libris
- MWDL is using Ex Libris’ Primo for their harvesting mechanism and search interface.
- University of Utah recently became only the third institution in the United States to purchase Ex Libris’ Rosetta digital preservation software. With two of the other four institutions being in Utah (the LDS Church and BYU), they are now considering building a digital preservation network on the backbone of MWDL.
Utah Digital Newspapers (UDN)
The Utah Digital Newspapers Program has over 1.3 million pages. Because every article on every page is currently a separate file, those 1.3 million pages translate to about 17 million individual files.
They are gradually transitioning to the newer method of digitizing, which focuses more on creating individual JPEG 2000 files for each page, with built-in article coordinates. This will reduce the file count from 17 million down to the actual number of pages, currently at 1.3 million. They will use METS/ALTO, the new metadata and processing standard for digital newspapers. This will address scalability issues.
-- Lessons --
- If you try developing processes in-house and they prove unable to scale, find external help. Work with an outside vendor to develop the process.
- Hire a project manager
- Write the grant with the project manager
- Encourage the project manager to write and submit other grants
Western Waters Digital Library (WWDL)
WWDL “provides free public access to digital collections of significant primary and secondary resources on water in the western United States.” (Homepage)
WWDL was essentially built on the model of the MWDL. A successful concept, project and model prove that you have a track record and will help you expand and get more grants.
Western Soundscape Archive
This collection has nearly 3,000 individual sound files. They can all be streamed and listened to live on the website. Some of the files can also be downloaded and re-used for educational purposes, if the copyright or creative commons licenses allow.
-- Lessons --
Audio and video files are huge. They will require pedabyte size storage space, which is expensive.
- “Audio and video files are our biggest sector of growth.”
- “We have—we have roughly 100 terabytes of digitized data that we’re—that we’re having to manage. Over the next five years, we expect that to grow to about 250 terabytes, or a quarter pedabyte.”
- “The vast majority of that growth will be in audio and video files. Because they’re—they’re just bigger. So yeah, that creates—that creates stresses on our infrastructure, it creates stresses on our funding, and it creates huge stress on storage and digital preservation.”
A lightweight funding sustainability model
Develop lightweight projects that require minimal staff and funding.
- “The models that have been set up for Mountain West Digital Library and for Western Waters Digital Library are not contingent on new funding coming in. The model is lightweight enough that there are very few personnel, beside my IT staff, who support these, and—it just sort of vacuums up the metadata of collections that the participating institutions we hope would digitize anyway. So in that way it’s a relatively lightweight model.”
Marketing what libraries have created
Libraries, in general, do not market either themselves or what they have created very well. Libraries create products and services that are focused on the public. Libraries would benefit from greater public support if they marketed these products and services effectively.
Fund your marketing costs by building in a marketing budget into grant proposals. Create a sustained marketing plan. Engage your marketing and development departments to promote your digital collections.
- “So as popular as this program has been, I think it could be a lot more popular if we had a genuine marketing and advertising program.”
- “I think what we’re still lacking is getting into the—general consciousness of—of the public. And this particular project, the Utah Digital Newspapers Program, and Mountain West Digital Library are—are really focused at the general public.”
- “We don’t advertise ourselves well enough. And I think what we have to do—it’s a culture change, frankly, in libraries.”
- “The public doesn’t need us as much as they used to. So we have to make more of an effort to show them what we can provide. And part of that means thinking more like businesses. Actively engaging our university marketing departments.”
- “I think in general, development departments and marketing departments have to start promoting digital collections more.”
Utah Digital Newspapers is still not part of the Mountain West Digital Library due to scalability issues. Using Ex Libris’ Primo aggregating software will help make UDN more scalable. Then the metadata from the millions of records in UDN will be aggregated into MWDL. Using the new METS/ALTO metadata and processing standard will reduce the file count from 17 million down to the actual number of pages, currently at 1.3 million, and will address some scalability issues.
The power of pilot projects
Pilot project allow you to prove concepts. When you develop a concept, build a pilot website to visually and audibly show your concept. Pilots are powerful tools that help you communicate your vision to grant funders.
- “Pilot projects are always great things…they allow you to prove the concept.”
- “I hired Jeff Rice for six months just on departmental money to help us develop the concept, build a pilot website, put some—put some sound files into a collection. And then, based on that, we wrote a proposal to IMLS. And this was late 2006, early 2007 and we were funded in September of 2007. And that was another three year proposal.”
Find private donors
Focus on working with private donors to fund your digital collections.
- “Most of what I see in terms of—in terms of development, in terms of donor relations, still focuses on the building and print materials, in particular our special collections. There’s not very much of a focus yet on trying to get external funders, private donors, to contribute money to digital collections like this. And that—that has to change.”
- “How we do that? We just have to keep talking about it. Just have to keep promoting it and pushing it. And—you know, things change slowly. But they do change.”
Copyright limits knowledge increase and innovation.
- “Copyright was never intended to be for the life of the author or even beyond. The—the current copyright laws are ridiculous. It’s something like life of the author plus 75 years or maybe—maybe it’s even longer at this point.”
- “Initially—when copyright was developed by the founding fathers, it was intended to give the inventor or the author some years of profit from their—from their works. But eventually, things have to go into the public domain because that’s how innovation happens. That’s how knowledge increase happens. And locking these things up and making them unavailable—freely accessible—I think hurts us.”
ADVICE FOR STUDENTS
Scientists need help managing their data.
- “There are—there is room for tremendous innovation and the places now where I’m seeing a lot of room for growth and a lot of room for librarians really taking the bull by the horns is data management—managing—figuring out with scientists how to manage their data.”
- “Because I can tell you that most scientists really have no idea how to deal with their data. They don’t even know the right questions to ask. Everything from basic storage, where do I—where do I store my data to migrating it to—putting it in a database to make—and assigning metadata to it and making it accessible. So—so data management is a huge issue.”
Catalogers and linked data
Wanted: Catalogers who understand linked data and how it can help create context around materials.
- “Metadata itself is—I think is—I think there is more need now for cataloging librarians than there’s ever been before, it’s just a—it’s just a tremendous paradigm shift.”
- “The cataloguing librarians who are willing to think in new ways, who are willing to think about linked data and how it can—help—create context around materials is—is just enormous.”
- “And we’re behind the 8-ball. We’re falling behind in that area again. So I think there’s tremendous potential for growth there.”
Read widely and broadly
- “I would suggest that you—read widely. You don’t have become an expert in any particular field but you have to know what’s going on broadly.”
Hire the right people into the right positions, give them the power to run projects and the opportunity to creatively solve problems.
- “Try to achieve success by leading and providing opportunity.”
- “I think the best things that I’ve done in my career are hiring the right people and putting them into the right positions. Giving them the power to run with projects. You have—you have no idea what—what people can do until you present them with a problem and an opportunity and see how creative they can be. See what they can bring back to you.”
- “If you go into management, you have to be prepared to earn your salary. Which means that sometimes you have to—you have to counsel people out. You have to—deal with—with staff who are not doing a good job and who are unproductive.”
- “Always try to—to achieve success by leading and providing opportunity, but you have to be prepared to deal with the other side as well, and that’s—that’s not easy.”
Core issues in digital libraries
The core issues in libraries and digital libraries are the same. What has changed are the tools, methods and the need to work across boundaries.
- “We still have to acquire materials, we still have to organize them, we still have to make them accessible to the public and we still have to preserve them. Right? Those are the core—core issues in libraries”
- “In a digital library it’s no different. I need the special collections people to bring the collections in, I need the public services people to tell their patrons about these collections, I need the catalogers to help—help people make sense of the digital collections. None of that has really changed. It’s just the tools and the methods have changed.”
- “But what has changed is the need to work across the lines, the need to work across boundaries.”
SEO and institutional repositories (IRs)
Arlitsch’s search engine optimization (SEO) research shows that institutional repositories are practically invisible to Google Scholar. However, Arlitsch discovered that “this is less of a technical problem than it is of an administrative and a communication problem.”
“Everybody, virtually everybody in the library is now a stakeholder in how their digital collections—or how the library’s digital collections get—presented to the world. And whether the world can find them. And so search engine optimization has to be talked about broadly across all departments.”
Article: Invisible institutional repositories: addressing the low indexing ratios of IRs in Google.
Kenning Arlitsch, Associate Dean for IT Services and Patrick O’Brien, SEO Research Manager, are both from the J. Willard Marriott Library at the University of Utah.
Two of the authors’ pilot studies on University of Utah’s IR, USpace, demonstrated that when the metadata tags were converted to the more precise bibliographic Highwire Press tags, then the ratio increased from 0% to 62% on the second pilot and to over 90% on the third pilot.
The authors conclude that IRs can substantially increase indexing ratios when libraries do the following:
- Use the metadata schemas that Google Scholar recommends -- Highwire Press, EPrints, PRISM, and Bepress
- Provide precise bibliographic information in the HTML page header tags.
Libraries specify two additional factors that can improve IR content visibility to search engine crawlers:
- Addressing technical SEO issues
- Optimizing HTML tags in PDF files
This article has impact on academic libraries and institutions. Libraries invest significant hours into marketing their IRs to faculty in order to establish user buy-in and persuade faculty to deposit their publications into the IR. Libraries will be able to demonstrate to faculty that using the IR will make their publications significantly more visible. Being able to communicate personal benefits to faculty can help to establish long-term user adoption. Additionally, being able to communicate how the university can benefit from the higher rankings as a result of an improved IR can help bolster library support.
Arlitsch, K. and O’Brien, P.S. (2012). Invisible institutional repositories: addressing the low indexing ratios of IRs in Google. Library Hi Tech, 30(1), 60–81. Retrieved from http://dx.doi.org/10.1108/07378831211213210
ACRL Webinar from June 6, 2012: Google Scholar and Institutional Repositories: Improving IR Discovery (http://www.ala.org/acrl/irdiscovery)
Professor of Cinema Studies and Director of New York University’s Moving Image Archiving & Preservation Program (MIAP),
Senior Scientist for Digital Library Initiatives for NYU’s Library
Howard Besser discusses working on multiple digitization projects including the first scan and direct digitization of an oil painting by the Pacific Film Archive at UC Berkeley; the first interactive image query system that produced high-quality art images on a computer screen called Image Query; and the complexities of working on a huge multi-institutional project like Museum Educational Site Licensing Project, which involved seven universities, six museums and the Library of Congress.
“Have some vision of the future when you’re doing your little project that would allow your little project to be part of a world of information.”
An eclectic work background
Your career path can be eclectic. It does not have to be linear.
Funding your projects
If you need funding for operations, you could apply for a grant for a very creative project on a hot topic that will also fund operations. Align your ideas with larger campus-wide initiatives. This can help generate support, including staff and funding, for your own ideas.
“It’s not enough to just have a user get an image. They need to be able to do something with it.”
It’s not just about the data. It is about attaching meaning to the data. Users attribute meaning to the data based on their own needs. It is important to remember that we cannot anticipate what users will want to do with the data. Users create value when they use, reuse and repurpose the data to create new things.
Questions to ask when working with materials for a digital collection:
- What types of attributes might the user want to query?
- What type of functions does a user want?
- Will they want to zoom in on it?
- Will they want to be able to save it on their own computer?
- Will they want to link it to a map?
- Will they want to be able to add their own metadata to it?
Prototype projects are opportunities to figure out the things that you really want, what things can you do without, in what areas you might you need to go in a different direction than you originally envisioned. Have an open and flexible attitude and view. Be open to the possibilities of new ways of thinking and questioning.
Create multi-institutional collaborations and lifelong professional relationships
Think larger and create multi-institutional collaborations. Howard Besser worked on a project that included fourteen institutions: six museums, seven universities and the Library of Congress. Anticipate building lifelong professional relationships with people from working on multi-year multi-institution collaborations. When working on large projects, you will learn “how to try to do a new project, how to work with other people, how to collaborate, how—how to actually get things done.” You will gain experience that can lead to higher-level management positions and full-time project management positions.
“You had to do things to encourage use”
The “if you build it, they will come” notion does not apply to digital collections. You will need to encourage users in order to have users. User adoption is key and this requires promoting and marketing and working directly with users. For example, find ways to encourage faculty to teach with the tools you are producing.
Problems that you did not anticipate or envision will come up. You need to have a stable and steady source of income to pay for unanticipated costs of items taking longer than expected or not working as expected.
Project participants need to be valued
Project participants need to know that they have a voice, that they are being listened to and that their contributions and input to the project are being valued. If not, it could result in “a groundswell of opposition to the project management.” Listen to the participants and “make sure there’s a participant voice” by being sensitive to “when things are brewing and there’s murmurs and disgruntlement.” Ensure that those who are devoting significant amounts of time to the project (on top of their own full-time jobs) are appreciated because “if they are not feeling good about the project—then the project is not going to succeed.”
Advice to new people from Howard Besser:
1) For most important things, you should not do them alone. Things are best done in groups.
2) You need both a short-range plan and a long-range vision. For example, know how you might need to scale and change the process of how to work with a collection of 500 objects when that collection grows to 10,000 or 100,000 objects.
3) Your short-range plan has to fit into your long-range vision. For example, how your collection of 500 objects fits with other larger collections. Think through what may be useful data for future unanticipated uses.
4) Have a larger vision for your project: “Have some vision of the future when you’re doing your little project that would allow your little project to be part of a world of information.”
Main issue: Digital preservation
One of the main challenges is digital preservation since the digital files come in so many different formats. How will we be able to open them all up later?
Main issue: Copyright
Copyright is a huge challenge. Copyright for orphan works is one of the biggest challenges since they do not know who owns the rights to so much of the content in repositories. Be cognizant of the TEACH Act, which addresses copyright laws for distance learning and the distribution of media via online classes, since it does not full address moving image materials. See The Copyright Clearance Center’s Copyright Basics: The Teach Act (http://www.copyright.com/Services/copyrightoncampus/basics/teach.html).
Upstream and downstream cataloging
It is necessary to catalog metadata in order to retrieve the material, contextualize and preserve the material. Libraries rely on users in order to gather the metadata downstream. Both content creators upstream and users downstream have to be involved in cataloging metadata. “What we need to be doing is to be pushing the cataloging upstream and downstream. Upstream to the content creators and getting more of the cataloging that we need, more of the information that we need from the creators and downstream to our users and having them contribute the metadata that we need to—to find things.”
A study of workflows of a born-digital public television show indicated that, “there is a huge amount of metadata that we need for retrieval and for preservation that is known at early stages of the production and is thrown away.” Producers need a standard format tool for gathering and documenting the metadata from creating the television show.
Besser co-wrote the following paper on gathering metadata upstream from content producers:
Besser, H. and Van Malssen, K. (2007). Pushing Metadata Capture Upstream into the Content Production Process: Preliminary Studies of Public Television
Retrieved from www.ils.unc.edu/digccurr2007/.../besserVanmalssen_paper_4-1.pdf
From abstract: “This paper examines the issue of metadata lifecycle management, highlighting the need for conscious metadata creation to become part of the digital media production workflow in order to facilitate effective digital curation.”
Materials in museums and libraries have historically been the “quality material” that “is not representative of the average person.” There is a need to look much more at ephemeral material that’s “produced by ordinary, everyday people as part of ordinary, everyday activities.” Although historically considered marginal, it is very rich and valuable in contributing to understand history and society. YouTube movies are examples of ephemeral material that reflect and describe what people are thinking about and discussing topically today. The NYU Orphan Film Symposium showcases “all manner of films outside the commercial mainstream: amateur, educational, ethnographic, industrial, government, experimental, censored, independent, sponsored, obsolescent, small-gauge, silent, student, medical, unreleased, and underground films, as well as kinescopes, home movies, test reels, newsreels, outtakes, fringe TV, and other ephemeral moving images” (http://www.nyu.edu/orphanfilm/).
Outdated copyright laws
The pre-internet copyright laws are ill-equipped to handle today’s modern artistic creations, which are “pastiches of previously created things;” new creations are “riffed” where “you take something that someone else made and you re-edit it and put other things around it,” thereby making it your own new creation. As “custodians of culture and our cultural heritage,” we must be concerned about not only with collecting and preserving these materials but also allowing people to be able to re-use and re-purpose the materials.
“Libraries are not primarily about that physicality. They’re about a place. They’re about a civic engagement and fostering civic engagement.” Libraries and museums are described in IMLS report on future of museums and libraries as “third spaces” that “foster civic engagement.” Being “third spaces” gives libraries the opportunity to remain relevant to communities and to become introduced to new audiences. These audiences may not be interested in the library’s physical objects or physical spaces.
In The Great Good Place, Ray Oldenburg defines the significance of the third place in a healthy society. Neither work nor home, the third place is a neutral community space, where people come together voluntarily and informally in ways that level social inequities and promote community engagement and social connection. As public gathering places organized around public service and the transfer of information and ideas across individuals, museums and libraries are a unique form of the third place because of their distinct resources as easily accessible, low-cost barrier places rich in content and experience. (Pastore, p. 9)
Pastore, E. (2009). The future of museums and libraries: A discussion guide (IMLS-2009-RES-02). Washington, D.C. Institute of Museum and Library Services. Retrieved from http://www.imls.gov/about/future.aspx
Ray Oldenburg, The Great Good Place: Cafes, Coffee Shops, Bookstores, Bars, Hair Salons, and Other Hangouts at the Heart of a Community, 3rd ed. (Jackson, TN: Da Capo Press, 1999).
Director, Digital and Preservation Service, BCR, Inc.
Liz Bishoff discusses her role as Executive Director of the Colorado Digitization Project; the importance of building sustainability into a program; and incorporating digitization into the overall library or cultural heritage institution program, not just as a project, but as an integral part of the program.
“While we need to continue to create digital collections, we’re going to have to look at how to incorporate digitization into the overall library program or cultural heritage institution program. It cannot be just a project, it has to be a part of the program.”
Collaborations with other cultural heritage institutions
Collaborations with other cultural heritage institutions are necessary in order to create a fuller picture of Colorado’s heritage. The library, by itself, cannot create the whole picture. You need multiple perspectives.
A common vocabulary
When collaborating with other agencies and institutions, it is critical to develop a common vocabulary that helps the partners share common goals. Developing this vocabulary at the outset will prevent problems later during project implementation.
Digitization as part of the overall program
Digitization must be incorporated into the overall library program or cultural heritage institution program. It cannot be just a project. It must be an integral part of the program and sustainability must be built into the program.
It is critical to obtain sustainability in these two specific areas:
- Organizational sustainability: people, money and institutional support
- Sustainability of the digital objects: long-term access and digital preservation
Programs with institutional affiliation and more importantly, a solid institutional home and foundation, have long-term success since they are sustainable.
Applying best practices in other geographical locations
Consider how the best practices that you are developing will be applied to areas around the country and the world that are different geographically, geopolitically, economically and demographically.
Adoption of your project by institutions in other geographical locations
When naming or branding projects and programs, consider if adding a geographical boundary to the name (i.e. Colorado, Denver, Western) would make it less applicable to other areas later. Ask yourself if the name would need to be changed if other areas want to adopt the program or project.
Digital content is the future of our cultural heritage institutions
There are endless opportunities with digital content. It is the way we will be able to reach everyone today and in the future including the geographically disadvantaged.
The Colorado State Library
The Colorado State Library has a history of doing and supporting innovative work and collaborations, including the Colorado Digitization Project and the Colorado Virtual Library.
Director of Digital Libraries, UC San Diego
Robin Chandler shares the dramatic story of how her library’s digitization of five cartons of incriminating evidence led to the University of California San Francisco being sued by the Brown and Williamson (B&W) tobacco company for the return of stolen property. The documents showed that B&W knew that tobacco was addictive and that there were linkages between smoking, heart disease and lung cancer. B&W was attempting to keep documents from being made freely available in the public domain. Plus, learn from Robin Chandler why we need to have a Grateful Dead Archive.
A powerful tobacco company. An anonymous sender. A library scanning allegedly “stolen property”. Death threats. Was it “stolen” property? Does the public have the right to know? Who was Mr. Buds?
“Seize opportunities…there are events that happen around you that can lead to really important things…the youngest person in the room can actually contribute a great deal to something. I’d also say to be brave…don’t be afraid…you can really make a difference.”
In her first job leading a department, Robin Chandler was the head of Archives and Special Collections at UC San Francisco’s Library and Center for Knowledge Management from 1995 to 2000. Only in her mid-thirties, Robin Chadler was managing the Legacy Tobacco Documents Library (LTDL) (http://legacy.library.ucsf.edu/). LTDL currently “contains more than 14 million documents (80+ million pages) created by major tobacco companies related to their advertising, manufacturing, marketing, sales, and scientific research activities” (LTDL Homepage).
Documenting grassroots movements
Consider documenting the process of popular movements. For example, the UC San Francisco Library, being part of an academic health center, had a mission of public health. They were documenting the whole process of a grassroots movement around tobacco in California that revolved around the anti-smoking ballot initiative, Proposition 99. “Prop 99” passed and allowed the state to levy a tax on tobacco sales. Revenues would be spent on helping educate the public about the hazards of smoking. Documenting the process involved reaching out and getting the papers of individuals and organizations that had made this ballot initiative possible.
Advice to young people entering the field: Seize opportunities and be brave!
Robin Chandler was only in her mid-30’s in her first job as head of a department and running a Special Collections unit within the University of California system.
- “Seize opportunities…there are events that happen around you that can lead to really important things…the youngest person in the room can actually contribute a great deal to something. I’d also say to be brave…don’t be afraid…you can really make a difference.”
Collection level access vs. Item level access
Item-level metadata creates access at the item level. How do you work your archival principles and thoughts about arrangement and description in a project with unique challenges?
- “Archivists are usually thinking very much at a sort of collection level or series level and what was really clear about this particular set of materials is that we were really talking about item level access, and each—each letter or memorandum or report was something of unique interest.”
- “Because we knew that the power of the material was going to be in the ability to search the documents. The more possibility that you could mine that material, the better it would be.
Arm yourself with knowledge
Be knowledgeable about the definition of “property”, prior restraint and issues of publication, the First Amendment, and what is appropriate to be in the public domain.
- For example, the five cartons contained copies of original documents. B&W was arguing that the documents were their property. Essentially, “that meant that what they were really trying to assert was that—that copies were still property. And in a sense they wanted to control the information.”
- UCSF argued that “they were copies, they weren’t the originals, so they really couldn’t use that…sort of argument that—that they had their originals, so how could you argue that you had stolen property?”
- “And it quickly sort of escalated to this really interesting kind of free speech first amendment issue that—could the tobacco companies essentially exercise prior restraint on us to keep us from making the material accessible.”
Focus on sharing science pre-print data
Physicists want to share their pre-print data.
- “Physicists gather data and they analyze the data and—and they’re always sort of on the cutting edge of—sort of computers and the idea of sharing information.”
- “High energy physicists from the beginning are—they’re all about the pre-print…they’re more interested in actually sharing their information even before it gets published.”
- “So it’s just—it’s just this—very much this dissemination kind of culture.”
- “There are only usually a few labs around the world…SLAC [Stanford Linear Accelerator Center] or CERN or KEK in Japan, but you’ve got—you’ve got physicists from all over the world that want to use those machines, so you might have, you know, 2 to 3 to 400 scientists all working on one experiment, but they’re all very interconnected and they—again, they share all of their information.”
- “The physicists are distributed world-wide, and there is—it is no coincidence that Tim Burners-Lee was at CERN and that he invented the Web because of that need to essentially connect physicists.”
Be prepared for backlash (or worse, death threats)
Not everyone in the field will agree with you, even if you feel that you’ve done a very important public service.
- “There were our defenders, that really felt that, you know, we had done something fantastic, that it was just like the Pentagon Papers, that it was this great—great step forward to have put forward all this primary research material. I mean, and that was really what it was. This was making primary research material available on the web. It was a huge thing.”
- “Understandably, the archivists that were working in the corporate sector, were pretty concerned about it…that we had violated—what were archival ethics…essentially violated the trust that an archivist would have with a corporate organization and that we had—we had done the wrong thing.”
- “And not long after the publication I got—I got some—essentially what was sort of hate mail from an individual who said, you know, you have no right to do this and you better watch your back because the bullets are going to be flying.”
Committing to digitizing on a massive scale
When you have information, users and the mechanism to make the information available to users, make the commitment and digitize the content despite the size and cost. You have the “why”. Now focus on the “how”. Find the funding and get it done.
- “Based on what we had done already with Brown and Williamson, and then after that we took on the “Joe Camel papers”, which was another litigation that we put the materials up, that—that it was just—it was just clear to me that no matter what the size, that it was going to be used.”
- “And I thought that no matter what it was going to cost to do it, that in some sense the idea that you would put 30 million electronic documents up online was, yeah, a formidable challenge, but it’s kind of like, how many other times in the history of information are you going to have the technology to make something available and then a really good sense of the fact you’ve really got users.”
- “I mean, this is going to be used…you’ve got a mechanism, you’ve got information, and you’ve got users.”
- “I mean, this was scale. This was revolutionary. And it just felt like something that should get done.”
- “They embraced it and we started looking for money, you know, for how to do it.”
Archivists are really powerful people
You need to understand the “real power of records” and “the power of what archivists do”. At the same time, you also need to understand that there must be a balance between that power, the consequences of using it and your institutional responsibilities. Robin Chandler recommends reading Archives Power by Randall C. Jimerson and The Ethical Archivist by Elena S. Danielson.
- “They’re both people I really admire but they—they speak to me about—the real power of records. And the power of what archivists do.”
- “Archivists are really powerful people…we really are powerful people, and Rand, very, I think, eloquently in the sense of thinking about how we have very important roles in the creation of history and memory and—we have very important jobs to shape what happens.”
- “The really important things are thinking about that balance. Because I’ve thought about it…it was a very profound thing to know that one was putting archivists that were corporate archivists in some kind of jeopardy by what we had done and…I considered that a lot. And those were very—they were poignant arguments that they were making.”
- “Understand how powerful you are. You know, don’t doubt it. Don’t doubt it at all. It’s a very powerful, powerful job, definitely.”
Archivists are part of the larger conversation
- “There’s even kind of a larger dialogue that we’re a part of, and it’s especially prevalent now with the interconnectedness of the web that sometimes you need to—help participate in a—in a debate that would end up in a reasoned discussion and hopefully a reasonable way forward.”
A Grateful Dead Archive? Of course!
The Grateful Dead Archive Online (GDAO) http://www.gdao.org/
From About Us page:
- “What is GDAO? The Grateful Dead Archive Online (GDAO) is a socially constructed collection comprised of over 45,000 digitized items drawn from the UCSC Library’s extensive Grateful Dead Archive (GDA) and from digital content submitted by the community and global network of Grateful Dead fans.”
- “On April 24, 2008 band members Bob Weir and Mickey Hart announced at the San Francisco Fillmore press conference that the group was donating its archives to the University of California at Santa Cruz (UCSC) Library Special Collections. There was a great deal of excitement about making the archives available as a research collection and as a resource on the Internet where the band’s thirty-year history could be interpreted through educational use of archives and artifacts.”
- “The GDAO website is powered by the open-source web-publishing platform Omeka supporting the display of collections and exhibits, social media tools and the uploading of user contributions as well as the community development of plugins to enhance the software.
From Robin Chanlder:
- “And there are people that…dismiss the Grateful Dead for certain associations with cultures of the 1960s, and of course that’s all true, but at the same time, it’s clear to me that historically, in another 100 years, that period of time will be looked on like any romantic period.”
- “You go back, look at anything, like the 19th century, mid-19th century and romantic movements that were happening that were—sort of, in a sense rebelling against the industrialization and these things happen all the time…It’s just part of—it’s part of youth, it’s part of history, to re-envision the world. And in that sense, you know, that’s what the Grateful Dead was part of.”
- “What’s interesting just as an historian is also to look at just how the band changed over those thirty years. And that’s very interesting because you don’t really see—you don’t necessarily have an archival record for something—a group like that.”
- “That period of time deserves to be documented as well, and it should be documented and I think it’s—again, it’s just—it’s part of what we do and what you need to do.”
Advice to young people entering the field
Seize opportunities, be brave, make sure it fits with your moral compass and go for it!
- “Seize opportunities…there are events that happen around you that can lead to really important things…the youngest person in the room can actually contribute a great deal to something. I’d also say to be brave…don’t be afraid…you can really make a difference.”
- “I’d say go for it. You know, when it comes, just go for it. Don’t let the naysayers stop you. You do have to make sure that it fits with your own moral compass. I mean, that’s the most important thing, that it’s something that—morally that you feel comfortable with...And that’s really important because whatever you do decide to do, you’ll need to live with it. But if you’ve got that…then you’ve got everything so…you’re fine. Go forth and prosper.”
Moving archivists and librarians into data management
We need to address records management at the beginning of the lifecycle during R&D. We need to work with faculty and researchers and discuss documentation strategy. Ask questions such as: What faculty papers are important to save? What data do we need to capture?
Manager of Open Collections Program at Harvard University Library
Steve Chapman, who has been at Harvard since 1996, discusses the importance of infrastructure in supporting the long-term lifecycle of collections. More importantly, Chapman urges libraries to take the risks now and aggressively identify materials of value, especially materials in the 20th and 21st centuries that are truly at risk of being lost, and assert our role by making copies or changing the system so that we can assert the right to make copies. “Because to do nothing is to make the stuff obsolete.”
“I think that the measure of our success ten years from now will largely map to how open we are. Not how much content we’ve made, not the quality of that content, how good it is, but how open it is.”
The process of developing best practices
Do, practice, distribute and engage
- “Cornell’s posture then and for a long time thereafter was to—learn by doing and then as practitioners, to then distribute guidelines and try to engage others.”
- “So you can engage other people formally by hosting workshops or you can engage other people, you know, just through the professional literature.”
- “So putting out guidelines and practices as Cornell was doing them, you know, created a vetting opportunity and others modified those kinds of things. And other—you know, future standards and best practices emerged. But certainly Cornell was interested in putting their practices out there.”
Infrastructure: vision and knowledge-building
When sharing an infrastructure, for example in a training workshop where you are sharing best practices, you are providing a framework for the audience to bring back to their own institutions.
- “And the knowledge building is just a huge piece of infrastructure, you know, just figuring—all the policies, the procedures, direction setting.”
- “It’s not just the vision thing, it’s really—just at the operational level, what is it we’re trying to do, how do we organize ourselves to do that.”
- “So I think the Cornell workshop was a good catalyst for people to go back to their home institutions with some questions and some organizing framework, particularly in the narrow realm of preservation copying of brittle materials.”
Infrastructure: staff, space and equipment
Staffing is key. You won’t know how to do everything. Know what you do not know and find a way to get those who do know to be part of your team.
- “The key piece of infrastructure that made all of this happen in imaging services was staffing…staffing, space, equipment.”
- “Everything had to be developed and I knew what I didn’t know so that’s why I got support to bring in a consultant. And that—that helped a lot. That helped a lot, to have somebody come in and help us do our needs assessment and fit out space.”
Siloed projects and the lack of infrastructure
Two major issues that the senior leadership in the Harvard libraries recognized in 1995-96 that were happening at Harvard and beyone were siloed projects and the lack of infrastructure: knowledge, systems, and services. Firstly, institutions were creating very locally-focused digital collections with their own custom metadata that were eventually not interoperable with other collections. Secondly, there was a lack of infrastructure to support the long-term lifecycle of the collections.
- “The first was that silos were emerging in the libraries and the academic departments. So you had libraries making websites, you had different people in the libraries making small exhibits and probably more importantly, on the cataloging front, you had people using different vocabularies to make different kinds of databases that were searchable to patrons. And they weren’t talking to each other.”
- “And so if we did nothing, the one thing that they observed that was going to keep happening is, of course, libraries were going to continue to embrace the internet. And they were going to continue to kind of work narrowly and locally to develop the kinds of protocols for metadata—even though we weren’t using that term then, but you know, for metadata and for digital objects. And there was going to be a problem because these things weren’t going to talk to each other.”
- “So if we give ourselves the ability to make digital things, we also needed and lacked the infrastructure of—of knowledge, systems, and services. I think of those as sort of the three legs of the stool.”
- “Sort of looking at this broadly, if we create things in digital form, do we have the means to manage, preserve, describe, permit discovery, delivery, and use. And the answer to almost all of those questions was either no or not enough.”
Access to resources from anywhere in the world
The library’s role is “to mediate between producers of information and users” and to satisfy the faculty’s and students’ need for access to resources from anywhere in the world.
- “A big mandate that—that has been there for the research library was—was always there, and I think it’s shaped the evolution of technology and infrastructure of the campus, and this was to mediate between producers of information and users. And it’s not simple.”
- “So our faculty as the top—always at the top of the food chain in universities, of course, you know, our faculty as huge consumers with huge appetites where the proverbial reader in the small—demanded quickly to—have access to resources regardless of where they were located.”
- “And so—so now as a research library’s role, our role wasn’t simply limited anymore to providing services between the content that we physically had at this institution and their needs.”
- “If we have faculty that are interested in biodiversity or religious studies or fine arts, we don’t have all of the materials that they need at this institution, and it—a lot of pressure that’s been there from our key constituents in universities -- faculty, students, independent scholars -- a huge mandate that’s been placed upon us ever since the internet has been to facilitate the discovery and distribution and acquisition of content and services from anywhere in the world.”
“Moving from a local mindset to a collaborative and global mindset”
So many initiatives and stand-alone projects have a local focus and are not collaborative. We need to move to a more collaborative and global mindset and have a portal that provides access to the very best materials from a wide variety of institutions and collections.
- “I think that it’s been a very, very long path to moving from a local mindset to a collaborative and global mindset.”
- “So if we want to develop a portal, and have sort of a best of breed experience, that the best publicly domain materials—historic materials that are available in this field, are distributed widely, let’s acknowledge that and from a collection development perspective, work collaboratively so that the user has the convenience of a portal.”
Three successful parts of a collaborative digital project
Collaboratively use technology to produce content
A distributed architecture for discovery
- “So we had the collaborative nature of the funding, the—and technology development—the collaborative nature of producing content, but the third thing that they didn’t try to do was have a distributed architecture of discovery.”
- “What was available at that time were repositories of metadata, the protocols that we are using today to create persistent links, and persistent naming and name resolution have been around for a long time, number two. So one can populate these catalog records with persistent links.”
- “And number three, we have had local—and we should, continue to have local servers, even if they weren’t called repositories, we had locally managed servers and storehouses for digital objects and to federate access and delivery of content, one does not need to have all data objects in a single repository, you just need to have links.”
- “And then you need to have policies at institutions that would say, regardless of where our catalog records end up—this is really important to me—regardless of where they end up, and in particular, when they end up outside of our own domains, for example, when we—load our content to WorldCat, we do not block access between that link and the digital object.”
- “If we—if we are truly open as organizations, our links travel with our records, and that link resolves to an object where nobody needs to be authenticated or authorized to use it.”
The next big challenge
How do we deliver content for multiple equations?
- “There are two present and future challenges. It’s the policy first and the technology -- the technology/implementation second.”
“It’s a question of openness”
Being open means not claiming rights, open APIs between repositories and not having to be authenticated or authorized to access records.
- “If we—if we are truly open as organizations, our links travel with our records, and that link resolves to an object where nobody needs to be authenticated or authorized to use it.”
- “I think that we really need to be better educated about what the rights are to this material, and I think that our behavior should reflect that.”
- “So if we cannot claim any right to our objects, then I think our behavior should be as transparent and as open as possible.”
- “So if there are things that we’ve created and acquired that truly are open for anybody to use, I don’t think that we should have any limitations to the way that they use those.
- “And the logical technical implementation that would follow from that is that we would have open APIs between our repositories and all application developers to say, you know, here are the technical means by which you can get to our objects, they’ll do whatever applications you wish because these data are completely open. We’re not claiming the rights for this kind of stuff.”
Libraries should be focused on open distribution and not on developing their own apps
- “Now, we might want acknowledgement, but if we’re not trying to control access, if we’re truly open about this, evidence of that would manifest itself in policies, in APIs, and a proliferation of technologies and tools largely—I hope, that the majority of those in the future will be ones that we don’t make.”
- “That, with all of the people who are writing—I mean, who’s writing all of the apps? For mobile phones and for Apple? I mean, there’s so many people who are writing the apps, we shouldn’t be developing apps for our content in libraries—we don’t need to. I mean we can follow.”
- “Be in the content management business and be in the open distribution business and be in the business of continuing to respect the privacy and the rights of users.”
The library’s position on open access
Libraries have a stake in creating and protecting the information commons and materials in the public domain.
- “So our position on open access in both independent and collaborative ways really, really needs to be defined.”
- “I think that the measure of our success ten years from now will largely map to how open we are. Not how much content we’ve made, not the quality of that content, how good it is, but how open it is.”
- “I really think that that—in all domains, in the legal, the financial, and the—technical domains, universities in particular with other affiliates really need to be talking much, much more about open access and what that means. And not just access to journals. Not just access to scholarly publications. Not just access to the things that we’ve digitized, but all the material that’s being created today.”
- “In the for profit realm, as big aggregators put more pay walls up, it’s going to—that that behavior of pay walls and subscriptions to things that are ostensibly free today, I think will really heighten user’s awareness—scholars and the general public—I think it will really heighten their awareness of the value and fragility of having things in an information commons.”
- “And things that are really in a public domain, that are commonly held, and we have a stake in creating that and protecting that realm.”
Deconstruction and reuse
When libraries make materials openly accessible, then we facilitate the re-use of materials in new and creative ways.
- “On the technology front I think it will be exciting to see our created material be packaged in tons of new ways. I’m really excited by deconstruction and reuse.”
- “It’s great that we’ve made collection websites, it’s great that we have digital objects, but I’m really excited about better object characterization and people just taking things apart and using components of what we’ve made in creative ways that satisfy their needs. And more power to them.”
Have a long-term and global view
Libraries need to see beyond the short-term thinking of their own institutions and view things in the long-term and collaborate with others.
- “If you have people looking at the long term, then I think trying to retain control over the commons is a more powerful mandate and open access is more powerful when you look at things in the long term rather than the short term.”
- “I think the collection development issues are very profound, that we have a collective challenge in partnering with organizations to try to have enough material that’s available openly, that can be packaged and aggregated and distributed to meet research and teaching needs into the future.”
- “No single library is going to go it alone to do that anymore. So how can we collaborate to really not only meet the needs of today’s teachers and students but also have some policies and programs in place so that when—not if—when this content is distributed electronically all over the place, that libraries will be key players in identifying, federating, packaging, aggregating these kinds of things in a way to meet the teaching function.”
- “And there’s a lot of challenges of us to do that. But I think you need long-term thinking. I think you need a global thinking now to do that, I don’t think you can look specifically to the needs of your own institution and try to build all of these services just locally. So this collaboration is important.”
Preserving 20th and 21st century materials
We must preserve more of the 20th and 21st century materials.
- “This is [the] time—that this is a good opportunity to put the challenge of triage out into the public record…and triage is the means to the end. So our end goal, of course, is to continue to sustain access literally to the historic artifacts or when that’s not possible, to good representations of those things.”
- “And so the means to facilitate continued usability of cultural heritage material—is what’s all about.”
- “Whatever financial and technical and legal and—means we can use to meet that preservation function is critical.”
- “That’s our ongoing role, to—to make sure that historic materials are appropriately described, discoverable, and either those materials or adequate representations thereof are made available to people.”
Act and make copies now
Libraries need to take the risk and aggressively assert our rights in action to identify materials of value and make copies of these materials. “Because to do nothing is to make the stuff obsolete.”
- “Too much of our attention has been focused on things that are unambiguously in the public domain where we can assert the right to copy.”
- “The University of Michigan, in partnership with Google—but they were doing this long before Google came along…I believe that they have asserted a right in action.”
- “They’re not asserting this legally, they’re not asserting this legislatively, just in their action, they are asserting a right to make copies of material.”
- “Because to do nothing is—is to make the stuff obsolete.”
- “I really think that we have to sustain all of the attention that we have been paying to our heritage, pre-1900.”
- “But we—in every institution I think we have to just really move much more aggressively—aggressively and in a much more risk-embracing sort of mindset, rather than a risk-averse mindset to—to identify material that’s of value, that’s truly at risk to being lost, and assert our role either to begin making copies or to change the apparatus around us so that we can assert—assert the right to do that.”
- “Even if we don’t permit ourselves right now to copy it, to provide any discovery services from it, just to capture this and park the data and—figure out what to do with it.”
- “And to figure out what to do with it in highly collaborative ways where you have people who are really passionate about it and have things at stake really working on—on solutions around the content to promote more discovery.”
Reasserting the library’s role in information management
We can reassert our role in information management by collaborating internationally.
- “But we are in a world of silos today, and it’s not good enough. And I don’t think—I don’t think that silos alone—merit huge public support.”
- “But the portal, the sort of central point of service to—to get people to those places, it has to be something other than one just controlled by the commercial world.”
- “So, we have Google, we have Bing, we have these different kinds of things, and the commercial realm is moving into the space that’s always been ours.”
- “And if we want to reassert our role and our credibility to work in that space of information management, organization and use, I think that we have to find the means to collaborate internationally.”
Ask the “why” question
There are people who ask “why” and there are those who just want to be told what to do. Be one of the ones that ask “Why”?
- “I think that as a manager, I try to impart rationales with procedures and some people take them up and some people don’t. And I think that people who are interested in rationale and are interested in the “why” question, I think those are the people in our organizations—whether it’s in universities or in our field as practitioners—I think those people self-identify themselves.”
- “They’re people who just kind of step up and they self-identify themselves and I think all of the people that you talked to, I hope that what we all have in common, through this self-identification, is a curiosity to look behind the what and to ask why.”
- “And you need a certain number of people to do that, and those people then depend on all of the “just tell me what to do” folks to implement it. And—and seeing all of us as being in one community where we value each other, not to get too touchy-feely about it all, I think is a positive step forward.”
Associate Dean, Library Digital Programs, Johns Hopkins University
Sayeed Choudhoury discusses his role in establishing the R&D Group that applied engineering principles to building the Lester S. Levy Collection of sheet music and the Roman de la Rose Digital Library of medieval manuscripts. Choudhoury discusses the critical role of the Digital Humanist or Digital Scientist as intermediaries between scholars and technologists.
“More than anything, how do you take data that was created for a particular purpose and repurpose it for other uses?”
Large projects require systems
Handling large collections with hundreds of thousands of pieces (i.e. developing a digital collection for 130,000 sheets of music of the Lester S. Levy Collection) requires developing a workflow in order to handle the materials in the most efficient and effective manner possible.
Delineate roles: human versus machine
When scaling up, determine the tasks or materials that either machines or humans can handle more efficiently. Can a task be automated via hardware, software or robot? Large amounts of material can often be handled more efficiently through automation. That automation can also often be outsourced to a commercial vendor.
Each institution does not have to do every part of the project. It may be more efficient to outsource parts of the project to a vendor, consortium or another library.
When requesting funding for a proposal, you will need to be able to articulate the answers to these questions:
1. How much do you need?
2. What do you need it for?
3. What are the long-term implications of what you are proposing to do?
Institutional support often determines sustainability and can be the difference between a project and a program. When you have a committed institution, then your project can become a program that is part of the institution’s long-term strategy.
Aligning your grant and project with the library’s priorities
“Because the best sustainability plan quite frankly is to align internal library priorities with the grant or the proposal priorities. If they’re out of alignment, I don’t think you’re ever going to be able to get them back in alignment, even if you’re successful with the project itself.”
A Library R&D Group
An R&D Group within a library broadens perspectives and introduces new viewpoints within the library.
Applying engineering principles to projects
Engineering principles such as integrated workflows and efficient practices can be applied to projects.
“It’s about a whole new way of approaching how people interact with this content.” – Sayeed Choudhoury
Ask yourself what the potential uses are for the data:
•How can the data for and/or from the project be applied more broadly?
•What kinds of analysis can you run on this data?
•How can the data be repurposed?
Yet unimagined uses for the data
If you digitize content for one project or create a repository infrastructure for born-digital data, ask yourself, “what can you do with it that they did not imagine?”
“More than anything, how do you take data that was created for a particular purpose and repurpose it for other uses?” – Sayeed Choudhoury
Projects are not self-contained
Although researchers may have individual digital projects that seem to have unique needs and require new hardware and technology to support specific research needs, ask yourself what other types of projects could you use the same hardware and technology for in your digital program? Gain a sense of “moving from the mindset of individual projects to these are common pieces of infrastructure we will use across our digital programs.”
For example, the success of the Roman de la Rose Digital Library caused other researchers to ask about digitizing other manuscripts. The same infrastructure that supported the Roman de la Rose Digital Library could work with other manuscripts.
Engage the scholars early in the process
Identifying faculty champions early in the process can help make your proposals stronger.
Find champions for your project
Find faculty, post-docs and grad students who can be champions for your project. These are the people that will talk about how to use the content and data. Tenured faculty members tend to be more willing to experiment since they have more security.
Data Humanist or Data Scientist
“So just as there are these technology interfaces that exist between what scholars do and what we build, there are human interfaces that are really important.”
Data Humanists have expertise in particular scholarly disciplines and act as intermediaries between the scholars with the needs and the technologists building the software or hardware to meet the needs. Data Humanists have a unique perspective on how the technologists can better meet the scholars’ needs.
The library as partner, not service provider
When the library and faculty work together to secure funding and the library is either a PI or co-PI on the project, then the library is viewed more as a partner rather than as a service provider.
Finding a sense of community at conferences
Conferences are opportunities to gain a sense of community with others engaging in similar research and share lessons, tips, and standards.
The next big technical challenge
“One of the other things I’ve learned about technology is there’s always the next big technical challenge.”
Associate Professor of the School of Information, University of Michigan
Summary: Learn about Paul Conway’s fascinating story of how a project he was doing (and complaining about) at the National Archives ended up teaching him more than he ever imagined and led to his position as Head of the Preservation Department at Yale. Are you embarking on a digital project? Conway shares how to determine if you should build or buy, do-it-yourself or use a vendor. Conway discusses how libraries need to let go of perfecting digitization standards and workflows, and instead focus on why libraries exist in the first place: to get materials and content out for people to use.
Quotes: “The day is coming when all of our fixed visual resources are going to be available digitally. And if they’re not, they don’t count. They don’t count.”
“My biggest single concern was the time it took for the library and archives community to come to a consensus over what the best way to get the job done was.”
Seize opportunities to be mentored by leaders with a clear vision and plan
If you are fortunate enough to have the opportunity to work with someone who has both a clear vision and plan, seize it!
- “So I had the good fortune of being—mentored/supervised/forced to do a very clearly defined project with a very clear deliverable that resulted in teaching me more than I ever imagined. And I came away interested and fascinated with the—with the potential.”
- “So I spent a year complaining, essentially, and doing the project. But I came out the other end of it, knowing a whole lot about what standards are, how they’re developed, what government agencies decide to use or not to use standards in—in the technology arena, how they sort out what they’re going to do, how they work with vendors, …”
- “So I was just—it was just fortuitous timing, a boss that cared, was willing to be patient with somebody who complained all the time. I’ve told this story about be careful what you complain about because you never know when what you think is the worst job you’ve ever had may turn out to be your ticket for the next decade or two.”
Project Open Book
Yale and Cornell each wrote a grant to the National Endowment for the Humanities (NEH) to digitize books. Michael Lesk of NEH challenged the cultural heritage sector and for Yale and Cornell to test the hypothesis about which was more cost-effective and generated a better quality product: to scan or to film. Yale decided on filming and Cornell on digitizing.
Conway at Yale and Anne Kenney at Cornell decided to do a dual study in which they tried to control for as many variables as possible and do as many things as possible in the exact same way. That way, the only two conclusions they could reach were:
- Is it best to scan first or film first?
- Which had better quality?
Conclusion: “We found almost no statistically significant difference between the cost of scanning from the original and the cost of scanning from microfilm. And yet there were very significant differences in quality.”
Cost, control and obsolescence are major issues when dealing with vendors. Use caution!
- “The real challenges that I faced were vendor—relationships with vendors. And I use vendors very loosely defined. In some cases it’s services providers, vendors as service providers, and in other cases it’s vendors as product developers.”
With the National Archives project, Conway discovered that “the real key to understanding the future of digital imaging in federal agencies was to understand the deals that federal agencies were making with an industry that was just itself beginning to stabilize” since “there weren’t enough international standards yet, they were starting to emerge.”
- “The—highly competitive storage medium—competition in the storage area, competition in the tools area, competition in the workflow, software area—every one of those competitive pieces generated income for private industry via contracts, procurement contracts with federal agencies.”
At Yale, Conway used a beta version of Xerox’s Documents on Demand (XDOD). Xerox’s planned obsolescence and insistence on supporting only the newest version caused Yale to sever their relationship with Xerox. Moreover, Xerox never developed XDOD.
- “In the end, what I thought was going to be about scanning turned out to be all about file management, all about vendor relations, all about storage in a completely proprietary environment, in which we were completely dependent upon Xerox for everything, from supplying the next upgrade, installing the next upgrade, telling us when the next upgrade was going to come, insisting that we buy the next upgrade because if we didn’t, they weren’t going to support the immediate previous.”
- “So the obsolescence problem, which has become an obsession with the cultural heritage sector, was—real—in the early 90s.”
Build or buy?
When does it make sense to build or buy?
- “The question was, and still is for many libraries and archives, build it or buy it. And at the time, the feeling in the mid—early to mid 90s was that it was much more cost effective to acquire technology externally.”
- “Form good working relationships with the vendor community. And in doing so, in a sense, we fostered the development of tools that meet our needs.”
- “So the idea was to plunge in, form these relationships, perhaps at the R&D level, at the beta prototype level, so that we can have some influence over what—what these tools were going to be like, so that we can continue this relationship. And it was far more effective than building these tools from the ground up.”
For small pilot projects: Build
You have the flexibility to build for very small pilot projects to which you have not fully committed yet.
- “On the very small pilot project when you’re not committed and you’re not sure. You want to experiment. Then buy some tools, repurpose some staff, find a dark room, go to work. And test and evaluate and figure out what you can do and what your organization can bear.”
For large projects: Build collaboratively
Examples of large-scale projects that used a collaborative approach include Hathi Trust and the California Digital Library.
- “If you’ve got a particular scale and a level of institutional commitment, then building makes a whole lot of sense.”
For middle sized projects: Buy
It is not cost-effective for most of us in the middle to build.
- “It’s in the middle, which is where most of us are…almost the smallest of the small up to the largest of the large, the great middle…I don’t think it makes economic or technological sense to build...It’s not cost effective to do it. For lots of reasons.”
- “But who’s going to do it? You know, it’s—the software’s free but the support of it isn’t.”
The true cost of do-it-yourself projects
When does it make sense to do it yourself or to use a vendor?
- “And the fundamental—problem with home grown digitization services is throughput efficiencies. In order to—pay the cost of the hardware and the software and the people, you need to keep the equipment running like a factory. Preferably two shifts. And libraries don’t work on two shifts. They don’t work on one shift.”
- “Vendors can run two shifts. Vendors can supervise staff who specialize in different tasks. Vendors can have seven different pieces of equipment optimized for seven different types of material. And it’s very, very—when you do the math on what it costs to actually run an in-house shop, it—it really takes your breath away.”
Fun, but expensive and unsustainable
When you do project operations in-house, you “trade fun for cost-effectiveness” and efficiency. It is also unsustainable in the long run.
- “It’s fun though. See, this is the thing. You trade—you trade cost effectiveness for fun. Or you trade fun for cost effectiveness. Yeah. And because it’s—it’s not all that fun to box all your stuff up and put it on a FedEx—truck insured and send it to Minnesota so that it can be digitized and six weeks later you unpack the boxes and make sure that the vendor did the right thing. That’s not fun.”
- “It’s fun putting your hands on a piece of equipment and getting dirty and making something happen. So—and that’s—and I can understand that because it was fun. While it lasted. It’s not sustainable.”
Libraries need to get the job done!
Instead of focusing on “getting the job done” so that users can have access to digitized materials, libraries, in their aversion to risk, want to perfect standards and workflows. Libraries exist “to get content out there that people could use,” which is “the most compelling argument to just move ahead” and “get the job done.”
- “My biggest single concern was the time it took for the library and archives community to come to a consensus over what the best way to get the job done was.”
- “The more we try to perfect the workflow, and the more we try to establish just the right—perfect technical guidelines, the more our focus and our—gaze shifts to what do the materials need that we want to digitize instead of what do the users who are going to use our digital products need.”
- “And so the more we focus on process and the more we focus on technology, the more—the more we lose sight of the fact that real live people want to get their hands on this material and that their definition of perfection or their definition of okay may be different from our technically driven definition of perfection and okay. And that gap is growing rather than shrinking.”
- Conway, paraphrasing MacKenzie Smith: “You can only take guidelines and best practices so far and at some point you have to decide to do the job.”
- “I think the biggest single argument for mass digitization is because that’s the way the world works now, and it will be working that way for the foreseeable future. So if we want to be a part of the way the world is now, then let’s get on board and get the job done. And if we don’t do it, Google, Microsoft, or other organizations that are in the content business, are going to do it.”
- “Entertainment industries, they’re going to do it. The movies are going to get digitized by the Hollywood back offices. And the sound motion picture—the sound recordings are going to get done by the people who see commercial value in the—in the bits and bytes.”
- “So we’ve got to do it. If we want to—if we want to manage it and own it and control it and foster open—uses in new and interesting ways, then we’ve got to do the work.”
Wasteful quest for perfect standards
Libraries lost time, momentum and leadership in the digitization field due to the wasteful and drawn-out quest to perfect standards. Libraries wasted so much time coming to a consensus on digitization standards and trying to perfect the workflow, that Google, Corbus and Getty Images have dominated the digitization field.
- “My biggest single concern was the time it took for the library and archives community to come to a consensus over what the best way to get the job done was.”
- “And in—in the process there’s been a tremendous amount of wheel reinvention. Issuance of new and better guidelines. Guidelines tweaked for a particular audience. Guidelines more specific. Guidelines less specific. Guidelines that are regionally based. Guidelines specially for archivists. Guidelines especially for librarians.”
- “But my research has shown that by about 1999 or 2000—I could almost put a specific date on it—the people who had been experimenting knew what the answer was and had already developed a well-documented set of guidelines that could have been—either turned into standards or could have somehow been adopted, in some—through some mechanism not fully clear.”
- “But instead we’ve spent another decade—still trying to figure out what the right guidelines are. And then Google steps in and does all the books and—Corbus comes in and does all the photographs and Getty steps in and does—Getty Imaging does more photographs.”
- “And I fear that the scale—our ability to—to migrate our collections into the digital realm at a scale that it’s worth doing—we’ve lost precious time. And lost precious momentum in an effort to perfect the workflow. And that bothers me. Even today. It—it bothers me a lot.”
If it’s not digital, does it exist?
- “If it isn’t digital, it either doesn’t exist or it doesn’t matter.”
- “The day is coming when all of our fixed visual resources are going to be available digitally. And if they’re not, they don’t count. They don’t count.”
Digitizing all photographs
It is possible to digitize all photographs since there is a finite set of photos from the last 150 years.
- “And now we can—you can’t not get up in front of people and say, well, in a decade maybe all the photographs are going to be done. They’re easy to do, they’re fun, everybody wants photographs, there’s a finite set—yeah, there’s lots of photographs, but the photographic era is over. Kodak isn’t producing film anymore, nobody’s buying film cameras. We effectively have a 150-year period in which the concept of still photography exists. Why not to say, let’s digitize everything. Like Google did every book. Just do it all.”
The marginalization of smaller organizations
Libraries can collaborate with smaller organizations, e.g. historical societies, to digitize and preserve their materials and thereby broaden their reach.
- “I worry about marginalization of the smaller organizations that want to be and are online but are highly selective. They’ve—they’ve got 27 things on their website. And there’s lots of other good stuff.”
- “Every—every one of these organizations either was founded or continues to exist because its collections resonate with someone. And there’s always something interesting and good in every one of these—they’re all valuable.”
- “That’s why I’m a proponent of large-scale digitization as an antidote to historical skewing.”
- “The more that we can do this collaboratively, the more that we can—push the costs down and do—and be realistic about the quality requirements, the more we can get done. And the more we can get done, the broader the reach. We can reach into the smaller organizations. We can reach deeper into collections that aren’t considered treasures, they’re just considered good stuff.”
Historical and cultural skewing due to digitization
What happens when a young person’s knowledge is based only on what he or she can find online? Their view of the world will be defined by what they find online.
- “And then there’s the skewing of our view of what history is all about. If the histories get written, if my high school son is going to write a history paper and it’s only stuff that he can find online, if a doctoral student is going to choose a dissertation topic because he or she has found an invaluable resource of newly digitized records and it makes it possible to do this dissertation without spending 18 months in Tanzania, okay. So then that collection is going to define Tanzania. That dissertation, that book, those articles.”
- “I worry about our grandchildren coming up with their high school textbooks, their view of the world, which is going to be written by people who write digitally.”
- “Documentary films are being made because the material can be made available digitally. Music videos, the online environment and then formal histories, English literature, on and on.”
- “So I, at the meta-level, I’m very worried about a kind of cultural skewing that is almost inexorably driven by digitization.”
- “That’s why I’m a proponent of large-scale digitization as an antidote to historical skewing.”
Senior Program Officer in OCLC Research
Ricky Erway, Senior Program Officer in OCLC Research – Erway discusses being hired by the Library of Congress right out of library school and being involved with the American Memory Project, which she describes as “the crown jewel of the National Digital Library.” Erway challenges us to focus on digitizing for access instead of for preservation.
“Getting people to follow metadata standards is almost impossible, and yet every project you hear about to this day, that’s one of the first things they do.”
Standards “are like toothbrushes, nobody wants to use anybody else’s.”
“We talk too much about it, we focus too much on it. I mean, standards—I heard recently— standards are like toothbrushes, nobody wants to use anybody else’s.”
“Getting people to follow metadata standards is almost impossible”
“Getting people to follow metadata standards is almost impossible, and yet every project you hear about to this day, that’s one of the first things they do.”
With Dublin Core, where you have ten simple elements with very few requirements, the goal is to be able to map it to another metadata schema. However, “in the end, when you’re mapping different metadata schemas, you end up just dumbing it down to the lowest common denominator.”
Archives for local use
Often, even with a well-defined standard like EAD, metadata records do not include the institution’s name since they were intended for local use. The institutional name and unique identifier would have to be supplied. Often, those were the only two dependable elements on a large project.
“Access needs to persevere”
Both the content and how that content is accessed need to be updated. Focus on access to content versus digitization preservation quality.
Free-text searching, search results and access
Focus on access quality and improving full-text access to content.
“For my whole career of working with providing access to digitized collections, I would spend more time thinking about how to provide useful feedback from a free-text search.” – Ricky Erway
Focus on ranking results to improve access.
“When you’ve got all the text in books, when you’ve got great descriptive—great descriptions of archival collections, lots and lots of words, why not let people use those words and then focus on ranking their results, offering faceted browsing through the results, finding ways to extract from all those words, you know extract personal names, extract subjects, we can do all those things now, and I, you know, wish we had thought more about that along the way.”
Special Collections: Digitizing for access
For special collections, digitizing for preservation is “slow, expensive and not very productive.” Focus instead on digitizing for access.
“In special collections, where you’re going to preserve the original collections, maybe we can just start thinking about digitizing for access, making a good enough copy to improve access, and then putting our efforts towards preserving the originals.”
In Shifting Gears: Gearing Up to Get Into the Flow, Ricky Erway and Jennifer Schaffner outline eight points describing the shift from preservation quality to digitizing for access for special collections (non-book collections, such as photographs, manuscripts, pamphlets, minerals, insects, or maps):
1. Access vs. preservation—Access wins!
2. Selection has already been done
3. Do it ONCE (then iterate)
4. Programs not projects
5. Describing special collections: Take a page from archivists
6. Quality vs. quantity—Quantity wins!
7. Discovery happens elsewhere
8. Brother can you spare a dime?
Erway, Ricky, and Jennifer Schaffner. 2007. Shifting Gears: Gearing Up to Get Into the Flow. Report produced by OCLC Programs and Research. Published online at: www.oclc.org/programs/publications/reports/2007-02.pdf
Google is the aggregator and the portal
Focus on getting materials to where users are looking.
•“Google is the aggregator...you’ve got to plan for that”
•“It’s not about offering a great interface. It’s about getting it into Google.”
•“The Library of Congress and the Smithsonian might be able to create destination sites, but the rest of us should really think about how to get our collections into Google and into, you know, the places that researchers and citizens are likely to look, so this idea of building these handcrafted beautiful portals is really sort of—that time has come and gone.”
Re-digitizing due to change
We assume that materials will only need to be digitized once. However, because standards, user needs and technologies change, materials may need to be re-digitized.
Rapid capture of special collections
Google and Internet Archive are rapidly digitizing books with advanced scanners. What about special collections (non-book collections, such as photographs, manuscripts, pamphlets, minerals, insects, or maps)?
Programs, not projects
If your digitization project is to be sustainable, then digitization has to be what you do, not what your project is. Digitization needs to be part of your institution’s program and the institution needs to commit funding and IT support.
Short-term funding resulting in short-term projects
The reality is, special projects are expected to be done on a tight deadline on top of the normal job load. When the grant funding ceases, then neither time nor labor are allotted to that special project. The result is, what was digitized and made accessible disappears.
Build compelling prototypes that tell compelling stories
A great prototype can be used to great advantage to share the vision of your project and to obtain philanthropic funding. A two-year digitization project will not have much to show in terms of results until close to the end of the two years. Building a compelling prototype with a compelling story will not only help persuade funders, it will help you get through the periods when you do not have results yet.
Outsourcing to experts
“You can’t be expert in everything and that there are experts out in the world and you should take advantage of them.”
Harness the expertise of people at other institutions, of consultants and advisors, and of outsourcing companies. For example, outsource the imaging to a company with the right type of imaging equipment and imaging expertise.
Creative outsourcing is another option. For example, the Library of Congress outsourced to a company that had women in a federal prison key the text of books. This was a cost-effective option since it was a federal program.
Lessons in collaboration
When working on a multi-institutional project, it is difficult “to motivate people to pull together when they all have different timelines, different priorities, different ways of doing things.” On the flip side, it is satisfying “pulling together these great collections on sort of one theme and having them all accessible in one place.”
Advice to students
•“There aren’t answers to every question, and no one knows all the ones that there are answers to.”
•“Once you’ve been doing it a little bit, you’re the expert, go ahead. And experiment. And accept that you might have to do it over.”
Figure out what your niche is, develop that, stick your neck out and distinguish yourself.
“Differentiate yourself in whatever ways you can, stick your neck out, be noticed, be amazing.”
“Make weird decisions”
“One little decision can cause so many other things to happen that you end up some place you just never could have imagined.”
Senior Policy Advisor at Cornell University Library
Summary: Why do good digital projects fail? Peter Hirtle shares his wisdom gleaned from working on a myriad of successful, and not so successful, projects at the National Library of Medicine, National Archives and Records Administration (NARA), and the Cornell Institute for Digital Collections where he served as the Director. Hirtle addresses critical issues in the future of digitization: linking siloed digital projects, mashing-up small projects, and getting people to think about how their own digital project fits in with other collections, as part the national construct.
Quote: “Parking digital images away in a dark archive or in a preservation system is always a bad idea…You need to have them be part of a live system…As soon as digital data is not being used, then it’s likely to be destroyed.”
Reasons why digital projects fail
Good digital projects can fail because:
- The materials are in formats that are: unusual, proprietary, expensive to work with (and you cannot justify the cost)
- The quality of the results is poor
- The approach taken is technology-focused instead of user-focused
How to approach your digital projects
Be user-focused and assume that you will not be able to go back and re-digitize the materials. Ask these questions:
- What is the nature of the documents?
- What is it going to take to have the full informational capture?
- Do not approach it from a technology-focused standpoint where your focus is on what a particular piece of equipment can do.
On Anne Kenney’s approach to digitization
Anne Kenney is the University Librarian at Cornell University who was instrumental in using a user-centered and research approach to digitization. See Anne Kenney’s interview on the Digital Pioneers website (http://digitalpioneers.library.du.edu/).
“You want to create rich enough files to support a multiple range of uses because you may not be able to go back and scan again.” ~ Anne Kenney on digitizing files and focusing on the highest image quality and the broadest range of use
- “It was just—Anne’s (Kenney) perspective of not sitting down and saying, what is it that the equipment can do, which is what Bill Houghton had been doing, but—with the—the Honeywell equipment at NARA, but instead sitting down and saying, what is it—what’s the nature of the documents, and what is it going to take to have the full informational capture? That just struck me as a really—clever and—right way of approaching it, on the assumption that we weren’t going to be going back and—and being able to—do this again.”
- “Anne’s (Kenney) really great contribution is—saying, don’t do that, think about the document. Think about what you’re trying to capture. And I worry that not enough people remember that lesson. That they just have kind of—either say, oh, 600 dpi sounds good, that’s what everyone uses, and forget why we came to that number—“
- “But I thought that—the fact that she sat down and approached things as a research question and did—you know, the kind of analytical work and then wrote it up and presented it—so her studies on—her first work on image quality, her articles, Steve Chapman on the metrics for image quality that you’ve read in D-Lib Magazine, their report on converting micro—is it better to first microfilm and then go to digital or digitize and then try to produce microfilm for it. Their early report on using Kodak photo CD technology as—a mechanism for color images, that’s all really—top quality stuff. And very, very impressive.”
Capturing artifactual qualities
With digital imaging, you can capture artifactual qualities, e.g. color and sense of the page.
- “In reality now, we realize that digital imaging is better than preservation microfilming. Yes, you can capture the words in the book, but you can also capture the color, a sense of the page, more of the artifactual qualities.”
Keep control of your data
Proprietary software puts structuring information into a proprietary format. When that structuring information is encoded, you may not be able to access your data.
- “Always be able to export your data. Be careful about proprietary solutions.”
On the original Making of American project
Digitization is a means for preservation. Preservation requires use. When the digital objects are simply preserved in storage, then they are not used. Those files will then be forgotten and due to technical obsolescence, will become inaccessible.
- “Using digital imaging for preservation purposes, as opposed to preserving the digital object.”
- “And so the digital imaging was being done in order to produce new books. And in some cases, tipping in the illustrations from the original books into these things. And the images were then just stored on a server. And only after the fact did we sit down and say, oh, we could actually provide access to these images as well.”
- “You know, the primary thought was to make the analog replacement for the scanned item. But even those images were on a server that started to fail because it had been sort of forgotten about and it took a heroic rescue effort to get the images off of that and save them.”
- “And that convinced me that parking digital images away in a dark archive or in a preservation system is always a bad idea. That you need to have them be part of a live system, that as soon as digital data is not being used, then it’s likely to be destroyed.”
Converting a project into a long-term sustainable program is difficult, especially if the project is “a bit outside of the mainstream.”
- “The thing that I think we’ve done really poorly on…is turning the projects into programs.”
- “…the idea of trying to turn this into an—an ongoing program was—is difficult. And I’m not sure that we’ve ever solved that problem.”
Democracy and insularity
It is now very easy for anyone to digitize a collection and post it up on the web. But you need to ask yourself these questions:
- How is your digital project going to fit in with other collections?
- How is this part of a national construct?
-- “And that anyone can buy a scanner and set it up and start throwing some things and making a website and -- generating—files. And so it’s sort of—very democratic. But that leads to—I think a little bit of—insularity.”
-- “People get focused on sitting down and saying, well—what’s my special collection? I’m going to sit down and digitize it because this is really special to me.”
-- “They don’t think about this as being part of a national construct. Anytime now when I sit down and hear that we’re digitizing—somebody X’s collection on—library X’s collection on whatever. You know, you sit down and say, but how is going to fit in with other collections that are like that, how’s it going to work?”
Disappointments with duplicated efforts
With preservation microfilm, there was an effort to reduce duplication. However, in digitization, because there is no one coherent and comprehensive cross-collection catalog, there is a lot of duplicated effort.
- “Why is it that there is, as far as I know, no one place where you can find out if there’s—if a work’s been digitized or not?”
- “I’m so disappointed—I was looking up a book that—we had—there was some question about—in the Internet Archive. I think I found nine different versions of this book that had been digitized—by different libraries in different projects at different times and in different ways. And we were so good with preservation microfilming, sitting down and saying, this person—you know, knows how to do preservation microfilm is going to take responsibility and master it and—and not duplicate efforts. And we just ignored all of that with digital imaging.”
- “It just worries me that—you know, you may have some historical society in Ohio that sits down and says, oh, we’re going to digitize—our local histories and put them online for our users, not realizing that they’re all in the University of Michigan and they’ve all been done for the Hathi project.”
- “Is it because these were projects that were done without—outside of the normal technical services structure so that people weren’t thinking about—generating MARC records for them? But those are where—I think we have fallen down. Now, maybe there’s hope, you know, maybe they’ll all show up in WorldCat. But I fear that—Google by default will be the place where—everyone goes.”
Mashing-up small projects
The future is reference-linking disparate and distinct projects in non-standing formats so that they can interact with each other.
- “The other problem we have is when you have these distinct little projects. What we’re really interested in is having mash-ups and interactions. And they’re in non-standing formats, we don’t have an agreed upon—a standard book reader—to do reference linking and to do other things and so—maybe that’s one of—it’s not so much a regret, as where the future’s going to be.”
Critical issue: Connecting siloed digital projects
How do we link up siloed digital projects?
- “So we’ve built—20 different—you know, there’s probably—20, 50, 100 silos of information around the world of digitized information. And now the issue is how do we make them talk to each other and interact and make them easy to use.”
Issue: Lack of interoperability
If you lack good metadata for your images, then Google cannot index them.
- “And how many, you know, image databases are there that don’t even have the metadata for the images catalogued by—or indexed by Google? So again, you have to go into the—database to find it. So—it’s that interoperability that’s—becoming the important issue.”
Do what is fun and interesting to you
Do things that you’re interested in and that you think would be fun.
- “I was always interested in how we could use new technology—to—make scholarship better.”
- “And I just took to it and kept on coming up with projects that I think would be fun things to do.”
Director, Resource Center for the National Science, Technology, Engineering & Math (STEM) Digital Library
Do you see yourself as a “constructor of knowledge”? Kaye Howe discusses how the great democratization of information is changing and threatening institutions. Authoritarian structures are being turned upside down as young people see and understand themselves as creative persons who construct knowledge through video, music, etc. This is revolutionary, transformational, and also very scary to some.
On management: “The technology was easy and the people side of it was very difficult.”
On digitization: “This was really about communication and the creation of communities, and not about content.”
One of the greatest management challenges is that you may find yourself in a role where you have a lot of responsibility but very little authority.
- “The technology was easy and the people side of it was very difficult.”
You may also find yourself charting unknown territory as part a large project where no one knows exactly what needs to be done. Ask yourself these questions:
- What should it be?
- Where should we go?
- What should we do?
- What are we to make of this?
- How do we work together in this new kind of organization?
From content to communication and communities
Content, which had been in such scarcity, is anything but scarce now. The key is to shift your focus from content to communication and creating communities.
- “We were moving from content to communication.”
- “This was really about communication and the creation of communities, and not about content.”
New tools, technologies and support
When you provide access to new tools and technologies, you also need to provide people with support on how to use them.
- “Just putting things down in front of people and not providing support is really insufficient.”
- “You have to provide the context for these new tools and these new insights and these new possibilities.”
Going with the change
Be patient. Try to understand things, accept them as they are, and go with change. Realize that there is not much you can do about certain things.
- “Things take time, things go along, and you have to just go with that.”
Brain science and epistemology (how people learn)
Every day we find out more about how our brain works and how we learn.
- “How do we take advantage of what we are increasingly knowing every day and how do we pay attention to the user?”
We learn differently from each other. Ask what the context of learning is for the user. Know the user’s unique environment, which shapes how the user learns. What can you do to help the user learn in their specific environment?
- “So not only the understanding of how we learn, more and more precisely and the application of that, but the understanding in a generous way about what is the context of learning in all sorts of environments.”
- “That understanding of the learning process and the application of what we know to education -- which is absolutely the most important thing on the planet except for kindness perhaps.”
The Great Democratization of Information
The Great Democratization of Information allows everyone to construct knowledge.
- “All of this has led to a great democratization of information. And it’s changing and threatening a lot of institutions.”
Authoritarian structures are being turned upside down as individuals see and understand themselves as creative persons who construct knowledge.
- “But here are little kids taking stuff and putting it together and—and having a kind of auto—autonomy about how they learn and their own creativity and a confidence in that. That is truly transformational, really wonderful. Also very scary for a lot of people.”
Associate Professor, University of Washington Information School, MLIS Program Chair
Joseph Janes discusses the Internet Public Library (IPL) and what began in 1995 as a two-credit special topics course and quickly grew into the first global and freely available online library reference service in the world that is still thriving today.
“Never underestimate the power of a good idea and people willing to work themselves like dogs to see it through.”
Special topics course on the impact of technology
The Internet Public Library (IPL) (http://www.ipl.org/) started out as a special topics course on the impact of technology. This type of course would be an excellent way for LIS students to gain knowledge about current and relevant technologies.
What can the library learn from others and what can others learn from the library?
Ask yourself what the library world can offer to new environments, including new communities, initiatives, projects, issues, problems, challenges, etc.
Enthusiasm, excitement, commitment and potential
Even if you do not know exactly what the process will be, go for it! Because when people with a lot of enthusiasm and excitement commit to an idea, then that idea has the potential to be catapulted into action and grow into something that has long-term impact.
A project can be an immersive learning experience
Working on a project can help you gain experience in different kinds of processes that librarians go through out in the field. For example, when developing an online collection, you would need to create a collection development policy and determine your criteria for selection and what the collection should cover.
Look to the past
Especially for those of you who are just starting out and for those who discount the past: “The more you look at what has come before, the more you will find that it’s familiar going forward.” Today’s issues have been dealt with in a similar form in the past. Looking to the past and seeing the evolution of ideas can help you gain perspective on how to manage current issues. For example, the developers of the IPL researched the literature on phone reference when determining how to do online reference.
It’s easier to ask for forgiveness than for permission
In certain circumstances, it may be best to boldly go forward and execute without seeking permission first. You can ask for forgiveness afterwards.
Public perception and public relations
You can help shape public perception by the way you frame, describe, promote and brand your project. For example, the IPL’s PR group sent out a press release about the launch of the Internet Public Library that generated so much interest that 3,000 people immediately signed up for the listserv. Publicity helps to generate interest, which in turn, helps to encourage trial, acceptance and use. Hence, user adoption can be highly dependent on public perception.
Be prepared for pushback
Not everyone will be excited about your new ideas. There are some in the library profession who fear change. This fear runs the gamut from a healthy amount of fear of the new and unknown to complete paranoia.
Embracing and rejecting innovation
Although there is a strong tradition of innovation in librarianship, there is also resistance to change. This co-existence “simultaneously opens the door for us and inhibits us.”
Create a sustainability plan
The long-term success of your project usually depends on having a sustainability plan for the management and funding of the project.
Anything is possible
“Never underestimate the power of a good idea and people willing to work themselves like dogs to see it through.” Anything is possible when a dedicated group in sync works hard together to reach the same goal.
“It’s exactly the same and completely different.” – Dave Carter
“The idea of being able to embrace and enfold previous practice, but also to invest it with new ideas and not be beholden to the past, but not abandon it altogether either—I think a lot of that is bound up in exactly the same but completely different.” -- Joseph Janes
“There’s a lot to be learned from the heritage of librarianship and archival work and museums and other cultural heritage institutions but you kind of have to be able and willing to look at that with fresh eyes, leave some of it behind, grieve it a little bit if you do, take the stuff that makes sense, and be willing to add to it and think about it in an entirely different way.” -- Joseph Janes
Familiarity makes adoption easier
When new ideas are familiar to communities, colleagues and clients, then it is easier for them to accept and adopt those ideas.
Branding: The power of the word “library” in Internet Public Library
Because they called it a “library” and because they acted like librarians, then it was easier for people to understand the concept of the IPL.
There’s no way to fail
Joseph Janes had five goals for the IPL course and the last of these five goals was “There’s no way to fail.” This created a net, a safe space in knowing that the outcome was not relevant. The work itself was what was valuable because the students learned so much in the process-- about web technologies, the internet, librarianship, service, collections and evaluation.
University Librarian at Cornell University
Summary: Imagine capturing the detail in a one-millimeter high character in Bodoni Italic font. Anne Kenney discusses how the Great Collections Microfilming Brittle Books Program, which was focused on preserving our national and international heritage, led to standards for benchmarking digital imaging that were eventually adopted by JSTOR and Google. Kenney also shares why she thinks Special Collections is “an area where there’ll be some heavy mining for digitization” and why libraries put themselves at risk when they limit access to Special Collections and fail to meet “our traditional time-bound responsibilities to promote scholarship and learning.”
“I think you don’t want to be on the bloody cutting edge very often. It is much better to be a clever adapter, to take things from different places and build something than to try to be an innovator all the time.”
Talk about your project
Discuss the projects you are working on with others. You never know who might either be involved with or knows someone who is involved with a related project. Be open to others not just in the library world, but to those in companies, non-profits, and other industries.
Focus on the highest image quality and the broadest range of use.
− “You want to create rich enough files to support a multiple range of uses because you may not be able to go back and scan again.”
How much metadata is enough? Switch to thinking from the maximum to the minimum essential level that you can efficiently collect.
− “I think in the metadata realm—I was more interested in not how much we could collect, but how little we could collect to meet our needs. And so there were some fairly elaborate standards for preservation and—and other descriptive metadata that are—that are pristine and beautiful but—but probably aren’t going to be fully implemented because of the overhead associated with them.”
− “The RFP that we developed—in looking for a vendor had both the standards for the imaging process and the quality control but also the essential metadata.”
Grants are a blessing and a curse
Outside funding comes with great and lasting responsibility. You own the responsibility to mainstream and sustain your digital projects. This responsibility will last long after the life of the grant.
− “It was easy to get funds to do imaging projects. It’s—it’s become obviously tougher over time. But—you know, it’s a blessing and a curse, to have—outside funds to do such work. Because ultimately the issue of mainstreaming and creating sustainable paths for—for keeping such materials haunts you.”
Integrate your projects with your institution’s mission
Outside funding allows you to do research and development. However, this is often done in parallel, instead of as an integral part of your institution’s mission and goals. Sustaining a project long-term requires institutional anchoring and commitment. Your project needs to be integrated into your library’s day-to-day operations.
- “So—once a project is done, you’re—you are then left holding something that hasn’t been well integrated into the—the mission and goals and processes that an institution may have. It’s done in parallel. And that is highly problematic.”
Don’t mix up research and production grants
Research and development grants are smaller grants where you do research and there is flexibility to change course as you learn more. Production grants are focused on output, the end product(s) and quality.
- “You can either do research and stop to admire what you’re doing and sort of change course as you learn more, or you can do a production grant, where, you know, throughput is the big deal. Quality and throughput and reliability. But mixing and matching the two is—is problematic.”
Think before you buy
Invest the time into determining what the problem is before immediately deciding on a technical solution and buying equipment than does not actually solve your problem.
− “Buying equipment and then determining what you want to do with it is—an all too common occurrence.”
− “Oh, we’re going to do rare books so we bought a flatbed scanner. Well, you know, how are you really going to capture those rare books? Because you’re not going to disbind them.”
Don’t be so enamored with the money: Think before you apply
Ask yourself “what is it that’s critical for my institution moving forward?” Will receiving the grant actually detract you from your institution’s mission and goals?
− “I can get a grant to do this. It’s going to be a hard sell back home, but man, there’s a lot of money there that might be really cool to have!”
− “You understand and not be—distracted from, first and foremost, is this good for the institution, does it support its mission, is it—is it a priority that means that funds will be diverted from somewhere else?”
Be “a clever adapter”
Don’t try to be an innovator all the time. Innovation = high risk. Failure can be very expensive. Instead, think broadly and efficiently about how you can inexpensively build on others’ innovations.
- “I think you don’t want to be on the bloody—cutting edge very often. It is much better to be a clever adapter, to take things from different places and build something than to try to be an innovator all the time.”
− “Because innovation has such a high risk element associated with it for—for failure. And—and failure in the digital access realm is quite dismal to everybody else. So—and a lot of time and effort can be spent in that.”
Beware of gifts
Gifts may often cost you much more than the value of the gift itself.
− “So you’ll often start with something that seems like a really—great thing and then end up paying over and over and over for—for those kinds of gifts that come through.”
Meeting the service expectations of “uppity users” from all over the world
When you make a collection available on the internet, you are committing to servicing that collection. Expect to serve users from all over the world with different needs and demands.
− “In fact, when you put something out there and you provide it freely accessible, it’s naïve not to think that you are serving a much broader community, which can have very different needs for what they’re doing. So uppity users of the world, unite.”
− “The amount of service expectation that came with all of these customers and all of these users around the world was phenomenal. And we had to not only to provide technical support for them but also to provide a lot of reference support for them.”
Letting Special Collections be mined for digitization
Libraries have the responsibility to make materials available, including Special Collections. However, libraries have been held back by fears of litigation. Limiting access to Special Collections actually puts libraries at risk. Libraries need to “meet our traditional time-bound responsibilities to promote scholarship and learning.”
− “I think special collections is—an area where there’ll be some heavy mining for digitization.”
− “That the focus has been for so long now on avoiding risks of litigation. And by doing so, we have curtailed what have been traditional roles that libraries have played in society, which is to make material available for use and new knowledge and creative expression.”
− “We have to be very mindful of privacy rights, of donor rights, of expectations for users. And not be so fearful of asserting fair use rights.”
− “I think that—institutions—given the pressure from external forces, have been relatively timid about—supporting risk-taking in terms of providing access to materials. And I think we are—putting our institutions at risk by being so timid about doing that.”
Don’t be a one-trick pony
Think broadly, deeply and holistically about your skill sets. Your job description will change over the next two years.
− “So broader thinking as well as deeper—appreciation of some areas of specialization. But not so narrowly defining yourself that—you know, you don’t want to be a one-trick pony in terms of what the needs are.”
- “It’s a constant cycle of reinventing, relearning, appreciating new things”
- “That you not box yourselves in—oh, yeah, that’s our—that’s our imaging person.
- “That you have a holistic understanding of the full scope of what it means to fulfill missions in a really changing environment.”
− “I think most staff at Cornell have different jobs than they did two years ago. Even though—you know, it’s a constant cycle of reinventing, relearning, appreciating new—new things that come down—the pike. But don’t lose the enthusiasm.”
Get mentored up and down
Cornell has an effective mentoring program where the staff is “mentored up and down.” Find mentors up and down staff ranks. Everyone has different skills and talents that they can share with others.
− “Being mentored up and down I think is absolutely—absolutely key.”
Manager, Western History & Genealogy, Denver Public Library
James Kroll discusses the beginnings of DPL’s Western Heritage Program, participating in the American Memory Project of the Library of Congress, and successfully becoming self-sustaining.
“In many ways, it’s the public that drives the vision.”
Augie Maestro Giuseppe: A man with a vision
For DPL’s Western Heritage Program, Augie Maestro Giuseppe had a vision “that photo digitization could benefit the researcher in many ways.” Many of the projects discussed by the Digital Pioneers began with one, two or a few people with a shared vision. It was often a vision of creating improved access.
Sustainability: Developing funding sources
That DPL has been able to sustain their digitization program through photograph sales is proof that engaging the public makes the data more valuable and is a win-win for the library and the public. Invest income back into your project so that it can help make your project more sustainable by paying for salaries and equipment. Owning the equipment can simultaneously save on labor and generate more income and improve turnaround times. Determine the owner of the data at the outset since this will affect your ability to generate income from the sale of items based on the data. Creating greater accessibility increases relevance and renders the data more valuable since it has more meaning to more people.
Sustainability: The cost of funding your project long-term
Look at all the costs of funding your project long-term, including human resources, physical resources (i.e. building space and data storage), financial resources, new technologies and the cost of replacing old equipment. Like a business, you need to have reserves in place for both anticipated and unanticipated costs.
Lack of standards, best practices and a workflow process
You may have to develop your own standards, best practices and workflow processes since none exist yet.
You will need a person on your team to have technical expertise: software, metadata, catalogers, researchers, etc.
Increased demand and income due to greater exposure
Becoming part of an even larger project will expose your project to a greater audience. Being invited by the Library of Congress to join the American Memory project lead to much wider exposure for DPL’s Western Heritage Program. This exposure, in turn, lead to a dramatic increase in photograph sales. For the past ten years, annual photograph sales have grossed between $130,000 and $150,000 a year. This income has paid for salaries of technicians, supplies and equipment and has essentially allowed the program to become self-sustaining as a result of these sales.
Collaborations and public demand open up your project to different materials besides photographs including maps, works of art, architectural drawings, manuscript collections, rare books, etc.
The public’s demands drive the vision
The vision needs to respond to the community’s needs. “In many ways, it’s the public that drives the vision.” – James Kroll
You will need to become more creative in your grant applications, i.e. making the project not just about photo digitization, but about photo digitization as part of a larger project.
Main issue: digital preservation
The main concern is how long will the digital files last.
Choosing a vendor
Check to see if a vendor has consistently made enhancements to their product. This will be indicative of their investment in and future support of the product.
Advice for students preparing to work with digital objects
“You want to be as flexible as possible. You want to be as nimble in your thinking as possible.”
Executive Director, Coalition for Networked Information
Summary: How do digital collections connect communities? Clifford Lynch discusses how the Library of Congress has successfully used crowdsourcing to catalog their photo collections on Flickr and how this has resulted in connections, conversations about family, genealogy and local history, and “really deep storytelling.” What happens when museums digitize entire collections? Typically, museums are only able to display the top 5% of their holdings at a time. Learn about the new “encyclopedic museums” will revolutionize the way scholars access, understand, and analyze materials.
Quote: “The whole strategy of imaging important collections of cultural heritage was really a stewardship and survival strategy.”
Issues in integration
In the past, the workflow process was adversely affected by the siloed nature of the various parts, e.g. metadata and imaging.
“I would say the biggest hole though, was around—integration and delivery. So you had one silo where you could do the metadata stuff, another silo where you’re doing the imaging stuff, a real…workflow problem connecting the two reliably, and then the problem of how to craft something to allow retrieval and delivery across this whole mess, which was really quite a different world than the production workflow for it. So things were pretty bad back then.”
“You still had…an integration problem though. There weren’t good platforms for building a system that really combined text and—imagery nicely.”
Digital collections: “stewardship and survival strategy”
Libraries have the responsibility to be good stewards and ensure the preservation of material. While they do not protect against loss of original material, digitizing provides a record in case of damage or a natural disaster.
“The whole strategy of imaging important collections of cultural heritage was really a stewardship and survival strategy.”
“It was something that institutions charged with stewardship were going to be obligated to do to be good stewards and to ensure the preservation of their material.”
“Probably the strategy going forward would be to have digital records of the material and then the underlying material and that that would give you…certainly not protection against loss of the underlying material, but at least leave you in a much better place given the, you know, ugly multimillenia history of wars and natural disasters and things that have damaged so many collections.”
Issue: Color fidelity
Capturing color in high fidelity is very difficult.
“In terms of the sort of underlying technology, probably the thing that I didn’t see coming—was how much trouble color fidelity was going to be.”
“In terms of the amount of grief we’ve had with standards around color space and color management and calibration schemes, I mean, this is still a big headache for—workflows that in—that where you’re really concerned about, you know, the capture of color with—with high fidelity.”
Issue: Lack of standards and interoperability
The lack of standards and the multiple ways of doing things results in systems not being interoperable.
“Another area which—I think we—we would have been so well served to get out in front of with a good standard 10 or 15 years ago is the situation where you’ve got—textual material imaged and
then you’ve got a—an OCR transliteration attached to it. And you want to—work with those two as connected objects essentially.”
“And people have way too many different ways of doing this today. I mean, it’s a disgrace. If you look at—for example, the—projects that people like Mellon have been funding to digitize manuscript collections—a lot of those still don’t interoperate.”
“And that’s the kind of thing where, if we’d made a strategic investment in some standards and maybe some reference software that could have been given away open source or very cheaply—early on, it would have saved a lot of pains, some of which is still to come as we—get all this stuff into some kind of homogeneous form.”
Issue: Scaling up
Success in scaling up depends largely on funding from a committed institution and not attempting to scale too early in the process.
“When you look at the history of a lot of this digitization—at work, there—there was a lot of overpromising, and it’s still very hard for people who make large scale funding commitments, and I don’t mean, you know, here’s a—here’s a small, you know, project that we’re going to get a grant for to digitize something, but I’m going to make an institutional commitment at scale.”
“It’s still really tough, I think, and has been, over the last 10 or 15 or 20 years to figure out when the right moments are in terms of cost performance and quality of the technology to make those choices. When—when to move from little pilot projects and when to do something at scale.”
“And so there’s been a history of attempts to do things at scale too early, where the technology was overpromised to management and funders and then a lot of money spent and not much result.”
“That’s a very, you know, different environment and I think, you know, sort of collectively, everybody didn’t do as well as they could have in terms of thinking strategically about when to fund pilots, when to fund fundamental research and how to communicate the outcomes of these two policy makers, essentially, who would drive or—you know, decisions to do large-scale deployments.”
About the optiputer (http://www.optiputer.net/) as described by Clifford Lynch
Dr. Larry Smarr of UC San Diego, who ran the National Center for Super Computer Applications, has been a part of pioneer in high performance computing for about 20 years. Dr. Smarr’s interests lie in the high performance visualization of models, e.g. astrophysical phenomena or biological phenomena coming out of super computer simulations. Because some of these datasets are extremely large, i.e. 600,000 pixels on one side, which when squared, is gigantic. Dr. Smarr worked on display devices for these large datasets.
Dr. Smarr built optiputers, which are walls of large LCDs that can handle a large number of pixels. The 24-inch LCDs are lined up 20 across and 4 down. Dr. Smarr uses a Beowulf cluster for graphics management. IO (input/output) drivers are used on one or two of the monitors on each of the parallel machines in the cluster. This allows for backing up the monitors with enough computational power and storage to allow for zooming, dragging and dropping, etc.
“So when you start thinking about medical imaging or simulations, certainly people are starting to work with this.”
“They’ll work with it for cultural heritage, too. So—you know, you’re going to see large paintings reproduced even larger on this thing, and an ability to—zoom in on detail that—you know, is a bit different than what we’ve had to date.”
“I think we need to be—you know, kind of cautious about—that—how much is enough resolution. You know, I remember back in the day, we underestimated that more than once in the interest of proving that a system would be affordable on an engineering basis. Now, I think this is really only going to apply in cultural heritage to things that people really need to see the details of, and we’re going to need to make some choices about this and it’s clearly silly to do this for—you know, a lot of hand manuscripts and things like that. We just don’t need that kind of resolution.”
Reading ancient handwriting: “a bigger paleography problem”
Paleography is the study of ancient handwriting and inscriptions. We currently have an issue of the younger generation not being able to read handwritten text.
“Just as a sideline on OCR, one of the things we probably are going to need to spend some R&D money on is OCR for more kinds of texts and for handwritten texts and things like that.”
“You know, it’s all very interesting to hear this rhetoric about—about engaging people with primary sources. But the fact of the matter is that—most kids aren’t taught handwriting anymore, really, of the sort of Victorian copperplate kind, and you show them 19th century letters, handwritten letters or manuscripts, and, you know, they may as well be looking at a, you know, 10th century manuscript.”
“So—things that we can do to—to help with, you know, OCR and transliteration of those. I mean, maybe another way of saying it is, we’ve got a bigger paleography problem than we’d like to admit, so I’m mindful of that one.”
3D as “one of the next frontiers”
Clifford Lynch sees the digitization of 3D objects, e.g. buildings and statues, as one of the next frontiers.
“So those are a few of the things that—that I think are real issues on the capture side going in. There’s also a question about how we do 3-D capture. We’re starting to see a lot more work on imaging 3-D objects and there’s a whole lot of different strategies ranging from the kind of old fashioned one of just, you know, you document it from all four sides and the top if you need to—through things where, you know, for statues now, we’re doing these laser scans.”
“I mean, we can do whole buildings and statues and stuff like this. So the whole question of digitizing 3-D stuff is still very much on the table and—and I think is going to be the kind of next frontier—or one of the next frontiers.”
Museums typically display the best 5% of their holdings at a time. Digitizing museum collections in their entirety will provide more context and change the way scholars do research, e.g. it will allow scholars to compare and contrast objects within the collection; scholars will be able to analyze how objects and practices changed over time.
“So now the question is, how do you build tools to let people understand the variation across 100 objects? And if you think about a lot of scholarly work in museums, it has much the same quality, right? Think about—you know, those sort of endless Greek vases. Now, you know, what you do with an exhibition is you put out, you know, about half a dozen particularly nice examples. But what a lot of the scholarly work is really about—is understanding what’s sort of typical about these and what’s unusual and it means looking at lots of examples of these and trying to understand their similarities and variations.”
“A big museum, maybe, can exhibit maybe 5% of its holdings at a given time. And the whole way people understand museum collections is going to change radically as they can get to representations of the entire collection. And they’re going to want to do this kind of analytical stuff across large numbers of—you know, not necessarily individually stellar examples, but—to understand how the production of things and the practices changed across time. So you’re going to see—I think, a whole line of development around these kinds of systems—accompanying the move to open up the collections of large kind of, you know, encyclopedic museums.”
Pictures are difficult to catalog
It is very difficult to adequately catalog pictures because they are so rich and they require subjective interpretation.
“Pictorial material is, at some level, almost impossible to comprehensively describe because it’s got so much in it, it’s so rich. Often it has interpretations that are influenced by cultural context, allegories and things or historical representations in very complex ways. So we’re very bad at describing images even in those rare cases where we can throw huge amounts of human time at it.”
“On top of this, the actual fact is that most of the time, we don’t have the money to throw the human energy at it and to do elaborate cataloging of this even if we could. So it’s not uncommon to find things on the net like, you know—20,000 photographs of New York City street life in the 1950s.”
Digital Pioneers: Valuable Lessons for Students 4
Interviewee: Clifford Lynch
Crowdsourcing, conversations, critical mass and “really deep storytelling”
The Library of Congress is effectively using crowdsourcing in asking the public to help identify and describe photos on Flickr.
“If you look, for example, at the experience the Library of Congress has had, and they’ve done a very good report on this, putting up photographs on Flickr Commons and dealing with the conversation around it, you know, you start realizing that, for instance, there are lots of people around who are very interested in aspects of material culture. You know, airplanes, trains, machines, cars—there are lots of people who are interested in genealogy and family and local history. And there’s a ton of material kind of in private hands.”
“Now, the place where this all kind of you know, reaches critical mass and ignites is where you’re dealing with photographic collections because photography is still a relatively recent technology.”
“So, unlike putting up, you know, images of 16th century paintings, the property of most photographic collections, especially big ones, is they’re dealing with stuff that still hits the edges of living memory.”
“So you started eliciting these things about, you know, this connects somehow with my family and my family’s history, you know, that’s my granddad as a kid and his pet dog and I happen to know the name of his pet dog, and—you know, that’s the store he owned in the 1930s. This kind of, you know, really deep storytelling.”
Lots of floppy discs and no floppy drives
Archivists are now dealing with materials from the 1980s and 1990s that require old computer technology that is no longer available.
“Right now, people are having close encounters with the horrors of, you know, consumer electronics in the 1980s and early 90s. You know, nasty sorts of floppy discs with manuscripts on them that they want and things.”
The future of digitization
With the pervasiveness of photo and video recording via mobile devices, we will need to anticipate how to manage these types of technologies and collections.
“So we’ve not only got the things digitized according to our standards of cultural heritage, but we’re going to have some pretty funky images coming in off of—you know, not off of nice SRL cameras but off of—you know, cell phone cameras and things like that that leave a great deal to be desired. And we’re going to need to think about how to mix these into our collections and manage them and where various kinds of image enhancement is appropriate and that sort of thing. So I think—I think there are a whole new collection of issues as we start thinking about, you know, what our—you know, what our collections going to look like in 2050 as documenting and understanding the lives of individuals in our culture continues to be an important activity.”
Senior Program Officer in OCLC Research
Merrillee Proffitt discusses her role as the Director of Digital Library Development at UC Berkeley’s Bancroft Library; the importance of strong curatorial direction; and the advantages and benefits of working on large multi-institutional projects such as the California Heritage Project and the Making of America II Project.
“Stop trying to perfect the data and the metadata that we’re putting into the system because there’s going to be so many unanticipated uses downstream that we can’t possibly think of them all.”
Merrillee Proffitt shares what inspired her about electronic texts: “The idea of putting those things online so that people could discover them and it would help enable their research was really very powerful to me and still kind of gives me chills. You know, the idea of taking little things that are in these collections and making them more accessible to researchers.” Many of the projects discussed by the Digital Pioneers began with one, two or a few people with a shared vision. It was often a vision of creating improved access.
Say “Yes!” to projects
You never know where different project roles might lead. You’ll learn a lot and gain invaluable experience.
Librarians can be the mediator in the Digital Humanities
Librarians that can speak the languages of both programmers and the humanities can be the link between the two worlds and become mediators for the Digital Humanities.
“Selection is expensive”
It is very difficult to select only 35,000 photos out of an inventory of millions of great photos. It would be impractical to look through the collection image by image. If you can, avoid selection by digitizing everything in some sense. Or have very clear curatorial direction and be very clear about what you choose for digitization. For example, select an entire series or the entire collection for important photo collections. Know what your goals are for the collection. Do you want it to serve the broadest range of interests? Or to serve the specific needs of the campus community?
Don’t encode everything and don’t worry about encoding perfectly
Make documents available as PDFs and let Google take care of the first level discovery.
Take advantage of opportunities to work with and at other institutions
Work on multi-institutional projects so that you can see how things are done differently at other institutions. It is invaluable experience to work with other people outside of your own institution. It will give you perspective into how things are not perfect at any institution and give you the opportunity to bring back best practices to your own institution and try them out.
“Question previous practices” – Merrilee Proffitt
“Librarians are excellent at learning how to do things well and then doing them over and over again. And then we teach each other to do things well and do them over and over again and do them really consistently. And that is such a great thing. But it also doesn’t lead to us questioning how to do things differently.”
We need to ask the difficult, big picture, and fundamental questions. We cannot narrowly focus on how to incrementally improve the tools or techniques that we are currently using.
Merrillee Proffitt recommends Think Like A Startup by Brian Matthews on how can libraries have more of a start-up mentality. Brian Matthews is Associate Dean at Virginia Tech and blogs at The Ubiquitous Librarian: http://chronicle.com/blognetwork/theubiquitouslibrarian/
Mathews, B. (2012). Think Like A Startup: a white paper to inspire library entrepreneurialism. Retrieved from http://vtechworks.lib.vt.edu/handle/10919/18649
Abstract: This document is intended to inspire transformative thinking using insight into startup culture and innovation methodologies. It’s a collection of talking points intended to stir the entrepreneurial spirit in library leaders at every level.
Description: Facing the Future -- We don’t just need change, we need breakthrough, paradigm-shifting, transformative, disruptive ideas.
Cultural problems within the LIS community that prevent us from moving forward
Librarians are too protective of the ways we currently do things. It is unrealistic to hope that all researchers will “magically learn the importance of information literacy” and “eschew Google when appropriate.” Proffitt recommends that we:
•“Look at where the rest of the world is going in terms of discovery and seeing if we can get there.”
•“Stop trying to perfect the data and the metadata that we’re putting into the system because there’s going to be so many unanticipated uses downstream that we can’t possibly think of them all.”
•“Constantly ask ourselves the question, is the work that we’re putting into this going to be worth it for the long term?”
Former Associate Deputy Director for Library Services at the US Institute of Museum and Library Services (1997 to 2011)
Note: Joyce Ray left IMLS in August 2011. She is now Visiting Professor, Information Studies at University College London
Summary: IMLS was the first and is the only funding agency in Washington with a statute authority to fund digitization. Joyce Ray was there when IMLS was established in 1997. Ray discusses the beginnings of the WebWise Conference as a place to bring together people from different types of libraries, archives and museums who had an interest in digitization and technology. Ray urges us to shift to thinking of “innovation” from using content to connecting with, building and making a difference in a community.
Quote: What does it mean to be “innovative”?
“It’s not always complex technology, sometimes it is more about community building, like working with a new or different community. And bringing different groups of people together…using the content in innovative ways.”
New projects: short on how-to but filled with excitement
Take advantage of opportunities to work in a new agency or on a new project. You many not know exactly what you are doing and you may have to make things up as you go along, but it is very exciting.
IMLS and the statute on digitization
IMLS was the first and only funding agency in Washington that has written into its statute the authority to fund digitization.
IMLS initially did not know how to help people prepare for a digitization project.
− “There was a real lack of knowledge about how to do it—how to do it right.”
This prompted IMLS to launch the WebWise Conference as:
− A place to bring together all those interested in digitization and technology
− A place to present models and share on how to prepare for a digitization project
− A place for collaboration to bring together people from different types of libraries, archives and museums
Digitization for long-term preservation
People tend to think of digitization primarily in terms of access. But it has a large role for the future long-term preservation of physical collections, which are subject to being damaged or lost.
− “People think of it primarily for access, and, you know, that’s certainly an important part of it. But there are so many examples where physical collections have been lost—through fires and floods and earthquakes and thefts and—destruction that—knowing that you have a good digital surrogate as a good backup is really an important part of that—that picture.”
Funding “innovative model projects”
In terms of funding digitization projects, IMLS has always focused on funding “innovative model projects.” Initially, IMLS funded projects that developed criteria, guidance and workflows. As those projects turned into established best practices, the focus evolved to innovative projects such as
statewide collaboratives, aggregation projects for metadata harvesting, tool development and the interaction of tools with content.
Yet unknown uses for your digitized collection
IMLS used to require digitization projects to determine their audience and show demand for the content. However, they learned that these were often unknowns and that “our imagination about who’s going to use content has been greatly expanded just by doing it and putting it out there.”
− “So you have collections that used to only be available to a few, scholars and people had no idea that when they put this stuff online that there would be so much interest from people that they never imagined, like school children, homeschoolers, scholars, you know, around the world. And that’s been very gratifying and eye-opening.”
IMLS also used to require projects to interact with an audience, evaluate actual use, be tied to learning outcomes, show impact, etc. However, IMLS also realized that they were sometimes requiring grantees “to do too much in one project” within the three-year project period; that is, focusing on the technical issues of “a really cutting-edge technology project should be enough in itself.”
What does it mean to be innovative?
It means using content in innovative ways in order to:
− Connect with a community
− Make a difference in a community
− Bring people together and build a community
− “For a lot of people it really means getting connected to a community. So that means going to conferences, even if it’s local conferences, and finding out what the state-of-the-art is so they can figure out how they can make a difference in their community.”
− “And it’s not always complex technology, sometimes it is more about community building, like working with a new or different community. And bringing different groups of people together. So using the content—can be using the content in innovative ways.”
Persistence pays off
When you apply for a grant for the first time and you are not successful, don’t get discouraged. Learn from the comments and advice of the reviewers. Then, apply again.
− “We also have seen—applicants that are not successful the first time they apply for a grant, if they really listen to the reviewers’ comments and take them to heart—I feel that our reviewers are very sensitive and try to give really helpful advice. And we have seen people make very good use of that and come back with a successful project.”
Private versus public funding
There are issues with private sector money for projects, including:
− Funding often goes to “cherry-picked” projects that satisfy an immediate demand
− It is more difficult for smaller institutions with less exciting yet high value projects to get funded
− Access to the content may be controlled, limited or “locked”
Associate University Librarian for Digital Scholarship and Preservation Services, Cornell University Library
Oya Rieger encourages us to recognize that digitization projects have a life. They are living projects that need extension, assessment and a clear understanding of how they are being used by faculty and researchers. They need to connect with the academic learning-teaching-research environment. Rieger also shares the challenges of organizationally mainstreaming projects, preventing projects from being orphaned and integrating standards into practice.
Life-cycle management: “Whatever we do in libraries, especially with digitization programs, it is not about stating up, it’s about sustaining and maintaining and developing and phasing.”
Libraries are creating knowledge and information
In the past: Libraries used to focus on organizing and delivering information and supporting the use of information.
In the present: In the last ten years, libraries have been actively participating in the creation of information and especially working with faculty and other researchers as they are creating knowledge, transforming knowledge into information, publishing, etc.
Projects need to be based on targeted faculty needs
Libraries are most effective when they work on programs that are targeted and specifically based on what the faculty is selecting, i.e. creating digital repositories, managing the content that the faculty is creating or digitizing content to support the faculty’s teaching and learning.
The purposes of digitization
In the beginning, digitization had two purposes:
1) To protect the originals: the digital copy reduces wear and tear on the original by acting as an intermediary
2) To provide global access to core historical materials
Digitization is “a way to unify, in a way distribute primary collections that are historically important, that form the canon of any given discipline.”
Librarians always seem to have a kind of “perpetual anxiety about the future and the role of libraries.”
The evolving role of librarians within the Digitization and Preservation Research Unit at Cornell
Digitization gave libraries a new role to digitize historic collections in order to make them more accessible to the world and to help connect the world. This new role elevated and rebranded librarians as technologists who knew not only how to scan images but the detailed technical specifications of colors and bits. As technologists, librarians were seen as “pioneering and trying to, in a way, move the library’s agenda into an innovative area.” However, a gap developed between the “traditional librarians and now this new age digital librarian.”
This new and innovative Digitization and Preservation Research Unit was seen as “a new shop” or “a new operation” with its own team that immediately went into production. The Digitization Unit was not organizationally mainstreamed into the library. This turned out to be an impediment since it took years to integrate this stand-alone group into the library. It would be advisable instead to approach digitization as research and development work that would be explored, understood and with the goal to be mainstreamed. Organizational mainstreaming would involve:
- Moving the responsibilities of metadata librarians within cataloging
- Moving the maintenance of image databases to the IT unit
- Not solely relying on soft money and grants
- Developing a sustainable infrastructure
Share what you have learned
You will gain expertise in developing your digital collections. Consider sharing your knowledge with the greater library and cultural heritage community via webinars, hands-on workshops, etc.
Many libraries are facing the same challenge where “we start an experiment, we get money, but then they are kind of orphaned and we move on.”
This applies to digitization and other types of projects. Organizationally mainstreaming projects and creating an infrastructure for them will help projects become more sustainable.
Standardization takes years
Standardization is usually a broad international collaboration that is based on groupthink and requires consensus-building.
“It takes a very, very long time to come up with standards, and that—also I think maybe what makes standards work well is that they are tedious and they are detail-oriented.”
The challenge becomes, after these standards are developed, how to integrate them into practice and the fast-paced, low-resourced work front.
Look at your digital project holistically
From an organizational perspective, look at your digitization projects more holistically. The program can then be foundational rather than an add-on.
Connecting digital collections with the learning-teaching-research environment
In the past, the prevailing attitude was “libraries manage information, you give it to the faculty, they consume it.” Today, we need to “establish very strong partnerships with faculty and researchers so that they are enduring relationships.” We need to be more embedded in research, learning and teaching and ask if we can do anything innovative that would help faculty with their research activities. Libraries need to be more collaborative and connect their projects with the learning-teaching-research environment of the faculty and researchers.
Digitization: the life-cycle management of a living project
Digitization programs are about “sustaining and maintaining and developing and phasing.”
Recognize that digitization projects “have a life and that we need to attend to it.”
“You select, you digitize, create metadata, provide access—you digitize more, you add, you change the interface”
“You should see it as a living project that needs extension, assessment, understanding how it’s being used.”
Director of Research and Scientific Data Management at the Smithsonian Institution
Summary: Thornton Staples shares the story of how he discovered Sandra Payette’s paper on the Flexible Extensible Digital Object Repository Architecture system (better known as Fedora), how he liked its information architecture, and how his team tested (and proved) its scalability with 30 million objects. Staples also shares a model of how to successfully get humanities faculty hooked on digitizing their work and taking it to the classroom. What is the new frontier in digital libraries? According to Staples, it is in developing durable, extensible and interoperable repositories.
Quote: “So the new frontier is pulling it all together in a way that doesn’t get in the way of the scholars doing their work, but ends up with a durable product that can be in a repository and can be moved from one repository to another as it needs to be but is a stable part of the scholarly record, or I would even say the human record. I think the human record is the Web, and is this digital—sphere that we’re building. And—if we don’t get good at it, I think we’re in for a dark age.”
Hook the humanities faculty
The key was to hook the humanities faculty on using their digital information. This was the way to reach the classrooms and the students. The library needs to “work with faculty to digitize texts and make them available to them more generally.”
“And their notion was that if you hooked the faculty on using digital information in their work—in their research—they’ll take it to the classroom, and to spend a lot of money and a lot of time trying to inject this directly into classroom situations was a non-starter.”
“The committee that put together the original vision had enough vision to say, get the faculty hooked on their own research, and they’ll take it to the classroom. And that was really the driving—and I really don’t—wouldn’t change that. I think it really—it was successful.”
Test your system for scalability
Will your digital repository system be able to handle 30 million objects? Test it for its stability and scalability.
“So we did a new interpretation of their—of their architecture using one SQL database and one Java servelet and demonstrated all the principles would work and put thirty millions objects by doubling—like, copying objects and changing the identifiers until we had, like, forty thousand real objects and we kept duplicating them to get thirty million, and the system was still working.”
Network and join forces to get funding
Thornton Staples optimized Fedora for real use. He joined forces with Sandra Payette of Cornell to get funding on the Fedora project. Fedora is based on Sandy’s original research known as the Flexible Extensible Digital Object and Repository Architecture (Fedora). By joining forces, they were able to obtain a Mellon grant that was the beginning of the Fedora project.
“So in the meantime we started looking around for funding, and I had a couple of other Mellon grants for some other things. And Don Waters at Mellon had been the head of the Digital Library Federation right before he went to Mellon. And he’s very—he’d always been interested in FEDORA, the architecture. And we were having a drink one night at a conference, talking about another grant, and he says, “What about FEDORA? What are you guys doing with FEDORA?” and so that meant green light, green light, make a proposal, and so Sandra Payette who had done the original work at Cornell had in the meantime contacted me, saying, I’d really, she was basically getting jealous, so—we decided we got—we had a meeting. She came to Charlottesville and we had a meeting and we decided we were gonna join forces and try to get some funding. And Don had already—we’d already had this sort of opening with Mellon, so we put together a proposal. And that was the beginning of the FEDORA project.”
Sometimes, you’ll have to make it up as you go along.
“There weren’t any rules; no one knew what we were doing; we were making it all up from scratch.”
You are a peer, not the help
As technical experts, consider yourself as a peer of the faculty, and not the help. The faculty knows their subject and you know the technology.
“We were worried in the very beginning that...these faculty were very well known in their fields and we thought we were going to be treated as the help and we weren’t.”
“We were treated—we were considered peers around the table at IF because they didn’t know what they were doing either.”
“They knew their subject; we knew the technology, but none of us really knew computing and the humanities and what it meant to put these two things together.”
“So they were pretty good about it. But we were worried, because when you work in the university as a technical person, you often get treated like the help.”
A successful model
Thornton Staples shares a model that was successful in getting faculty hooked on digitizing their work and taking it to the classroom.
•You have a committee of faculty to judge proposals
•Faculty members propose projects
•When awarded by the committee, the faculty member takes a year off to work with the institute to digitize their work
•They get hooked on digitizing their work
•They take their digitized work back to the classroom to the students
“The way IF got set up is the faculty would apply, they had a project that they’d propose, there was a committee of faculty who had to say that that was an interesting enough project, presented interesting technical problems, and the scholarly—value was high enough to make it worthwhile, and then they got a year off teaching. They got office space in the institute and we were—all the technical people, myself and others—were actually housed in the institute, so we were there together for a year. So—I think—that model worked really well.”
Lessons from Staples Thornton
Focus on structure, standards and organization early on in the project as part of the research.
“I think I would have switched—not in the first year but—over the four years, if we had switched to think more about—thinking of the overall structure and the organization of these projects as being part of the research, I think we would have—we would have shortened—we would have been where we should have been sooner.”
“It was—it was very much about new technology and not about standardizing the output. And that’s a good thing, but if we had just a little more thinking that standardization is research, I think we would have—we would have put the pieces of the puzzle together better sooner. And I don’t think we’ve put those pieces of the puzzle together yet, really, at all.”
“But if we had been thinking about that…we would have arrived at the digital library platting later—with the idea that we’re not just putting—digitizing books and putting them online, that’s part of it, but we’re really have to prepare ourselves for these complex—webs, graph-like structures of related objects that—that we’re—it’s clearly dealing with now.”
“The Web, scholarly record is clearly becoming like the Web, not like books and articles and journals.”
The new frontier
The new frontier is developing durable, extensible and interoperable repositories.
“So the new frontier is pulling it all together in a way that doesn’t get in the way of the scholars doing their work, but ends up with a durable product that can be in a repository and can be moved from one repository to another as it needs to be but is a stable part of the scholarly record, or I would even say the human record. I think the human record is the Web, and is this digital—sphere that we’re building. And—if we don’t get good at it, I think we’re in for a dark age.”
“They—they worked their butts off and they worked all these graduate students for years to get these really brilliant projects out there, and they’re like built on sand, and they don’t know it.”
“And they all, you would ask them and they would tell you the library’s gonna collect it and save it forever. And you know, the library, we already knew that we didn’t know how to do that…I think people still think that the libraries or the archives are just gonna do it for them.”
Senior Program Officer, Research Division of OCLC
Roy Tennant emphasizes the complete dependence of digital preservation on commitment, even of just one person. Tennant describes the value of reaching out to the community when they embarked on The Jack London Online Collection. In discussing eScholarship, Tennant shares the benefits of creating a prototype to help stakeholders see your vision and the secrets to getting faculty research into your institutional repository.
“Preservation is about commitment”
“The single most important aspect of digital preservation is commitment. That’s it. Commitment. It’s not what the bits are on, it’s not any of that stuff. It’s simply committing to be there to keep that stuff around.”
Include the wider community’s items in your collection
The Jack London Online Collection became one the first virtual libraries where the digitized materials were not housed in any one physical location. When planning for a digital collection, reach out to the wider community and include items from collections that are not owned by the library.
For example, while working on digitizing The Jack London Online Collection that began at the UC Berkeley Bancroft Library (http://london.sonoma.edu/), Tennant’s project reached out to the Jack London community and included items that were not part of Bancroft’s collection including photographs of Jack London’s family from a private collection. They also contacted both Dr. Clarisse Staz at the Sonoma State University and the California State Parks System. Sonoma State University Library now manages the collection.
“Preservation is about commitment”
•“The single most important aspect of digital preservation is commitment. That’s it. Commitment. It’s not what the bits are on, it’s not any of that stuff. It’s simply committing to be there to keep that stuff around.”
•When the project lacks commitment, you may eventually need to supply it yourself.
The library needs to care about your project
When managing a project, sometimes you may be fortunate enough to have free reign on how to run the project. However, the down side is that your project runs the risk of becoming too disconnected from the library. It is critical to 1) make an effort to promote your project as part of the library and 2) make your project a part of the regular processes of the library. Your goal is to have the stakeholders care about your project.
Projects as launching pads and playgrounds
Use current projects as launching pads for future projects. Projects can be playgrounds for experimenting, building on and creating new things. Projects can act as building blocks for larger projects. For example, while working on the Librarian’s Index to the Internet, Tennant found that he “could provide an infrastructure and capability and technical ability to take that project to the next big step.”
Share your expertise and your findings
You will develop expertise in various technical areas as your project progresses. Experiment and report your findings back to the community so that others can benefit. For example, Tennant and his team held a five-day boot camp help educate librarians on the digitization process.
Start a technology-learning program at your library
Tennant and about half a dozen library staff started Library Technology Watch, where they each watched out for specific technologies, wrote about them and held brown bag sessions to teach the other staff about the various technologies. Although now defunct, Library Technology Watch eventually became the Information Systems Instruction and Support (ISIS) that both educated staff on new technologies and served as a help desk.
Have the courage to go on a different path and have the courage to move on if it does not work out
You may have a powerful idea. However, no one else may believe in it. If no one else is excited about your project, the lack of support in staff, staff time and funding will adversely affect your project.
“One of my lessons from all of this that there are going to be times when you have exactly the right idea but it’s the wrong time or the wrong place, or whatever, and you have to let it go and move on. If you can’t make it real within that particular context or that situation, then go be successful in something else.”
“Prototypes are worth a million words”
Build prototypes to tell the story of your project. People need to see what you are asking them to support. Prototypes illustrate your vision of what is possible – the purpose and the “why” of the project. Someone else can manage the “how” and actually build it in the way it needs to be built.
“I am a huge fan of prototypes. You know, if you want to make your case to someone about doing something, build it, even if it’s just smoke and mirrors. I mean, literally it could be as little as an HTML page, you hit a button which links to another page, you hit a button which leads to another page, and that’s all it is. That could be a prototype. It doesn’t even have—actually have to work, you just have to be able to tell the story that that is telling you.”
Lessons from the eScholarship Repository: http://escholarship.org/
Tennant was instrumental in “pioneering an infrastructure” that powers this journal-publishing platform.
From the eScholarship website: “eScholarship provides a suite of open access, scholarly publishing services and research tools that enable departments, research units, publishing programs, and individual scholars associated with the University of California to have direct control over the creation and dissemination of the full range of their scholarship.”
•A simple upload interface
Despite heavily promoting the benefits of eScholarship, the faculty members did not use it. Tennant and his project staff were then inspired by an economist at the Berkeley Electronic Press who had built an easy-to-use platform that helped to encourage people to deposit their works. This inspired Tennant and staff to marry that simple upload interface on the front-end that the faculty was more open to using, with the full-featured journal-publishing platform on the back-end.
•Focus on training the departmental support to upload faculty work
When Tennant made the departmental administrative assistant the contact and trained them on how to use eScholarship, then the faculty’s works were deposited in the repository. It was the administrative staff and graduate students who consistently uploaded the documents for the entire department, not the faculty.
Keep learning: “Keep the juices flowing and keep your ear to the ground”
Understand the ways you naturally learn best and use those different methods to constantly keep learning.
•“The biggest thing you have to realize is that you don’t leave school and stop learning, you leave school and you just keep learning because the world shifts underneath your feet on a daily basis.”
•“The profession, and really probably any profession these days, is just one of constant learning.”
•“Try to find people whose work might be a good bell weather to what’s coming down the road who you can kind of watch and see what they’re looking at.”
•Use a “tiered model of current awareness”: “So it’s a variety of different things. It might be anywhere from blogs that you might want to monitor to Twitter feeds to, you know, individuals to, you know, Fast Company Magazine, I mean, whatever it might be that just kind of keeps the juices flowing and keeps your ear to the ground.”
Every librarian needs programming skills
Learn basic programming skills.
“Knowledge of structured text and how to manipulate those, so for example, XML and XSLT, that’s a solid skill which I don’t see going away anytime soon.”
“Basic programming skills and I’m not saying you need to be a programmer, but at this point I think every librarian needs to understand enough about programming to understand what it does, how it can be useful, to be able to spot a problem that a program won’t solve, and have a rough idea on how long it would take for someone to write that—that’s a key skill.”
Know how to parse metadata
You should be able to parse and effectively work with metadata from different formats such as Dublin Core, MARC, etc.
As young professionals, having mentors is an important and effective way to “get some assistance, to get a leg up, to get going, to get out there in a way that is hard to do when you’re first starting out.”
•“Coming from someone who has had many mentors in his life, I suggest you try to find someone who you’d like to have mentor you and attach yourself to them. And by that I mean, you know, ask them. Just come out and say, I’m really interested in your work, I’m a young professional, I could use some help in terms of meeting other people I need to know, some advice just on, you know, career direction, you know, blah blah blah, could we have dinner at ALA some time?”
•“There are plenty of older professionals like me out there who would be perfectly happy to help young professionals get started.”
Alumni Distinguished Professor at the School of Information and Library Science at the University of North Carolina at Chapel Hill
In this video, Helen Tibbo touches on a wide range of areas in digital curation, including how a conference workshop on dealing with digital cultural heritage materials back in 2000 became her “transformative moment,” helping to develop standards and best practices for digitization for cultural heritage professionals, the DigCCurr’s International Digital Curation Curriculum, the Digital Curation Lifecycle Model, ISO 16363 standard on the audit and certification of trustworthy digital repositories and DataNet and big science data. Tibbo also shares the unique combination of skills that students can develop to ensure that they will be in demand as new professionals in digitization.
“If you have those digital skills and you have good communication skills so that you can talk to the people who don’t have the digital skills. And you can talk to the content creators and the content users and be that person in the middle. Very few people can pull that off, those—that’s the combination.”
“Having enough technology to be able to bridge between the people who have not enough—people who are going to be the programmers. You’re probably not going to be the programmers, but you have to have enough to be in that middle spot. And there are so few people who can actually do that, you will have a job.”
You may have a “transformative moment” at a workshop or session that affects your career trajectory.
Helen Tibbo attended a conference in 2000 at Rice University and her “transformative moment” was a workshop on dealing with digital cultural heritage materials held by the Humanities Advanced Technology Information Institute at Glasgow. This led to her teaching a semester-long class at UNC-CH in 2000, which then developed into week-long institutes in 2002-2004.
Archiving: From physical to digital collections
The physical materials – the actual handling of and working with tactical objects -- have long attracted people to collections. And we have legacy content, which archivists need to know how to preserve.
“But going forward, all of our new content is probably going to be digital. We have to be able to deal with that.”
“You can have great material in a digital format, but somehow there’s a different relationship we have with it, perhaps.”
Not much distinction between digital and “born digital”
“Once something is digital, it’s digital. It doesn’t make any difference. The bits are the bits. So to me, a digitized item and a born-digital item are the same thing once the work becomes digital.”
Digital curation is not just digital preservation. It includes the preservation component as well as the access.
Funding your institutional repository
When trying to sell your institutional repository to university funders, you need to sell “who are the users of this content going to be, and what’s the exciting content, who’s going to have access to it.”
Use standard formats since they are the most supported formats that allow for maximum use and reuse; “rogue software is not a good thing for preservation.”
How funding from the National Science Foundation (NSF) is used for scientific data
In the past: “I’m Historian A and I go into the archive and I read the papers, I read the papers, I write my book, the book goes in the library, but there’s no data that goes back into the repository. And then Historian B comes and uses those same archival materials and writes his book.”
Today: “Scientist A creating content that goes into the repository and Scientist B actually uses that scientific data.”
DataNet: Sustainable Digital Data Preservation and Access Network Partners
DataNet looks at science data, interoperability and cyber infrastructure for big data in science. Dr. Tibbo’s team at UNC is looking at the data user needs of hydrologists and ocean scientists
About DataNet: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503141
“This solicitation addresses that challenge by creating a set of exemplar national and global data research infrastructure organizations (dubbed DataNet Partners) that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.”
See What Has Been Funded (Recent Awards Made Through This Program, with Abstracts) including UNC’s DataNet Full Proposal: DataNet Federation Consortium
The Digital Curation Centre (DDC)
“So there are lots of different digital curation lifecycle models, but if you look at the one from the Digital Curation Centre in the UK, it has all the right assets.”
“The Digital Curation Centre (DCC) is a world-leading centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community.”
DigCCurr (say dij-seeker)
DigCCurr: Preserving Access to Our Digital Future: Building an International Digital Curation Curriculum
DigCCurr focuses on building graduate-level curricular frameworks to prepare students to work in the 21st century environment of trusted digital and data repositories, symposia and a professional institute for practitioners.
DigCCurr I: Preserving Access to Our Digital Future: Building an International Digital Curation Curriculum developed an openly accessible graduate-level curriculum to prepare students to work in the field of digital curation.
DigCCurr II: Extending an International Digital Curation Curriculum to Doctoral Students and Practitioners furthers the work of DigCCurr I.
The DCC’s The Digital Curation Lifecycle Model
The Digital Curation Lifecycle Model “has all the right assets” from inception of the digital object and the design of the content, to the curator working with the researchers to advise on file formats, to metadata creation and how to make it useful for reuse.
“In the digital world, what we know is that decisions made about the creation of content -- the early curation by the content creator, the early management of that content -- will vastly influence whether or not we are ever able to preserve it.”
“The likelihood that we are ever able to preserve something is increased if the archivist and the preserver work with the content creator.”
The Digital Curation Lifecycle Model
“Our Digital Curation Lifecycle Model provides a graphical, high-level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt. The model can be used to plan activities within an organisation or consortium to ensure that all of the necessary steps are covered – and that the process is completed in the correct order.”
-- See page 8 of The Digital Curation Centre: A new phase, a new perspective Retrieved from http://www.dcc.ac.uk/sites/default/files/documents/publications/dcc-phase-3v3.pdf
ISO 16363 is the international standard that lays out what constitutes a trustworthy digital repository.
Standard: ISO 16363:2012
Title: Space data and information transfer systems -- Audit and certification of trustworthy digital repositories
Abstract: ISO 16363:2012 defines a recommended practice for assessing the trustworthiness of digital repositories. It is applicable to the entire range of digital repositories. ISO 16363:2012 can be used as a basis for certification.
ISO 16363 + OAIS = Recipe book to create a repository
When you put the ISO audit of certification with the Open Archival Information System (OAIS), then you have “recipe book for how to create a repository.” However, remember that not everyone is a good cook or can read a recipe book.
Does your institution have a preservation mission?
The very first item on the ISO 16363 standard asks if you have a mission statement that supports preservation. A test audit of three institutional repositories believed they had a preservation mission. However, their mission statements stated that their mission was to provide access to content. Preservation is never mentioned. This will affect your long-term sustainability.
“I think if the repository and the preservation is the core of your mission—and it’s relevant to your funder, then there’ll be sustainability.”
Advice to students
Advice #1: Get as much technology behind you as you can
“Get as much technology behind you as you can. Because it is a technical, it’s a digital world.”
“Most of the new hires in archives, they’re really looking for somebody with digital skills.”
Advice #2: Find something you really want to do
a. Think about what type of job you want to have and how you can go about getting that job
b. For your master’s capstone, pick out, at least, if not where you want to work, an area that you want to work in, the type of place you want to work and the type of job you want to do
c. Do some research on that for your paper
d. Take that to your interview and actually say, “these are the issues that are relevant to you and look, I’ve done some exploration and I have some answers for you”
Digital skills + Communication skills = You will be in demand
Build up your digital skills and be the bridge between the programmers and those with not enough digital skills. The field needs those who can communicate without being intimidating to librarians, researchers and stakeholders who lack digital skills. This is what is needed and there are so few with this combination of skills.
•“If you have those digital skills and you have good communication skills so that you can talk to the people who don’t have the digital skills. And you can talk to the content creators and the content users and be that person in the middle. Very few people can pull that off, those—that’s the combination.”
•“Having enough technology to be able to bridge between the people who have not enough—people who are going to be the programmers. You’re probably not going to be the programmers, but you have to have enough to be in that middle spot. And there are so few people who can actually do that, you will have a job.”
Associate University Librarian for Library Information Technology at the University of Michigan and Executive Director of Hathi Trust
John Wilkin discusses his roles with the Digital Library Production service at the University of Michigan and the Making of America Project; developing skills from working on large multi-institutional projects; and current librarian skills and trends.
“It was really friendships and a recognition of the way that our different experiences can come together around this one problem.”
In any project, there will be “an ebb and flow of systems and materials.” You will need to plan for sustaining an enterprise and system due to “turnover and staff and intersections with other systems.”
Workflow versus agility
In some projects, designing a formal workflow could be “more cumbersome than the value it contributed.” Strive for a balance between a workflow and “the freedom to operate in more improvisational ways.” Improvising contributes to agility, flexibility and figuring out new ways of looking at things.
Build modules for constant systems
Building in modularity allows you to “build the code around modules that were constant and worked for all the systems and that what we did for each new one would be unique for—would be the unique piece for that.”
Using unique identifiers
“Semantically rich identifiers are stupid” since they are not scalable and are not that helpful. Unique identifiers need to be automated and be able to be validated. Barcodes are a very successful use of unique identifiers.
Working together on multi-institution projects
Working on the large multi-institution projects was about friendship and sharing experiences and strategies. The participants worked on determining efficient ways to develop scalable and sustainable architectures. They did this by recognizing that different experiences can come together to solve issues, by bringing together resources, and sharing when best practices and standards were or weren’t appropriate. “Doing things in a shared space rather than in our own institutional spaces and then knitting them together” was more efficient.
Hot topic: Ambiguity
Ambiguity is where a census of the digital materials is unavailable. U.S. government documents lack “a comprehensive corpus” since large regional depositories are not cataloged comprehensively at the item level. This affects how to determine the contents and quantity of the inventory and how to determine copyright renewal.
LIS Education: Strategies, methodologies and skills
Because skills are going to change, your education should be about strategies and methodologies rather than about specific skills. Due to the rapid change in skills and formats, you need to know about frameworks, strategies, and applying them. Know some key formats. But more importantly, “know why those formats are meaningful and how to extrapolate things.”
Going through a list of the last fifteen librarians that the University of Michigan’s library had hired, Wilkin discovered that none of them were catalogers or reference librarians. Instead, the list included positions for copyright specialist, user interface specialist, programmer, or someone with specific skills that can be applied to particular types of problems, i.e. project management. “Picking an area of usability or user experience or HCI and figuring out strategies is more effective than learning this framework for that thing” because “it’s not about the tools, it’s about the way you use things.”