Peter Hirtle - Learning Resources
Senior Policy Advisor at Cornell University Library
Summary: Why do good digital projects fail? Peter Hirtle shares his wisdom gleaned from working on a myriad of successful, and not so successful, projects at the National Library of Medicine, National Archives and Records Administration (NARA), and the Cornell Institute for Digital Collections where he served as the Director. Hirtle addresses critical issues in the future of digitization: linking siloed digital projects, mashing-up small projects, and getting people to think about how their own digital project fits in with other collections, as part the national construct.
Quote: “Parking digital images away in a dark archive or in a preservation system is always a bad idea…You need to have them be part of a live system…As soon as digital data is not being used, then it’s likely to be destroyed.”
Reasons why digital projects fail
Good digital projects can fail because:
- The materials are in formats that are: unusual, proprietary, expensive to work with (and you cannot justify the cost)
- The quality of the results is poor
- The approach taken is technology-focused instead of user-focused
How to approach your digital projects
Be user-focused and assume that you will not be able to go back and re-digitize the materials. Ask these questions:
- What is the nature of the documents?
- What is it going to take to have the full informational capture?
- Do not approach it from a technology-focused standpoint where your focus is on what a particular piece of equipment can do.
On Anne Kenney’s approach to digitization
Anne Kenney is the University Librarian at Cornell University who was instrumental in using a user-centered and research approach to digitization. See Anne Kenney’s interview on the Digital Pioneers website (http://digitalpioneers.library.du.edu:8080/).
“You want to create rich enough files to support a multiple range of uses because you may not be able to go back and scan again.” ~ Anne Kenney on digitizing files and focusing on the highest image quality and the broadest range of use
- “It was just—Anne’s (Kenney) perspective of not sitting down and saying, what is it that the equipment can do, which is what Bill Houghton had been doing, but—with the—the Honeywell equipment at NARA, but instead sitting down and saying, what is it—what’s the nature of the documents, and what is it going to take to have the full informational capture? That just struck me as a really—clever and—right way of approaching it, on the assumption that we weren’t going to be going back and—and being able to—do this again.”
- “Anne’s (Kenney) really great contribution is—saying, don’t do that, think about the document. Think about what you’re trying to capture. And I worry that not enough people remember that lesson. That they just have kind of—either say, oh, 600 dpi sounds good, that’s what everyone uses, and forget why we came to that number—“
- “But I thought that—the fact that she sat down and approached things as a research question and did—you know, the kind of analytical work and then wrote it up and presented it—so her studies on—her first work on image quality, her articles, Steve Chapman on the metrics for image quality that you’ve read in D-Lib Magazine, their report on converting micro—is it better to first microfilm and then go to digital or digitize and then try to produce microfilm for it. Their early report on using Kodak photo CD technology as—a mechanism for color images, that’s all really—top quality stuff. And very, very impressive.”
Capturing artifactual qualities
With digital imaging, you can capture artifactual qualities, e.g. color and sense of the page.
- “In reality now, we realize that digital imaging is better than preservation microfilming. Yes, you can capture the words in the book, but you can also capture the color, a sense of the page, more of the artifactual qualities.”
Keep control of your data
Proprietary software puts structuring information into a proprietary format. When that structuring information is encoded, you may not be able to access your data.
- “Always be able to export your data. Be careful about proprietary solutions.”
On the original Making of American project
Digitization is a means for preservation. Preservation requires use. When the digital objects are simply preserved in storage, then they are not used. Those files will then be forgotten and due to technical obsolescence, will become inaccessible.
- “Using digital imaging for preservation purposes, as opposed to preserving the digital object.”
- “And so the digital imaging was being done in order to produce new books. And in some cases, tipping in the illustrations from the original books into these things. And the images were then just stored on a server. And only after the fact did we sit down and say, oh, we could actually provide access to these images as well.”
- “You know, the primary thought was to make the analog replacement for the scanned item. But even those images were on a server that started to fail because it had been sort of forgotten about and it took a heroic rescue effort to get the images off of that and save them.”
- “And that convinced me that parking digital images away in a dark archive or in a preservation system is always a bad idea. That you need to have them be part of a live system, that as soon as digital data is not being used, then it’s likely to be destroyed.”
Converting a project into a long-term sustainable program is difficult, especially if the project is “a bit outside of the mainstream.”
- “The thing that I think we’ve done really poorly on…is turning the projects into programs.”
- “…the idea of trying to turn this into an—an ongoing program was—is difficult. And I’m not sure that we’ve ever solved that problem.”
Democracy and insularity
It is now very easy for anyone to digitize a collection and post it up on the web. But you need to ask yourself these questions:
- How is your digital project going to fit in with other collections?
- How is this part of a national construct?
-- “And that anyone can buy a scanner and set it up and start throwing some things and making a website and -- generating—files. And so it’s sort of—very democratic. But that leads to—I think a little bit of—insularity.”
-- “People get focused on sitting down and saying, well—what’s my special collection? I’m going to sit down and digitize it because this is really special to me.”
-- “They don’t think about this as being part of a national construct. Anytime now when I sit down and hear that we’re digitizing—somebody X’s collection on—library X’s collection on whatever. You know, you sit down and say, but how is going to fit in with other collections that are like that, how’s it going to work?”
Disappointments with duplicated efforts
With preservation microfilm, there was an effort to reduce duplication. However, in digitization, because there is no one coherent and comprehensive cross-collection catalog, there is a lot of duplicated effort.
- “Why is it that there is, as far as I know, no one place where you can find out if there’s—if a work’s been digitized or not?”
- “I’m so disappointed—I was looking up a book that—we had—there was some question about—in the Internet Archive. I think I found nine different versions of this book that had been digitized—by different libraries in different projects at different times and in different ways. And we were so good with preservation microfilming, sitting down and saying, this person—you know, knows how to do preservation microfilm is going to take responsibility and master it and—and not duplicate efforts. And we just ignored all of that with digital imaging.”
- “It just worries me that—you know, you may have some historical society in Ohio that sits down and says, oh, we’re going to digitize—our local histories and put them online for our users, not realizing that they’re all in the University of Michigan and they’ve all been done for the Hathi project.”
- “Is it because these were projects that were done without—outside of the normal technical services structure so that people weren’t thinking about—generating MARC records for them? But those are where—I think we have fallen down. Now, maybe there’s hope, you know, maybe they’ll all show up in WorldCat. But I fear that—Google by default will be the place where—everyone goes.”
Mashing-up small projects
The future is reference-linking disparate and distinct projects in non-standing formats so that they can interact with each other.
- “The other problem we have is when you have these distinct little projects. What we’re really interested in is having mash-ups and interactions. And they’re in non-standing formats, we don’t have an agreed upon—a standard book reader—to do reference linking and to do other things and so—maybe that’s one of—it’s not so much a regret, as where the future’s going to be.”
Critical issue: Connecting siloed digital projects
How do we link up siloed digital projects?
- “So we’ve built—20 different—you know, there’s probably—20, 50, 100 silos of information around the world of digitized information. And now the issue is how do we make them talk to each other and interact and make them easy to use.”
Issue: Lack of interoperability
If you lack good metadata for your images, then Google cannot index them.
- “And how many, you know, image databases are there that don’t even have the metadata for the images catalogued by—or indexed by Google? So again, you have to go into the—database to find it. So—it’s that interoperability that’s—becoming the important issue.”
Do what is fun and interesting to you
Do things that you’re interested in and that you think would be fun.
- “I was always interested in how we could use new technology—to—make scholarship better.”
- “And I just took to it and kept on coming up with projects that I think would be fun things to do.”