Paul Conway - Learning Resources
Paul Conway
Associate Professor of the School of Information, University of Michigan
Interviewed 5/25/2010
Summary: Learn about Paul Conway’s fascinating story of how a project he was doing (and complaining about) at the National Archives ended up teaching him more than he ever imagined and led to his position as Head of the Preservation Department at Yale. Are you embarking on a digital project? Conway shares how to determine if you should build or buy, do-it-yourself or use a vendor. Conway discusses how libraries need to let go of perfecting digitization standards and workflows, and instead focus on why libraries exist in the first place: to get materials and content out for people to use.
Quotes: “The day is coming when all of our fixed visual resources are going to be available digitally. And if they’re not, they don’t count. They don’t count.”
“My biggest single concern was the time it took for the library and archives community to come to a consensus over what the best way to get the job done was.”
Seize opportunities to be mentored by leaders with a clear vision and plan
If you are fortunate enough to have the opportunity to work with someone who has both a clear vision and plan, seize it!
- “So I had the good fortune of being—mentored/supervised/forced to do a very clearly defined project with a very clear deliverable that resulted in teaching me more than I ever imagined. And I came away interested and fascinated with the—with the potential.”
- “So I spent a year complaining, essentially, and doing the project. But I came out the other end of it, knowing a whole lot about what standards are, how they’re developed, what government agencies decide to use or not to use standards in—in the technology arena, how they sort out what they’re going to do, how they work with vendors, …”
- “So I was just—it was just fortuitous timing, a boss that cared, was willing to be patient with somebody who complained all the time. I’ve told this story about be careful what you complain about because you never know when what you think is the worst job you’ve ever had may turn out to be your ticket for the next decade or two.”
Project Open Book
Yale and Cornell each wrote a grant to the National Endowment for the Humanities (NEH) to digitize books. Michael Lesk of NEH challenged the cultural heritage sector and for Yale and Cornell to test the hypothesis about which was more cost-effective and generated a better quality product: to scan or to film. Yale decided on filming and Cornell on digitizing.
Conway at Yale and Anne Kenney at Cornell decided to do a dual study in which they tried to control for as many variables as possible and do as many things as possible in the exact same way. That way, the only two conclusions they could reach were:
- Is it best to scan first or film first?
- Which had better quality?
Conclusion: “We found almost no statistically significant difference between the cost of scanning from the original and the cost of scanning from microfilm. And yet there were very significant differences in quality.”
Vendor challenges
Cost, control and obsolescence are major issues when dealing with vendors. Use caution!
- “The real challenges that I faced were vendor—relationships with vendors. And I use vendors very loosely defined. In some cases it’s services providers, vendors as service providers, and in other cases it’s vendors as product developers.”
With the National Archives project, Conway discovered that “the real key to understanding the future of digital imaging in federal agencies was to understand the deals that federal agencies were making with an industry that was just itself beginning to stabilize” since “there weren’t enough international standards yet, they were starting to emerge.”
- “The—highly competitive storage medium—competition in the storage area, competition in the tools area, competition in the workflow, software area—every one of those competitive pieces generated income for private industry via contracts, procurement contracts with federal agencies.”
At Yale, Conway used a beta version of Xerox’s Documents on Demand (XDOD). Xerox’s planned obsolescence and insistence on supporting only the newest version caused Yale to sever their relationship with Xerox. Moreover, Xerox never developed XDOD.
- “In the end, what I thought was going to be about scanning turned out to be all about file management, all about vendor relations, all about storage in a completely proprietary environment, in which we were completely dependent upon Xerox for everything, from supplying the next upgrade, installing the next upgrade, telling us when the next upgrade was going to come, insisting that we buy the next upgrade because if we didn’t, they weren’t going to support the immediate previous.”
- “So the obsolescence problem, which has become an obsession with the cultural heritage sector, was—real—in the early 90s.”
Build or buy?
When does it make sense to build or buy?
- “The question was, and still is for many libraries and archives, build it or buy it. And at the time, the feeling in the mid—early to mid 90s was that it was much more cost effective to acquire technology externally.”
- “Form good working relationships with the vendor community. And in doing so, in a sense, we fostered the development of tools that meet our needs.”
- “So the idea was to plunge in, form these relationships, perhaps at the R&D level, at the beta prototype level, so that we can have some influence over what—what these tools were going to be like, so that we can continue this relationship. And it was far more effective than building these tools from the ground up.”
For small pilot projects: Build
You have the flexibility to build for very small pilot projects to which you have not fully committed yet.
- “On the very small pilot project when you’re not committed and you’re not sure. You want to experiment. Then buy some tools, repurpose some staff, find a dark room, go to work. And test and evaluate and figure out what you can do and what your organization can bear.”
For large projects: Build collaboratively
Examples of large-scale projects that used a collaborative approach include Hathi Trust and the California Digital Library.
- “If you’ve got a particular scale and a level of institutional commitment, then building makes a whole lot of sense.”
For middle sized projects: Buy
It is not cost-effective for most of us in the middle to build.
- “It’s in the middle, which is where most of us are…almost the smallest of the small up to the largest of the large, the great middle…I don’t think it makes economic or technological sense to build...It’s not cost effective to do it. For lots of reasons.”
- “But who’s going to do it? You know, it’s—the software’s free but the support of it isn’t.”
The true cost of do-it-yourself projects
When does it make sense to do it yourself or to use a vendor?
- “And the fundamental—problem with home grown digitization services is throughput efficiencies. In order to—pay the cost of the hardware and the software and the people, you need to keep the equipment running like a factory. Preferably two shifts. And libraries don’t work on two shifts. They don’t work on one shift.”
- “Vendors can run two shifts. Vendors can supervise staff who specialize in different tasks. Vendors can have seven different pieces of equipment optimized for seven different types of material. And it’s very, very—when you do the math on what it costs to actually run an in-house shop, it—it really takes your breath away.”
Fun, but expensive and unsustainable
When you do project operations in-house, you “trade fun for cost-effectiveness” and efficiency. It is also unsustainable in the long run.
- “It’s fun though. See, this is the thing. You trade—you trade cost effectiveness for fun. Or you trade fun for cost effectiveness. Yeah. And because it’s—it’s not all that fun to box all your stuff up and put it on a FedEx—truck insured and send it to Minnesota so that it can be digitized and six weeks later you unpack the boxes and make sure that the vendor did the right thing. That’s not fun.”
- “It’s fun putting your hands on a piece of equipment and getting dirty and making something happen. So—and that’s—and I can understand that because it was fun. While it lasted. It’s not sustainable.”
Libraries need to get the job done!
Instead of focusing on “getting the job done” so that users can have access to digitized materials, libraries, in their aversion to risk, want to perfect standards and workflows. Libraries exist “to get content out there that people could use,” which is “the most compelling argument to just move ahead” and “get the job done.”
- “My biggest single concern was the time it took for the library and archives community to come to a consensus over what the best way to get the job done was.”
- “The more we try to perfect the workflow, and the more we try to establish just the right—perfect technical guidelines, the more our focus and our—gaze shifts to what do the materials need that we want to digitize instead of what do the users who are going to use our digital products need.”
- “And so the more we focus on process and the more we focus on technology, the more—the more we lose sight of the fact that real live people want to get their hands on this material and that their definition of perfection or their definition of okay may be different from our technically driven definition of perfection and okay. And that gap is growing rather than shrinking.”
- Conway, paraphrasing MacKenzie Smith: “You can only take guidelines and best practices so far and at some point you have to decide to do the job.”
- “I think the biggest single argument for mass digitization is because that’s the way the world works now, and it will be working that way for the foreseeable future. So if we want to be a part of the way the world is now, then let’s get on board and get the job done. And if we don’t do it, Google, Microsoft, or other organizations that are in the content business, are going to do it.”
- “Entertainment industries, they’re going to do it. The movies are going to get digitized by the Hollywood back offices. And the sound motion picture—the sound recordings are going to get done by the people who see commercial value in the—in the bits and bytes.”
- “So we’ve got to do it. If we want to—if we want to manage it and own it and control it and foster open—uses in new and interesting ways, then we’ve got to do the work.”
Wasteful quest for perfect standards
Libraries lost time, momentum and leadership in the digitization field due to the wasteful and drawn-out quest to perfect standards. Libraries wasted so much time coming to a consensus on digitization standards and trying to perfect the workflow, that Google, Corbus and Getty Images have dominated the digitization field.
- “My biggest single concern was the time it took for the library and archives community to come to a consensus over what the best way to get the job done was.”
- “And in—in the process there’s been a tremendous amount of wheel reinvention. Issuance of new and better guidelines. Guidelines tweaked for a particular audience. Guidelines more specific. Guidelines less specific. Guidelines that are regionally based. Guidelines specially for archivists. Guidelines especially for librarians.”
- “But my research has shown that by about 1999 or 2000—I could almost put a specific date on it—the people who had been experimenting knew what the answer was and had already developed a well-documented set of guidelines that could have been—either turned into standards or could have somehow been adopted, in some—through some mechanism not fully clear.”
- “But instead we’ve spent another decade—still trying to figure out what the right guidelines are. And then Google steps in and does all the books and—Corbus comes in and does all the photographs and Getty steps in and does—Getty Imaging does more photographs.”
- “And I fear that the scale—our ability to—to migrate our collections into the digital realm at a scale that it’s worth doing—we’ve lost precious time. And lost precious momentum in an effort to perfect the workflow. And that bothers me. Even today. It—it bothers me a lot.”
If it’s not digital, does it exist?
- “If it isn’t digital, it either doesn’t exist or it doesn’t matter.”
- “The day is coming when all of our fixed visual resources are going to be available digitally. And if they’re not, they don’t count. They don’t count.”
Digitizing all photographs
It is possible to digitize all photographs since there is a finite set of photos from the last 150 years.
- “And now we can—you can’t not get up in front of people and say, well, in a decade maybe all the photographs are going to be done. They’re easy to do, they’re fun, everybody wants photographs, there’s a finite set—yeah, there’s lots of photographs, but the photographic era is over. Kodak isn’t producing film anymore, nobody’s buying film cameras. We effectively have a 150-year period in which the concept of still photography exists. Why not to say, let’s digitize everything. Like Google did every book. Just do it all.”
The marginalization of smaller organizations
Libraries can collaborate with smaller organizations, e.g. historical societies, to digitize and preserve their materials and thereby broaden their reach.
- “I worry about marginalization of the smaller organizations that want to be and are online but are highly selective. They’ve—they’ve got 27 things on their website. And there’s lots of other good stuff.”
- “Every—every one of these organizations either was founded or continues to exist because its collections resonate with someone. And there’s always something interesting and good in every one of these—they’re all valuable.”
- “That’s why I’m a proponent of large-scale digitization as an antidote to historical skewing.”
- “The more that we can do this collaboratively, the more that we can—push the costs down and do—and be realistic about the quality requirements, the more we can get done. And the more we can get done, the broader the reach. We can reach into the smaller organizations. We can reach deeper into collections that aren’t considered treasures, they’re just considered good stuff.”
Historical and cultural skewing due to digitization
What happens when a young person’s knowledge is based only on what he or she can find online? Their view of the world will be defined by what they find online.
- “And then there’s the skewing of our view of what history is all about. If the histories get written, if my high school son is going to write a history paper and it’s only stuff that he can find online, if a doctoral student is going to choose a dissertation topic because he or she has found an invaluable resource of newly digitized records and it makes it possible to do this dissertation without spending 18 months in Tanzania, okay. So then that collection is going to define Tanzania. That dissertation, that book, those articles.”
- “I worry about our grandchildren coming up with their high school textbooks, their view of the world, which is going to be written by people who write digitally.”
- “Documentary films are being made because the material can be made available digitally. Music videos, the online environment and then formal histories, English literature, on and on.”
- “So I, at the meta-level, I’m very worried about a kind of cultural skewing that is almost inexorably driven by digitization.”
- “That’s why I’m a proponent of large-scale digitization as an antidote to historical skewing.”