Clifford Lynch - Learning Resources

Clifford Lynch
Executive Director, Coalition for Networked Information
Interviewed 2011

Summary: How do digital collections connect communities? Clifford Lynch discusses how the Library of Congress has successfully used crowdsourcing to catalog their photo collections on Flickr and how this has resulted in connections, conversations about family, genealogy and local history, and “really deep storytelling.” What happens when museums digitize entire collections? Typically, museums are only able to display the top 5% of their holdings at a time. Learn about the new “encyclopedic museums” will revolutionize the way scholars access, understand, and analyze materials.
Quote: “The whole strategy of imaging important collections of cultural heritage was really a stewardship and survival strategy.”

Issues in integration
In the past, the workflow process was adversely affected by the siloed nature of the various parts, e.g. metadata and imaging.
“I would say the biggest hole though, was around—integration and delivery. So you had one silo where you could do the metadata stuff, another silo where you’re doing the imaging stuff, a real…workflow problem connecting the two reliably, and then the problem of how to craft something to allow retrieval and delivery across this whole mess, which was really quite a different world than the production workflow for it. So things were pretty bad back then.”
“You still had…an integration problem though. There weren’t good platforms for building a system that really combined text and—imagery nicely.”
Digital collections: “stewardship and survival strategy”
Libraries have the responsibility to be good stewards and ensure the preservation of material. While they do not protect against loss of original material, digitizing provides a record in case of damage or a natural disaster.
“The whole strategy of imaging important collections of cultural heritage was really a stewardship and survival strategy.”
“It was something that institutions charged with stewardship were going to be obligated to do to be good stewards and to ensure the preservation of their material.”
“Probably the strategy going forward would be to have digital records of the material and then the underlying material and that that would give you…certainly not protection against loss of the underlying material, but at least leave you in a much better place given the, you know, ugly multimillenia history of wars and natural disasters and things that have damaged so many collections.”

Issue: Color fidelity
Capturing color in high fidelity is very difficult.
“In terms of the sort of underlying technology, probably the thing that I didn’t see coming—was how much trouble color fidelity was going to be.”
“In terms of the amount of grief we’ve had with standards around color space and color management and calibration schemes, I mean, this is still a big headache for—workflows that in—that where you’re really concerned about, you know, the capture of color with—with high fidelity.”
Issue: Lack of standards and interoperability
The lack of standards and the multiple ways of doing things results in systems not being interoperable.
“Another area which—I think we—we would have been so well served to get out in front of with a good standard 10 or 15 years ago is the situation where you’ve got—textual material imaged and
then you’ve got a—an OCR transliteration attached to it. And you want to—work with those two as connected objects essentially.”
“And people have way too many different ways of doing this today. I mean, it’s a disgrace. If you look at—for example, the—projects that people like Mellon have been funding to digitize manuscript collections—a lot of those still don’t interoperate.”
“And that’s the kind of thing where, if we’d made a strategic investment in some standards and maybe some reference software that could have been given away open source or very cheaply—early on, it would have saved a lot of pains, some of which is still to come as we—get all this stuff into some kind of homogeneous form.”

Issue: Scaling up
Success in scaling up depends largely on funding from a committed institution and not attempting to scale too early in the process.
“When you look at the history of a lot of this digitization—at work, there—there was a lot of overpromising, and it’s still very hard for people who make large scale funding commitments, and I don’t mean, you know, here’s a—here’s a small, you know, project that we’re going to get a grant for to digitize something, but I’m going to make an institutional commitment at scale.”
“It’s still really tough, I think, and has been, over the last 10 or 15 or 20 years to figure out when the right moments are in terms of cost performance and quality of the technology to make those choices. When—when to move from little pilot projects and when to do something at scale.”
“And so there’s been a history of attempts to do things at scale too early, where the technology was overpromised to management and funders and then a lot of money spent and not much result.”
“That’s a very, you know, different environment and I think, you know, sort of collectively, everybody didn’t do as well as they could have in terms of thinking strategically about when to fund pilots, when to fund fundamental research and how to communicate the outcomes of these two policy makers, essentially, who would drive or—you know, decisions to do large-scale deployments.”

The Optiputer
About the optiputer ( as described by Clifford Lynch
Dr. Larry Smarr of UC San Diego, who ran the National Center for Super Computer Applications, has been a part of pioneer in high performance computing for about 20 years. Dr. Smarr’s interests lie in the high performance visualization of models, e.g. astrophysical phenomena or biological phenomena coming out of super computer simulations. Because some of these datasets are extremely large, i.e. 600,000 pixels on one side, which when squared, is gigantic. Dr. Smarr worked on display devices for these large datasets.
Dr. Smarr built optiputers, which are walls of large LCDs that can handle a large number of pixels. The 24-inch LCDs are lined up 20 across and 4 down. Dr. Smarr uses a Beowulf cluster for graphics management. IO (input/output) drivers are used on one or two of the monitors on each of the parallel machines in the cluster. This allows for backing up the monitors with enough computational power and storage to allow for zooming, dragging and dropping, etc.
“So when you start thinking about medical imaging or simulations, certainly people are starting to work with this.”
“They’ll work with it for cultural heritage, too. So—you know, you’re going to see large paintings reproduced even larger on this thing, and an ability to—zoom in on detail that—you know, is a bit different than what we’ve had to date.”
“I think we need to be—you know, kind of cautious about—that—how much is enough resolution. You know, I remember back in the day, we underestimated that more than once in the interest of proving that a system would be affordable on an engineering basis. Now, I think this is really only going to apply in cultural heritage to things that people really need to see the details of, and we’re going to need to make some choices about this and it’s clearly silly to do this for—you know, a lot of hand manuscripts and things like that. We just don’t need that kind of resolution.”
Reading ancient handwriting: “a bigger paleography problem”
Paleography is the study of ancient handwriting and inscriptions. We currently have an issue of the younger generation not being able to read handwritten text.
“Just as a sideline on OCR, one of the things we probably are going to need to spend some R&D money on is OCR for more kinds of texts and for handwritten texts and things like that.”
“You know, it’s all very interesting to hear this rhetoric about—about engaging people with primary sources. But the fact of the matter is that—most kids aren’t taught handwriting anymore, really, of the sort of Victorian copperplate kind, and you show them 19th century letters, handwritten letters or manuscripts, and, you know, they may as well be looking at a, you know, 10th century manuscript.”
“So—things that we can do to—to help with, you know, OCR and transliteration of those. I mean, maybe another way of saying it is, we’ve got a bigger paleography problem than we’d like to admit, so I’m mindful of that one.”

3D as “one of the next frontiers”
Clifford Lynch sees the digitization of 3D objects, e.g. buildings and statues, as one of the next frontiers.
“So those are a few of the things that—that I think are real issues on the capture side going in. There’s also a question about how we do 3-D capture. We’re starting to see a lot more work on imaging 3-D objects and there’s a whole lot of different strategies ranging from the kind of old fashioned one of just, you know, you document it from all four sides and the top if you need to—through things where, you know, for statues now, we’re doing these laser scans.”
“I mean, we can do whole buildings and statues and stuff like this. So the whole question of digitizing 3-D stuff is still very much on the table and—and I think is going to be the kind of next frontier—or one of the next frontiers.”

“Encyclopedic museums”
Museums typically display the best 5% of their holdings at a time. Digitizing museum collections in their entirety will provide more context and change the way scholars do research, e.g. it will allow scholars to compare and contrast objects within the collection; scholars will be able to analyze how objects and practices changed over time.
“So now the question is, how do you build tools to let people understand the variation across 100 objects? And if you think about a lot of scholarly work in museums, it has much the same quality, right? Think about—you know, those sort of endless Greek vases. Now, you know, what you do with an exhibition is you put out, you know, about half a dozen particularly nice examples. But what a lot of the scholarly work is really about—is understanding what’s sort of typical about these and what’s unusual and it means looking at lots of examples of these and trying to understand their similarities and variations.”
“A big museum, maybe, can exhibit maybe 5% of its holdings at a given time. And the whole way people understand museum collections is going to change radically as they can get to representations of the entire collection. And they’re going to want to do this kind of analytical stuff across large numbers of—you know, not necessarily individually stellar examples, but—to understand how the production of things and the practices changed across time. So you’re going to see—I think, a whole line of development around these kinds of systems—accompanying the move to open up the collections of large kind of, you know, encyclopedic museums.”

Pictures are difficult to catalog
It is very difficult to adequately catalog pictures because they are so rich and they require subjective interpretation.
“Pictorial material is, at some level, almost impossible to comprehensively describe because it’s got so much in it, it’s so rich. Often it has interpretations that are influenced by cultural context, allegories and things or historical representations in very complex ways. So we’re very bad at describing images even in those rare cases where we can throw huge amounts of human time at it.”
“On top of this, the actual fact is that most of the time, we don’t have the money to throw the human energy at it and to do elaborate cataloging of this even if we could. So it’s not uncommon to find things on the net like, you know—20,000 photographs of New York City street life in the 1950s.”
Crowdsourcing, conversations, critical mass and “really deep storytelling”
The Library of Congress is effectively using crowdsourcing in asking the public to help identify and describe photos on Flickr.
“If you look, for example, at the experience the Library of Congress has had, and they’ve done a very good report on this, putting up photographs on Flickr Commons and dealing with the conversation around it, you know, you start realizing that, for instance, there are lots of people around who are very interested in aspects of material culture. You know, airplanes, trains, machines, cars—there are lots of people who are interested in genealogy and family and local history. And there’s a ton of material kind of in private hands.”
“Now, the place where this all kind of you know, reaches critical mass and ignites is where you’re dealing with photographic collections because photography is still a relatively recent technology.”
“So, unlike putting up, you know, images of 16th century paintings, the property of most photographic collections, especially big ones, is they’re dealing with stuff that still hits the edges of living memory.”
“So you started eliciting these things about, you know, this connects somehow with my family and my family’s history, you know, that’s my granddad as a kid and his pet dog and I happen to know the name of his pet dog, and—you know, that’s the store he owned in the 1930s. This kind of, you know, really deep storytelling.”

Lots of floppy discs and no floppy drives
Archivists are now dealing with materials from the 1980s and 1990s that require old computer technology that is no longer available.
“Right now, people are having close encounters with the horrors of, you know, consumer electronics in the 1980s and early 90s. You know, nasty sorts of floppy discs with manuscripts on them that they want and things.”
The future of digitization
With the pervasiveness of photo and video recording via mobile devices, we will need to anticipate how to manage these types of technologies and collections.
“So we’ve not only got the things digitized according to our standards of cultural heritage, but we’re going to have some pretty funky images coming in off of—you know, not off of nice SRL cameras but off of—you know, cell phone cameras and things like that that leave a great deal to be desired. And we’re going to need to think about how to mix these into our collections and manage them and where various kinds of image enhancement is appropriate and that sort of thing. So I think—I think there are a whole new collection of issues as we start thinking about, you know, what our—you know, what our collections going to look like in 2050 as documenting and understanding the lives of individuals in our culture continues to be an important activity.”

