Skip to main content

You are here

Clifford Lynch - Transcript

Clifford Lynch
Executive Director, Coalition for Networked Information
Interviewed March 4, 2010


----Beginnings----
[How did you get involved in the field of digitization with libraries and museums?]
I’m actually by background and training a computer scientist, although I have spent pretty much all my working life dealing with issues around information retrieval, library automation, computer networking, those—those kinds of things. And I’ve been involved with various phases of libraries since about the mid-70s. So I’m really ancient. I can—I can, you know, tell you horrible stories about ancient old machines in libraries, you know, and the trialabytes I kept for pets at the time and that sort of thing. So I go back a long way. In fact, I go back really before the technology of imaging was within—certainly financial reach of the library, museum, and archive community, with the—you know, exception of a couple of very small scale hero projects—that various government agencies did. And I’ve kind of watched the evolution of this as the technology has gotten less expen—less expensive and more accessible.
So in terms of my working career, I was with the University of California from around ’79 to around1997 and I did a bunch of things there starting with the building of the online catalog, Melville, that covered the UC system and the computer network to support it and then growing that out from an online catalog to a multi-faceted kind of an information resource that held various kinds of secondary and then later on, primary materials as the sort of cost curves moved.

The last 10 years I was at UC I held the title of Director of Library Automation and that gave me not juts responsibility for Melville but a sort of viewpoint and a connection into a lot of other things that were happening in the UC system at the campus level, where sometimes I’d get involved on a consultative basis—or other kinds of system-wide initiatives and I’ll come back to a couple of those.
I left UC in 1997 to take my current position as Director of the Coalition for Networked Information. And—you know, I’ve been doing that since and again, in that role I see a tremendous amount of digitization stuff and have had opportunities to talk with or occasionally influence the thinking on a number of projects.
Now I—you know, sort of in preparation for this did some thinking about what actual digitization things have
I been substantially involved in as opposed to just kind of around the periphery of. And it’s kind of interesting. If you look at a lot of what I’ve done with digitization, it’s really been with the delivery of digitized page image material primarily. So, for example, the University of California was a very early pioneer in a number of projects to take—page images of scholarly journals and deliver them across the network to—display terminals scattered around the UC system and beyond. And, you know, that was a technically substantially challenging undertaking. In the early days, of course, about all the publishers could give us is—is bit-mapped images. This was before XML, before web browsers. We were actually doing this with specialized X-Windows programs probably back in—I guess this would have been the early 90s.

We weren’t really much involved in the digitization side of that. That—that was done by the publishers and we worried about delivery and so UC was part of the Tulip project with El Sevier, which you’ve probably run across. We later did a very large scale trial with the Society for Electronic and Electrical Engineers, IEEE, doing all their material, and there were others. So that was probably the biggest kind of imaging thing that I was directly involved in the—the deployment of.

On the other hand, there’s a whole pile of things I’ve—I’ve been involved in on kind of a consulting basis, so back—in the—I guess probably this would be around 19—83, probably about that early, I got involved with a series of projects that the National Agriculture Library did. They were a very early leader in doing digitization, again, largely of textual materials. Grey literature, journals, things like that, which they were delivering on CD-ROM to various—institutions, including material on sustainable agriculture to third world countries and things like that. They actually did a number of things with various laser discs before the industry sort of coalesced around CD-ROM as a—as a sort of a standard medium for this. And they’re—you know, NAL set up a pretty significant production line for capturing this stuff. Again it was—in some ways not challenging by today’s standards. You know, basically monochromatic and you were really more concerned with readability than—you know, perfect fidelity of reproduction for the material, some of which—you know, wasn’t superb in its original form. So—did a lot of that kind of thing.

Had some peripheral involvement with several projects to image manuscripts, medieval and renaissance kinds of manuscripts back in the—again, probably this would be the late 80s on CD-ROM. Charles Fulhaber at Berkeley who’s now the—the head of the Bancroft library there was actually a faculty member in Spanish and Portuguese at Berkeley and was one of the early pioneers in capturing manuscripts. He—I remember he did a collection of Spanish manuscripts which he presented on CD-ROM to the King of Spain when the king was over.
So there you were starting to get more into issues of fidelity of capture and reproduction. Later in the—in the 90s, various UC entities got quite involved in the first round of the NSF digital library program. So you had the geospatial project at UC-Santa Barbara, the—which really actually in some ways goes on today, still. They’re kind of the center of geospatial material, both digitized maps, aerial photographs, this kind of thing. There was also another project at Berkeley, which—among other things imported a lot of material on the flora of California. There’s a book called—Jeppeson, which they had the rights to and the Berkeley CS folks worked with a herbarium and other groups there to put this kind of material up.
Berkeley supported a large museum in phrematics program, actually. They were one of the pioneers in doing that institutionally back in the 80s and 90s and I talked with them frequently about their work. So they were capturing imagery from the Hurst museum and anthropological materials from the herbarium and a few other sources.

So those were some of the—the kind of early—you know, imaging and digitization things that I—I was around in one capacity or another. I guess the other one I’d mention, which I really can’t take, you know, any credit for other than doing a bit of consulting with it, but I think was very influential in shaping some of my thinking was the big—the big image database that New York Public launched about four years ago—five years ago now. That’s about—that launch was about half a million images from their collection and represented one of the fairly early efforts to deal with a—a large and extremely heterogeneous database of imagery. I mean, the Library of Congress has done some things in this area as well, but—I would say probably more focused projects at least up to that point.

So those are—those were some of the—the places where I—you know, came into this, that I can think of, at least, in getting ready for this.


[What was it like working with these projects in the beginning and not have metadata standards or guidelines?]

The first problem—you can kind of break this up into periods. There was the period before CD-ROM really got stable. Then there was the period where you were doing things mainly on CD-ROM for most kinds of projects—and you were doing typically workstation-based delivery using the CD-ROM. Then there was the period when you started really wanting to move the stuff onto the network. So you can kind of think of these as three epics.
Now, the pre-CD-ROM period was just horrible. There were basically no standards for much of anything. You know, there were sort of early forms of .tif for dealing with the images. Compression wasn’t well-standardized, typically people were using some variation of Huffman Coding, at least for the bi-tonal images, but it was—it was pretty messy.
This was an age where there were a lot of custom turnkey systems that were put together that really weren’t very transparent where you’d sort of fall into the clutches of the system integrator and find it very hard to extricate yourself. This was a period where basically, you couldn’t afford to use magnetic storage to store any significant number of these images, and there was this sort of endless supply of—of crappy and unstable optical storage solutions. Some of them with the extra bonus of involving jukebox kinds of technologies. So you had mechanical problems as well as optical problems dealing with this stuff. They tended to use proprietary file formats on these things. And of course, you know, you really wanted to do stuff like couple this up to main—large mainframes, which was a particularly bad meeting for various reasons having to do with IO channel protocols and things like that. So you wound up often with some kind of mini computer feeding your larger mainframe where you were dealing with the metadata management and creation.
There were, as you say, basically no standards for metadata, but—well, actually, that’s not fair. There were standards for metadata for a lot of stuff I was working with because it was primarily textual in nature so bibliographic kinds of—you know, MARC sorts of standards, or the kinds of standards established by major ANI databases in selected fields were around to help.

I would say the biggest hole though, was around—integration and delivery. So you had one silo where you could do the metadata stuff, another silo where you’re doing the imaging stuff, a real prob—workflow problem connecting the two reliably, and then—and then the problem of how to craft something to allow retrieval and delivery across this whole mess, which was really quite a different world than the production workflow for it. So things were pretty bad back then.
And, you know, there were zero standards for delivery. I mean, we kind of—we wound up building custom delivery things for different platforms and they—they had a fairly short life because the platforms were unstable themselves. I mean, one of the things that—that people tend to forget now, where you’ve got a manageable rate of innovation in things like the personal computer industry, and you have—sufficiently large deployed bases of technology that vendors have to worry about back compatibility and things like that—it wasn’t like that in the 80s. I mean, there were a zillion companies that popped up, you know, had aspirations to be the next Microsoft or—well, they didn’t know about Microsoft back then, maybe the next—the next—WordPerfect or something like that and then, you know, vanished equally rapidly. So there was tremendous churn in the industry and a great deal of sort of discontinuity where companies would go away and the sunk investment and data would be kind of stranded. This is a—this is a situation that is really much less commonplace today. So you were fighting all that.

Now, once you got to the CD world, that rapidly started to help standardize things around image handling, you know Apple’s development of things like QuickTime was very helpful there—you saw standards like .jpeg starting to get well-established—.gif earlier but really .jpeg for more serious quality imaging. The—there started to be some serious thinking about metadata standards that would be relevant for images of things other than the sorts of things that MARC comfortably describes—art objects, cultural objects, scientific objects.
You still had a sort of a—a integration problem though. There weren’t good platforms for building a system that really combined text and—imagery nicely. So, you know, you saw a lot of the early stuff in that day like—things like the Voyager work. You know, those had a certain handcrafted quality to it, which were beautiful for these closed—kind of closed content systems. You know, Voyager would say, we’re going to present something, you know—a corpus of music or, you know, Darwin’s notebooks or whatever. And they—they’d kind of do a finite CD, but many of these cultural memory institutions were really interested in not just snapshots but kind of growing collections that—that could be presented and really struggled with those kinds of things. It’s easy to forget how really slow those machines were and how memory-constrained those were and how slow the original CDs were and there was a certain—you know, excruciating quality to many of those systems. They could do things but they couldn’t do things fast.

So then—in the kind of third era, life got really interesting because the kind of theme there was you wanted to build up these network-based servers and do delivery across the network. And that turned out to be really hard for a lot of reasons. Coming up with the right kind of architecture for how to distribute function between some kind of network endpoint, you know, a PC or an X-Windows workstation or a Mac or something, picking the right standards to go with that architecture for the distribution of function, those were really complicated issues that were very poorly understood.
You know, such things as, do you use something like X-Windows or do you use Z39.50 and move the images around as data, you know, objects and then deal with them—you know, out on the workstation? One of the problems with using something like—a client server style protocol where the object that’s being delivered is an image but it—really the software on the client is what understands it’s an image, is that—it doesn’t let you do tricks like progressive transmission very well, you have to get the whole damned image over to the client before you can start doing something with it. People forget how slow networks were there, especially long-haul networks, and even more especially last-mile networks.

You probably can sort of faintly remember or—or you know, some of you maybe can, as children, the earliest days of the web where you had graphical web browsers like Mosaic starting to come online. But in fact, for most people, access to the web was still a dial-up modem operating at, you know, 32 kilobytes, maybe 50 kilobytes with some compression in the modem if you’re lucky—to home, you know, and that’s really not enough capacity, you know, to support fluid and—and kind of quality imagery work. So—that was—it was really quite awhile before the network grew into the kind of promise that it immediately offered and that you could sense when you saw things like the first demonstrations of Mosaic. It took awhile for the underlying networks to really catch up with that on a deployed basis. Browsers, of course, were—web browsers—were an absolute godsend for all kinds of imaging things because they started to provide a kind of a common platform. I mean, prior to the web browsers, or—or actually, the web browsers and a couple of early predecessors, things like Gopher, there was still a tendency to have custom clients for databases, which was really a messy—you know—business that impeded the use and reuse of images in a really serious way.
[When did you discover that digitization was important to cultural institutions?]
I don’t know that there was, you know, a sort of sudden revelation that I had. I mean, it—it was clear to me pretty early on that—you know, the potential was—was there to really just completely blow open the doors of cultural heritage institutions and fundamentally change the equations about the way things were used. And that it was really largely just a question of how long would it take for the cost curves and the deployment rates to, you know, really make it possible to do it.

I guess—one very important kind of shift for me, and it was one—it—you know, it wasn’t one that happened in a moment, but—but it was kind of a change in thinking, that—that I started working through in the late 90s was a realization that we actually were rather quickly going to get to the capability to image objects—and other materials in such a way that they were as good as the underlying objects for not all purposes, but many purposes, and that in fact they needed to be—that the whole strategy of imaging important collections of cultural heritage was really a stewardship and survival strategy. It was something that institutions charged with stewardship were going to be obligated to do to be good stewards and to ensure the preservation of their material and that—you know, probably the strategy going forward would be—to have digital records of the material and then the underlying material and that that would give you—you know, certainly not protection against loss of the underlying material but at least leave you in a much better place given the, you know, ugly multimillenia history of wars and natural disasters and things that have damaged so many collections.
[Top] [Back to Interview Breakdown]

----Hindsight----
[Looking back, what would you have done differently?]
Well—let me pick out maybe a couple of things for brief mention there. In terms of the sort of underlying technology, probably the thing that I didn’t see coming—was how much trouble color fidelity was going to be. And it may be that I was just naïve about this, because certainly, you know, I had started to hear from people who had worked, for example, in film—you know, that this was going to be an issue. Or people who had done—art books, you know, where they were trying to do high quality reproductions and print of paintings and things. And, you know, I was sort of prepared to believe, yeah, it’d be an issue, but in terms of the amount of grief we’ve had with standards around color space and color management and calibration schemes, I mean, this is still a big headache for—workflows that in—that where you’re really concerned about, you know, the capture of color with—with high fidelity.

Another area which—I think we—we would have been so well served to get out in front of with a good standard 10 or 15 years ago is the situation where you’ve got—textual material imaged and then you’ve got a—an OCR transliteration attached to it. And you want to—work with those two as connected objects essentially. And people have way too many different ways of doing this today. I mean, it’s a disgrace. If you look at—for example, the—projects that people like Mellon have been funding to digitize manuscript collections—a lot of those still don’t interoperate. And that’s the kind of thing where, if we’d made a strategic investment in some standards and maybe some reference software that could have been given away open source or very cheaply—early on, it would have saved a lot of pains, some of which is still to come as we—get all this stuff into some kind of homogeneous form. So that’s another—that’s another problem area.
Beyond the technology though, and keep in mind that a lot of what I did in the 80s and 90s was really thought about building big, deployable systems. Is, when you look at the history of a lot of this digitization—at work, there—there was a lot of overpromising, and it’s still very hard for people who make large scale funding commitments, and I don’t mean, you know, here’s a—here’s a small, you know, project that we’re going to get a grant for to digitize something, but I’m going to make an institutional commitment at scale. It’s still really tough, I think, and has been, over the last 10 or 15 or 20 years to figure out when the right moments are in terms of cost performance and quality of the technology to make those choices. When—when to move from little pilot projects and when to do something at scale. And so there’s been a history of attempts to do things at scale too early, where the technology was overpromised to management and funders and then a lot of money spent and not much result.

The kinds of things that one deals with in pilot projects are very different than the kind of things one worries about at scale. It’s actually really easy to make nice-looking pilots of image collections, even reasonable size image collections—you know, and I’ve seen literally hundreds of these over the years. It’s much, much harder to do these things, you know, where you’ve got, a hun—you know, 50,000 concurrent users beating on this thing from around the world, where you have basically no control over the devices that they’re using and the quality of the network paths to them and you want this thing to be at least reasonably robust. That’s a very, you know, different environment and I think, you know, sort of collectively, everybody didn’t do as well as they could have in terms of thinking strategically about when to fund pilots, when to fund fundamental research and how to communicate the outcomes of these two policy makers, essentially, who would drive or—you know, decisions to do large-scale deployments.[Top] [Back to Interview Breakdown]


----Advice----
[What advice can you give emerging library science students?]

Okay. So, there’s this—there’s this professor at UC-San Diego named Larry Smarr. That name may sound familiar because he ran the National Center for Super Computer Applications back when they built Mosaic and has done a number of other notable things. He’s been a part of pioneer in high performance computing for about 20 years. So, when he went back to San Diego, one of the things he was very interested was really high performance visualization—of models of—you know, things like astrophysical phenomena or biological phenomena coming out of super computer simulations. So he decided that what they needed to do was to get serious about display devices and—because some of these datasets are—you know—oh, I don’t know, 600,000 pixels on a side? You know, so square that up and you’ve got a big number.
So basically he started building these things called optiputers, which are walls of big LCDs. You take about a—I don’t know, maybe a 24-inch LCD and you gang up, like, 20 of them by 4 of them, into a wall that actually can handle a lot of pixels. However, you’ve got some problems essentially with the graphics management on this, so what you do is behind it you stick a Beowulf cluster. And you put IO drivers for one or two of the monitors on each of the parallel machines in the cluster. So you basically are backing up this monster monitor with an appropriate amount of computational power and storage so you can do things like zoom and, you know, drag and drop stuff, you know, across the ganged—screens—appropriately.

We’re going to see stuff like that as the—you know, as display devices, more and more commonly I think in—high end kind of imaging applications. So when you start thinking about medical imaging or simulations, certainly people are starting to work with this. They’ll work with it for cultural heritage too. So—you know, you’re going to see large paintings reproduced even larger on this thing, and an ability to—zoom in on detail that—you know, is a bit different than what we’ve had to date.

So I think we need to be—you know, kind of cautious about—that—how much is enough resolution. You know, I remember back in the day, we underestimated that more than once in the interest of proving that a system would be affordable on an engineering basis. Now, I think this is really only going to apply in cultural heritage to things that people really need to see the details of, and we’re going to need to make some choices about this and it’s clearly silly to do this for—you know, a lot of hand manuscripts and things like that. We just don’t need that kind of resolution.

Anyway. So, moving on, so we’ve got color, we’ve got level of resolution—I think we’ve got some questions we need to be thinking about multispectral imaging. That’s starting to be used to rather good effect in manuscript scanning, for example. Because it sets you up to do some nice image enhancement—that you can’t do with monospectral visible light imaging. If you look at the kind of work people have done on things like the Archimedes palimpset —you know, having—having those—those scans at different wavelengths are really useful. But again, how you—correlate them and align them is messy, just like how you manage OCR with text. I’m sorry, with underlying images.

Just as a sideline on OCR, one of the things we probably are going to need to spend some R&D money on is OCR for more kinds of texts and for handwritten texts and things like that. You know, it’s all very interesting to hear this rhetoric about—about engaging people with primary sources. But the fact of the matter is that—most kids aren’t taught handwriting anymore, really, of the sort of Victorian copperplate kind, and you show them 19th century letters, handwritten letters or manuscripts, and, you know, they may as well be looking at a, you know, 10th century manuscript. So—things that we can do to—to help with, you know, OCR and transliteration of those. I mean, maybe another way of saying it is, we’ve got a bigger paleography problem than we’d like to admit, so I’m mindful of that one.
So those are a few of the things that—that I think are probably real issues on the capture side going in. There’s also a question about how we do 3-D capture. We’re starting to see a lot more work on imaging 3-D objects and there’s a whole lot of different strategies ranging from the kind of old fashioned one of just, you know, you document it from all four sides and the top if you need to—through things where, you know, for statues now, we’re doing these laser scans. If you look at stuff like the Michelangelo project or the work that people like Steven Murray are doing on things like French churches. I mean, we can do whole buildings and statues and stuff like this. So the whole question of digitizing 3-D stuff is still very much on the table and—and I think is going to be the kind of next frontier—or one of the next frontiers.
So yeah. Those are a few of the kind of striking things about the capture side. Let me say a few things about the rendering side. So one of the—one of the things we need to get a lot smoother about is how you do zooms in and out in very high resolution objects and how you render three-dimensional things. Another area—and actually, when I talk about delivery—I want to—I’ll first talk about delivery, you know, at this sort of individual object level and then I want to say a few things about corpora. So that’s—that’s one issue at the individual object level.

Another, again, is this whole business of color calibration on delivery, which is just a mess and—and needs some thinking about. How you—contextualize objects in an apparatus, for example, that involves layers of annotation, that involves the multi-spectral imaging, that involves transliteration and perhaps translation attached to that. We don’t have good standards for that. We—certainly people have built good—individual systems, but we need to make this a routine process that—you know, allows databases to interoperate freely and is just much easier for people on both the capture and the delivery side and where we stop inventing the wheel. So I think—I think that’s a really important set of problems.
Other things on the—the individual delivery side. I think those are probably the most important. I think that another thing that we’re starting to run into as we digitize more materials is that people want to work with them in different ways. So for example, one of the things that textual scholars do a lot is they compare texts. And—you know, it’s not—it’s certainly not uncommon, in the days before computers, to see a scholar working with two or perhaps three manuscripts, trying to understand textual variations to build a critical edition or something like that.

Even working with two or three is clumsy in most systems right now, especially if you don’t have two or three monitors to work with. But just think about this. So—in—in reality, it’s usually not just two or three texts, it’s just that that’s all we could work with. So you have a project like—The Romance of the Rose, at Hopkins, where they’re basically digitizing every manuscript of that they can find. So they’ll have 100 or 150, something in that range by—the time they’re finished the major scanning phase of the project, which I think is sometime this year.

So now the question is, how do you build tools to let people understand the variation across 100 objects? And if you think about a lot of scholarly work in museums, it has much the same quality, right? Think about—you know, those sort of endless Greek vases. Now, you know, what you do with an exhibition is you put out, you know, about half a dozen particularly nice examples. But what a lot of the scholarly work is really about—is understanding what’s sort of typical about these and what’s unusual and it means looking at lots of examples of these and trying to understand their similarities and variations.
So you’ve got a—you’ve got again here a real tool challenge as you start thinking about—you know, a big museum, maybe, can exhibit maybe 5% of its holdings at a given time. And the whole way people understand museum collections is going to change radically as they can get to representations of the entire collection. And they’re going to want to do this kind of analytical stuff across large numbers of—you know, not necessarily individually stellar examples, but—to understand how the production of things and the practices changed across time. So you’re going to see—I think, a whole line of development around these kinds of systems—accompanying the move to open up the collections of large kind of, you know, encyclopedic museums.
Another thing that I’ve been—I’ve had very much brought to mind by some recent talks I’ve seen is—we’ve got a lot of issues about when you display things for various purposes, how you indicate uncertainty and how you deal with reconstruction of—of damaged things, and represent that in ways that people understand what they’re looking at. So this gets you to things like the London Conventions for the Reconstruction of Architectural—Models, where you might have—you know, a ruin of a building and you want to build—you want to start with the ruin and then represent kind of scaffolded on the ruin what you think the place looked like. And trying to understand what parts are how speculative and based on what evidence is very critical. And this is going to scaffold on top of the—the actual digitizations that provide underlying evidence in all kinds of areas around archaeology and architecture, to some extent manuscripts that are damaged and are being reconstructed. And I don’t think there’s near enough thinking going on about how to handle that.

The other thing I’d say is that, especially when you’re dealing with pictorial material—pictorial material is—is, at some level, almost impossible to comprehensively describe because it’s got so much in it, it’s so rich. Often it has interpretations that are influenced by cultural context, allegories and things, you know, or—historical representations in very complex ways. So we’re very bad at describing images even in those rare cases where we can throw huge amounts of human time at it.
On top of this, the actual fact is that most of the time, we don’t have the money to throw the human energy at it and to do elaborate cataloging of this even if we could. So it’s not uncommon to find things on the net like, you know—20,000 photographs of New York City street life in the 1950s. Helpful. You know? And so, what happens around these collections—and, you know, it will happen I think whether it’s well-documented or very sparsely documented is that—basically they turn into a conversation with the audience. And a couple of the speakers here [WebWise 2010 conference] got into this a little bit, but I think—I think that the impact of this phenomenon is still rather underestimated.

If you look, for example, at the experience the Library of Congress has had, and they’ve done a very good report on this, putting up photographs on Flickr Commons and dealing with the conversation around it, you know, you start realizing that, for instance, there are lots of people around who are very interested in aspects of material culture. You know, airplanes, trains, machines, cars—there are lots of people who are interested in genealogy and family and local history. And there’s a ton of material kind of in private hands.
Now, the place where this all kind of you know, reaches critical mass and ignites is where you’re dealing with photographic collections because photography is still a relatively recent technology. So, unlike putting up, you know, images of 16th century paintings, the property of most photographic collections, especially big ones, is they’re dealing with stuff that still hits the edges of living memory. So you started eliciting these things about, you know, this connects somehow with my family and my family’s history, you know, that’s my granddad as a kid and his pet dog and I happen to know the name of his pet dog, and—you know, that’s the store he owned in the 1930s. This kind of, you know, really deep storytelling.
So when you start making these kinds of things available, you’ve got to be ready to figure out what you’re going to do with that conversation. And this gets into really tricky questions about, you know, to what extent you want to adjudicate this conversation, to what extent you want to lend authority to it, how do you distinguish between assertions you make and assertions that other people make about the object? How do you deal with the acquisitions piece? When people say, not only is that my granddad, but I’ve got two shoeboxes more of pictures. You know, and he actually wrote some memoirs, too, and I have the rights to that if you’d like to digitize it.

So, you know, this can be just a treasure trove for a cultural memory organization, or it can be a disaster, you know, in many dimensions if you don’t think this through. And, I think that, you know, kind of as a community, we really need to start talking about this in a—in a much more sophisticated way than—than we have been. We’ve got some pilot projects, some institutions have done some very, very good thinking about this, but we shouldn’t all be reinventing the wheel on this one. So I think in terms of—of making large collections of—especially photographic material available, this is—this is a monster issue. So those—those are a few speculations I’d throw out.

And one last thing, since we’re really focused on cultural heritage. It’s—one of the places where I’ve been paying a lot of attention lately in talking with a lot of people is, what happens to cultural heritage going forward? As—as archives and special collections and libraries and other groups acquire personal papers and personal collections from people, these are having more and more of a digital component. And—you know, you’re always a little bit in the rear-view mirror there because you tend to get things after they’ve sat around for a while. So, you know, right now, people are having close encounters with the horrors of, you know, consumer electronics in the 1980s and early 90s. You know, nasty sorts of floppy discs with manuscripts on them that they want and things.

If you look at what people are doing today, though, it’s absolutely clear that the nature of those personal collections is changing dramatically, and in particular, the amount of still and moving image material is, you know, just orders of magnitude bigger than it’s been in the past. So, you know, a family from the 1950s, a typical one, somebody might have, you know, some little bit of 16 mm film in the attic or something from somebody’s birthday but people didn’t take video casually. That really didn’t start till Camcorders in the 90s. And now, with the ability to do it on cell phones and things like that, it’s just become pervasive.

You get people—you know, individuals now, who are routinely trying to bumble through the management of 50,000, you know, still images that they’ve taken and, you know, can’t find anything or figure out how to organize it, and are starting to, you know, geo-reference their images and things as the technology moves on. That’s all coming into our collections in a few years, that’s going to start appearing.
So we’ve not only got the things digitized according to our standards of cultural heritage, but we’re going to have some pretty funky images coming in off of—you know, not off of nice SLR cameras but off of—you know, cell phone cameras and things like that that leave a great deal to be desired. And we’re going to need to think about how to mix these into our collections and manage them and where various kinds of image enhancement is appropriate and that sort of thing. So I think—I think there are a whole new collection of issues as we start thinking about what our—you know, what our collections are going to look like in 2050 as documenting and understanding the lives of individuals in our culture continues to be an important activity. So I—I think we need to have an eye on that future stuff as well.[Top] [Back to Interview Breakdown]

July 2012