Peter Hirtle - Transcript
Peter Hirtle Senior Policy Advisor at Cornell University Library Interviewed 5/25/2010
-- Beginnings --
I’m really honored that you think of me as a digital pioneer, because I certainly don’t think of myself as that way. At best I was there very close to the creation and was involved with a lot of—I think interesting projects—often at the tail end just as they were failing. I used to like to say, I think I have been involved with more failed project—digital projects than any other person in the United States. But anyway I’m glad to be doing this.
Now, you wanted to know about how I got started in all of this. And that’s a—a very interesting question. You know, I had started off going to graduate school in history, and even then, for reasons that I still can’t fathom, I was very interested in how one could use new technologies in order to make historical research work. I remember being at a meeting of the French Historical Studies Association talking with a fellow who’s—a friend who’s now the vice president of the ISI saying how this is what I thought was really going to be the future, is how could we use this kind of emerging stuff to make exciting things happen.
Okay. So I was talking about how I was always interested in how we could use new technology—to—make scholarship better. When I was—working at Stanford briefly in the library and then teaching in the program in Values, Technology, Science and Society, it was a wonderful time in the—in the first years of the 1980s. And—because this was—RLG was just—had been around but their new system was just coming in, it had its email system to communicate between RLG computers and it was really neat and messaging across the systems. They had—I remember when Stanford brought in their first OCR machine, which was about the size of a kitchen table and could—if you had documents that were on perfect white paper in Courier 10-point or 12-point type it would get about 80% accuracy. But I thought that was just amazing.
I was using my—typed my wife’s dissertation on the mainframe and printed it off and I suspect it’s the first laser printed dissertation Johns Hopkins ever got delivered. So I—I was interested—this was an interest of mine, and it continued as I went and started working in—archives, to sit down and say, how can we do this? I wasn’t much in graduate school in terms of using the computers and the humanities people, that was going to the summer institute to do—learn how to use punch cards and do statistical analysis. But I was interested in saying how can we use—documents.
And the real breakthrough came when I went to the National Library of Medicine. And I was serving as a curator of modern manuscripts, which was everything—1500 to the present, is modern manuscripts. And the—it was just an exciting time to be there because NLM had been for a long time a—a technology leader, they produced the first—computer-generated book was the Index Met—printed book was the Index Medicus of I think October 1964.
They were—now, I can’t remember if they were just immediately before or immediately after Dialog as the first online database for people to search. They’d been doing things with facsimile transmissions in the 1950’s or interlibrary loan, and George Tallman was running a digital imaging project, which was primarily concerned with current literature and experimenting with that. I think he called it SAIL. But he had an engineering group that was real keen on—imaging of materials. And I just took to it and kept on coming up with projects that I think would be fun things to do. Real interested in trying to convert the surgeon—the index catalog of the Surgeon General’s Office into electronic form, but it was a little too soon to do that and we couldn’t justify it.
But we did do a big project with—converting the—photo collection into retrievable electronic format. This was probably in about ‘85 I guess. It was John Stokes—came in, Stokes Imaging, and we were doing it on video disc. And the plus is—the people—it was a Canadian project. And I can’t remember, if it was at CAHA or—the Canadian Center for Civilization. But they were doing a video disc, and they just shot straight with a camera straight onto the video disc. And we had 35 mm slides made of everything and then digitized from the slides. And of course that was great because we could claim the project was over. Who cared about analog video disc and TSC resolution?
And then you can go back and—run the slide through a slide scanner at some point. And while I was in NLM was really when I became interested in copyright and dealing with issues of copyright and unpublished materials. So I got involved in this project a lot on advising on copyright issues about what could we do with these images and were they copyrighted and handling things that way. I know I wrote a paper about copyright and photographs when I was in library school at the same time at Maryland for Frank Burt. Frank then—I think the next year put it on reserve for his class without asking me and I—was always nervous about that because I wasn’t sure I really believed in my own conclusions in the paper. But it was a—a nice thing.
So—that’s what—listening to George Toma, I also—because it was in Washington, got to go over and see the optical imaging program at the Library of Congress and meet with Carl Fleischhauer who was—doing—their early—CD-ROM projects, the Matthew Grady photographs. And they had the current literature that they were having on their big optical disc, Newsweek and some other things—like that so we could see what they were doing with digital imaging. And a little but with the National Archives.
But I got to know that better because then I—left NLM to go over to work for the technology research staff at the National Library—or, excuse me, the National Archives. Taking Paul—Conway’s position when he left and joining Avra Michaelson, who was working on trying to bring the internet to the Archives and—Ted Weir was in that unit as well and it was headed up by Bill Houghton—and—Charles Dollar and Bill Holmes was the head of it.
And they had been running their optical digital imaging storage system—pilot for quite awhile and I came in and helped finish up the final report on that project. Which was—an interesting project—in—they did a whole lot but they—they probably did it, you know, the wrong way. Which was to sit down and say, given this incredibly expensive equipment that Honeywell had given us that wasn’t very good, was this good enough for people? Would they accept the results? And the response was, yes, they would. People were happy to accept 200 dpi bi-tonal scans of—Confederate records. Because it was better—better than dealing—with microfilm, and these things hadn’t even been microfilmed, so—so it was easy to get to. And they had an index—that was done—I think, if I recall now, was in an ORACLE 5 database but the database was maybe tied to the hardware in which this had all been developed.
Anyway, it turned into a horrendous preservation nightmare, because the images were on 14-inch platters and—not even the 12-inch platters the database was in this proprietary ORACLE 5—format. And at one point I figured out that we could modify the imaging system that had just gone into the Bush library, which was left over from the White House, to—get the indexed data off and get the images off, but it was going to cost probably a couple hundred thousand dollars to do it, and the image quality just wasn’t—didn’t make it worthwhile. And the indexed data—well, probably wasn’t worth the $75,000 to—to make a go of it. So all of that—all the product from that—went away. An example of a good—one of those—failed—digital projects.
Well, you know, the—I’m not sure—well—indirectly, there were some people who worked on that who then became important in NARA’s later very good works. But I credit Steve Puglia with most of that and the knowledge and expertise that he brought and—and Steve wasn’t part of the—OTIS project. They were too busy microfilming in the labs to be able to let him have a chance to get away and—and really deal with the—where he had this outstanding expertise.
But I do remember when Steve and Barry Rosinski, the other fellow—who had worked on the project, went up to—Cornell and took—maybe the first digital imaging workshop that Anne Kenney and Steve Chapman taught and came back filled with the exciting ideas from there. And—the—and—I can remember Anne and—Stewart Lynd coming down and presenting their initial vision—I’m not even sure they were calling it Making of America at that point, I think I still have the prospectus that they were—distributing in a box at home.
But the—idea—yeah, the great literature of this sort of stuff—and—it was just—Anne’s perspective of not sitting down and saying, what is it that the equipment can do, which is what Bill Houghton had been doing, but—with the—the Honeywell equipment at NARA, but instead sitting down and saying, what is it—what’s the nature of the documents, and what is it going to take to have the full informational capture? That just struck me as a really—clever and—right way of approaching it, on the assumption that we weren’t going to be going back and—and being able to—do this again.
So, I get—you know, there was some—expertise that developed that—NLM—but boy, I’m not—I’d be hard pressed to say about any other—you know, they learned a lot about digital imaging. Of course, the other thing they were worrying about was—working with NIST and I think even funding some NIST studies on trying to making—measuring optical discs, trying to figure out their life expectancy. And—but—that was all good research work, and you know, you’d see exciting things, a guy from NIST would show up and show you how to change one bit in a JPEG image and the thing becomes unreadable because of—dealing with a lossy compressed format.
But on the other hand, wasn’t that misdirected. It was the sort of notion that the archives had about—with the Center for Electronic Records, was saying, we have to take all of our data and put it onto one of three approved tape formats that we think will last for a long time. Or, we want our microfilm onto silver nitrate film. In fact, not only put the things on tape format, NARA would only accept certain—storage formats to—ingest into the archives in order to copy them onto the approved tape cartridges.
And so there was a fascination with medium and sitting down and saying, can we come up with those 100—the CDs, the optical media that will last for 100 years? And I suppose out of it came a realization that none of these things were going to last for 100 years. And that you’re going to have to deal with constant changes in software and other things. But I think there was still that sort of hope that—that there was a—the Rosetta Stone that you could write this to, the—the Cuneiform tablet that you can write your digital bits to and stick ‘em to a closet and come back 500 years later and it will be done.
There were good projects that were done at the technology research staff—again, some failed projects. I think the most interesting project they did was they were trying to build a new—information system for the National Archives. And it never went anywhere, but the contractor that they had wrote some really terrific reports about the nature of archival work and how it was done. In fact, those reports were probably about as useful as anything I was reading in graduate school at the time. And so, that was good.
So we had—we had—Anne Kenney show up from Cornell, and that was a great aha! moment, but we couldn’t—NARA wasn’t ready to—commit to doing things with anyone else. I had Michael Ester from Luna Imaging come in, and he was just—I’m not sure he’d even started Luna at that point. But he had done that really interesting work—an article that was in—maybe a magazine called Rembrandt about trying to assess—how—art historians felt about digital images and quality issues associated with—digital images. And Michael was just one of the real—now there’s a digital pioneer for you, especially in the art field.
And—the—and then, when I was—I’d become sort of a—an internet—guru, I guess, for the agency, I was running our—BITNET accounts for people out of NLM. I was—Lee Miller at Tulane and I started teaching Introduction to the Internet for Archivists courses—that was—you know, how to get people—I can’t even remember classes—how to get onto the internet—but even ones where you’re having to teach people what is a mouse and—how to do that. And I guess, you know, you just say, well, yeah, but I’ve seen this at Xerox Park when I’m out at Stanford doesn’t everyone know what these things—are?
The—and using Gopher and—using Gopher. I was a big believer in Gopher. I still miss Gopher. I think it’s because of its hierarchical file structure it’s really well-suited for archival organization. And everything we wanted with Gopher. But they did say that—one day our systems guy said, well, you know our Gopher server can also deliver up the web. So—I said, okay, we’ll have a website then. And we’ll make our Gopher web accessible. And then we put together a committee to put together NARA’s first—website and talk about what should be on it. And I can remember people saying, should this be—preserved? And saying, oh no, this is not a record in NARA’s terms. It’s an ephemeral publication. Thank God, because I would have no idea how to preserve this.
And then we did a study with—we got money from NARA, from Bob Kerrey at Nebraska, the senator from Nebraska, to try to think about putting—digital technologies online. But we had to do it in conjunction with Nebraska, and we needed to make sure that it was going to meet the needs of—Nebraska citizens.
So I was the PI on a project where NARA staff went out and we traveled around Nebraska speaking to focus groups about if we were digitizing things, what would be useful for you. And of course the big—they wanted the Federal Register and they wanted to know about regulation and they wanted to know about lots of other things—and it was just—the National Archive is so huge, the problem of—trying to digitize anything in there and get it so that it’s really a high demand was—very challenging. I tried to push very hard to do digital imaging—for the census, but they were—that had to wait for another 10 years. Opening up census records was—is such a massive operation and having to do it so quickly that it was probably a little bit premature.
I can remember—meeting with the census people about—getting their records that were going to be stored at the Archives over the internet and doing it with FTP. And I said, oh, we could—make these things available and then having staff in the center for electronic records—no one would ever want to access the census data we have over the internet. You might look at a guide and find out what you want, but of course they’re going to want us to send tapes and—and so then I was really pleased when—a couple years later they could announce that they were making—I think their first set was maybe—the—Vietnam Death Record Index—available online. So it took them a little while but they—they came along.
I would do the introductory workshops and what is it and how people are going to be able to access it—and that soon everyone’s going to be having access to it and—NARA’s a very interesting place. There’s incredible—a large number of very talented, very bright people there. But they also tend to—let the scope of what they’re doing get in their way and—because the volume is so great, and maybe because they’re all—have been trained internally. They—and go—at that time, they would bring in historians and send them through a two-year rotation to train them in the NARA way. It didn’t really open itself up to a lot of out—you know, unusual thinking. And especially with technology. They were very, very conservative about how they—because they were primarily worried about not doing anything that was going to damage records or—their primary job was to be a custodian of these things. [Top] [Back to Interview Breakdown]
-- Challenges --
Two examples. They had a—database that was run by—I think it was called Narzay 1. And they had, like, three or four people on the Justice Department contract that—probably prisoners—key in data about the archives holding into this database, which was an ORACLE database. And—and I said, oh, well, this is great. If you—give me a—output all of the records into electronic form, I can index it using Ways—because I’d met Brewster Kahle at that point. And I think—I can’t remember if it was the second or the third Ways Conference we hosted at the Archives and—had Brewster there. We were running Ways the white-area indexing service. It was, you know, one of the first full-text search engines—that—on the internet, and Brewster’s second great technical accomplishment after his work with—on super computers or thinking machines.
But anyway, I said, give this to me, and we’ll index all of these records, which were—a kind of electronic guide to the National Archives and we’ll put it on the internet and people will be able to find what we have and isn’t that great? And then he said, well, oh, well, no, we don’t have any report that spits out all of the data from the database. We have—this database is only so that we can do print-outs on lined paper that can go into the reading room. And we have two different reports with this data in it, and to give a—to write a new report—because it was an obsolete database at that time—was going to take nine months of a programmer’s time, and why would anybody want to do this anyway? And so Narzay 1—it would—we called it the original biggest data roach motel because the data went in and never came out.
And the other one I had is, we were moving out to Archives II at that point and as big—you know, moving millions of boxes, tremendous logistical problem, and to help out they were sitting down and preparing labels to go on the outside of each one of the boxes identifying what was inside them. And so—to do that they would have to prepare a label saying, this is record group whatever and box number and usually a brief little description of the contents of the box. And they were doing this in a box label making piece of hardware that would then spit out the adhesive label. And I said, oh, well, this data would be wonderful. And if we, you know—give it to me and we’ll stick it in and make it available. Oh no, we didn’t save and of that. We just typed it in to make the labels. So, very bright people but sometimes doing challenging things.
There was not the idea of reusability. The assumption was that the—and no recognition that people were going to have—you know, disintermediate the process, that you could use technology and have people consult that—the—or that—if you were going to do it you had to build large complex systems. And so that’s when they were sitting down and they decided that they had done an experiment with trying to use the MARC AMC format through RLG to decide NARA records and decided that really wasn’t going to work very well, they needed to design their own systems and spent years doing that and are still—trying to do that.
And again, given the kind of unique nature of their material and—the volume that they have, perhaps that’s—there’s—it’s understandable. But it was still a little bit—frustrating. And I’m always the something is better than nothing—approach to life and—give people as much information as you can and—a charge along—the same way I was trying to convince them—to take—they have a database that indexes the archival literature, and I would say, oh, make that available online, and the—well, no, you know, we have it in this particular software and there might be problems with doing that, so—
And they—now, you know—on the internet in a full way doing it seriously. They have people like Steve Puglia who’s still set the standards in terms of what imaging is—appropriate. But by that point I had left to—leave NARA to go to Cornell and work as the first—assistant and later Director of the Cornell Institute for Digital Collections and starting to work with—Anne Kenney and Steve Chapman left shortly after I arrived but—then—well it’s 2010? Yeah. I don’t know, about 1996 maybe? Something like that?
And—you know, I really—having an opportunity to work with Anne Kenney was a real—plus. In the end I—I came to question a little bit—a fundamental element of her approach, and I’ll—I can talk about that. But I thought that—the fact that she sat down and approached things as a research question and did—you know, the kind of analytical work and then wrote it up and presented it—so her studies on—her first work on image quality, her articles, Steve Chapman on the metrics for image quality that you’ve read in D-Lib Magazine, their report on converting micro—is it better to first microfilm and then go to digital or digitize and then try to produce microfilm for it. Their early report on using Kodak photo CD technology as—a mechanism for color images, that’s all really—top quality stuff. And very, very impressive.
The—where I think Anne may have been a little mistaken is that she came from the preservation microfilming community. And her focus initially was to sit down and say, can we use digital imaging to produce something that’s as good as preservation—mic—quality microfilm? And then—so she then took the measuring devices and metrics from preservation microfilm, including the quality index—figure, to sit down and say, this digital image is as good as microfilm, and the preservation microfilm community was completely sold on the idea that a—preservation microfilm was an acceptable substitute for a book and would—capture all of the information content in the book.
In reality now, we realize that digital imaging is better than preservation microfilming. Yes, you can capture the words in the book, but you can also capture the color, a sense of the page, more of the artifactual qualities. You may not need that for all objects, but—but I’m not sure if trying to have a metric that used pet—preservation microfilming as the standard—in retrospect may not have been as appropriate.
But her idea is sitting down and saying, why should you scan at—38—36 bit color at 600 dpi— for a book and generate huge files when there’s no really—no significant information—more information in those files—that, you know, you could do it at 400 dpi in color or even in grayscale and get out what you need. That’s really—important.
And it was—part of that thinking about—not saying, what can the—the machines do and because the—machine manufacturer’s sitting down and saying, 200 dpi, no, now it’s 300, now it’s 600, now it’s 1200 dpi, now you can buy the desktop—machine for 24—that’s 2400 dpi. And it’s—and of course we always think that bigger is better, right? So then if I scan the 2400 dpi, that’s got to be—got to be six times better than a—a 400 dpi image, right? And so that’s—Anne’s really great contribution is—saying, don’t do that, think about the document. Think about what you’re trying to capture. And I worry that not enough people remember that lesson. That they just have kind of—either say, oh, 600 dpi sounds good, that’s what everyone uses, and forget why we came to that number— But there—it just makes a bigger file. And of course, the other thing that was an eye-opener is Anne is pointing out that many people at that time were having scanners that could only optically scan at 300 dpi and then we’re using software to increase the size of the file to a 600 dpi file. But that makes no sense at all, right? At most you’re going to do 300 dpi and—and store it that way and then—you can use algorithms to increase in size later on, but you have to assume the algorithms are going to get better and that after 10 years, an algorithm to change something from 300 to 600 dpi is going to be better than what it was 10 years ago. So—it was—for seeing what was a very small, very tight-knit group, they were doing really spectacular things to go along with.
Yeah. Well, and, of course, we were—I mean, right from the very start, back when I was at NLM, the—technical services librarian at—the National Agricultural Library of all places, put together a terrific conference on the application of scanning technologies in libraries and brought in all of these people. So she had the people from LC and she had George Toma from NLM and she had—I can’t remember if Anne Kenney was there—or not. They were doing a little project at—NAL—they had the people from Syracuse who were doing—a very strange project on—adult education—I mean, it was a fine project, but it just seemed to have disappeared off the map entirely. And it was a big project, funded with COLOG money. So it was—really a chance to get all these library people who were experimenting with digital imaging together at a very—early moment. Anne had an RLG conference at Cornell in—one March. I remember it was March because I was snowed out flying into the conference and snowed out flying out of it—and it was a—you know, forsythia were blooming and it was spring in DC and who knew that—six hours north that it was still in the middle of a blizzard.
It was the first time and only time I’d been to Cornell, you know, before I moved there. But that brought together a lot of the key players. And then RLG played a good role as well too in having the digital imaging conference on—especially with the emphasis—on photographs. We talked to John Stokes, you heard about the people who were—going at the archives in Seville. Cornell got involved with John Wilkin at Michigan with the MOA and made that a collaborative project, working with Paul Conway at—you know, it wasn’t Paul then, but at Yale—they were using the same Xerox class machines that Cornell was using—that were an early challenge.
Of course, that was proprietary software and it put things into proprietary format for the structuring information and we have one version where we can—got software from them that allowed us to break apart their structuring information and use it. But the latest version, all the structuring information is encoded and it can’t be touched at all. So that’s messy. You know, you learn those lessons the hard way. Always be able to export your data. Be careful about proprietary solutions, all those—
Even with all the images that were—the original Making of America project, that was done as a digital preservation project for brittle books to replace brittle books with new books printed off onto acid-free paper. And so the digital imaging was being done in order to produce new books. And in some cases, tipping in the illustrations from the original books into these things. And the images were then just stored on a server. And only after the fact did we sit down and say, oh, we could actually provide access to these images as well.
You know, the primary thought was to make the analog replacement for the scanned item. But even those images were on a server that—started to fail because it had been sort of forgotten about and it took a heroic rescue effort to get the images off of that and save them. And that convinced me that parking digital images away in a dark archive or in a preservation system is always a bad idea. That you need to have them be part of a live system, that as soon as digital data is not being used, then it’s likely to be destroyed.
And—and you know, there were other exciting things going on. As an archivist I was asked to be part of the RLG CPA task force on—digital archiving that Don Waters—headed up and had Margaret Hedstrom and me on it. And that was a terrific opportunity and really started us thinking about—we didn’t even call it digital preservation at that point.
In fact, I—speaking of digital preservation, I—gave a talk about its history at the—National Archives in the UK. And I said, now, when did that first—term first come up? And you could see that digital preservation initially—in the early ‘90s, and I think this is in—Stewart Lynd’s glossary, was using digital technologies to produce preservation copies, the idea that—of, just as we were doing, scanning to produce an analog replacement copy. [Top] [Back to Interview Breakdown]
-- Hindsight --
Well, I think—you know, the thing that I think we’ve done really poorly on, is two areas. One is the—turning the projects into programs. And that—Anne—early on identified that as an important priority. But—it still—and even though she identified it, it’s just really hard when—if you’re going to start with a project—including projects that in some ways were a little bit outside of the mainstream.
Certainly when I was running the Cornell Institute for Digital Collections, which was working on primarily—visual materials and we were doing things with the art museums, because Anne had done such a good job on text, I was focusing on images. But—and—even Anne’s projects, even though they started off with Lym Personias who was head of the systems office at—in the library, there’s still a certain part of this where this was all happening outside of the normal IT infrastructure and it was being done on soft money and the idea of trying to turn this into an—an ongoing program was—is difficult. And I’m not sure that we’ve ever solved that problem.
And then it is exacerbated as well by what I think is the—tremendous need to collaborate but also the—and to have people do it right—but also the fact that it’s so easy to do it. And that anyone can buy a scanner and set it up and start throwing some things and making a website and generating—files. And so it’s sort of—very democratic. But that leads to—I think a little bit of—insularity.
People get focused on sitting down and saying, well—what’s my special collection? I’m going to sit down and digitize it because this is really special to me. They don’t think about this as being part of a national construct. Anytime now when I sit down and hear that we’re digitizing—somebody X’s collection on—library X’s collection on whatever. You know, you sit down and say, but how is going to fit in with other collections that are like that, how’s it going to work?
I’m so disappointed—I was looking up a book that—we had—there was some question about—in the Internet Archive. I think I found nine different versions of this book that had been digitized—by different libraries in different projects at different times and in different ways. And we were so good with preservation microfilming, sitting down and saying, this person—you know, knows how to do preservation microfilm is going to take responsibility and master it and—and not duplicate efforts. And we just ignored all of that with digital imaging.
Yeah, I—and everyone thought that they should be doing that. Where we were really bad, and I’ve always thought that this is where the Digital Library Federation could have been doing more effort, is in trying to come up with the coherent cross-collection catalogue. Why is it that there is, as far as I know, no one place where you can find out if there’s—if a work’s been digitized or not so that, you know, you might—
But, you know, it’s not that—shouldn’t be that hard—that—we’re good at generating catalogue records and thinking about this. And there should be a way of sitting down ad saying, all of the things at Gallica has—digitized and—the European and in the States here. And instead you just have to know, this book—don’t I remember that Gallica for some reason digitized the papers of the Royal British Society and—and go look and say, oh yes, indeed they did. And again—is it because these were projects that were done without—outside of the normal technical services structure so that people weren’t thinking about—generating MARC records for them? But those are where—I think we have fallen down. Now, maybe there’s hope, you know, maybe they’ll all show up in WorldCat. But I fear that—Google by default will be the place where—everyone goes.
They’re—my worry is that—I don’t see so much of a problem with there. That, you know, if the books are gone, who cares? They’ve been digitized before other places and for other projects. There’s too many copies that are out there. So I really see that as being a bigger problem. It just worries me that—you know, you may have some historical society in Ohio that sits down and says, oh, we’re going to digitize—our local histories and put them online for our users, not realizing that they’re all in the University of Michigan and they’ve all been done for the Hathi project.
Well, you know, it’s—or looking for their local foundation or whatever. So there’s where I think—and, you know, the other problem we have is when you have these distinct little projects. What we’re really interested in is having mash-ups and interactions. And they’re in non-standing formats, we don’t have an agreed upon—a standard book reader—to do reference linking and to do other things and so—maybe that’s one of—it’s not so much a regret, as where the future’s going to be.[Top] [Back to Interview Breakdown
-- Advice --
So we’ve built—20 different—you know, there’s probably—20, 50, 100 silos of information around the world of digitized information. And now the issue is how do we make them talk to each other and interact and make them easy to use. And instead we’ve got technical things—because they’ve all been built without having that up front, there’s technical problems, there’s people who sit down and say, you know, we don’t want to have our stuff be intermingled with other people, we need to keep control over it. So—but I think that’s going to be the—the big issue. Otherwise, you know, people will just go—to Google and say, that’s good enough. It’ll be there and their 20 million scanned books and if I can only get a snippet of a few, well, so be it.
And how many, you know, image databases are there that don’t even have the metadata for the images catalogued by—or indexed by Google? So again, you have to go into the—database to find it. So—it’s that interoperability that’s—becoming the important issue.
Well—of course, the digital imaging workshop at Cornell started—long before the School for Scanning, and the people from School for Scanning came to spend the week-long seminar and learned what we were doing and then—steal it. I don’t—the—and—again, you know, Anne is just so terrific that she also did the online tutorial—the free online tutorial to accompany the Digital Imaging—you went through that?
And the same thing with the Digital Imaging—the Digital Preservation Workshop, when we did that to go along. You know, once again it was hard to keep the Digital Imaging Tutorial up to date, especially when we—stopped teaching the workshops and shifted over to the Digital Preservation—workshops. And we were also doing—editing RLG DigiNews—from Cornell. You remember DigiNews as the place to do that. And then when Bill Arms came up to Cornell, I took over as associate editor of D-Lib Magazine. So we were doing lots of—sort of outreach sorts of things.
Now, forget the audience, it was a wonderful learning experience for me. There was an interesting sort of—tension in some of the workshops because on the one hand, it was a mass presentation and they wanted to get, you know, 300 people into the auditorium, but on the other hand, we were trying to sit down and say, you know, if you’re doing this, and you want to do it right, it’s not easy. It’s not just buying a $30 scanner—they would have been a $300 scanner at that time—and popping along. But a good digital project took time.
We were—we had done—I can’t remember the timing now—if the niche best practices document for digital im—collection building came first and I was part of the group that was working on that, or if—School for Scanning—NEDCC did its manual that Maxine Sitts edited and—and I went over—and I came on board to do—sort of an editorial review of that before it went out. So, I can’t remember which one of those came first, but the stress was really upon how to do this right in most of the instructors.
But there were occasionally—and Roy Tennant would also jump up and say, oh, and go out and buy your $100 scanner and slap some images on and anyone can do this. And just go do it and—you know, there’s a certain amount where—he’s right too. So, you know, if you invest—no money into it, and you just want to do it and throw something up and make it be quick and dirty, sure.
And if you want to do it right—or, you know, have something of enduring value that’s part of the national infrastructure and everything else, that’s going to be—complicated. So—the—and it was interesting to see that, you know—I guess in the same way that—when we—when I talked about how when we taught the internet workshop and we might have to show people how to use a mouse or—I can remember the first time we taught the web, showing people how if you typed in some HTML code something would pop up. Hello world in formatted format and—the same thing with the workshops where you’d have to go from explaining what a pixel is and what a digital image—you know, all the basics, and seeing how—the audience became more and more—sophisticated as time went on. So that was good.
And—and so—and so then we were doing, you know, MOA2 was designed initially to—Making of America 2 was the project and it was to sit down and try to say, how can we structure—archival documents? And it’s an incredible problem to sit down and say, you know, if you have a document and you have enclosures with it—that are of a different date and there may be stamps or marginalia on these things, and how do you relate them and—if you have the same document that’s in an enclosure somewhere else in your database, do you have to scan them again if it’s identical—what do you do with it? It was—it was a project that was going nowhere and I thought was doomed. And McKenzie came along and rescued it and turned it into METS. And I just kind of said, wow. I—clearly it’s a good thing I’m not doing this anymore because there’s a lot brighter people than I am.[Top] [Back to Interview Breakdown
July 2012