Ricky Erway - Transcript

Ricky Erway
Senior Program Officer in OCLC Research
Interviewed 5/24/2012

[How did you get involved with the Library of Congress and the American Memory Project?}

In my last semester at library school, I applied to be an intern at the Library of Congress. Every year they have an intern program. And it’s really a great jump-start into the profession. It happened to be the year of the Gramm-Rudman-Hollings budget cuts, and they cut the intern program that year. So—and because of the budget cuts they had to lay off a lot of staff at the Library of Congress and then hire temporary staff to fill in behind them. So when an announcement of a job opening in the planning office at the Library of Congress came to the attention of my placement officer, because I had applied for the internship program, she knew I was interested and passed it on to me. And I had a telephone interview and then moved to Washington from Wisconsin. So it was kind of just a little stroke of luck that connected me with this great job. The planning office was in the—is in the office of the librarians, so my first job out of library school was in the sixth floor corner suite of the Madison Building overlooking the Capitol Dome and the dome of the Jefferson building of the Library of Congress.

And one of the things that I got involved in in the planning office was, you know, uses of new technology for the library. And it was in the final years of the library’s optical disc pilot project, so I got very much involved in that, assessing the success and so forth. In the optical disc pilot project, the—the big technological advances it boasted were a jukebox that would hold a hundred twelve-inch WORM discs, read once—write once read many, and also video discs, which would hold a hundred and four thousand analog images. So that was the cutting-edge technology. It was awhile ago.

And so as the optical disc pilot project wound down, in about 1987, James Billington came to the library as the next Librarian of Congress. And he envisioned American Memory as a way to get the champagne out of the bottle, you know, share the wealth with the nation. And so there were three of us who started prototyping, you know, what American Memory would be even as we were helping to digitize more special collections. So we started with some of the material that had already been digitized or imaged for the optical disc pilot project and repurposed that in the early prototypes and did a lot of, you know, one-at-a-time scanning so we could put together a prototype with a compelling story.

So the—this prototype was kind of hauled out every time there was a visiting dignitary, you know, from Steve Jobs to the Queen of England. And we would take this big cart of equipment, you know it had, well, it ran on a Macintosh using hypercards, so this was way pre-web. It had a video monitor, a computer monitor, the Mac, a video printer, a frame grabber, this thing that we used to get the analog images off the video discs and into digital form. It had a whole raft of equipment and we’d haul this big cart through these tunnels underground to the Capitol building to show it to congressmen in attempts to get support and funding.
And Dr. Billington was brilliant at getting funding. He really used the prototype to great advantage, got philanthropic funding, which up until that time the library hadn’t done all that much. Occasionally it received huge philanthropic gifts, but it didn’t really seek them. And he put together the Madison Council, did all this stuff to get funding for this project so that in 1990 we started a pilot project with 44 schools, colleges, and other types of libraries to test out, you know, how would people use it, how would they receive it, what do they need. And that ran for a couple years in the early ‘90s, and then in about 1995, the library got significant funding to really roll out the project. In about 1993, we started realizing that the Internet was the future for distribution of digital content. And we started putting a quarter of a million images on the library’s web server in—I think in ‘94 or so.

I left in 1995, ironically on the same day that the library started putting what would be 60 million dollars to use in what was becoming the National Digital Library rather than American Memory, although American Memory has always been kind of the crown jewel of the National Digital Library. So they’ve really staffed up since then. And I would say that digitization at the Library of Congress is now kind of an ongoing program, it’s no longer about that special project status, which is a great thing.

So at that time in 1995 I had started thinking about what I might do next. I—I had the best job in the world. I mean, there was no doubt, and I loved living in Washington. But it was my first job out of library school I thought I should maybe have another experience. And so I was starting to look around and realizing I didn’t have any management expertise. I kind of wanted to be in an academic library but it would kind of mean, you know, starting out as an entry-level reference librarian or something, and so I looked around quite a bit and thought OCLC Research would be a cool place to work and the Research Libraries Group would be a cool place to work. And I had kind of enough credibility to probably get an entry-level position in either one of them. And since OCLC was in Dublin, Ohio and RLG was in Mountain View, California, my sights were set on RLG. And RLG had, not to my knowledge, but had decided that they needed a digitization expert, and they sent a job vacancy announcement to my boss. And he handed it to me and said, do you know anybody who might be interested in this. And so I came out and interviewed and was hired.

My first project at RLG was called Studies in Scarlet and it was a collaboration between seven institutions in the US and the UK to digitize, I don’t know, 300,000 pages of textual material related to marriage and sexuality. You know, women and the law, marriage, sexuality in sort of a century timeframe ending in World War I. There I learned a lot about collaboration (laughs). And just how hard it is to motivate people to pull together when they all have different timelines, different priorities, different ways of doing things. It was a real eye-opener and a challenge. But you also find out all the great things about collaboration, where, you know, we’ve got one institution that’s further along in this area and the others can learn, and vice versa. And also pulling together these great collections on sort of one theme and having them all accessible in one place.

We also learned a lot about providing a portal to a digital collection. So RLG hosted this gateway to Studies in Scarlet. And it was—it was one of those early gateways and—fairly non-remarkable except, you know, we had to figure out how to represent Type TEI encoded text onscreen and how to relate pages of text with page images and how to use Metz wrappers to hold together all the page images with the text. So we explored a lot of different issues with Studies in Scarlet.
After that I was involved in the beginning of AMICO, the Art Museum Image C-O. (Laughs.) I don’t remember what the C-O stands for, we’ll just call it AMICO. And this was working with fifty art museums, maybe thirty, I can’t remember, to put their art images online. And this was an interesting project in that it was a lot about licensing and control. Nobody wanted to just put them up online because they thought that they would then sort of lose the control of their images. So it had to be you know closely contained and accessible only to people with signed use agreements. And we also included current—contemporary art, which had all sorts of licensing ramifications so we had to work with the licensing organizations to make those things available all within the subscription environment. And that later became CAMIO, which is still offered by OCLC today.

The RLG Cultural Materials was another collaborative effort just to take special collections. So this wasn’t on a theme, this was just give us your digitized collections, we’ll create an aggregation and people can make all sorts of interesting connections. That went a lot better because instead of saying, we want you to pick and choose on this topic, and digitize things you probably wouldn’t otherwise be digitizing, we were saying, whatever you’re digitizing, that’s great, let’s put it in this big aggregation.

There was a lot to learn there too, though. People always want guidelines. They want to know what the best practices are, they’re hungry for them, but they almost—to a one—won’t follow them. (Laughs.) This is especially true of metadata guidelines.You know, if you’ve got a collection of photographs, and you know three things about each photo, there’s no sense saying, okay, here are the six required metadata elements. You don’t have six. You’ve got three. In other collections you might have way more than what is requested in the metadata guidelines, and you kind of want to include that information, you don’t want to leave it behind. So people have what they have, you know, some things don’t have titles, some things don’t have creators. Who is the creator of a butterfly? Okay, God, but I don’t know if we have an authority record for him. I’m sure we do.

So getting people to follow metadata standards is almost impossible, and yet every project you hear about to this day, that’s one of the first things they do. Well, let’s, you know, figure out the metadata standards, let’s decide which are required elements, let’s see how we’re going to do mapping. Mapping of metadata is another one of my—stalking horses for, you know, kind of poking holes in all the things we’ve thought for decades. Which is, you know, as long as you use one standard, you can map to another, right? So it doesn’t matter what you do, if you use a standard, you can always map it to another. Which kind of evolved into, okay, you could map it to Dublin Core, you know, ten simple elements that have very few requirements, but even then, you know, things—some collections, they’re—what they use for a description is more like a title. Maybe they don’t have a title. Some—you know, some—like what is the subject of a painting is quite different than the subject of a book and they might be in different elements. So in the end, when you’re mapping different metadata schemas, you end up just dumbing it down to the lowest common denominator.

And this was never more true than with EAD. One of the RLG services was—is called Archive Grid, and we’re still working with this today at OCLC. But in Archive Grid, initially we were just working with EAD encoded finding aids. So you think, whoa, this is luxurious, I’ve got a really well-defined standard, I’ve got, you know, a practicing archivist marking up descriptions of collections in this standard. We ought to be able to offer killer access. So we start looking at it. Well, first of all, because most of these things are—were intended for use locally, most institutions don’t put their institution name in their metadata records, at least then they didn’t. So we had to add that. A lot of the titles of collections would be something like “Collected Papers: 1918 to 1952.” And somewhere else, in the description or something, you’d find the name of the person.

So we, you know, and EAD supports marking up geographic place names and personal names. But very few archivists did that. So if you were to create an index of geographic place names and make it searchable, people would search in that index and not find probably 90% of the content that really matched their search. So we ended up doing things like offering a full-text index but calling it Geographic Place Names, so that people would be reminded oh, I can type in Philadelphia, but we were just doing a free-text search. We did that with a couple different elements.

So in the end, we had—the only thing we could count on to absolutely be there were the institution name, because we had supplied it, and some sort of unique identifier, because we had supplied it (laughs). And nothing else was really dependable. So we ended up just primarily offering free-text searching. And this, you know, in the sort of category of what would you do differently, I think, you know, for my whole career of working with providing access to digitized collections, I would spend more time thinking about how to provide useful feedback from a free-text search.

When you’ve got all the text in books, when you’ve got great descriptive—great descriptions of archival collections, lots and lots of words, why not let people use those words and then focus on ranking their results, offering faceted browsing through the results, finding ways to extract from all those words, you know extract personal names, extract subjects, we can do all those things now, and I, you know, wish we had thought more about that along the way.

-- Challenges --

[What were some of the issues you faced trying to digitize these things, since you weren't working with any standards at the time?]

Well, one thing that I think we realized early one was that you can’t be expert in everything and that there are experts out in the world and you should take advantage of them. So one of the things the library did even in the optical disc pilot project was outsource some of the imaging, and fortunately they outsourced to Stokes Imaging, a company in Texas who—they were asked to provide the analog images for video discs of posters and broadsides and cartoons and—but mostly photographs of like, the WPA photographs and so forth. And so they were asked to proved digi—analog images on video disc, but in the process of capturing them, they captured digital and wrote it out as analog and he saved the digital. So that when the library realized okay, now we need digital to put it up on the Internet, we went back to Stokes and, you know, for a relatively small price, got the digital images. So that was kind of a lucky happenstance.

But in the early days, I mean, we—it was kind of early in SGML markup and the text encoding initiative, and we sort of adapted TEI to work for our texts. We did a lot of creative outsourcing. One of the—a collection of books about the westward movement to California was outsourced to—I think it was called Unicore, which had women in a federal prison in Lexington, Kentucky doing the keying of the text of the books. And they would—you know, with OCR you can usually get 99.95% accuracy and they said they could do that too, because they had each—they had two women key the book and when the second woman did anything that disagreed with the first one, a little alarm went off and she had to rectify it. So they double typed all of those books, so that was creative outsourcing. It was a good deal because, you know, it was a federal program.

We—what else—we were looking at scanning from microfilm; kind of early on saw that that would be a quick way and a easy way to outsource is by having microfilm scanned. In the beginning with the Mac and hypercard, it was pretty much bi-tonal. We were doing black and white scans, often of half-toned images from publications, so you’d get these crazy Marais patterns so we had to do a lot of experimenting with that sort of thing. Metadata issues—most of the items in special collections hadn’t been individually cataloged, so you know, what kind of description was needed. I should give a lot of credit to the people in the Prints and Photographs Division, who were in many, many ways, the real leaders at the Library of Congress. Elizabeth Etsy, Betz Parker, Helena Zinkham, other sort of luminaries of the time. And they were really out there, ahead of things with both imaging and metadata for images.

One challenge I think most institutions face is dealing with short-term funding. You know, as long as something is treated like a special project, it means you’re adding on all this digitization effort on top of their normal jobs and you know, expecting a lot out of them, usually on a tight deadline, and that causes problems. Also for a grant funded project, they’re usually covering digitization and some form of access and then they’re gone. So, you know, access, both the digital content needs to be preserved, and that’s more than just keeping a backup copy, it needs to be migrated to new media and so forth. But also that access needs to persevere and that needs to be updated almost as much as the content. So when the grant funding runs out and there’s no more attention put to something that was digitized and made accessible, often they just disappear.

I would say another sort of thing I learned was, in the early days, portals made some sense. You could publicize them and people would look, kind of because it was a curiosity. We’re no longer there. The Library of Congress and the Smithsonian might be able to create destination sites, but the rest of us should really think about how to get our collections into Google and into, you know, the places that researchers and citizens are likely to look, so this idea of building these handcrafted beautiful portals is really sort of—that time has come and gone.

One of the sort of big changes in my thinking has been about digitization for access rather than for preservation. Even in the early LC days, even though we were scanning in relatively low-res, and you know, kind of making decisions like scanning it black and white instead of grayscale or color, we were thinking about doing it to preservation standards. And those standards change, and a lot of the stuff had to be redigitized. And I think we all have to assume that, you know, we’re not going to do it once and for all, things will change, you’ll do it again. It’s amazing to imagine because you know, you can’t digitize everything and the thought of digitizing something again is—it sounds not very practical. But it happens.

But in special collections, where you’re going to preserve the original collections, maybe we can just start thinking about digitizing for access, making a good enough copy to improve access, and then putting our efforts towards preserving the originals. I wouldn’t make that argument for books—and you know, published materials there—it’s kind of like, let’s get one really good copy, make sure it’s stored in a lot of places, and then store the physical copy in one or two of them in safe places, but for special collections we know we’re not going to throw these things out, so let’s try to do more digitizing, provide more access, and step back from feeling like we’re only going to get one chance to do this and it has to be to the nth degree. Because that’s just slow, expensive, and not very productive.

In 2006, RLG and OCLC merged, and I became part of the OCLC research staff. And one of the first things we did was host a meeting about digitization of special collections. And out of that came the essay that Jennifer Shaftner and I wrote called Shifting Gears, which really was addressing that, you know, RLG and OCLC for years have been talking about preservation quality, best practices and so forth, but for special collections we now think access is more important. So it gave, you know, a couple—seven points that were kind of, this is how thinking has changed. And that helped people, I think, because we were the ones always putting out these guidelines and best practices that helped us say that, that maybe we could lighten up a little bit and try to get our stuff out there where people will use it.
And since then I’ve done more work with digitization of special collections including looking at how to address rights issues for unpublished—collections of unpublished materials. And we assembled a group of lawyers, archivists, experts in the field and kind of came up with what we called “well-intentioned practices.” And said, let’s just make this a community standard; if you adhere to these approaches, you’ve done a real good job, and if you’re challenged, you can say, well I, you know, I’ve followed the community norm and, you know, they have a good take-down policy and you take things down if the owner comes forward. I’m talking about for things where you can’t identify the copyright holder.

I’ve also done some work on rapid capture—of special collections. So, you know, now that Google and Internet Archive have come up with those great book scanners with the foot pedals and have really made that a lot faster than it ever was before, what can we do with special collections? And I—interviewed or visited several institutions where they’ve really kind of done a breakthrough to really push through the digitization of different formats of non-book materials.
I’ve also done a little bit about sustainability of digital collections, you know, what it takes to keep one going after it’s all said and—after it’s all digitized, described, and accessible. What does it take to keep that alive? So it’s all kind of, you know, built a lot on experiences and things I’ve learned along the way. I expect a lot more learning in the future. So more challenges. You can’t know everything. There’s so much to know, there aren’t a lot of right answers, there are a lot of smart people, so use them. A lot of them won’t be at your institution, so think about advisors, consultants, outsourcing. There’s a lot of ways to harness expertise, and you know, no one can have it all.

At the beginning of every digitization project—say it’s a two-year digitization project, you won’t have much to show until close to the end of the two years, so build a compelling prototype, you know, get yourself through the lean years. It’s great to have something to show while you’re doing all the hard work that it takes to come up with a true end result, so the compelling prototype is a really wonderful thing.

For Cultural Materials we had—you know, prototypes that were basically, we just went out and found things, got permission, where necessary, scanned them, to tell a story, you know, so if a user was interested in this, he’d get all these results and it’d be one of each different format, from different institutions, and you know, it was a great story. It took probably another year and a half before we had actual content to try to make another compelling story. Part of sustainability I think is realizing that digitization is what you do, it’s not a special project. You know, it should be part of what the institution does. And, you know, building it in so you’ve got IT support, you’ve got funding commitments and so forth. It’s almost got to be in-house, you can’t just keep going for grants to make a new interface or to, you know, revisit the metadata, it’s gotta be part of what you do.

As an aggregator, there’s many different approaches to keeping an aggregation alive. And they supply to more modern day examples like disciplinary repositories of scholarly research articles and so forth. But they all have—either no business model or a business model, and business models can vary quite a bit, from subscription access, charging the contributor, having some sort of premium service, so anyone can search this database, but if you want to have email updates as content is added or if you want to have any sort of personalization, that costs something.

Ideal if you can get an indomin or ongoing government funding. But really, I mean, nobody likes to think about paying for access to information. And I think you know, archive, the physics repository at Cornell, has a good approach of saying, look, you 200 institutions are the biggest users, here’s your share of it, you pay up because you use it a lot, and everyone else can access it for free. I mean, that’s—that’s a compelling model, I think.

So we tried a lot of different things with Cultural Materials, including creating a service called trove.net, where we put smaller images of all the image content on an open website and offered licensing. So if somebody saw an image they wanted, they could pay for a higher resolution version of it. And that—the free service was popular; there wasn’t a lot of licensing. And part of that was that if we had taken the 3,000 “best” images from Cultural Materials and put them in a licensing site, then it might have gotten more attention. There was, you know, a lot of drac. In any collection there’s you know, just, you know—not all the images were great. And it wasn’t compelling. It’s not like a Corbus or a Getty Images. So that had limited success.

There just wasn’t that much interest in licensing images from library collections. But it could be done in a different way and probably could be successful. We wanted to make sure that every institution that contributed was included. So while we tried to partner with Corbus and Getty, they were not—they would only be interested in, you know, picking and choosing—cherry picking the collection to have things to include in their own offering. So we ended up partnering with a sort of middle-tier imaging vendor—image licensing agency. So we knew we didn’t want to do it ourselves because the licensing agencies have all of this marketing—they know who to contact at different publishers, textbooks, all these different uses for images. And for any institution or organization like RLG that was just too much. We didn’t have a voice, we didn’t have a way to reach that audience. So I think we were right to partner with someone and—and it just wasn’t a perfect fit. We tried a lot of different things to try to make it sustainable, and in the end, it wasn’t.

I mean, standards that I heard recently—standards are like toothbrushes, nobody wants to use anybody else’s.

Every project seems to think, mine is different, so I need to come up with new standards. And new standards, multiple standards, it starts to not be anything related to standardized. To me it seems like the digitization standards have kind of settled down. There are answers. The metadata standards, not so much. And I lobby for free text access and doing smarter things with the descriptions and the words that are in the materials themselves. And, you know, it’s not about offering a great interface. It’s about getting it into Google, getting it into larger aggregations and it could be that the National Digital Public Library ends up being the sort of massive aggregator for our materials. If they can create a real destination, then I’d say focus on what you need to work nicely in the Digital Public Library of America.

-- Hindsight --

[Knowing what you know now what would you have done differently?]

Well, I—you know, there’s access quality versus preservation quality—the—improving full-text access to content. De-emphasizing portals in favor of getting the materials where people are more likely to be looking. And I would say, you know, I’ve probably been more tentative than I needed to over the decades. You know, I would be bolder—people—not everybody—there aren’t answers to every question, and no one knows all the ones that there are answers to, and—you know, once you’ve been doing it a little bit, you’re the expert, go ahead. And experiment. And accept that you might have to do it over. I’d say, be bold.

-- Advice --

[What advice do you have for us students going into the library world?]

Every project seems overwhelming and you’ve got to think about them in chunks. Probably in the beginning you need to specialize. What chunk are you going to take on? Are you going to be a metadata expert? Are you going to be a technology expert? Figure out where you niche is and develop that. I think doing anything to stick your neck out and distinguish yourself. When I was in library school there weren’t that many technology courses but I took them all. So I knew about video text and Minitell and interactive television and all sorts of things that have gone by the wayside. But because I specialized, I think that made me more noticeable and I was able to get the jobs that I did. One little decision can cause so many things to happen you can end up somewhere you never could have imagined. So make weird decisions and push yourself. I mean that application for the Library of Congress internship was such a tiny thing, but it really determined the direction of my career. Differentiate yourself in whatever ways that you can. Stick your neck out and be noticed and be amazing.

[Is there anything else you would like to tell us?]

One of the big changes that I have noticed in my thinking is digitization for access instead ofpreservation. Even in the early LC days we were scanning at low res and making decisions like scanning it in black and white instead of grey scale or color we were thinking about doing it in preservation standards. Those standards changed and a lot of the stuff had to be re-digitized. We all have to assume that we won’t be able to do it once and for all things will change and we’ll have to do it again. It’s amazing to image because you can’t digitize everything and the thought of digitizing again sounds not very practical, but it happens. In special collections where you’re going to preserve the original collections, maybe we can just start thinking about digitizing for access. Making a good enough copy to improve access and then putting our efforts towards preserving the originals. I would not make that argument for books and published materials, it’s more like let’s get one good copy and store it a lot of places and store the original copy in one or two safe places. But for special collections we know we aren’t going to throw these things out. Let’s try to do more digitizing and provide more access and step back from feeling like we are only going to get one chance to do this and it has to be to the nth degree, because that’s just slow and expensive and not productive.

July 2012

You are here

Ricky Erway - Transcript