Oya Rieger - Transcript
Associate University Librarian for Digital Scholarship and Preservation Services, Cornell University Library
-- Beginnings --
I came to Cornell right after getting my library science degree and actually prior to my library science degree I was in a PhD program in a public policy doctoral program, but I decided to—stop with my master’s degree because I discovered the libraries as I was doing my PhD work. So as I said, I came to Cornell almost 20 years ago. And I will tell a little bit about what I do now and then take you—to not necessarily all 20 years of detailed explanations but as it relates to digitization.
Currently I’m the associate university librarian and my program area is called Digital Scholarship and Preservation. It’s a rather new program area within libraries and digital scholarship is definitely a bit of a broad kind of a catch-it-all phrase that we have been using over the last several years. And basically, just to kind of characterize the program I’m seeing, libraries—you know, used to really focus on organizing and delivering information and supporting the use of information and the last ten years or so what we are seeing is libraries are also actively participating in creation of information and especially working with—faculty and other researchers as they are, you know, creating knowledge, transforming knowledge into information, publishing and so on and so forth.
So the programs I see are kind of at that balance and also a little bit closer to working with faculty in managing the content they are creating, so it could be digital repositories, digitization of their content or digitization of content to support their teaching and learning, which means based on what they are selecting. Usability issues and publishing. And of course the big—in a way—you know, the broader program area of scholarly communication—how changing, how creating and sharing information is changing. So that’s where I am now.
And within my program there is still a group called Digital Media Group engaged in—providing—supporting digitization programs all the way from—oh, goodness—in-house digitization to outsourcing to copyright and metadata all the way to delivery. So I also have a unit with digitization services. So as I said, I came to Cornell 20 years ago and what really attracted me to the position, the first position, was—actually the position was called Numeric Files Librarian.
And it really was kind of a pioneering position at libraries because it was basically to manage data collections. And when I look now at the increasing importance of research data programs, actually I’m really amazed that Cornell, 20 years ago, felt the need to bring a data librarian. But my scope was more—social, economic, and agricultural data. But it’s about basically working with faculty—especially with census data in helping them in, you know, extracting subsets, analyzing subsets, of course, GIS.
So as I was involved in this program area, just due to the nature of my program, I started working with the USDA, different USDA groups, especially groups that are—that were creating economic data, economic research data. And that—we started actually experimenting with repositories. USDA Economic and Statistics Service—that was actually one of my initial projects, and basically the goal was to—work with the USDA and receive their reports and that the library manage them—we started with the FTP and then there was Gopher and then there was Telenet and then there was ????Web distributing the information but also supporting the information, it was about, you know, accuracy, timeliness of service promoting the service, helping users, so on and so forth.
And that really pretty much led me into the digitization because soon we were not only talking about providing access to these PDF files sent to us from the USDA agencies, but also digitizing existing files. And as my interest increased in digitizing primary materials, I started working with Anne Kinney, also, who was at Cornell. And that actually she—pretty much pioneered in that area in means of digitization and I joined her—by the time I joined her, I believe she was two or three years into her research into digitization.
And—actually my job changed drastically when I joined Anne’s unit because up until then, the first four years, I was working with statistical data, I was working with the USDA, I was a selector for datasets and I was also doing reference work instruction, but after I started working with Anne and we actually had a unit called Digitization and Preservation Research Unit, my job became almost 100% research. So these were the early days and that—our research involved understanding the characteristics of primary materials from books to maps to 3-dimensional objects and understanding how they need to be captured in digital format, file formats—you know, many specs from resolution to bit depth or color accuracy, and interestingly, our starting point was—experimenting with digitization to preserve a rare and special materials.
[Top] [Back to Interview Breakdown]
-- Challenges --
In means of—digitization, I would say—one challenge that still continues is, and I know there are some libraries that are proud to be an exception, but many libraries I know, including my library, suffer still from digitization being—kind of a latecomer to the program. We still—several of our digitization staff are still on soft money, we still—very heavily rely on soft money with grants and so forth, so that is—that also applies to some other programs within the libraries that were developed within the last ten, fifteen years. But libraries I think, especially because they moved some of these programs from experimenting with innovation to all of the sudden production—organizationally streamlining them, finding a sustainable infrastructure for them, is—was challenging and it’s still challenging form Cornell’s perspective. So that’s one area.
The other area is, as I said, there was—it was complex in a sense that we ended up creating our own team because, you know, catalogers didn’t know about metadata then. The IT folks only worked with library management systems, they didn’t understand about image management systems. So luckily, I think we are doing much better now, that it’s much—I guess I would say that it’s definitely—more streamlined in means of distributing responsibilities—you know, we have traders who are in a way selecting materials, we have—as I said our cataloging unit is very much engaged in metadata creation, IT is storing files. I think I’m seeing a more better division of labor and it’s not as—as—silent any longer, but that was definitely—probably a challenge to start with.
So one challenge we have is that—at Cornell, as I said, I’m afraid many libraries are in the same, you know—are facing the same challenge—we start an experiment, we get money, but then they are kind of orphaned and we move on. That we don’t have the same—and I—as I said, I by no means want to generalize it, because, you know, like Blake Archives, there are some libraries that they said, you know, this is our niche, we are going to create—or—in Carolinas, with North Carolina history and so on and so forth. I think some libraries have been very successful in carving out a subject domain and seeing it as one big program. It did not happen at Cornell. We still have probably 40 collections that loosely tie into each other from a digitization perspective.
So I can continue but let me stop there and let me talk about challenges from a standardization perspective. I mean, you know, I don’t know if I can put it out there as a challenge, but it takes years. It’s usually a broad collaboration—international collaboration. It’s very groupthink, you know, it’s consensus building. It takes a very, very long time to come up with standards, and that—also I think maybe what makes standards work well is that they are tedious and they are detail-oriented. But also, I think some of these standards, after they are developed, the challenge becomes, you know, how do you integrate them with the practice so that they are not like—for instance, I think I spent almost five years working on standards for—for image quality control. You know, you kind of—you look back and then you see that some of those were very nitty-gritty. That they just don’t blend well with—the fast-paced, kind of low-resourced, you know, work front. But just in a nutshell, I don’t want to go on and on about your question so that can ask other ones. As I said, the first one was looking at it from an organizational perspective, more holistically. So that program emerges with a nice foundation rather than this kind of add-on thing. So there are certain things I would be doing differently, but not me. I think it was—it was more the role of administrators then. I’m an administrator now, but when I was very heavily engaged, I was a librarian, you know, I had a line, a research line that I had—research—you know, I had academic freedom in means of research and—you know, production of my scholarship, but not necessarily form program perspective within the library.
And the other issue is—I think—we tried but probably we could have put more emphasis on usability issues and also on issues related to—establishing very strong partnerships with faculty and researchers so that they are enduring relationships. It is not asking them, oh, we are digitizing historic map collections, which books we should be digitizing, but much more engaging questions, you know, how could you be using these collections in your learning, can we do it anything innovative that would help you with your research activities, you know, kind of more embedded in research learning and teaching. I think that was still the base of—you know, libraries manage information, you give it to the faculty, they consume it. It was not as interactive as we are looking at it now.
Probably, yeah, I mean, I would say each project, but I’m thinking about, you know—like two—as I said Making of America is one, but also, with the funds form IMLS, we—had a two-year project on southeast Asia travels. It was a fascinating project. Because we digitized primary materials documenting the stories of—first western visitors who are travelling in Southeast Asia and Southeast Asia from their perspective. That was a fantastic collection that we could have expanded, could have, you know turned into a curriculum or—it would have engaged more faculty. I think, again, we took two, three years, we digitized, it’s still a beautiful portal—the collection’s very nice, but frankly, I don’t know, if you look back then, if it’s being used, who’s using it, what’s happening, so—it’s a bit disconnected in means of, you know, you finish it, you move on. And that I think we are still weak about connecting it with the learning-teaching-research environment sometimes.
And I actually, I would say every year we have at least one big project, all externally funded. Very often IMLS or NEH. So we continue, I would say, with the same speed. And also we work closely with Google—we did work with Microsoft and Internet Archive, and we continue to actually send books to Google for digitization. At a lower rate, but we do. [Top] [Back to Interview Breakdown]
-- Hindsight --
So basically digitization was seen as a reformatting tool that—you know, there were two purposes. One was to protect the originals. So that if there was a digital copy, it would be an intermediary for users to look at so that the wear and tear on the original material would be reduced. And also it was seen as access to core historical material for the world. Some of the first collections we worked were math, agriculture from Mann Library. So we did collaborate with the history of actually Americas. So we really saw digitization as a way to unify, in a way distributed primary collections that are historically important, that form the canon of any given discipline. As I said, we did, especially focused on math and some historical areas and Americas in means of 18th century America.
Actually, frankly, you know, in a way, we have witnessed—you know, quite a bit of advancement in technology in the last fifty years, but on the other hand, the environment was not terribly different than what we have right now. We had scanning devices and some of them were very high end; some scanning devices now got faster or—they maybe have better image quality capture, but even then we had access to some—we collaborated with Xerox especially and they had some very high end scanners and we had computers by then—you know, we had broad access to the internet. So frankly, clearly now we have better tools, faster computers, more storage, but I wouldn’t tell you that the technical environment was inferior to what we have right now. It’s more, as I said, now, more powerful, faster, a bit better, but very similar technical environment. Yeah.
The difference was—and I actually—I really think it still lingers, this difference that I’m going to explain to you. I think libraries—you know, just—I have been in the libraries for the last 20 years and I feel like librarians have this—I mean, this is my own perspective, but they seem to have this—kind of perpetual anxiety about the future and the role of libraries. And that—you know, it was there 20 years ago, and then when we began digitization, it was really seen as a new role, you know, the role of libraries would be partnering in digitizing the historic collections we have and making it accessible to the world so that we connect the world.
So I think there was a little bit of ideology in that work to elevate the role of the librarians. And especially elevate the—increase the—or in a way, rebrand librarians as technologists, too. I remember spending long hours trying to understand how scanners worked or we worked with image scientists from Kodak to understand—you know, kind of optical issues associated with resolution or bit-depth and color accuracy, color spaces. So—you know, working with color engineers and so on and so forth. It was definitely seen for us as librarians again as kind of pioneering and trying to, in a way, move library’s agenda into an innovative area.
And part of that feeling was, in a way, we were not mainstreamed. You know, we had our own group, own community, and I think it’s probably decreased—it has decreased quite a bit, but there was almost, like, a gap between, you know, traditional librarians and now this new age digital librarian. I remember even within my group, for years we had our own technologist, we had a metadata specialist, you know, we had a scanning technician, so on and so forth, that we were almost seen as, like, a new shop, a new operation.
And I would say probably, that approach was one of—it really turned out to be an impediment in a way, because—we didn’t really approach digitization as a research and development work that we would explore it, we would understand it, and then we would mainstream it. You know it started as research and before we knew it, it was production. Because we were doing things. And then we started forming our own team. It took years to really take this kind of broader stand-alone functional group and integrate it with the library. It took us years to really move the responsibilities of the metadata librarians within cataloging. Or maintenance of image databases to the technologies—info technology unit, just because as I said, we were—we didn’t have a very strong identity. We were neither Research and Development nor Production, we were in that kind of gray space.
And—so I think I spent almost 8 years very intensely doing research, conducting research on digitization, all the way from as I said, understanding how materials should be prepared for digitization to how they should be digitized, how they should be stored, delivery mechanisms. And a very gratifying and enjoyable component of those years for me was doing a workshop at Cornell. It was called Digital Imaging for Libraries and Archives. We ran it two to three times a year. We brought—you know—a couple of dozen—it was a rather small group—of archivists, librarians from all around the world, and it was a hands-on workshop, a weeklong workshop that we took them all the way from selecting materials to digitize, digitizing, doing quality control, understanding color spaces, creating metadata. It was a hands-on workshop.
We ran that for—oh goodness, I want to say five-plus—at least, I was involved with it almost for five years. And then that workshop really evolved into a digital preservation workshop, which began at Cornell and then moved to ICPSR at Michigan. But—but what I really want to emphasize is that I think it was really the excitement of the internet and all these digital tools and the librarians impressed and—exploring new roles and new programs for libraries. So it was really a hybrid of experimentation, research, and production.
And I think, you know, being there fifteen years ago, it was extremely, you know, enjoyable, for instance, to work with JSTOR as they began to work with them in understanding how the journal articles should be scanned, their illustrations, technical requirements for their illustrations. So I felt like it was, in a way—I mean, obviously, it’s not necessarily me, I was a part of a really—a group of archivists/librarians, an international group, who influenced. But when I look back, what we really have done, you still see the—in a way, influence right now, even though no one is talking about digitization or delivery requirements, but they are still using some of the formulas that we developed fifteen years ago. Again, I want to emphasize, I was a partner, but it was definitely a broader group, an international group, involved in these standardization efforts. I would say, you know, if I look back, that was the most gratifying part of my career, to be a part of that standardization effort.
We were pretty early adapters of DLSX from University of Michigan for delivering books. So it was—our, you know, effort to make sure that we are ahead on the digitizing and storing of these materials, but they are delivered to users and they can look at them, they can zoom in, they can zoom out, we OCR them so that they can go to a specific page or they can search by author.
And interestingly, maybe one technological impediment for us—was the network was slower. So for instance, delivering large images, high quality images was more challenging then. So—you know, we tried to—we actually experimented with compression to be able to reduce file size so that the files are delivered faster. OCR definitely was an important part of the work, especially OCR accuracy and OCR being used for keyword access. So you know, I can go step by step, but if I kind of fast forward it, you know, this work evolved and then—then the next phase of my career in means of digitization was probably working with the Internet Archive, Microsoft, and Google, in kind of more—broader collaboration initiatives. [Top] [Back to Interview Breakdown]
-- Advice --
I will just—I will just make some observations, but not necessarily in an order of priority, just as they kind of—appear in my mind. One of them is, you know, whatever we do, I think, especially now, it’s more important than ever to establish collaboration, especially with academic units. That’s, I would say, number one. You know, connect the dots and—especially with, for instance, digital humanities, I know that—master’s and doctoral students in many humanities still very interested in digitization or other digital collection building activities.
The other probably, advice, is life-cycle management that, again, whatever we do in libraries, especially with digitization programs, it is not about stating up, it’s about sustaining and maintaining and developing and phasing. You know, you select, you digitize, create metadata, provide access—you digitize more, you add, you change the interface, it’s really is—it’s a really—kind of a—you should see it as a living project that needs extension, assessment, understanding how it’s being used. So that’s probably my second advice.
One is that—as I said, usability and integration with the academic and research scholar environment. Second thing is that recognition that these projects have a life and that we need to attend to it. I continue to be disappointed that libraries were not able to create a broad partnership to digitize certain materials altogether. That—you know, I—before Google and Microsoft started digitizing books working with libraries, I—I think I was involved in conversations for—goodness, maybe ten years, especially in the framework of the digital library foundation, that’s probably one of my biggest disappointments about being a librarian, that we couldn’t pull it together, that we couldn’t establish a—you know, we all were spending that money to put together, we would be investing the same money Google is investing today. But it just didn’t happen. I would have liked—I think one scenario in my mind was—libraries and archives and education institutions having very strong ownership of public domain materials, digitizing them and having strong ownership over them. And then if you know, Google wants to digitize in-copyright materials, they want to take risks and so forth, I think I’m very open to it and I think we all benefit from it in a way. But as I said, I would have liked to see a strong library collaboration and a corporative digitization strategy for all the public domain materials so that we have full ownership. [Top] [Back to Interview Breakdown]