Kenning Arlitsch - Transcript
Kenning Arlitsch
Associate Dean Information Technology Services, Marriott Library, University of Utah
Interviewed 5/22/2012
-- Beginnings --
[What was your first project?]
Okay. So I—I’ll start with the Mountain West Digital Library. I was running—in 2001, I was running the Marriot Library Digitization Center, which was a fairly new department at that time, and we’d been operational for—I don’t know, probably a year, year and a half. I had initially just one part-time—one part-timer helping me and then gradually got some more staff.
But sometime in 2001, Max Evans came to visit us. He was then the director of the Utah State Historical Society and Max had a small LSTA grant to digitize about 200 glass-plate negatives from the State Historical Society’s collection. These were beautiful negatives, 8x10 inches in size and some of them were even 11x14 inches in size. And the grant proposal that he had written for LSTA specified that he was going to send those glass plates—200 of them—to the Nebraska State Historical Society because they had good digitization facilities. They were going to scan the negatives for him and ship them back, and at that point he was kind of stuck. He didn’t know what he was going to do after that, how he was going to make them accessible to his constituents.
So we had recently—in—a year or so prior—had purchased—what’s now known as CONTENTdm. At that time it was just known as Content. It was a research project at the University of Washington led by Dr. Greg Zick, who was a faculty member there. And his group, called the Center for Information Optimization, developed this software for storing and retrieving digital images. It was a very—it was—much more rudimentary software then than it is now. And I think we were only the third or fourth customer of the University of Washington.
So we were talking to Max and it occurred to me that not only could we help him scan his negatives, thereby eliminating the shipping costs and possible breakage by going to Nebraska, but we could also give him a piece, the client piece of the CONTENTdm software that he could install on computers at the historical society. And then we would give him—space on our CONTENTdm server that he would—have administrative rights over. So it was—it was an idea, it was a plan, we tried t out, it worked spectacularly. Instead of 200 glass plate negatives he was able to scan 400 because—because he didn’t have to pay the shipping costs to go to Nebraska. We scanned the images, gave them back to him, he up—loaded them into the CONTENTdm client, and then added metadata and uploaded them to our server. So we did 400 of those. Then partway through the project he got an NEH grant and the project suddenly exploded to over 10,000 images. The Shipler collection is large and—and very rich, so, it also then started to include panorama negatives. Some of these were as long as 90 inches. And we scanned them in sections, seamed them together, and did the same process.
So while we were doing this, it occurred to me that this model could be expanded to other institutions to support other institutions. And then in talking with Greg Zick at the University of Washington, I learned that they had—they had piloted a—a software for CONTENTdm that would allow them to search multiple CONTENTdm sites at one time. And it was called the Multi-Site Server. So—we tried something else. We got that, we installed it—I wrote a proposal to the Utah Academic Library Consortium in late 2001 and ran it through the UALC digitization committee at the time.
And we got some—some basic funding from UALC to establish digitization centers at four universities in the state of Utah. Utah State University, University of Utah, of course, BYU, and Southern Utah University. And then we ran the multi-site server here at the University of Utah and we—what it did was just harvest metadata from each of those CONTENTdm servers. So the model that I proposed was that each center would not only digitize their own—collections and put them on their CONTENTdm server, but they would also support and host other institutions the way we had been doing with the state historical society. And that’s how the Mountain West Digital Library was born.
So from those four initial—centers we have now grown to—I should have my numbers straight—I think about 17. And more than 60 partners. At this point we have over 650,000 item records, which translates to several million digital objects. It’s still a very distributed digital library—and we only harvest metadata back to our aggregating server, and that aggregating server has changed at least twice more since that original multi-site server. After the—the multi-site server could only harvest from CONTENTdm servers. So when we started to bring in other—digital asset management systems, we switched to an OAI harvester. And at first we used an open source harvester from the public knowledge project, and more recently we’ve scaled up to Primo from Ex Libris. So that’s now our harvesting mechanism and our search interface. And I just want to show you quickly—maybe—there it is. [Shows current website for Mountain West Digital Library]. That’s the current Mountain [West Digital Library. As I said, over 60 partners, over 650,000 item records. And we are] growing dramatically all the time. And I have to say, I shepherded this along for about the first five years—but then I was able to convince UALC to hire a program director. And her name is Sandra MacIntyre, and she has been managing and leading this project ever since. She—she is paid by UALC but she reports to me—works in our IT department with the rest of my staff, and she has just done a phenomenal job bringing in new partners, new collections, and this continues to grow.
In fact, we are now—the University of Utah just a few weeks ago became only the third—institution in the United States, maybe even in North America, to purchase Ex Libris’ Rosetta digital preservation software. And two of the other four institutions are also here in Utah. It’s the LDS Church and BYU. So we’re now—thinking about building a digital preservation network on the backbone of the Mountain West Digital Library. And it’ll work on the same principles of bringing this kind of support to other institutions that—that couldn’t on their own, afford their own digitization and digital preservation infrastructure.
So it—it’s pretty exciting the way Mountain West continues to grow. It has subsisted on very little funding. It’s a relatively lightweight model, but people see the benefit of it. They—we continue to get—bring on new partners. It covers Utah, Nevada, a little bit of Idaho, and we’re about to bring on the Arizona Memory Project. So we’re—we’re continually growing. And that’s the Mountain West Digital Library.
[So what was the next project you moved onto?]
The next project—was the Utah Digital Newspapers Program. This also started in late 2001. It was a very fertile period. I got an LSTA grant that proposed figuring out how to digitize newspapers. And we started with three local weekly newspapers—and we tried to do this in-house. We tried to develop a process that would work in-house that we could do ourselves but it became apparent pretty quickly that it wasn’t going to scale. We needed external help. So we started working with a company called iArchives, which is down in Utah Valley, just about 30 or 40 miles south of here, and with them, developed a process that—a scanning process both from microfilm and from paper that segmented out individual articles and they returned to us the—the processed material and we loaded them into CONTENTdm.
So that first grant developed this process. Then I went back to LSTA the second year and said, basically, look, this is working. I submitted another proposal wherein I asked to expand this dramatically. I think initially the first grant we did 30,000 pages of newspapers, the second grant asked for another 100,000 pages, plus I asked for funding to hire a—a project manager, basically, and that’s how John Herbert got hired. And he has been the program director of the Utah Digital Newspapers almost ever since.
So—so during that second grant from LSTA, well—I should say I hired him at the beginning of January and we spent the next month writing a major IMLS proposal. And it was sort of a baptism by fire for him. And that proposal was successful and so we got over a million dollars to greatly expand the Utah Digital Newspapers. I can’t remember how many more pages we—we digitized—it was probably 300,000 more or so. And we continued to perfect the process with iArchives. And then—that was a three year program. By the end of that, John had pretty much taken over as program director of the Utah Digital Newspapers, and then he began to submit proposals to NEH and the Library of Congress.
We had talked early on with—with NEH and LOC and tried to get them—urged them on to-- to start digitizing newspapers. And I have—I have to just tell a funny story. We were—University of Utah was a recipient of—newspaper funding in the 1980s from NEH called the United States Newspapers Program. And this was in effort to collect and microfilm and catalog all the newspapers in each recipient’s state. However, the University of Utah had begun microfilming newspapers in the 1950s, in the early 1950s, we were pioneers at that point as well.
But there were no standards. And the microfilm that we sometimes had to deal with was just—stunning in its lack of quality. I mean, there was—there was—you know, across any given frame you could have it go from very dark to very light because of poor lighting conditions at the time. You could have one side be focused, the other side be out of focus. And then there were all kinds of interesting artifacts in the frames as well. You’d see people’s fingers, occasionally you’d see a ciga—a lighted cigarette lying across the frame that was just photographed with everything else. So the quality of our microfilm was not always very good and we saw that right away.
And that’s actually what drove us to digitize as much as we could from paper. And that meant we had to go around the state and make connections and—and try to figure out whether libraries or archives, in many cases even the publishers themselves—whether they had their backruns and whether they would lend them to us. It was quite successful. For awhile, I think we were digitizing at least 50 or 60% of our collection from paper itself. And that of course required a completely different scanning operation.
But when we started to—when NEH and Library of Congress developed the NDNP, the National Digital Newspaper Program, and John started to submit proposals and was successful, we were at that point pushed to go back to microfilm because they didn’t want to pay the extra—the extra money that it cost to scan from print newspapers. So John was very successful with that, we’ve had three rounds of NDNP funding and at this point the Utah Digital Newspapers is over 1.3 million pages and because of the way we’ve—we’ve zoned out, each—every every article on every page is a separate file at this point. Which means that those 1.3 million pages translate to about 17 million individual files. And those are being handled by our CONTENTdm server.
The—the newer method of digitizing focuses more on creating individual JPEG 2000 files for each page with article coordinates built in. And so we are gradually transitioning to that. Through the NDNP process we have—we’ve received about—we’ve received—so when we started getting the NDNP funding, we continued our digitization process but we also received from iArchives the formatting that—that NEH and LOC required. So we have all those archived and we’re gradually starting to transition to—to move our systems over to using—to using those methods.
But it’s been another very exciting project because of how successful it’s been. We had no idea how interested people were going to be in this, especially genealogists. And it has been by far our most successful—our most used digital newspaper pro—sorry, digitization program. So. Then—go ahead.
[Tell us about the Western Waters Project.]
[A]n idea that grew—that—that was born at the University of Utah. Greg Thompson is the Associate Dean for Special Collections and has been here for a long time. And this was an idea that he had in the late 1990s, and he talked a lot with our former director, Sarah McHolloc, about this, about developing this. And she eventually took it to the Greater Western Library Alliance.
The western United States, of course, beyond the 100th meridian, is—is a very arid place. Particularly the inter-mountain west. And so the water is a much more crucial issue for population growth and for agriculture and even for recreation than it has been in most of the eastern United States, at least historically.
So the Western Waters Digital Library was an effort to bring together materials about water issues in the western United States. And the Greater Western Library Alliance brought the strength of roughly 30 institutions mostly west of the Mississippi who could focus on the materials. We started seriously going after grant money in about 2001. In 2003 we were successful, in fact the Utah Digital Newspapers Program from IM—that was funded by IMLS and the Western Waters Digital Library proposal was—was also funded by IMLS in the same round—in the same year, so that was a very good year of funding. The Western Waters Digital Library was essentially built on the model of the Mountain West Digital Library. So you start to get to see how important the Mountain West Digital Library is. It was a concept that we proved there but then expanded to the Western Waters Digital Library. The difference was that it covered a much greater geographic area.
But again, we at the University of Utah have always managed the aggregating server, we’ve managed the website, but there are numerous participants around the west that digitize their own water collection—water-based collections and then expose their metadata through OAI to our—our aggregator. And so it’s—you know, it’s another program that has struggled a bit over the year due to—years due to lack of funding. That initial IMLS grant expired in 2005. In 2000 and —oh gosh, 8 or 9, GWLA was successful in getting an—another grant from the National Endowment for the Humanities. That—that particular grant was led by Colorado State University. And, so we’ve continued to—add material that way.
So—so I’m saying things that are a little bit contradictory. On the one hand, these programs do suffer from a lack of funding and they could be a lot better than they are if they had more funding. On the other hand, the models that have been set up for Mountain West Digital Library and for Western Waters Digital Library are not contingent on new funding coming in. The model is lightweight enough that there are very few personnel, beside my IT staff, who support these, and—it just sort of vacuums up the metadata of collections that the participating institutions we hope would digitize anyway. So in that way it’s a relatively lightweight model.
The last thing I want to talk about is the Western Soundscape Archive. This was another IMLS-funded—program started in 2007. But in 2006, a guy named Jeff Rice called me up, out of the blue. He was a sound recordist, he had a journalism background, and he proposed this idea for an archive—or for a project in which the library at the University of Utah would serve as the archive. And his proposal was to record or collect sound files that represent the western United States. And those would be animal sounds, those could be environmental sounds, ambient sounds—and after talking to him for awhile, I convinced him that the library could play a much bigger role than just being the archive where these sound recordings would be—preserved long term, that we could actually turn this into an interactive digital library, just as we had with—with several other projects.
So we—we did a pilot project. Pilot projects are always great things. They get—they allow you to prove the concept. I hired Jeff Rice for six months just on departmental money to help us develop the concept, build a pilot website, put some—put some sound files into a collection. And then, based on that, we wrote a proposal to IMLS. And this was late 2006, early 2007 and we were funded in September of 2007. And that was another three year proposal. It has been very successful. We’ve—we have nearly 3,000 individual sound files now in our collections, most of them are—streamed. Some of them are downloadable if the—if the copyright or creative commons licenses allow. A lot of these have been recorded by Jeff actually going out into the field. And some of them he has also collected from other sound recordists that he knows. So the—the copyright varies. They can all be streamed, they can all be listened to live on the website, but some of them can also be downloaded and reused for educational purposes.
We also have a collection of about 20,000 spectrograms from the National Park Services. The National—a spectrogram is basically an image file that shows over a period of time the sound levels for a given location. And the National Park Service—it’s very interesting—a very interesting thing—has a mandate to collect this kind of material, but they have no mandate to make it accessible to the public. So this provided a perfect outlet for them. We were able to—we asked them if they would consider giving us those spectrograms to post on the website, and they said sure. So those are up there as well. This particular project, the Western Soundscape Archive, is—is really important to me, personally because—I’ve lived her in Utah for over 18 years. I grew up on the east coast, where it’s a lot more crowded, a lot more busy. I love being outside in Utah and throughout the inter-mountain west, but I can see even here—anyone can see even here, and the statistics bear this out, that population growth is—is increasing dramatically in the west. We all see new housing developments that come up or new shopping centers or whatever other kind of development. That—that’s continuous, that’s very visual.
But I think what many of us, and I include myself in this, are less aware of, is how that development, how that human encroachment, affects the soundscapes of a place as well. A place that you visited 20 years ago will not sound the same as is does today. Whether it’s as a result of direct—development or traffic that goes by, car traffic that goes by a few miles away or airplane traffic that flies overhead—so it’s—it’s important to document the animals that have lived in a place, what a particular place sounds like—even what a particular place didn’t sound like. What it—it’s hard to record silence. But silence is—is endangered. Silence is a big piece of—of the picture that is gradually going away.
So—so this is a very important project to me personally. The grant, of course, expired in 2011. We have submitted another proposal to the National Science Foundation, we should hear about that in the next month or two and this will be an effort to—educate the public—about the loss of our soundscapes. And the plight of some—particularly the plight of some indicator species. Frogs. Frogs are one of the first species that disappear from an environment when—when something changes in it. So.[Top] [Back to Interview Breakdown]
-- Challenges --
[What were some of the challenges with Mountain West and Utah Newspapers?]
I think—you know, there are—there are, of course, technical hurdles, and the technology is constantly changing, so one of the dangers of getting in early, of course, is that you are going to do things that are not standardized. So that’s—that’s a hurdle that we’re having to overcome now by transitioning to—the new method of digitization—of digitizing newspapers. So there are always technical hurdles to overcome, there are scaling issues—I have to say though, I think—I think that what libraries in general don’t do very well is market ourselves, market what we have created. So as popular as this program has been, I think it could be a lot more popular if we had a genuine marketing and advertising program. We—John and I—between us have given a lot of presentations over the years about this. But—I think what we’re still lacking is getting into the—general consciousness of—of the public. And this particular project, the Utah Digital Newspapers Program, and Mountain West Digital Library are—are really focused at the general public.
Another significant issue that we have faced—the Mountain West Digital Library is sort of an umbrella for all of our digital collections. Everything that we digitize pretty much goes—gets aggregated into the Mountain West Digital Library. And I think that’s mostly true with all of the other centers as well. However, at this moment, the Utah Digital Newspapers is still not part of the Mountain West Digital Library. And it’s simply because of the sheer size of it, because of these—these scalability issues, the aggregating software that we were running was not capable of pulling in those 17 million files or—or whatever they were. Now that we have Primo, it’s much more scalable, so we’re getting close. I’m pretty confident that sometime this summer we will finally pull the Utah Digital Newspapers metadata into the Mountain West Digital Library, but that’s been a significant challenge.
So—so we’re—we’re addressing this from two sides. One is by improving the aggregating software that we’ve been using for the Mountain West Digital Library, the other side is starting to change our—digital newspaper processes—processes to use this new method. And you know, the name of the—the name of the standard—METS/ALTO, sorry. METS/ALTO is the—is the new metadata and processing standard for digital newspapers. And—so by—by switching over to that method, we will have many, many fewer files. We should be able to reduce our file count from 17 million all the way down to the actual—number of pages. 1.3 million, at this point. It will address some scalability issues that we’ve had.
Another—another significant problem with all projects is—virtually everything we’ve built has been on soft money. I haven’t even talked about Western Waters Digital Library and Western Soundscape Archive yet, but—but all four of these major projects have been built through grants. And as you—as you well know, sustaining those projects after the grants go is a problem. And the transition of libraries from focusing their work on print materials to focusing on digital materials and having the money follow that has been a long and slow transition. It’s getting better all the time, of course, and the University of Utah is very—supportive in many ways of the digital programs that we’ve created, but we—we suffer from a lack of funding as I’m sure most institutions do when it come to digital programs.
We don’t advertise ourselves well enough. And I think what we have to do—it’s a culture change, frankly, in libraries. You know—it’s—libraries have been so dramatically changed over the past two decades anyway because of—of the internet. We used to be the gatekeepers of information, right? People used to have to come to us to get access to most information. And it’s not true anymore. They don’t—they don’t—the public doesn’t need us as much as they used to. So we have to make more of an effort to show them what we can provide. And part of that means thinking more like businesses. Actively engaging our university marketing departments.
We did this, actually, a little bit with the Western Soundscape Archive. We built in a marketing budget that IMLS funded and we did do radio spots, we did other kinds of advertising. It had a good effect, it wasn’t necessarily a sustained effect. I think in general, development departments and marketing departments have to start promoting digital collections more. Most of what I see in terms of—in terms of development, in terms of donor relations, still focuses on the building and print materials, in particular our special collections. There’s not very much of a focus yet on trying to get external funders, private donors, to contribute money to digital collections like this. And that—that has to change. How we do that—just have to keep talking about it. Just have to keep promoting it and pushing it. And—you know, things change slowly. But they do change.
[What were some of the challenges with adding sound in Western Soundscape?]
Yes. Yes. So. Yeah, that’s been another interesting evolution over these 10 or 12 years that I’ve been doing this. You know, initially we just started with simple things, as most digital collections did. Photographs, right? Documents. Maybe we got into books. Eventually EAD finding aids. Digital newspapers was a huge paradigm shift for us. As I said earlier, we tried to do this in-house. We really had—it’s not that we had no idea, we did figure out how to do it. But we could not have done it in a very scalable way. We needed outside help to do that. So yeah, that was a big transition. Sound and video—yeah, that was another big shift, so—and of course, now, audio and video files are our biggest sector of growth. We have—we have roughly 100 terabytes of digitized data that we’re—that we’re having to manage. Over the next five years, we expect that to grow to about 250 terabytes, or a quarter pedabyte. The vast majority of that growth will be in audio and video files. Because they’re—they’re just bigger. So yeah, that creates—that creates stresses on our infrastructure, it creates stresses on our funding, and it creates huge stress on storage and digital preservation.[Top] [Back to Interview Breakdown]
-- Hindsight --
[Looking back on Western Waters, were there opponents?]
You’re—you’re definitely right that it’s a contentious issue. Mark Twain very famously said, whiskey is for drinkin’, water is for fightin’. And—however, I don’t—I mean, we certainly ran into strong opinions, we ran into some contentious issues. I don’t think anyone ever tried to prevent us from digitizing anything in particular or asked us to take anything down. We have been somewhat limited in what we’ve been able to put up simply because of copyright issues. I’ll rant on copyright in just a second. But most of what’s in the Western Waters Digital Library is historical material. You know, how these agreements came to be. The Colorado River Compact, for instance, is the very famous document from 1928 that divided up the water in the Colorado River between the upper basin states like Colorado and Wyoming and Utah and the lower basin states like Arizona and California and Nevada.
The problem with the Colorado River Compact is that the best historical data they had at that time went back about 30 to 50 years. And those happened to be, it turns out, some of the wettest years on record. So they expected they had a lot more water than actually turned out to be the case. So we—we actually did a video—oral histories project at one point called, Water is for Fightin’ from that—from that Mark Tain—Mark Twain quote. And—and those—those interviews that we did are available on the Western Waters Digital Libraries site. That’s where we ran into a lot of opinions, a lot of strong opinions about water and should the Colorado River Compact be revisited and renegotiated. And, you know—yes, very strong opinions. But no, I can’t say that anyone ever tried to tell us, we couldn’t digitize something.
Copyright, however, is—is I think just a huge problem. Copyright was never intended to be for the life of the author or even beyond. The—the current copyright laws are ridiculous. It’s something like life of the author plus 75 years or maybe—maybe it’s even longer at this point. Initially—when copyright was developed by the founding fathers, it was intended to give the inventor or the author some years of profit from their—from their works. But eventually, things have to go into the public domain because that’s how innovation happens. That’s how knowledge increase happens. And locking these things up and making them unavailable—freely accessible—I think hurts us.
-- Advice --
[We’ve asked the last couple of people we’ve interviewed—just what sort of advice you’d give us as students going into the digital realm?]
Huh. I think—I think this is a—I graduated from library school in late 1993. And I think 199—2 was when Mosaic was developed. That was the first graphical web browser. And you can’t imagine how different that made—when I was—here I am, I’m Associate Dean for IT Services. When I was in college, I wanted nothing to do with computers. I hated them. It was all very text—textually based at that time, it was completely graphic-less. Mosaic made the internet come alive, you know. It created incredible possibilities. And when I came here in 1994, I helped develop the library’s first website. Those were—those were tremendously exciting times.
Now—I think now, things are even more exciting. I mean, you’ve got so much of these early tech—technological problems ironed out. Computers are so much better. Everything from personal computers to servers—everything is so much faster, our networking is so much better. There are some—it just creates huge possibilities for you.
There are—there is room for tremendous innovation and the places now where I’m seeing a lot of room for growth and a lot of room for librarians really taking the bull by the horns is data management—managing—figuring out with scientists how to manage their data. Because I can tell you that most scientists really have no idea how to deal with their data. They don’t even know the right questions to ask. Everything from basic storage, where do I—where do I store my data to migrating it to—putting it in a database to make—and assigning metadata to it and making it accessible. So—so data management is a huge issue.
Metadata itself is—I think is—I think there is more need now for cataloging librarians than there’s ever been before, it’s just a—it’s just a tremendous paradigm shift. The cataloguing librarians who are willing to think in new ways, who are willing to think about linked data and how it can—help—create context around materials is—is just enormous. And we’re behind the 8 ball. We’re falling behind in that area again. So I think there’s tremendous potential for growth there.
And digital preservation is another area that we’ve been talking about for years and years, we still haven’t got it completely figured out and it’s a mess out there. I mean there’s—there’s—different institutions are doing all kinds of different things, there’s no cohesion. So I would—I would suggest to you that there is tremendous room for innovation, tremendous room for doing exciting work, building on internet technologies.
I would suggest that you—read widely. You don’t have become an expert in any particular field but you have to know what’s going on broadly. If you—if you move into management—I think the best things that I’ve done in my career are hiring the right people and putting them into the right positions. Giving them the power to run with projects. You have—you have no idea what—what people can do until you present them with a problem and an opportunity and see how creative they can be. See what they can bring back to you.
If you do go into management, there’s a flip side to that happy view that I just gave you. So. If you go into management, you have to be prepared to earn your salary. Which means that sometimes you have to—you have to counsel people out. You have to—deal with—with staff who are not doing a good job and who are unproductive. It’s just—it’s just two sides of the coin. But if you want to be—you always try to—to achieve success by leading and providing opportunity, but you have to be prepared to deal with the other side as well, and that’s—that’s not easy.
I think the library used to be a much more departmentally focused and territorial place. And I think—I think that—in some ways what libraries have to do has not changed at all. We still have to acquire materials, we still have to organize them, we still have to make them accessible to the public and we still have to preserve them. Right? Those are the core—core issues in libraries, and in a digital library it’s no different. I need the special collections people to bring the collections in, I need the public services people to tell their patrons about these collections, I need the catalogers to help—help people make sense of the digital collections. None of that has really changed. It’s just the tools and the methods have changed.
But what has changed is the need to work across the lines, the need to work across boundaries. The—the—I’m sure we talked about this in the CNI presentation about search engine optimization research that we’re doing, how we’ve—we’ve discovered that this is less of a technical problem than it is of an administrative and a communication problem. Everybody, virtually everybody in the library is now a stakeholder in how their digital collections—or how the library’s digital collections get—presented to the world. And whether the world can find them. And so search engine optimization has to be talked about broadly across all departments. And so yeah, it’s another example of the blurring of the lines. [Top] [Back to Interview Breakdown]
July 2012