Howard Besser - Transcript
Howard Besser
Professor of Cinema Studies and Director of New York University’s Moving Image Archiving & Preservation Program (MIAP), Senior Scientist for Digital Library Initiatives for NYU’s Library
Interviewed 3/5/2010
-- Beginnings --
I’m Howard Besser. I’m a professor of cinema studies at New York University and I’m director of the Moving Image Archive and Preservation master’s degree program. I also wear a zero—I have a zero percent time appointment as senior scientist for digital initiatives for the NYU library. And that just means that I work on projects kind of when I have time to—with arrangements of the dean of libraries. So.
My background is I have a—several degrees in library and information studies. I have a master’s a certificate in Bibliography of Non-print Media, and a PhD all from Berkeley School of Librarianship for the first one, School of Library and Information Studies for the last two. It’s changed its name many times. So that’s—my undergraduate degree is in pataphysics—which is the science of imaginary solutions, from Berkeley—so that’s my educational background. Are you interested in my work background?
Okay. In terms of my work background, I—I have a pretty eclectic background, I worked for fifteen years at the Pacific Film Archive, part of the University Art Museum in Berkeley in a variety of capacities ranging from organizing the film stills collection early on to working on creating a cataloging—online cataloging system or computer-based cataloging system for Japanese film collection in the late—late 70s, to doing projection, to designing the new calendar to all types of computing work at Pacific Film Archive. I worked for the—vice chancellor for computing for the Berkeley campus for a number of years in charge of image database projects. I was—I’ve been a faculty member in Library and Information Studies or some similar name at University of Pittsburgh, University of Michigan, University of California-Berkeley, UCLA—I have been the—Head of Technology for the Canadian Center for Architecture—probably—undoubtedly the premiere architecture museum in the world. And a host of other kinds of things, so I have a fairly extensive and eclectic work background.
[Okay. How did you get involved in digitization?]
How did I get involved with digitization? It’s kind of a long story. In the 1980s, I was working at the Pacific Film Archive in Berkeley, and we had—had a lot of trouble getting funding for basic day-to-day activities. Activities like cataloging, like collection management—this was the early to mid 80s. Ronald Reagan was president, advocating trickle-down theory. I noticed that very flashy and sexy projects—got—significant funding and day-to-day operations didn’t. So I envisioned a very sexy project for the Pacific Film Archive, a project where, if we got money for it, it would trickle down to our normal activities and would keep people doing some of those normal activities.
So I wrote a paper around 1984 that envisioned—scanning every frame of a film—and then asking the director of that film to come into residence for a semester on the Berkeley campus. And one of the activities would be that we would give him or her, probably a him in those days, a—the capabilities to remove scratches from their film, to rebalance color, to essentially re—digitally restore that film. And then at the end of that semester that we would output the film, a new, restored film onto 35 mm film and output another—or keep another version in digital, and just store that in a long-term fashion.
Now, this was 1984, three years after the IBM PC came out; most people’s hard discs were about 20 megabytes in size, and to actually do something like this right, one would have to fill up a hard disc with every frame of a film. And—so that you would have literally tens of thousands of hard discs in order to do this. So this was clearly not a—a very doable kind of project. But I thought, maybe this could attract money and the money would trickle down for us to do what—what I really wanted to do, which just normal stuff. This was kind of a ploy on my part.
About a year later, around 1985—we—the Berkeley campus had a new vice chancellor—for information technology, the computing vice chancellor. And he—wanted to get support from the campus to build a set of wires connecting different buildings on the campus. He had this vision of a networked campus. And he could—he managed to get some support from people in the sciences, but not from the humanities or the arts for this. Why would they want wires connecting one part of campus to the other? He saw my paper and a light bulb went off in his head. Oh! Movies! Everyone likes movies! We could—Pacific Film is a block from the campus. If we could—over this network, play those movies, in classrooms, in—faculty members’ offices, and let them access that, that’s a demonstration of why we should have a network on the campus.
So he met with me and the—the Pacific Film Archive is a part of the University Art Museum. He met with me and the associate director of the museum and—and we developed a plan where he would pay my salary for the next few years while I worked—on this project to digitize things. But one of the things that we quickly realized was that digitizing film was not possible. So instead we backed down to the idea of digitizing photographs and—paintings in the art museum, which was the parent organization of the film archive.
And then—so for about six months I worked with programmers there on that, and then we added a couple other different areas. We added architecture department, slide collection, architectural things, and the geography department’s map collection so that we could then—when we—after we scanned these images we could put them somewhere on a map and associate that with it. So we built—so—so all of this was in place, he assigned two programmers to work with me, everything was going really good, and then he—of course we didn’t have any scanners, this was—you know, 1985, and there were very few scanners in the world at that point in time. There were really only a couple companies that even made a scanner.
One of them was called Iconix, which was later bought by Kodak; the other was Hewlett-Packard, which had something that wasn’t really a scanner, it was something that measured light in some way, reflective light, but wasn’t quite a scanner as we know it today. So—we found out that Digital Equipment Corporation, which made mini computers at the time—afterwards was bought by Compaq, which was bought by Hewlett-Packard. Digital Equipment’s office in Palo Alto, about an hour drive away, had just acquired a scanner from Iconix. So I went down to Digital Equipment with an oil painting, a small oil painting from the art museum’s collection, to try to scan this oil painting. It—I—I still believe that no one had ever done a direct scan of an oil painting before this. I think this was the first attempt to really scan directly off of an oil painting.
So we—the camera that DEC had bought was not set up yet, they had never even tried it, so I worked with a—a very clever guy, Armando, forgot his last name, who was head of the DEC office, head researcher of the DEC office, and we set up the—I spent a whole—I think two days down there. We spent most of the day setting up the camera. And at this point in time, a scanner did not have a viewfinder. Or any way of displaying the scan. So you would scan something, but you wouldn’t—this was a scanner that was on a flatbed—it was like a copy stand. So the scanner was here and we’d push it up and down and we had the painting down here. We could not tell where the field of view was. We couldn’t see where the edge of what was being scanned was.
So, you know, we’re clever guys, figured out some things—we—first we started—we didn’t put the oil painting in there. First we started flat—with just a piece of white paper. And then we put a pencil down. We did the scan and then we had to take the file from the scanner, FTP it to a computer, and on that computer we just saw a set of numbers. Of the number values for each pixel. And when that number value changed, we knew that’s where the pencil was. So we had to keep—we had to keep on going up or down until we got where the pencil was on this side, where the pencil was on that side, so we got the field of the view appropriately. So that’s what we had to do even to set up a scanner at that time, 1985.
In—the next thing we did was put the oil painting down. We actually were told not to take it out of its frame. So we had—it had kind of a raised frame. We put the oil painting down and then—tried doing this—oh, well, no. Actually before we did the painting—I had actually—no, I remember. I had actually on the way out of my house, I had gone to the museum, gotten the museum, put the painting—oil painting from a museum—in the backseat of the car, and I stopped at my house just to get something, and I grabbed a poster. And it was lucky I grabbed this poster.
So first we tried to scan the poster. What we found was that in order to do a scan of approximately 8”x10”, it took about fifty minutes under extremely hot light. And I started, like, thinking, oh well, this project, I don't know we’ve invested all this into these great ideas for this project, at what temperature does oil in an oil painting boil? You know? Are—am I going to wreck this painting? And, you know, fifty minute—these were really, really hot lights. Oh, the other thing that I had grabbed going out of my house was my darkroom thermometer. The dar—so I had that as well. So—so we were actually measuring the heat, the darkroom thermometer goes to 110 degrees, right? It went off the edge after about twelve minutes, right? So, you know, this is getting really hot, it’s under light for a really long time, you know—maybe I should just pack it in and this whole idea was just not going to fly.
But you know, I’d gone all the way down to Palo Alto to DEC and you know—so we started getting very—you know, as I said, we’re clever guys, Armando and I are both kind of clever guys when it comes to this. So we thought, okay, what will lower temperature? So we actually—we found a fan—we blew a fan across the surface of the poster. And we got it so that after those fifty minutes of the scan, it was only around 100 degrees. Which we thought, probably not so good for the oil painting, and certainly not good to have fifty minutes of light, but—you know, it’s probably okay. So we did the scan—of the poster, worked out okay, so then we did the scan, direct scan off the oil painting, which, as I said, I think, I believe to be the first direct scan—direct digitization of an oil painting. So—we came back with that.
Up until that time we had been fooling around with geospatial data—we had been—the images that we had been using were from satellites—views of earth from satellites. And so we finally had our first art image that we could use in this. And then we went down there a few other times—they—the conservator at the museum didn’t want us taking any more oil paintings down. But we took other photographic objects and other types of material down. So that’s—that’s my first experience with digitizing.
[What was one of the first projects you were involved with?]
Well, the first project was this project that I just talked about. That was the first major digitization project I was involved in. And—in 1985 and the first half of 1986 we did a lot of work in digitizing quite a few images in the areas of geography, art, and architecture. And we developed a system—again, what I believe was the first client server—interactive query—image query system. We called it Image Query. And we showed this in May and June of—of 1986 at the annual meetings of the American Association of Museums and the American Library Association.
We rented a booth there, or we partnered with Sun Microsystems to rent a booth there, and—people were blown away. Because they had never seen high-quality art images on a computer screen before. And we were scanning at a pretty high resolution then. And they had never seen a query system where you can look at cer—essentially a database where you can put in certain attributes and you will get back images that are—that fit those attributes. And—and we even—not at that point, but by a year later from that, we had mapping—icons on a map where you could actually put them—on a map. So we had that by ’87.
So this was—you know, it really was the first project. It was a prototype project. You know, we designed it with the idea that some of the software—some of the manipulation had to be done on the user’s side, and some of it had to be on the server’s side. And—we—because we envisioned this as working on the internet, which really at that time was really the ARPANET. And we assumed that we would have our digital repository at Berkeley and that people anywhere in the country would access it.
But of course this is long before there were web browsers. This is long before the notion of—a set of software on everyone’s desks so that some things can happen on their machine and other things can happen on the server. So we used the only open source tool that was relatively widely available at the time, XWindows. And we built it around an XWindows environment. So certain manipulation when you zoomed in on the image and moved the image around on your screen, that was happening on your own machine rather than on the server because, for one thing, the speed of networks at that time would not have permitted you to try to do that from the server side. It—it meant thinking about, you know, what happens on the client side, what happens on the server side, and then to make things even worse, XWindows reverses what is client and what is server. So, that’s a whole other—other thing.
So—so we built this system and it—it was—it got a lot of use and it certainly answered some major, major questions for—for us. Questions that other people later tried to ask again and got the same answers that we got. Questions like—what types of attributes might you want—might people want to query on in a system like that? What type of functions does a user want? I mean, we very quickly learned that it’s not enough to just have a user get an image. They need to be able to do something with it. They want to be able to zoom in on it, they want to be able to save it on their own workstation, they want to be able to link it to a map, they may even want to be able to add their own metadata to it.
We discovered all of this in the 1980s. And—so in a way, it was really—it was just what a prototype project should be. It’s something that is trying to figure out what are the things that you really want, what things can you do without, wh—in what areas might you need to go in a different direction than you originally envisioned, things like that. We—we also learned a lot about things like display and color and pixels on computer screens, you know. None of us—even though all of us were pretty savvy, there were things we did not realize that, when you displayed the same image on two different screens, that that image would look different. And that in some cases the aspect ratio would be different because if you had pixels that were square versus pixels that were rectangular, you would have an effect that would either squish or extend one dimension of your image.
So we discovered a lot of things like that, which led me into a world of—of imaging science people that were working on things like standards of image display with color representation standards and—and being able to actually—set the attributes of a monitor and be able to calibrate monitors and things like that. We also learned that the phosphors on different monitor manufacturers reflected light in different ways. And that even if you had two Sun3 workstations, if you looked on the back and found one of them was a Sony Trinitron make and one of them was another company’s make, even though they’re both sold as Sun3 monitors, that those are significantly different. So there’s lots and lots and lots of stuff that we learned in this project. [Top] [Back to Interview Breakdown]
-- Challenges --
[What were a few of the major challenges that you found?]
Well, we were way ahead of the technology, so challenges on—you know, the hardest thing at that point—trying to do this at that point in time really was the—the fact that people did not have the software on their own machines to run this. That was the problem that was finally solved in ’93 when—people started having web browsers. So even though we were using something—you know, we picked XWindows because there were XWindow clients for Mac, Windows, and—UNIX-based machines. Even so, most people weren’t running XWindows. You could download that, and you could install this, but most people didn’t have it on their machine. So that was a huge challenge, and as I said, that challenge was solved when people started—when web browsers became ubiquitous and used to be on every machine. So that was certainly a challenge.
The speed of the network was a challenge, and the way that we finessed that at the time was moving more functions down to the user’s desktop so that there was only minimum—the text was going back and forth but the download of the image and different views of that image happened, you know, just once and then the manipulation happened on the user’s workstation. So those types of challenges.
There really—we really had very few challenges in terms of collaboration. Mainly because we had a sugar daddy, we had the vice chancellor for computing, who—later was fired for going way over budget on everything. We were just very—you know, a hundredth of a percent of his budget, but he went, you know, many, many million dollars over in his budget. So—so there was enough funding to do it, there were—there was enough staff and personnel, and this was enough—a project that was important enough to all the stakeholders, to all the players, that people were given the time to actually work on the project.
That started decaying about a year later around ’87 when the—vice chancellor was let go and a number of other things happened then, and things started—started being more of a challenge. But there—and then in terms of the partners in the content, the repositories in the Geography department, the Architecture School and the Art Museum all worked together very well, collaborated very well, and the partners in the computer center. And I—you know, it was an exciting new project and there was all this stuff going on and—none of—I can’t remember a single serious disagreement—you know, we had lots of little, oh, I think we should do it this way, I think we should do it this way, and we’d come to some kind of compromise of how to do it. There was—I can’t remember any single case where there was any kind of significant battle or serious disagreement on things.
But I also—I attribute that to being—it being such a—so far ahead of its time, such a sexy project and well-funded. I mean, the only thing I had to hustle for was equipment, and you know, one look at, you know, I show this—this art image query thing on a Sun machine to the representative from Digital Equipment and she’s, like, quickly rushing a bunch of Digital Equipment machines to the campus. And then I show it to someone from IBM and they’re giving me their latest—RT—workstation—mini-type workstation for use on that. So it was not any problem even leveraging the equipment from vendors. It was—it was just really—yeah, so—the hard part was just, you know, thinking up the project, figuring out some of the technical stuff, we had stellar programmers on it—Steve Jacobson was the lead programmer on this, and he was, you know, just fantastic. He—he committed suicide a few years later, and I think part of it was that he had to do less interesting projects. But it was—it was a really—really interesting thing to do and an interesting time.
[Did you want to touch on the digital delivery—if you want to?]
Well—since—since—today we are in the middle of the Institute of Museum and Library Services WebWise, an interesting aspect of digital delivery is one that I think in some ways kind of highlights some difficulties between libraries and museums in their own—in each of their kind of traditions and heritage. A museum—okay. In order to—to really be able to enhance a user’s experience—one of the key things that we know will enhance a user’s experience is to let them see something and then stop, come back later and continue, to have the system somehow recognize things—that, oh, you’ve already looked at these things. Oh, you looked at these things and we’ll suggest that you look at these other things. Things like that. Now—to someone from the museum field, it’s cut and dry. Let’s do this. Let’s—we need to follow the user. And I actually heard someone from a museum speaking today at the conference—just flat out saying, this is really critical, this is one of the most important things that we need to do, is to be able to have that user relationship where we follow them all the time and we know what they’re doing.
Now, it’s very easy for a museum person to say that. It’s not so easy for someone steeped in library traditions to say that. Libraries have a tradition of protecting the privacy of the user. Librarians have gone to jail for refusing to give up names of users who have read certain books and who have done certain—checked out certain things. And for many years, I can remember having conversations with Cliff Lynch when he was still at the—Division of Library Automation in California, about how user experience and tracking user experience was something that was a really important thing to do and that the Melville system would have been highly enhanced by doing that. But at the same time it clashes tremendously with—with library traditions of not following users and not keeping that record and keeping that data. And, you know, the American Library Association has spent millions and millions of dollars in lawsuits trying to overturn federal laws that—that will—require them to track where people go on workstations hooked up to the internet in their libraries.
So there are or—there are possible ways and there may be future ways that are more difficult but that would allow one to maintain a degree of privacy while still being able to offer these kinds of services. But this is something that people from the library world really concern themselves deeply about. And this is not a tradition of the museum world. There is not a tradition of privacy. The privacy traditions are very strong in the library world; they’re fairly strong—they’re actually stronger in some ways in the archive world at least as far as contributors go. Not—not so much as far as users, but the privacy of contributors is really recognized. And the museum world, the only area of privacy that’s recognized is the privacy of donors. It’s not the privacy of the users. And—and the museum would rather that every donor own up to having donated what they did—that’s good PR for the museum.
But there aren’t these traditions of—so in our delivery systems that—I think that’s a problem that we’re going to continue to face as we build tools within our delivery systems to “enhance” the delivery system by recognizing that you’ve already visited here and—you know, and—libraries are still struggling with this. And museums have—not had—the only times museums have had to struggle with this is when they’ve been in partnerships with libraries and the libraries have insisted on it. So I think in the context of partnerships for delivery, I think that’s certainly an issue that we’re not near resolving, and—it really stems from different traditions.
[Is there anything else that you’d like to add or share with us that you haven’t been able to?]
And this kind of blends a little bit with my last statement. In—in one of their recent documents on the future of libraries, museums, and archives, IMLS talked about “third spaces.” And the need to foster these third spaces. These are spaces that don’t look like libraries, don’t look like museums, don’t look like—like anything that we really know. But they’re places for communities to gather together and interact with one another and to learn. And they can be virtual spaces or they can be meat spaces or human spaces. And they—they talk about how these are public spaces that need to foster civic engagement—that’s their wording—and foster community bonds, again, their wording.
Now, I think this is very important. This is certainly a role of libraries as place and as public spaces and this has always been very important. But again, I think there’s a conflict here between what museums seek to do and what libraries seek to do. Museums want to foster these third places—these third spaces. But are not particularly encumbered by certain things. Whereas libraries, when they create these third spaces, worry about free speech, being tolerated within those spaces, and worry about privacy concerns within those spaces. And libraries I think have been very good at being able to balance when free speech comes into conflict with respect for privacy and we’ve—we’ve I think traditionally been fairly good about that. But—again, museums—and I’m not at all trying to knock museums, I’ve worked in so many museums and worked with some many museums, I love museums and I teach about museums and I teach about the heritage of museums.
But—but in general museums do not give much worry and concern when they’re setting up these spaces for civic engagement. They don’t think a whole lot about free speech issues and, you know, what happens if someone doesn’t like what someone else said in this space, you know, what do you do? And libraries, before they even set up things like this, start devel—they think of, what are we going to do when this happens? They go through, you know, all this agonizing, you know, self—reflection about what will happen? And they run through all these scenarios.
Part of it is that we all, at least in library school, had some kind of exposure to public libraries and all those strange things that happen in public libraries around—where one person’s rights come into conflict with someone else’s rights, you know, someone talking in the library, right? And bothering someone else. So—I mean, we’re steeped in these traditions of trying to struggle with this. You know, that’s a—that’s a very different kind of thing.
I—I hope that libraries will not use that as an excuse to not set up these third spaces. I could see some libraries doing that. I think that that’s—that that would be a—that would be a mistake in terms of really libraries—maintaining their relevancy and opening themselves up to different types of audiences. Audiences that may find those spaces very attractive and may have no interest at all in our physical spaces or in—in any of our physical objects. And, you know, libraries are not primarily about that physicality. They’re about a place. They’re about a civic engagement and fostering civic engagement. But those things can happen in various spaces and times and, you know, those do not have to happen within the walls of a library. You know, many, many years ago, I imagine there was controversy about bookmobiles. You’re sending the library out to a different space! You know? I think most of us now love bookmobiles. They may be, you know, polluting or, you know, we may have our critiques of them, but the idea that you bring the library to where the users are—is one that—I think we all cherish. And you know, our users are online, we should be there. [Top] [Back to Interview Breakdown]
-- Hindsight --
In around 199—4, probably—we—we hadn’t seen much progress in the area of image retrieval, particularly of our type images. Since this image query project in the late 80s. so we’d gone, like, maybe four or five years without a whole lot of real concrete progress on the technical end of things and on the user—deployment of things like this. So we’d had the experiments in the late 80s and then we’d had a lot of people talk—it got people excited talking about it. We had a lot of actual progress in the area of metadata standards and things like that. But we had had no progress in deployment.
And so—the Getty Trust through the what was then called the Art History Information Pro—Program had done a number of experiments in just looking at image quality and trying to see what art historians required in terms of image quality. And could they give up their slides and their transparencies in favor of something digital? They had gone that—that far with the kind of research end, and they decided that—kind of the next step—they would take a few next steps.
One next step was to contract with me to write a book on how to do this stuff, which, they published in ’95, called Introduction to Imaging and then became part of a whole series. They have Introduction to Metadata, Intro—Introduction to all these things and it became wildly popular. But at the same time—this is around 1994, I was beginning to work on this—on this kind of—manual of how to do things, they thought, well, let’s—let’s actually try to have some kind of implementation to go along with that. So they brought together a group of us, including myself, Jennifer Trant, who had just been hired at the Getty to work on imaging issues, Cliff Lynch, who then was—Head of the Division of Library Automation for what later became the California Digital Library and Paul Peters, who was then—what Cliff Lynch is now, the head of the—Coalition for Networked Information.
So we got together in a hotel room in Miami—or no, in Orlando, Florida, the Dolphin Hotel and try—we had previously outlined what this thing we were trying to do was. And we got together there to actually decide who the participants would be. So this thing was to be a set of seven—museums and seven universities and that the seven museums, which turned out to be six museums and Library of Congress—that these people contributed—these organizations contributed a total of about ten thousand images and text records—metadata—for those images—from their collections. And that this aggregation from these seven organizations would be deployed on seven university campuses. And that we would then suddenly have these image databases on seven campuses and people using them. And we would have the rich metadata. And we would also have the comparison between each deployment because each university used a different set of hardware or software to deliver it on their campus. Had different ways that they used it on their campus, had different—permissions around who could use it, had different advertising schemes, so that we could compare all that.
So this became known as the Museum Educational Site Licensing Project, even though, so little—site licensing project sounds like it’s an intellectual property project. That had very little to do with the project. The intellectual property end was—almost ignored—it wasn’t ignored, but it was just one percent of the work that—that went on here.
So we had fourteen organizations collaborating on this—on deploying this. And—there—so—we had meetings—the project lasted about two and a half years. Initially it was supposed to be two years, it was extended for another year but there wasn’t full funding for the year, so it was about a two and a half year project. Things were deployed for about three years on those seven campuses, and we had meetings of two people from each organization. So that’s twenty-eight people, meeting—plus assorted other people who were invited at least two or three times a year during that—and so I was part of the management team for that, which was David Bearman, myself, and Jennifer Trant, were the main management team, and then the advisors were Cliff Lynch and Paul Peters until he died.
So this—this project went on ’85, ’86—I’m sorry, ’75—’95, ’96, ’97. And in ’97 I got a grant from the Mellon Foundation to actually do a—a careful study of what had happened in—within this project, and that was published in ’98. So it was just—it was an enormous project but it yielded really interesting results. Ranging from—little things like, initially we specked out standards that everyone would contribute all the—museums and then the Library of Congress would contribute—their images in lossless JPEG. Well, as it turned out, we looked at every codec on the market at the time, and we took a— an image and we compressed it using lossless JPEG, and we decompressed it, and compared them, and every one lost something. So we found out that lots of these products that claimed to be lossless JPEG were, in fact, not lossless.
So we learned things like—like, consumer beware, your—you know—your software may not be doing what it’s telling you. We learned things—ranging from that to issues of our use of metadata and how that—you know, certainly a goal that—that we had identified from the beginning was, if you have metadata coming from all these different types of institutions, how can you do a search across it? So, how were we going to try to map all this metadata into a single, workable solution? And we had seven different approaches to that from each of the universities that deployed it. And so we could look through that and—and see that.
And—we also learned a lot about how do you get people to use this material? You know, it’s—the whole idea that’s always—it’s been floating around for the last thirty years I’ve been involved in these things—if you build it will they come? You know? The notion was always, oh, we’ll just put this stuff out there and we’ll have users. Well, in fact, you—at that point in time, you wouldn’t have users. You had to do things to encourage use.
And we did—we did a lot of work with faculty members. We got them to teach, we found ways to try to entice them to use these things to teach, and we got them—we got faculty members on different campuses to share their teaching templates and their teaching approaches. And we used the project itself and money from the project to bring them together to meet—to figure out how they could collaborate.
And so it was—it was extremely successful at doing things like that. We got a lot of classroom use, we got a lot of individual use; one of the things we had hoped to do was to push at how we would write a site license because we were afraid that there would—that we would—we thought that we would generate all this bad behavior that we would need to—write—restrict in a future license. We thought people would be doing t-shirts from these images and coffee cups. Turned out we had almost no bad behavior, so it wasn’t very useful—in—in that sense.
But with any of these large projects that you—work on, you—you have your vision of where your problems are going to be and what kinds of things are you going to find out. And usually, you know, fifty, seventy, eighty percent of that ends up being accurate. But there’s another twenty, thirty, thirty-five percent that are things that you never dreamed of were issues. New issues that—that come up that you didn’t dream of. And issues that you thought were going to be contentious end up being—just really easy.
So—but that was a—a huge project and certainly a huge part out of a lot of people’s lives. There were not only the twenty-eight people who came to regular meetings, you know. There were project teams at each university and at each—museum that were much larger than this. This was over a hundred people—working, you know, solidly for several years on that.
And this—but this did build—you know, kind of lifelong professional relationships. So, for instance out there you are showing Thorny Staples and your interview with Thorny. Thorny was one of the representatives from University of Virginia on this project. I’m having dinner with Thorny on Wednesday. We’re—you know, we’re professional colleagues, we work on other things, and that’s true with lots of other people from that project that have—kept—kept together and—you know, because we were in an intense, intense environment for several years.
[Do you think there’s one or two things form that project you just talked about that made you change something the next time you did a project—the next step you took?]
Yeah. There are—there were a number of things from that project that I learned from and that kind of changed the way that I would look at things in the future. One of the issues in the project that emerged maybe halfway through the project was a feeling amongst the—on the ground project participants that they were being—that they were not having enough input into what the project was. And so there was a—a groundswell of opposition to the project management. This resulted in actually a change in the project manager—in a shift to—in a very direct way, the former project manager was replaced by a participant who became the new project manager.
And so—that—I think that was fairly successful, you know but for me, what I learned from that was if you have people who are devoting so much of their work time to a project that is centered away from their organization, those people have to be listened to a lot, and have to almost be—I wouldn’t use the word pampered, but you know, they’re doing—they have a full-time job to do on top of this thing that they’re being asked to devote twenty hours a week to. And so if—if they are not feeling good about the project—then the project is not going to succeed.
So—so that kind of participant—listening to the participants—and trying to make sure there’s a participant voice I think is a—an incredibly important lesson that—that came through this. And particularly when—it’s very hard when you’ve got a project with a hundred participants to really listen to everyone. But you have to have an ear on the ground to as to—and a sensitivity when things are brewing and there’s murmurs and disgruntlement and you need to be able to kind of—draw a line between things that are just, oh, it’s a lot of work and oh, we could do it in a way that would be much better and result in less work. I mean it’s not just—there’s a difference between grumbling and having better solutions or—or being able to explore alternative paths that may make it better for everyone. So—so that was certainly something that I learned from that project.
Another thing that I learned is that trying to do such a massive project is near impossible unless you have really a steady source of income to be able to pay for the things that weren’t envisioned when you first laid out the project, so you know, there’s a budget for the project at the beginning that has matched exact activities. When you have that many people involved, certain things go more slowly than anticipated, certain things don’t work as they anticipated, and if you don’t have the funding to buy your way into a solution for things like that, you are—you’re going to get stuck in these little corners that you can’t get out of—paint yourself into corners. And—and so that’s—that’s something else that—that I learned.
But it’s kind of interesting. As much as that project was a total drain on my life and I think everyone involved, and as much as we were all complaining all the way through the project, I—I doubt that any of us would have—would today say we wish we hadn’t been involved in that project. I think all of us really appreciate what we learned on that project.
And I’m not talking so much about what we learned about image distribution. I’m talking about what we learned about how to try to do a new project, how to work with other people, how to collaborate, how—how to actually get things done. And you know, most of the people went on to kind of more—higher level management positions managing teams because of their experience. Or managing whole proj—becoming full-time project managers. Things like that. Because—you know—we all learned a lot about how do you try to get something done. [Top] [Back to Interview Breakdown]
-- Advice --
[What would you tell people who are starting out in the area of digitizations in the libraries—what to be aware of? What would you tell people starting out?]
What would I tell people starting out—well, I do tell people starting out all the time, I give workshops and I teach classes and—you know, so I am telling people starting out right at the beginning. And I have a whole class on this and—but I’m—right now I guess I have to give something akin to a soundbite on this.
I think—maybe I’ll make a short list of things that I would—I would emphasize to people starting out. One is that—for most important things, you should not do them alone. Things are best done in groups.
Two is that you need both a short-range and long-range—short-range plan and a long-range vision. And your short-range plan has to fit into your long-range vision, so you may be doing a project that is just something involving a small number—a relatively small number of objects, 500 objects. You’re building some kind of digital collection, digital library, digital museum out of these 500 objects.
You need to look at what you’re doing and plot out that—you know, how you’re going to do it for that 500 objects. But in the back of your head has to be your vision of what you’re going to do when that is 10,000 or 100,000 objects. This may be one collection of many that you have. So you have to envision that this—how this will fit with the other collections, and you don’t have to have that vision to the level of detail that you have for the project. You just have to know where you’re going with it and be able, in your detailed project, to put in the hooks that you will need for the later things.
And—you know, in my years in doing this, I have seen hundreds and hundreds of projects where the project was cool, neat project, project gets done, and it’s a—it’s a small project, and later they want to do more or add to it or—make it work for their whole collection, and they end up having to start over. When, in almost all of these, if they had just thought about a couple of little things, or done—just a small incremental amount more of work on the initial project, it could have grown into the larger project. So these are thing—you know, simple things—I’ll throw out some examples now.
Say someone is doing a project involves—gathering—photographs of, say, botanical specimens or something, and making an image database of those and they send people out with cameras. Well, if they had thought to have those cameras with GPS chips and have the—have the georeference data as metadata in the file when they get out, think of all the things that could be done down the road if you could—if every one of those images that was taken could be mapped. You know, you wouldn’t have to sit there and say, oh, well I got this in the foothills, you know, up by—Boulder, you know. Or you say something like that that’s a very inexact thing and, you know, takes someone time to write it down, you know. Instead you could have implemented that with the exact location marking in that—file from the very beginning.
Little things—that’s one little example, but just—to be able to think through—to have some vision of the future when you’re doing your little project that would allow your little project to be part of a world of information and—you know, a networked world of all these really interesting things. And, you know, so that’s—that’s another thing that I always tell people, to have the narrow vision but also to have the broader vision. And then to put in all the hooks that you can think of.
[What is critical now in digitization? The hot, critical issues now?]
There’s—there are not too many hot, critical issues in digitization. There are plenty of hot, critical issues in digital delivery and in—in—kind of—architecture and other kinds of things, so if I’m able to go broader than the actual act of digitizing, there—there are quite a few interesting issues.
For me, where I am right now teaching in a program that focuses on archiving and preservation, of course one of the first things that’s critical for me is how do we save things over time. And that’s, you know, a very serious issue, and it’s a more serious issue as we get into formats that are more than words—you know—it’s hard enough to save a word processing file. Most people who had word processors fifteen years ago, even if they were using Microsoft Word, cannot open those files today. But it’s considerably tougher when you’re dealing with Raster images or Vector images, or sound, or moving image, where the file formats are—the word processors are all based on ASCII, so ultimately you’re still—when you’re trying to recover that file that was made in an old format, you can go back down to the ASCII. You know—our—our image and media formats are—are just so—different from one another at their very base that to be able to pull these out later is really a problem.
So—I still think we’re at the level where digital preservation is still a challenge. A lot of people have made a lot of progress on this in the last eight years or so. Library of Congress has spent—you know, tens or dozens of millions of dollars on funding really good prototype projects in different domains through the NDEP project—program. And we’ve made a lot of progress, we’re way, way ahead of where we were like—what was it, fifteen years ago. I was on a committee that—issued a report saying, you know, we’re in peril, we’re going to lose our digital heritage. So we’ve made a lot of progress but there’s a huge amount that we have ahead of us that’s a real problem.
So, digital preservation—is certainly a challenge. Intellectual property is a huge challenge. Copyright is a huge challenge. And probably the—the biggest challenge for us in copyright is—for our field, is orphan works and—the fact that so much of the content in our repositories we do not know who owns the rights to it, and we’re not allowed to make copies of it unless we know—or to put it up on the web or to do various things, unless we first ask the rightsholder, and we don’t know who the rightsholders are. So the—the orphan works problem is a huge problem.
There are all kinds of other contentious problems around—intellectual property. As we’re recording this there’s—in early 2010—there’s been the first case of a—a media association—association of media producers and distributors—suing a university for putting content—behind a firewall on their campus. This was the AIME suing UCLA.
Yesterday, in fact, UCLA—they sued a couple months ago and UCLA took everything down, denied access to that, and yesterday UCLA issued a press release saying that they were going to fight back. And that this was an intrusion on Fair Use, that they had purchased this material for use in a controlled way, that everything was password controlled, and that—AIME had no right to prevent them from delivering it this way, and so this—this is something that will play out in the courts and in public opinion.
Do we have the right to do things—you know, UCLA paid for these things. They bought licenses for them, they were to be used in classrooms, and these just happened to be online classrooms, which is where the future—you know, one future of classrooms is like that. They were password protected, they were—almost always they followed the TEACH Act, which was—an attempt to bring the laws—copyright laws for distance learning up to the present.
The one area that they may be a little shaky was that the TEACH Act was a little bit—was not quite forthcoming on moving image material as it was on—on all other types of materials. So this will play out. But you know, the copyright issue is a huge challenge, and that’s one we can’t solve by—having more—throwing more technology at it or throwing more money at it. This is something that is solved in Congress and in public opinion. And, well, of course, throwing more money at Congress may, in fact, work, but we don’t have that kind of money to do—to do.
Okay. Another challenge that—that we’re facing in—in the coming years is the kind of vast quantity of—born-digital information that is coming at us. And you know, think about the old models, of—every book that came in the library had to be cataloged. It’s going to take a cataloger an hour to catalog a book. Think of that with all the digital information that’s coming at us. That’s just absolutely untenable and we’ve tried to find ways to deal with that, a variety of ways. But—as someone who’s, you know, supposed to be a scholar thinking through all of this, and conceptualizing it—the critical things I think—the ways to look at this are—the—you have the library or the museum here, you have the material coming in and getting ingested. It is just at the point of this glass, right as it’s ingested, that we do the cataloging. That’s the traditional model, is that the cataloging happens here. What we need to be doing is to be pushing the cataloging upstream and downstream. Upstream to the content creators and getting more of the cataloging that we need, more of the information that we need from the creators and downstream to our users and having them contribute the metadata that we need to—to find things.
And so I’ve been involved in a project for the last five years funded by Library of Congress through the NDEP program—where we’re looking at public television material and preserving born-digital public television material. One of the things that we did early on was send our students to a variety of different programs—documentary programs, public affairs programs, more narrative app—programs—to study the workflows in each of those programs. And—and what we discovered from that was that there is a huge amount of metadata that we need for retrieval and for preservation that is known at early stages of the production and is thrown away.
And so it is not a matter of trying to get the producers to fill in a whole bunch of information that they didn’t have and to spend the time pretending like they’re catalogers. It’s a matter of them writing down in some kind of standard format what they know at the point at which they know it. And to give them the tools for doing that.
So we first wrote a paper about this that we delivered at the Dig Seeker conference in 2008 called Pushing the Data—“Pushing Preservation Metadata Gathering Upstream” or some such title, and then we had the very lucky opportunity that one of our partners, the public television station in New York, WNET—while we were working on this partnership with them around NDEP decided to start a nightly news program called World Focus. And so we were involved in the ground floor in conceptualizing World Focus where all this metadata gathering happened way upstream. Including things like what I have—said in another context earlier today, putting GPS chips in the camera so that every clip that is taken in—everywhere in the world is marked with the—with the metadata that stays with that clip even if it’s an outtake. And so that when these things come into the repository, we have a lot of the information that we need in order to provide access and in order to do preservation. So that’s one end of it.
The other end of it is the kinds of things that, you know, we’re here at WebWise today and the kinds of things that the WebWise community is talking about a lot, and that is user-contributed metadata, user contributed content. And you know, I think we have to rely on that, there are ways to distinguish between what was contributed by the library, museum, and archive or what was contributed by the user. There’s all kinds of ways to implement this in smart ways. But ultimately our specialists that are on the staffs of our repositories do not have the time to contribute all the metadata that will be needed in order to retrieve this material, in order to really contextualize the material, in order to preserve the material. So relying on—on our audience, relying on our users, is another part answer to this problem of how will the metadata get added.
[I’m going to take a leap here, and I’ve been looking at some of your work. Would you talk about—a little bit about the do-it-yourself media and maybe even digital delivery systems? Is that part of—does this link in to what we’ve been talking about before?]
What, do-it-yourself media and what?
[And digital delivery systems.]
Digital delivery systems? Those two as an intersection, or as separate?
[They don’t have to be—]
You know, clearly, we live in a do-it-yourself age. You know, anyone who doesn’t recognize that has blinders on, and certainly doesn’t interact with any young people at all, and you know, it’s pretty hard not to notice all the social media around us and—let me put this in a little bit bigger context. We in the—the field of cultural heritage has historically looked at the kind of cream of the crop, so to speak, the high quality material. And it really—other than the field of anthropology, most of the material—culture, art, different types of things that make their way into museums or into libraries are the quality material. That material is not very representative of human activity. It is very representative of the power or the—the kind of high art or the—the kind of highly respected sectors. It is not representative of the average person.
And one of the things that many of us have—have really been advocating probably since—since the early 1990’s is looking much more at ephemeral material. At the kind of material that’s produced by ordinary, everyday people as part of ordinary, everyday activities. And—so—this comes out in a whole lot of different ways. We at New York University sponsor the Orphans Film Symposium. Every two years we highlight orphan materials on film and video. This ranges from newsreels to home movies to anthropological material to lots of outtakes from—various types of things. Even to kind of the things that were—street cameras and bank cameras and things like that.
Now, when you—when you think about it, that’s pretty marginal, that’s ephemeral material. But if you look at what we have discovered—in our archives, special—library special collections as we’ve put more and more of our photographic collections up online, we have—it’s become quite clear that people’s, like, home photographs of them in the 1930’s walking down a street end up being incredibly valuable, rich material about what clothing people wore at that period of time, how couples walked together or separate, does the woman walk behind the man, does—you know. There’s endless numbers of things that you can find in a single photograph or in ten seconds of film of this—this time period that are incredibly, incredibly important to understand history and social—social groupings and—and things like that.
A friend of mine has been a pioneer in this, Rick Prelinger, who—collected—in garbage bins and dumpsters—collected films that people were throwing out. These types of films, educational films—how to be a good homemaker, how to go out on a date, things that were shown to students in—in K through—in middle school and high school in the 1950’s and 1960’s. These are the—the most important material we have to understand gender dynamics in that time period and to understand how gender dynamics—were internalized by people who are now in their sixties or so. And—so this types—this type of everyday material is becoming increasingly recognized as being of fundamental importance. This ephemeral, formerly regarded as ephemeral material. And so there’s a huge number of us that now try to put some kind of focus on this material.
Certainly the do-it-yourself kind of material that we see on YouTube is—is the ephemeral material of today. This is the material that says, what are—what topically are people thinking about? Howare they dealing with things? What’s on their minds? You know, this is—this is the—you know, this is even better. This is like also having had a microphone in on the water cooler for discussions that people have at work. Or, you know, in people’s homes to hear discussions in their—while they’re—boring sections of a football—Super Bowl or something. This is where this very rich material is. And so the capture of this material is really interesting.
Let me just give you an aside, this you’ll probably cut but—New York Times a couple days ago. There was a guest editorial by—a musical group—let’s see if any of you remember this. A couple years ago there was this top hit on YouTube of these guys singing while they were walking on exercise machines or on these rolling sidewalk-type machines, do you remember that? They were singing a song and they’re walking on this. This was like the number one thing on YouTube and I think it was probably the first big hit on YouTube. But the guy—this is Andy Warhol’s famous for fifteen minutes, they were really famous then, no one’s hard of the band anymore, nobody thinks of them—he had a guest editorial in the New York Times a couple days ago. And in it he was talking about how—his record company is not letting them distribute that anymore even though it ended up being, you know, a huge money maker for the record company. So it—it ended up being about copyright.
But this is—you know, there are tens of thousands of these stars for short periods of time. The woman on—the British equivalent of American Idol, older woman who’s singing and is now, like, famous all across the world, and everyone will forget about her in a few years. But the record of that and of her and of the hundreds and hundreds of parodies of her that are posted on YouTube is—is a tremendous insight into what is in the mindset of people today. And so for me, the—the kind of do-it-yourself material is—is really interesting from the perspective of being a window in to today.
It’s also interesting from another perspective. And that’s—that if you believe the theorists—the theories that emerged thirty or so years ago around post-modernism—most things that are created are pastiches of previously created things. And you look at 20th century arts. You look at music. Jazz, riffing off of other people. Rap music. The whole idea of sampling. You know, you take something that someone else made and you re-edit it and put other things around it.
Dada—cubism—Dada, surrealism, pop art, post-modern art, all of the 20th century art forms were built around taking something and riffing off of it to some degree. And so if we think of what our more modern forms of artistic expression, you know the trajectory we’ve been on for the last hundred years is that you don’t just create something whole hog, you comment upon something that exists and you incorporate that thing into your commentary. And so that’s—that’s what our art forms are built on. And our copyright laws just don’t know how to handle it. They just are—are just totally archaic, they’re pre-20th century in terms of handling these things.
But—but in terms of us as—as custodians of culture and our cultural heritage, these things, these—these types of do-it-yourself creations are primarily commentary on other creations and incorporate those other creations and—and so we—we really have to be worried about being able to continue to collect them—by those people to be able to create them and to be able to preserve them.
Just as one further example, my friend Rick Prelinger, when he started the movie section of the Internet Archive, he—was very adamant that he would not only just stream this material. They had to be available for download so that people could remix them. And you know, that kind of remixing is part of this age and is something that we really need to be aware of and need to be collecting. [Top] [Back to Interview Breakdown]
June 2012