Dilemnas of digitisation

Publié par agiffard le 15 Octobre, 2008 - 11:16
Version imprimable

 

The Maison Française d'Oxford has invited me to a symposium intitled "The dilemmas of digitization. How to digitize the humanities"', 22/23/24nd May. Paolo d'Iorio (CNRS-ITEM), Anne Simonin (CNRS-MFO), Alexis Tadié (CNRS-MFO) and Paul Flather (Europaeum) were the organisers.

 

My speech took place in the second panel "Did they want what they have achieved'" (1). The organisers ask me to speak about "From the Très Grande Bibliothèque Numérique to Gallica". (I have been in charge of the information system and digitisation program of the new national library in Paris, from 1989 to 1993).

 

Here are the oral notes, a little re-written, with some foot-notes I have omitted.


As an introduction, I would like to precise that the Très Grande Bibliothèque Numérique has never existed.

 

In the letter (2) of François Mitterrand, President de la République, letter which was known as the first program of the new library, it was said: "I should like to go further on in this domain, with the creation of  a "très grande bibliothèque", a very large and great library, of an entirely new type". The text said also: "this large library ...will use the most modern information technologies, and will be connected, accessible from outside".

 

La "très grande bibliothèque", without capital letters, was to become, under the pen of journalists, la " Très Grande Bibliothèque ", later on, "la TGB" . But there has never been any project of TGBN. We have begun with 500 000 digitized books, then, 200 000, to finish, for financial reasons with 100 000. 80 000 have been really digitised which was quite a big number for this date.

 

There is a sort of contradiction between the platitude of the words used by the letter to speak about technology and technological uses, and the pompous tone of the general scheme.

 

"Of an entirely new type" asks logical questions: does the text wanted to say " an entirely new sub-type of library", or "a library that would belong to an entirely new type of something that nobody knows and had no word on that moment".

 

We have proposed this answer: "a library, a new one". (3)

And this direction has had two related parts: digital library and digital reading. I forget the computerization of the activities of the library, an important part of the work, but not exactly the new one, and surely not entirely new.

 

1/ Digital library

 

Digital library was not, for this program, a data basis, nor an utility to supply digital information, or digitised documents, or even digitised books.

 

The basic idea of this program was to consider: (a) books as texts (not as information), remembering they were under the form of books (not "documents"); (b) library as a collection, remembering that choice and order of the collection are at the principle of the library. (4)

 

About the first point, let us remember that, in these times, the main point of view about the use of digitization in libraries was "electronic management of documents" including "on-demand publishing". The readers would have chosen such pages, or such chapters and the library would have digitises them for him. And really, mostly in the field of scientific documentation, a lot of programs in the 90s were based on these economics. The program of digitized TGB was not.

 

In the same idea, we have decided to propose the books in image and in text mode. Text mode is necessary for all the works which call for computer activities. And image mode is also necessary, because in a humanistic and historic library, the materiality of the book is crucial for the reception of the text.

 

About the second point - library as a collection- we had to face two different tendencies that were as opposed to this idea that they were opposed together.

 

From a certain point of view, let call it conservative, or a little bit too professional, digitization was only the way to keep or to preserve books in bad conditions, like we do with microfilms. And of course the list of books in bad condition is not completely eclectic. But we don't see in this sort of list the idea of collection. This point of view is still very present in the digital library programs (5).

 

The other point of view was presented as "dynamic management of library", or "dynamic management of documents", and was the leading perspective. Applied to the program of the French national library, it meant: "you will digitise the books that peoples want to read the most". As you know, the other face of dynamic management is "weeding" the library, "désherbage" en français. I must recognise that we have been quite static, or, if you prefer, equally dynamically opposed to dynamic and conservative management.

 

And so we had this definition of the digitized collection: a humanistic and historical reference library. The digitised collection was itself a library or a corpus.

 

The program has encountered a lot of difficulties. But the main difficulty for a digitization program is: to choose the books, and to choose the peoples to choose the books. I think it's still the case. I am not very confident with the programs of digitization full of technical and economical considerations, but curiously silent on the intellectual aspects.

 

This corpus has been digitised on such basis, up to 70.000 or 80.000 titles. In 1997, with the national initiative to develop internet in France, it has been asked that all these books should be on the web, accessible as the text of François Mitterrand had said from everywhere. That is what is known as Gallica.

 

Did we want what has been achieved'

For a large part, the answer is: yes. A bad decision has been taken when a drop in the budget stopped the process to pass in text mode. But the other points of the assessment are positive.

 

The prospective was good, if we accept the idea that what is confirmed by industry, trade and social uses is a figure of necessity. Since the beginning, success has been much bigger that expected (including by me). It is due to internet and, for a large part, it is due also to the idea of collection.

 

And we have not betrayed the true idea, the cultural idea of the library. I give you and example of use: for a work about the Gallic Hercules, I have found the first translation in French of the Lucian text (Geoffroy Tory), the second one, a very pleasant one of the Enlightment (Perrot d'Ablancourt), and an other one, more classic of the XX century (Eugène Talbot, 1912).

 

An other slightly different question is: "Do we still want what has been achieved'" My answer is also yes. It means basically that we have to consider digital libraries as libraries from an intellectual point of view, and their relations with digitised books and digital reading as the equivalent, as homothetic to the relations that the classical library has with the manuscripts and printed books and classical reading.

 

Digitising is not a utility; it is a symbolic gesture by which a community - a political one or scientific one or knowledge one -  says "these books are our books, we consider them as our living archives for memory, culture, art, science, etc".

 

Digitising library is a choice, a view, because doing a library is such a choice.

 

I do not want to speak about all the questions and I shall just point three figures.

 

About the European library: a portal of the digitised books of the different national libraries is useful but it is not a European library. A portal or a collective catalogue may be a sort of "virtual library", but only if they are constituted on the basis of a coherent collection, virtually produced from the different libraries. For the same reason, the web is not a virtual library. The sum of national european libraries is not a european library because the criterium "what is European'" is not there. And it is no more a library, because the different digitisation programs do not use the same criteria. It is useful; it is not a European library.

 

Second figure: does a library whose all the books will be digitised will be a digital library' Well, it seems it will be, if we do not want to complicate the things by a too much formal approach. But it does not mean that total digitization is a sufficient answer. In fact, there are not many cases in which total digitisation will be a sufficient answer. Because digitization will take a lot of time; during a long period, some books will be digitised and others not. Because some books will never be digitised, for example the copyright books, or will be digitized elsewhere. Because digitizing is not only image digitising, as for the Bibliothèque Nationale de France; but it is no more only text digitizing, it will need, further on, text publishing, encoding (6), and probably, in the future, other works. Simply it is a process, a long time process - and long time process is the rule of libraries. If it is a long time process, it needs and order, a way by which the digital library is a library since the beginning and continues after. Maybe this process of long term digitisation will need a work on the model of the concrete library: what is its true unity' is it a sort of collection of collections'

 

Last figure: what about the places of the web texts in this digital library' Obviously, this question is very different with the general public access to internet in libraries which is not a specific utility. And it is different with the "critical approach" of the web, that is necessary to organise in libraries and other places. Digital libraries have to include or to bring closer their own digital texts some texts offered on the web. For example, I should have liked Gallica propose me at least a link on the original text of Lucian/Lykianos on the Gallic Hercules, and the latin translation of Erasmus.

 

For all these figures, and reasons, I propose to keep the idea of choice, selection, collection, and all the intellectual principles which are on the basis of libraries, and which have never been so useful than now.

 

2/ Digital reading

 

Just a few words about a question that is less known that the precedent one.

 

Since the beginning of the project, digital library of the TGB/ BdF/ BNF, was associated with a project of a reader, a computer assisted reading software (CARE was the acronym in English and PLAO in French).

 

This orientation was important from a prospective point of view. The idea was that not only the texts, and the books will be digitized, but also, that they will be read on a screen, with a computer. And so, the software necessary to practice this reading had to be designed. And it is still the case. 

 

I have asked to the French philosopher Bernard Stiegler to organise a group for the design of this humanistic scholar reading software. Two prototypes have been produced and tested by a group of readers on their reading from texts we had digitised (4). In 1993, this project has been given up.
 
About the question: "Did we want what has been achieved'", the answer, here, is clearly: no. We did not want this sort of achievement, ie this surrender.

 

We meet now different softwares that turn around this idea of digital reading, for example, the "readers". But digital reading includes also a lot of other softwares as research motors; the automatic production of reading acts; the link between marketing and reading, and many other aspects like the practice of reading as a simulation, and the publicity of reading acts (I mean that some are published (blogs) and also that a lot of readings are not secret, but public, including e-mail readings).

 

More generally we have entered the era of "industrial readings". Since the web, digital reading has become a very large cultural practice, and we may ask if it is "of a new type" and what is this type. Questions are arising about: the attention of the reader in this sort of practice, the achievement of the reading act, the type of reading (information or reflection).

 

Libraries are not only places for the books. They are also places for the readers. And particularly humanities libraries must be a part of this research for example with the design of new devices of digital reading, including the practices of on line readers.

 

Reading industries are something amazing.

 

We have known reading technologies, as the art of reading of the Middle-Age (the Didascalicon of Hugh of Saint Victor).

We have known "literary industries" as said Tocqueville; they are the first example of the KulturIndustry of Adorno.

 

But what is characteristic of the digital era is more reading industries that literary industries, or, with the words of the moment, access industries that "content" ( a word I don't like) industries.

 

These reading industries are at the crossing of information industry (soft, telecom), cultural industry and marketing industry.

 

Marketing, publicity is the language of economy. I can see that a lot of librarians and scholars want to speak the nice language of their century, as said Baudelaire. But do you think that this language is the best one to imagine the continuation of reading in the digital era'


(1) A view on this subject by a pioneer:

Roberto Busa, Concluding a life safari from punched cards to WWW in Actes du Congrès "Digital resources in Humanities", Oxford, 1997.

 

(2)  In this letter (august 1988) to Michel Rocard, Premier minister, the Très Grande Bibliothèque is still separate from the Bibliothèque Nationale. In a posterior letter of the President (october 1990), when it was clear that TGB and BN were merged, it was said "the novelty will be in the possibility to use the most moderns computer technics for an access to catalogs and documents of la Bibliothèque de France".

 

(3) For a general presentation of the program, see:
Gérald Grunberg and Alain Giffard, "New orders of knowledge, new technologies of reading", in R.Howard Bloch and Carla Hesse (eds), "Future libraries", originally published as a special issue of Representations, Spring 1993, n°42, and then by the University of California Press, 1995.

 

(4) This basic idea presents a clear proximity with the principles of "digital philologia" as explained by François Rastier. See, for example, ""resources linguistiques" ou corpus", in the chapter "Philologie numérique" of Arts et sciences du texte, PUF, 2001.

 

(5) A mistake is frequently done, and still has been during the Oxford symposium, about digitisation politics and sources of the digitised books. A collection politics may use certain books coming, for example, from microfilms which have been produced in a preservation purpose; but it must not be confused with a preservation-oriented digitisation. Same argument for the rare books or texts.

 

(6) About encoding: Lou Burnard and C.M Sperberg-McQueen, TEI Lite: An introduction to Text Encoding for Interchange.

 

(7) An English presentation of this software in:

a. Chahuneau F, Lecluse Ch, Stiegler B, Virbel J, Prototyping the ultimate tool for scholarly qualitative research on texts, Actes de la 8ème Conférence annuelle du New Oxford English dictionary,1992.

b. Virbel J, Reading and managing texts on the BnF station, in "The digital word", P Delany, G Landow (eds), The MIT Press, 1993.

In French: Alain Giffard, La lecture numérique à la Bibliothèque de France, in Aurèle Crasson (dir.), " L'édition du manuscrit ", Academia Bruylant, 2008, with a bibliography.

Other texts on : http://alaingiffard.blogs.com