Monday, 10 September 2012

The fundamental interconnectedness of all things: the impact of networked knowledge systems on cataloguing

This is the text of a paper I delivered at the CILIP Cataloguing and Indexing Group Conference 2012 in Sheffield. The presentation slides are below:

What is cataloguing? Really, what is it that we do? 

This is “the dawn of a new era in cataloguing”. In these times of change, it’s good to get back to first principles and ask the philosophical questions. What are we doing here? What is it that we do? What is our value? What is cataloguing really? (1)

If you asked a hundred cataloguers, I suspect you’d get a hundred answers. But I choose to think of cataloguing and classification as the process of describing the bibliographic universe. As librarians and information professionals, we are familiar with the world of books, journals, information, data. We know about the secret web of connections and links and citations and themes and genres that connect the millions of information sources that we deal with every day. We have chosen to live in this bibliographic universe: to wander its streets together; to climb its mountains; to congregate in its squares; to see what lies down every dark alleyway; to explore every inch of it. And among information professionals, cataloguers and classifiers are the ones who have chosen to map the bibliographic universe. We are cartographers of the abstract. We chart the world of books, journals, and information: describing what we see, encoding it in a usable form, and sharing the results with our users and with each other. This abstract mapping is, I would argue, the true value of cataloguing. 

Nowadays we have new tools at our disposal to do this. FRBR – Functional Requirements for Bibliographic Records – is one such tool that we can use to describe the bibliographic universe. It’s one attempt to define the arrangement of the abstract entities that information professionals work with. It defines the relationship between these things: between the ‘work’ as envisioned by the author all the way down to the physical ‘item’ which a user can hold and touch. As well as defining the relationships between different versions of the same ‘work’ – between Group 1 entities – it defines the relationships and the links that works have with people and corporate bodies – Group 2 entities – and through this, defines their relationships with each other. FRBR is a framework on which to base our maps of the bibliographic universe. That is its abiding value. 

RDA is built on a foundation of FRBR and will be another useful tool. It places a new emphasis in cataloguing on “clustering of bibliographic records” and using metadata to define the relationships between works. Previously the relationships between one edition of a book and a later edition of the same book or between the print version of a book and the ebook version have been somewhat ill-defined if not totally unexplained by a catalogue record. FRBR and RDA are tools to help accurately describe the universe of information and so they’re both heavily informed by epistemology and ontology: two separate but linked branches of philosophy. 

Epistemology is the study of knowledge systems: what knowledge is, how it’s arranged, and how we can have it. It’s been studied from the time of the Ancient Greek philosophers – from Plato and his pupil Aristotle – through the Enlightenment philosophers – Descartes, Locke, Hume, Schopenhauer – to the present day when the debate continues and has been renewed by new scientific discoveries and what seems to be an ever-expanding world of knowledge. The World Wide Web has emerged as a quasi-physical embodiment of our abstract world of information – our realm of knowledge – and so the debate is more physical, more real, and more important than ever. It’s said that Socrates was in constant communication with a ‘daemon’ who supplied him with all his ideas and inspiration: today, we can all communicate with hundreds of people everyday who give us fresh ideas and invite us to interesting events. The Web and these changing paradigms of communication have changed our view of epistemology and I’ll get to that later. Ontology is the study of being and existence. It’s relevant to considering the bibliographic universe in terms of trying to define that universe’s metaphysical status. By this, I mean the questions ‘What kind of entity is information?’ and ‘What kind of thing is knowledge?’ Ontology tries to define what things are: is the text of a book a purely mental construct or does it have some kind of physical reality? Can a ‘work’ be said to exist in the same way as a chair? 

Because FRBR assumes the existence of a bibliographic universe with some ontological status and it’s the predominant intellectual trend in cataloguing, we go along with that. Partly because of the introduction of FRBR and RDA, epistemology and the ontology of knowledge are of central importance in modern cataloguing, indexing, and classification. We need to consider what shape knowledge has, how it’s arranged, and how we can accurately describe and represent this for our users. 

For centuries, knowledge has been represented as a hierarchy and this has informed the traditional classification systems that are in use in librarianship and bibliography today. Dewey, Library of Congress, LCSH: they’re based on ideas of hierarchy and taxonomy; of dividing and subdividing subjects like the branches of a tree. The conceptualisation of knowledge, in particular the ‘tree’ metaphor, has a long history. 

One of the first, if not the first, representations of knowledge is in the Book of Genesis: God provides the first humans, Adam and Eve, with the ‘tree of knowledge’. After that, one of the first real articulations of the concept of hierarchical knowledge comes from a library – from someone who was trying to work out what knowledge looks like so that he could organise his books. Aristotle, the great philosopher, had the largest personal library in Athens and to organise his collection accurately he envisioned knowledge in his work the Organon as a hierarchy based on the now-familiar principles of taxonomy and categorisation. His ‘tree of knowledge’ concept become codified as information theory developed and there are numerous examples stretching from Ancient Greece to the 20th Century. Linnaeus’ classification of the natural world in his Systema Naturae divides things by genus and species and subdivides into nested groups. In 1605, Francis Bacon published The Proficience and Advancement of Learning, Divine and Human which divides all knowledge into History, Poetry, and Philosophy which were then subdivided and so on into different branches. In 1783, Thomas Jefferson catalogued his collection of books – a collection that would go on to start the Library of Congress. Jefferson divided the world of knowledge, similarly to Bacon, into Memory, Reason, and Imagination broadly corresponding to History, Philosophy, and Fine Arts. There are hundreds of other examples – Diderot’s system for the Encyclopédie; John Wilkins’ 40 Universal Categories – and in terms of classification, we have examples closer to home. 

The classification schemes that we still use in libraries today are heavily influenced by hierarchical thinking. Enumerative classification schemes – Dewey, Library of Congress, Cutter’s Classification – explicitly “treat knowledge as if it were a unity which can be subdivided into smaller and smaller units. At the top of the tree is the whole universe, which is divided and subdivided to arrive at all the different entities, events and activities represented in the subjects of books.” Faceted classifications and analytico-synthetic classifications, though more flexible, also exhibit an essentially hierarchical structure with the small building up to form the large. The tree of knowledge – our centuries-old conception – continues to inform our epistemological systems and our thoughts on the ontological status of knowledge. Broadly speaking, our current maps of the bibliographic universe look like trees. 

Now we’ve rethought this conception of knowledge as a tree and are starting to think of different knowledge systems. A new model – a new intellectual paradigm – is emerging. It’s the idea of knowledge as a network rather than a tree: a web of interconnections between ideas, concepts, theories, data. 

A network can be defined as a system of interrelations: “individuals function as autonomous nodes, negotiating their own relationships, forging ties, coalescing into clusters. There is no “top” in a network; each node is equal and self-directed.” As science and philosophy have advanced and the universe of human knowledge has grown, we’ve discovered connections and interrelations between things that seemed totally unrelated. It turns out that the branches of the tree of knowledge are all connected in different ways. Everything is connected. The universe appears to be holistic in that everything depends on everything else. We’re beginning to see, in the words of the great detective, Mr. Dirk Gently, “the fundamental interconnectedness of all things”. The abstract world of knowledge turns out to be more complex – far more complex – than a tree shape and the more appropriate visualisation is something like a web or, better yet, a rhizome seed. 

In a 2010 paper, Lyn Robinson and Mike Maguire of City University adopt Deleuze and Guattari’s image of a rhizome as the better metaphor for information organisation. A rhizome is essentially a root: an underground mass of shoots and stems that grow in unpredictable ways in complex, laterally branching networks with different nodes shooting off in different directions. Deleuze and Guattari use it as an “image of thought” which represents complex networked knowledge systems. Robinson and Maguire’s paper is well worth reading as an excellent discussion of the changing concepts of knowledge structures. 

Broadly speaking, we are moving from the tree to the rhizome. And we can see this shift towards networked systems in a range of subjects and different areas. In physics, chaos theory tells us that everything is linked: that one tiny imperceptible event can cascade to significant consequences in a seemingly random and impossible-to-predict way that is nonetheless based on cause and effect in a networked system. In social life, we readily talk about social networks, recognising that human relationships can be mapped onto a network with each person connected to every other person: Stanley Milgram’s small world theory tells us that this can be done with a maximum of six degrees of separation. In technology, computer networks surround us, transferring data along connections between computers and servers and routers. They form the conceptual foundation for the Internet and the World Wide Web. 

In academia, we’re recognising the importance of the citation network – a network of references to and from various papers, journal articles, books. You may have heard of the mathematician Paul Erdős. His work was so prolific that any mathematician working today can be connected through citations to Erdős: an estimated 90% of mathematicians are connected to him through no more than 8 links. (2) 

Of these examples of networked systems, the citation network most closely relates to the networked systems of knowledge which are important for cataloguing and classification. We’re recognising that knowledge can’t be neatly divided into hierarchical categories and that in the bibliographic universe everything is connected in strange and sometimes complex ways. For an example, let’s look at Ludwig Wittgenstein’s Tractatus Logico-Philosophicus: in my humble opinion, one of the greatest books ever written. 

When we come to catalogue this book – here’s the catalogue record for the book at Durham – when we come to catalogue it, off-hand we’d say it’s a philosophy book – it’s one of the cornerstones of modern formal logic – and at Durham, we file it at 192 for Modern Western Philosophy of the British Isles but we could also stick it somewhere in 160 for logic, or, depending on how much you consider its implications, somewhere in 110 for Metaphysics. It depends how you interpret it and there are a lot of interpretations. 

That’s straightforward hierarchical classification and it puts the book neatly into a distinct place on a shelf but it’s not the whole truth. This doesn’t represent the links that the book has with everything else in the bibliographic universe. What about its links to science, language, mathematics, and possible worlds theory? What about the links to Wittgenstein’s other works? His other masterpiece, Philosophical Investigations, is a whole different genre of philosophy and refutes bits of the Tractatus: the two are nonetheless conceptually linked. What about the books written about this book: the different theories; the different interpretations; the books that owe their existence to this book? What about the Prototractatus: the original manuscript version written in the trenches of World War I? What’s the relationship there: is it the same work or not? Whatever the answer, there is some kind of strange link. What about the different translations: this is the Pears and McGuinness translation but what about the German original, the versions without Bertrand Russell’s introduction, the far more confusing Ogden translation? What about the links to the fiction inspired by this book? 

Even something as simple as this 80 page book is connected through a thousand interrelations to myriad other books and other nodes in the bibliographic universe. When we look at it closely and think about it, this book is a centre of a web – of a rhizome – connected to intensely different books, journal articles, people, and ideas. If we accept that it’s our job as cataloguers to describe the bibliographic universe accurately and represent it as truthfully as possible, then we need to think about how to represent these connections. A MARC-encoded, AACR2-standard catalogue record doesn’t do justice to the complex web of connections and interrelations that surround this book. Or any of the other books, journals, ebooks, ejournals, and other publications that exist in our libraries. 

This is, I think, one of the central issues in cataloguing today. How do we represent networked knowledge systems and adjust our practices accordingly? Electronic resources are growing in importance in librarianship (3) and are fundamentally arranged in a network. We’re all going to be interacting with information arranged in networks and we should we thinking about mapping the digital world. Thinking about networked knowledge systems is an important consideration for doing this. So how do we catalogue in a network? It’s an open question but broadly speaking, I think we need new practices, new technology, and new thinking. 

In terms of cataloguing practice, RDA isn’t necessarily the answer to all the riddles but it’s a definite step forward. RDA is based on FRBR and therefore has a footing in ontology and serious thought about the bibliographic universe’s structure. RDA as a new practice will help us to think about the connections between items, to look at things in a new way – for old and new professionals alike – and to better appreciate that information exists in a rich, complex, shifting epistemological network. How do we actually catalogue to reflect this? Do we use more access points? Do we index more fields? Do we add a bunch more fields in the 700s or do we need to more fully define relationships using 500 note fields? RDA is the biggest change to cataloguing in 30 years and so hopefully its implementation will give us the opportunity to consider some of these issues and perhaps rethink how we view our collections. 

We also need new technology in cataloguing. Our modern epistemology – this vision of a networked universe with everything connected – is beyond the scope of our current technology for cataloguing and data representation. Though there are interesting things going on with e-resource management and linked data and things like that, these haven’t really affected day-to-day cataloguing which is still based on flat, hierarchical MARC records. MARC needs to be replaced and the replacement needs to be able to show relationships more clearly, needs to help users to find information within a bibliographic network, and needs to make use of the links that integration with other software and other systems can provide. 

The development of new Semantic Web technology can help with this. The development of OWL and other web ontology languages can help us to define domain-specific ontologies (4). RDF is a language that helps to define classes and sets within an ontology and also has the potential to be utilised for accurate description of bibliographic systems. Semantic Web languages – the development of Web 3.0 – will help us to map the digital frontier and make it into a true mirror of our abstract knowledge systems. 

And then there are data visualisation tools which can take metadata and turn it into something more visual and usable. The UK Institutional Repository Search produced by Mimas in Manchester can produce a basic visualisation of search results and the networked links between them. The results from a search term are grouped in different colours by subject – economics, technology, biology – and you can move them around and click on different nodes to produce more results similar to the ones you’ve clicked on. The more you click, the more complex the network becomes and it can actually get quite beautiful. This is a beta code powered by Autonomy software and it gives a demonstration of what can be done with data visualisation. 

Most importantly – more importantly than practice or technology – we need new thinking in cataloguing. We need to think about networked knowledge systems and move on from the hierarchical bibliographic philosophy that has dominated librarianship and information management. Instead of Linnaeus and Dewey, we can look to d’Alembert, Paul Otlet, Vladimir Vernadsky: all of whom have advocated networked knowledge systems of one form or another. Crucially, we need to think about what networked cataloguing can achieve. Cataloguing is a way to map the bibliographic universe and in the act of mapping, we can bring subjects together and see the intellectual landscape more clearly. The biologist, Edward O. Wilson, uses the term ‘consilience’ to refer to the unification of knowledge: the belief that different academic disciplines don’t represent completely different domains but are part of a single ontology. One knowledge system. One network encompassing everything. “...a maze of mazes, a sinuous, ever growing maze which will take in both past and future and will somehow involve the stars.” Consilience encourages interdisciplinary research and bringing together seemingly disparate intellectual strands to form a single map of the world of knowledge. I researched consilience for both my undergraduate and postgraduate dissertations and I think this kind of synthesis will be a major intellectual trend in the 21st Century. Networked cataloguing is one way to achieve consilience and it’s here that modern librarians can make a real impact. 

Cataloguing and indexing a networked knowledge system requires changes to our practice, our technology, and our thinking. RDA, FRBR, new ontological languages, linked data, and ever-developing software are helping to bring these changes but we as cataloguers need to embrace them. We need to encourage and accept the change. We need to start thinking in networks. 

I’d like to end by contradicting everything I’ve said. I have argued that the most accurate – the most real – depiction of knowledge and the bibliographic universe is in the shape of a network. However I’m aware that I and the prevailing intellectual trend could be as wrong and misguided as we now believe the hierarchical theoreticians to be. My favourite writer, the Argentinean poet and one-time librarian, Jorge Luis Borges, wrote that “…obviously there is no classification of the universe that is not arbitrary and speculative. The reason is quite simple: we do not know what the universe is.” He reminds us that all our human schemes for arranging knowledge are provisional and potentially deluded. Learning and discovery is a process of continuous development and who knows what we’ll discover on the journey towards consilience and networked knowledge systems? In the words of Socrates, the only thing I know for certain is that I know nothing.

(1) When I ask ‘What is cataloguing?’ possible answers are either describing the bibliographic universe or, depending on your thoughts about the ontological status of knowledge, creating the bibliographic universe. In other words, applying order where none actually exists. That discussion is beyond the scope of this paper and there’s some philosophical assumptions later on that depend on the first interpretation. 

(2) The source for that statistic is Wikipedia so… yeah.

(3) Slight bias here: I'm an e-resources librarian. 

(4) The word ‘ontology’ is used here in a slightly different but conceptually linked sense to the philosophical use.

1 comment:

Sandy Shaw said...
This comment has been removed by a blog administrator.