Who owns the story of the future – and what does it have to do with information?

I am always drawn to events professing to talk about the future, especially if it gives me a chance to listen to William Gibson (@greatdismal) in person, and so I was at the British Library for one of their panel discussions in the series The Future: Science and Society, earlier this week.

The other commentators were by no-means lightweights in their respective fields (writers Cory Doctorow [@doctorow] and Mark Stevenson, economist Diane Coyle and chair Jon Turney) but obviously I was not the only starry-eyed Gibson fan in the room, which was packed with the sort of people who cannot resist treating their idol to a rambling monologue on metaphysics, drawn from the random clutter of their inner psyche, during question time.

No matter – for in addition to hearing some of William Gibson’s clever, considered comments, I could not help the comforting smugness which enveloped me as it became clear that for many people in the audience, “the future” was all about information – (ha!). Mark Stevenson reminded us that “.. it is not called the information society for nothing..”.

Ostensibly, the discussion was to draw out ideas from current scientific research on what our future may look like – thus the mix of science/sci-fi writers on the panel. Although Mark Stevenson mentioned he had been talking to people at IBM and MIT who were engaged in “amazing stuff”, I did not catch what this might be. I did count four mentions of Star Trek though, and have to admit that although my almost word-perfect knowledge of the original series episodes far exceeds my knowledge of most of the other sci-fi writers that were mentioned, I feel well equipped to deal with the future as foreseen in the 1960s Kirk/Spock era.

The future, it seems, is very personal. And William Gibson commented that it is only possible to write about the future from the perspective of the present. He wondered about the reception that his first novel, Neuromancer (1984), would have had if he had described today’s world of personal wifi, AIDS, international terrorism and the non-existence of the Soviet Union in his early 1980s vision of the future. So I guess that hints at the future we envisage as being a product of our personal view of the present.

Diane Coyle provided the economist’s perspective – that the future is all about investment, and that investors rarely look beyond the next 5 years – the near future. There was then discussion around whether there were any “far future” ideas any more, and whether we were currently experiencing such an enormity of technological advancement that we were simply “rendering” what we already have – a rather good analogy from a member of the audience. Other voices commented on the fact that technology already exceeded its promise, and gave as an example the lack of augmented reality apps. I have seen some interesting early instantiations of augmented reality (Aurasma, and the Museum of London’s Street Museum app) but have to say for the moment I agree that it doesn’t propel me very far forward. Maybe in time though.

To information then, and the concern that so much information will never be digitized that finding it will be impossible. Cory Doctorow argued that Google had digitized over 90% of books anyway, and that the rest would soon be dealt with. I am not sure his figures are quite right – digitization is not always that easy or straightforward, and, for sure a lot of documents have not yet reached the scanner. The enthusiasm for digitized material may lead to relevant items being missed in a search – unless you happen to be an information specialist – the question being rather whether anyone is looking hard enough, in the right place, in the right way.

In response to the issue of relevant documents being lost within “too much information” Diane Coyle argued that it was about attention; most information can be found, but is missed because no-one is looking at it – for example if it is listed beyond the first page on the Google search results listing.

On the flip side, we moved on to “bit rot” where information is lost because the technology to read it no longer exists. Cory Doctorow again voted for confidence in technology, stating that if information was held on “spinning platters” then it could be transferred to another type of spinning platter indefinitely. No-one considered whether this was always cost-effective though.

And to one of my favourite concerns – that nothing is ever deleted, and the more dire the image, the more likely it is to pop up again and bite you at some inconvenient time in the future. “Its tweeted in stone” – William Gibson’s observation, seemed entirely apposite.

So what about the story of the future – and who writes it? I don’t think the discussion answered this, although I was pleased to think that the future will clearly contain a lot of information which will need to be organized, and that thus, LIS specialists could still find employment. Interestingly the information related concerns were all problems of the present, so at least we are recognizing that things that are problematic now may go on to be a bigger nuisance in the future.

Other discussion centered around what it means to be human, and what we mean by “progress” – more knowledge, or a “better society”. And what is a better society – longer lived? Better informed? And how can we know how the future will be fashioned by our present?

William Gibson wondered if the inventors of the pager knew how much it would change drug dealing.

Humanity’s motto, he concluded, could well be “ who knew?”

Information is the new black

It must be the popularising effect of James Gleick’s new book “The Information”, because suddenly everyone I meet wants to talk about information: its history, its epistemology and Shannon-Weaver’s 1948 mathematical theory of communication (MTC), which became known as the mathematical theory of information. This is certainly good news for our information science course, where information has been considered from an academic perspective since 1961. I feel my time has come; all those hours spent memorizing equations to show that I truly, deeply understood how many signals you can push down a channel of a certain size, allowing for noise, have finally been rewarded, and I can now brandish my information-science credentials with a superior air of I told you so. Information is the new black, and everyone is wearing it.

I believed that I would forget Shannon’s theory entirely, as soon as the exam was over. It did not seem so relevant to my work at the time, which was with information resources in toxicology. Life, however, with a patient smirk, ensured that the ashes of the MTC rose like a phoenix 20 years later, when I was faced with presenting the mathematical good news to contemporary LIS students taking our Library and Information Science Foundation module as part of their masters. I dusted off my 1986 copy of Robert Cole’s “Computer Communications”, my notes still there in the margins of page 10, where I left them.

The issue I faced was one of presenting a definition of ‘information-science’, and of outlining its history as a discipline, to modern LIS students. Many of the papers considering the origins of information science gaze back in time to illuminate Shannon’s equations with a rosy pink glow, suggesting that his theory somehow led to the birth of information science as a true science (Shera 1968, Meadows 1987). This was the story in the 1980s, but in the 21st century, a more plausible thread is emphasized, the work of Kaiser, Otlet and Farradane on the indexing of documents, which suggests that the MTC was a bit of a red herring in respect to the history of information science. Rather then that information science grew out of a need to control scientific information, coupled with the feeling amongst scientists that this activity was somehow separate from either special-librarianship or the more continental term for dealing with the literature, documentation (see Gilchrist 2009, Vickery 2004, Webber 2003).

MTC

A look back at the original ideas and documents show that Shannon’s work was built on that of Hartley (1928). Stonier (1990 p 54) refers to Hartley:

“.. who defined information as the successive selection of signs or words from a given list. Hartley, concerned with the transmission of information, rejected all subjective factors such as meaning, since his interest lay in the transmission of signs or physical signals.”

Consequently, Shannon used the term information, even though his emphasis was on signalling. The interpretation of the MTC as a theory of information was thus somewhat coincidental, but this did not prevent it being embraced as a foundation of a true ‘information science’.

Shannon himself suggested that there were likely to be many theories of information. More recently, contemporary authors such as Stonier (1992) and Floridi (2010), have reiterated that MTC is about data communication rather than meaningful information.

Floridi (2010 p 42 and 44) explains:

“MTC is primarily a study of the properties of a channel of communication, and of codes that can efficiently encipher data into recordable and transmittable signals.”

“.. since MTC is a theory of information without meaning, (not in the sense of meaningless, but in the sense of not yet meaningful), and since [information – meaning = data], mathematical ‘theory of data communication’ is a far more appropriate description…”

He quotes Weaver as confirming:

“The mathematical theory of communication deals with the carriers of information, symbols and signals, not with information itself.”

Floridi’s definition of information as ‘meaningful data’ is more aligned to the field of information science as understood for our LIS related courses. Whilst we can still argue what is data and what is meaning, we can see that the MTC utilizes ‘information’ as a physical quantity more akin to the bit, rather than the meaningful information handled by library and information scientists.

This difference is set out  by Stonier (1990, p 17):

“In contrast to physical information, there exists human information which includes the information created, interpreted, organised or transmitted by human beings.”

Nonetheless, the MTC is still relevant to today’s information science courses because it has a played a pivotal role in the subsequent definitions and theories about information per se. And it is rather hard to have information science without an understanding of ‘information’. Many papers have been written on theories of information, and on the relevance of such theories to information science (see, for example Cornelius 2002).

MTC and other disciplines

The MTC provides the background for signalling and communication theory within fields as diverse as engineering and neurophysiology. At the same time that Shannon was writing, Norbert Wiener was independently considering the problems of signalling and background noise. Wiener (1948 p 18) writes that they:

“.. had to develop a statistical theory of the amount of information, in which the unit amount of information was that transmitted as a single decision between equally probable alternatives.”

Further (p 19), that

“This idea occurred at about the same time to several writers, among them the statistician R.A. fisher, Dr. Shannon of the Bell Telephone Laboratories, and the author.”

Wiener decided to:

“call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics”.

The relationship of information to statistical probability (the amount of information being a statistical probability) meant that information in Shannon and Wiener’s sense related readily to entropy (anecdotally von Neumann is said to have suggested to Shannon that he use the term entropy, as it was already in use within the field of thermodynamics, but not widely understood).

“The quantity which uniquely meets the natural requirements that one sets up for ‘information’ turns out to be exactly that which is known in thermodynamics as entropy.”

Shannon and Weaver (1949) p 103

“As the amount of information in a system is a measure of its degree of organization, so the entropy of a system is a measure of its degree of disorganization; and the one is simply the negative of the other.”

Wiener (1948) p 18

The link between information and entropy had been around for some time. In 1929, Szilard wrote about Maxwell’s demon, which could sort out the faster molecules from the slower ones in a chamber of gas. Szilard concluded that the demon had information about the molecules of gas, and was converting information into a form of negative entropy.

The term ‘negentropy’ was coined in 1956 by Brillouin:

“… information can be changed into negentropy, and that information, whether bound or free, can be obtained only at the expense of the negentropy of some physical system.”

Brillouin (1956) p 154

Brillouin’s outcome was that information is associated with order or organization, and that as one system becomes organized, (entropy decrease), another system must becomes more disorganized (entropy increase).

Stonier (1992 p 10), agrees:

“Any system exhibiting organization contains information.”

A well-known anomaly becomes apparent, however, when over 60 years later we try to understand the correlation between information and either entropy or probability. A trawl through the original equations and explanations, and subsequent revisitations, reveals that an increase in information can be associated with either an increase or decrease in entropy/probability according to your viewpoint. Tom Stonier (1990) refers to this in chapter 5, but Qvortrup (1993) gives a more detailed explanation:

“In reality, however, Wiener’s theory of information is not the same, but the opposite of Shannon’s theory. While to Shannon information is inversely proportional to probability, to Wiener it is directly proportional to probability. To Shannon, information and order are opposed; to Wiener they are closely related.”

The correlation between the measurement of entropy and information did however, lead to the separate field of information-physics, where information is considered to be a fundamental, measurable property of the universe, similar to energy (Stonier 1990).

This field stimulates much debate, and is currently enjoying what passes for popularity in science. A recent article in New Scientist tells how Shannon’s entropy provides a reliable indicator of the unpredictability of information, and of thus of uncertainty, and how this has been related to the quantum world and Heisenberg’s uncertainty principle. Ananthaswamy (2011).

Information-biology also appears to stem from work undertaken around the MTC. The connection between signalling in engineering and physiology was made by Wiener in the 1940s, and in 1944 Schrödinger, in his book “What is Life?”, made a connection with entropy as he considered that a living organism:

“… feeds upon negative entropy.”

Further that:

“.. the device by which an organism maintains itself stationary at a fairly high level of orderliness (= fairly low level of entropy) really consists in continually sucking orderliness from its environment.”

In the same book, Schrödinger outline the way in which genetic information might be stored, although the molecular structure of DNA was not published until 1953, by Crick and Watson (see Crick 1988). The genetic information coded in the nucleotides of the DNA is transcribed by messenger RNA and used to synthesize proteins. Information contained in genetic sequences also plays a role in the inheritance of phenotypes, so that informational approaches have been made within the study of biology (see Floridi 2010, also for discussion of neural information).

Information and LIS

For the purposes of our library and information science courses here at City University, we consider information as that which is ‘recorded for the purposes of meaningful, human communication’. Although I personally find Floridi’s definition helpful, information in our model is open to definition and interpretation, and is often used interchangeably with the term ‘knowledge’. In either case we regard the information as being instantiated within a ‘document’. The term ‘document’ also does not demand a definitive explanation, it merely needs to be understood as the focus of ‘information science’, its practitioners and researchers.

To complete the picture, when I became Program Director for #citylis at City University London, I wanted to strengthen and clarify the way in which we defined ‘information science’, and particularly to explain its relationship with library science (Robinson 2009). I suggested that library science and information science were part of the same disciplinary spectrum, and that information science (used here to include library-science) could be understood as the study of the information-communication chain, represented below:

Author  —> Publication and Dissemination —> Organisation —> Indexing and Retrieval —>  User

The chain represents the flow of recorded information, instantiated as documents, from the original author or creator, to the user. The understanding and development of the activities within the communication chain is what library and information specialists do in both practice and research. As a point of explanation, I take organisation in the model to include the working of actual organisations such as libraries and institutions, information management and policy, and information law. Information organisation per se, fits within the indexing and retrieval category.

Our subject is thus a very broad area of study, one which is perhaps better referred to as the information sciences. The question of how we study the activities of the model can be answered by applying Hjorland’s underlying theory for information science, domain analysis (Hjorland 2002). The domain analytic paradigm describes the competencies of information specialists, such as knowledge organization, bibliometrics, epistemology and user studies. The competencies or aspects distinguish what is unique about the information specialist, in contrast to the subject specialist. Further, domain analysis can be seen as the bridge between academic theory and vocational practice; each competency of domain analysis can be approached from either the point of view of research or of practice.

There are many definitions of information science, and there are other associated theories or meta-theories. The latter of which may also be associated with a philosophical stance. Nonetheless, the model portrayed above has proved to be a robust foundation for teaching and research, yet it is flexible enough to accommodate diverse opinions and debate as to what is meant by ‘information’. It allows for diverse theories of information.

It is interesting to reflect on whether ‘information’ as understood for the purposes of library and information science has any connection with ‘information’ as understood by physics and/or biology, or whether it is a standalone concept. Indeed later authors such as Bateson (1972) have suggested that if information is inversely related to probability, as Shannon says, then it is also related to meaning, as meaning is a way of reducing complexity. Cornelius (2002) reviews the literature attempting to elucidate a theory of information for information science (see also Zunde 1981, Meadow and Yuan 1997).

At a recent conference in Lyon, Birger Hjorland’s (2011) presentation considered the question of whether it was possible to have information science without information. He writes that there should at least be some understanding of the concept that supports our aims, but concludes:

“.. we cannot start by defining information and then proceed from that definition. We have to consider which field we are working in, and what kind of theoretical perspectives are best suited to support our goals.”

I agree with him. I do not think we can have information science without a consideration of what we mean by information – but information is a complex concept, and one that can be interpreted in several ways, according to the discipline doing the interpretation, and then again within any given discipline per se. It is not an easy subject to study, despite its sudden popularity. The literature of information theory is extensive, and scary maths can be found in most of it. Nonetheless, it is essential for anyone within our profession to have in mind an understanding of what we are working with; otherwise it is impossible to justify what we are doing, and we appear non-descript. Understanding information is like wearing black. Any colour will do, but black makes you look so much taller and slimmer.

References

Ananthaswamy A (2011). Uncertainty untangled. New Scientist. 30th April. 2011, 28-31

Bateson G (1972). Steps to an ecology of mind. Ballantine: New York

Brillouin L (1956). Science and information theory. Academic Press: New York

Cornelius I ( 2002). Theorizing information science. Annual Review of Information Science and Technology 2002. 393-425

Crick F (1988). What mad pursuit. A personal view of scientific discovery. Penguin: London

Floridi L (2010). Information: a very short introduction. Oxford University Press: Oxford

Gilchrist A (2009). Editorial. In: Information science in transition. Facet: London

Hartley RVL (1928). Transmission of information. Bell system Tech. Journal, vol 7 535-563

Hjorland B (2011). The nature of information science and its core concepts. Paper presented at: Colloque sur l’épistémologie comparée des concepts d’information et de communication dans les disciplines scientifiques (EPICIC), Université Lyon3, April 8th 2011. Available from: http://isko-france.asso.fr/epicic/en/node/18

Meadow CT and Yuan W (1997). Measuring the impact of information: defining the concepts. Information Processing and Management, vol 33(6) 697-714

Meadows AJ (1987). Introduction. In: The origins of information science. Taylor Graham: London

Qvortrup L (1993). The controversy of the concept of information. Cybernetics and Human Knowing, vol 1(4) 3-24

Robinson L (2009). Information science: communication and domain analysis. Journal of Documentation, vol 65(4) 578-591

Schrödinger E (1944). What is life? The physical aspect of the living cell. Cambridge University Press: Cambridge

Shannon CE and Weaver W (1949). The mathematical theory of communication. University of Illinois Press: Urbana

Shera JH (1968). Of librarianship, documentation and information science. Unesco Bulletin for Libraries, 22(2) 58-65

Stonier T (1992). Beyond information. The natural history of intelligence. Springer-Verlag: New York

Stonier T (1990). Information and the internal structure of the universe. Springer-verlag: New York

Szilard L (1929). Uber die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Zeitschrift fur Physik, vol 53 840-856

Vickery B (2004). The long search for information. Occasional Papers no. 213. Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign

Webber S (2003). Information science in 2003: a critique. Journal of Information Science, vol 29(4) 311-330

Wiener N (1948). Cybernetics: or control and communication in the animal and the machine. Wiley: New York

Zunde P (1981). Information theory and information science. Information Processing and Management, vol 17(6) 341-347

Core Collections

Collections are anarchic, they exist for and are defined entirely by their own purpose, they have their own identity and exude individuality.

Collecting is not just about acquiring everything. Collections can be very small, representative (core) rather than comprehensive. Collecting is about making connections, considering relationships and rearranging until the collected items sit in the ‘right’ order – this latter being open to interpretation. When looking at a collection, it is possible to gauge how one thing relates to another, to see where there is duplication, and where there are omissions. Good examples make themselves known, as do poorer contributors. Collections are aesthetically pleasing to behold, and they exude a calming stability in a world of dizzying change. Although as living entities, collections may have to be restructured and reinterpreted over time, at least at any one moment, collections are finite and thus comprehensible. I have made collections from just about all of the things I possess. I seek out items I gave away years ago in order to ease the pain of the ‘gap in the collection’. Books, of course, but also old magazines, items of glass, crockery and anything from Liberty’s. I am immediately interested in someone who has a collection of their own. The whole is always greater than the sum of the parts, and collections tell us things that isolated items cannot.

Which leads me to core listings, and resource lists. These surrogate collections furnish us with a manageable format via which to comprehend the real thing – they are often used to facilitate assembly of an actual collection, and they have traditionally been valued for their integrity and the effort expended over their construction.

The creation of resource listings is one of the fundamental areas within library and information science, (one of Hjorland’s1 eleven aspects of domain analysis) and yet very little is written about their construction – there is no definitive way for them to be produced. Nor, I am rather sorry to say, is it entirely clear today that listings of any sort are valued in the face of ‘go google’ – the land of instant lists.

I have a longstanding interest in listings. I started my career thinking about the ideal (comprehensive) listing of toxicology resources, at a time when it was feasible to contemplate such a thing. Once the Internet rendered our world global instead of local, any hope of a comprehensive resource listing within any subject area vanished. Instead we were left with the representative listing, the expert listing, the popular listing or the ‘here are some resources you could try’ listing.

As part of my doctoral research, I scoured the literature for methodologies relating to the compilation of resource lists and subject guides. As a response to the void I postulated (Robinson2) that ideal resource lists could be created by locating items via a cascade style of hierarchical searching; one would search first for lists of lists (quaternary resources), then for lists (tertiary resources), then for value added (secondary resources) and finally for the first instantiation of a work within the literature (primary resources). I called this hierarchy the ‘fundamental framework of resources’

The idea was academic, because in the real world resource creators do not adhere to my framework. They often create hybrid resources (a list containing other tertiary as well as secondary items for example), and not all resources are entered into a higher resource (not all lists are listed in a list of lists, not every article is indexed in a database …) which make a systematic cascade impossible to follow. It would be great if we had a single world list of lists for every subject – a bit like the gopher system – alas a distant memory. Nonetheless, I concluded that searching systematically, across databases, the internet and within ‘level specific’ resources, for resources within each of the four categories, would result in a representative listing of resources within any area.

The problem, as exemplified by my attempt to create a toxicology listing, was one of overload. There were simply too many resources to make even the term ‘representative’ an obvious way to go on its own. It thus became necessary to apply some selection criteria, whereby an item was included in the listing if it was the only one of its kind, or it was exemplary in some way. Me-toos were cut out, so that the list offered good examples of resources from categories such as books, journals, library collections and databases. This particular piece of work was carried out a decade ago; today we are faced with many more modes of dissemination. And a much bigger task.

So how then, to create resource listings in 2010? My interest has been stirred by joining the working group behind the creation of an updated edition of the Core Collection of Medical Books, under the auspices of CILIP’s Health Libraries Group. The last Core Collection3 was published by Tomlinsons in 2006.

The idea is to stick to a listing that just covers books, in order to bring the project within a manageable framework, but even then we come up against the issue of e-books and electronic access. The group favours including only works available in print, although some of these may be available as electronic editions, and in time, it may be that some works are missed if they are only available in electronic format. It was thought that this decision could be revised for a future edition.

Other considerations were the intended audience, previously stated as small to medium libraries, and the level of texts to include. It was considered that any constraints on potential audience should be removed, and that even though the listing would have a UK focus, it may be helpful to libraries internationally. The size of the listing was considered, currently around 1000 items, and the method of publication – another printed edition was favoured unanimously, but the group are using LibraryThing to solicit new items to be considered for inclusion, and comments on items in the existing list. It was suggested that if items received no comments, that they should be removed, but this was undecided as ‘no comment’ may not mean that an item should no longer be considered core. A date of no earlier than 2005 was mooted as a limit for publication date, as medicine progresses rapidly and texts date quickly. It was thought that this would be waived in a few cases where older texts are still believed to be valid (in psychotherapy for example).

Finally, the methodology for creating the listing, which for now is constructed from the last listing, plus any additions sent in by volunteer LIS workers contacted largely via lis-medical. This method was agreed to be limited, and an extension to the deadline for comments/submission was proposed so that more LIS professionals could be asked to contribute (CHILL and UHMLG members). The group also conceded that input from clinicians would be valuable, although probably time consuming to extract. I raised the issue of systematic searching by expert LIS staff within each category, but this was perhaps expecting too much time and effort from already time-poor staff. It was felt that an expert eye (LIS professional in our group) should assess the entire list in order to identify any obvious gluts or gaps; this highlights the strong desire of the group to bring the core collection into being as this is quite an onerous task. The third part of the methodology would be editorial, checking the text for publication etc.

The final aspect for consideration was the categorization used, or tags (the latter used with LibraryThing). The current tag list has been copied from the last edition, and the group will consider whether any changes should be made in the form of new tags or division of older joint tags such as ‘pharmacology and toxicology’ into separate headings. The tags have not been taken from any existing medical vocabularies, and the group has no plans to change this at the moment. It was decided, however, that we would not encourage free tagging, and that anyone suggesting an item for the list should use an existing tag.

The ease with which LibraryThing can be updated and maintained raises the question of whether the list needs to be finite – as theoretically new suggestions can be added in at an time – there is then the question of a mechanism for editorial control though.

So, the group intends to make a final call for comments and additions, whilst the group lead will look over all the entries to identify subjects (tags) where input is needed. We will meet again in the new year to consider our final material, and our options for producing a printed version, which despite the ready availability of the core collection on LibraryThing, was felt to be highly desirable.

Two companion works are already available; the Nursing Core Collection4 and the Mental Health Core Collection5. Further details can be found on the CILIP HLG website, from the link above.

It was my pleasure to meet a group of like-minded collection and resource list lovers, and I wholeheartedly admire their dedication to this project.

References

1) Hjorland B (2002). Domain Analysis in Information Science: eleven approaches, traditional as well as innovative. Journal of Documentation, vol 58 (4) 422-462

2) Robinson L (2000). A Strategic Approach to Research using Internet Tools and Resources. ASLIB Proceedings, vol 52 (1), 11-19

3) CILIP Health Libraries Group (2006). Core Collection of Medical Books 2006 5th edition. Tomlinsons

4) CILIP Health Libraries Group (2010). Nursing Core Collection 2010 4th edition. Tomlinsons.

5) CILIP Health Libraries Group (2009). Mental Health Core Collection 2009 2nd edition. Tomlinsons