An introduction to naming and reference on the Web

Henry S. Thompson
28 January 2012

Table of Contents

1. Acknowledgements

I owe much of what I understand about the Web to Tim Berners-Lee, along with Dan Connolly, Harry Halpin, Jonathan Rees and Larry Masinter

Brian Cantwell Smith has hugely influenced my approach to all things computational

2. The Web demands our attention

To say the Web is ubiquitous, at least in the so-called developed world, is commonplace to the point of vacuity

Ubiquity alone doesn't require philosophical enquiry

What I aim to show you today is that we do need to study the mechanisms of the Web

3. Why focus on URIs?

URIs turn out to be something quite new

And getting agreement on what they really are has proved. . .challenging

Understanding them is a crucial part of understanding how to make the Web work better for everyone

Their very name proclaims them to be names

4. The official story of URIs

There is a moderately clear official consensus about URIs

'URI' stands for

Uniform
There's a standard generic syntax and (partial) semantics
Identifier
URIs are identifiers, that is, names (not addresses)
Resource
What it is they identify -- anything at all

5. URI syntax: some terminology

We will need to refer to parts of URIs as they are written:

sample URI with scheme (http), domain (www.ltg.ed.ac.uk) and path (/~ht/identity/ComponentGraph.svg))

More examples:

6. Behind the scenes

Before we dive further into the details, let's try to bring what's hidden out into view

Put ILCC people page in a separate window & view source

What actually happens when someone clicks on a link to my home page:

  1. User clicks on a link in a page displayed by a Web browser
  2. Browser looks at URI behind the link and identifies http scheme
  3. Browser looks up domain name part of the URI via DNS and gets an IP address
  4. Browser sends HTTP GET message to port 80 at that address with the path part of the URI
  5. Browser computer (the client) operating system and Ethernet hardware
    1. Disassemble the message into packets
    2. Send them via TCP/IP to a gateway
    3. Which routes them through the Internet to their destination
    4. Where they are reassembled into a message by the destination computer (the server) hardware and operating system
    5. And delivered to the computer program (the web server) listening on port 80
  6. Web server maps path onto its file system
  7. Server sends header specifying mime type text/html plus file contents back to the client [see above wrt TCP/IP]
  8. Client dispatches response to browser
  9. Notice the URI appears in the address bar
  10. Browser interprets message body as HTML, displays page to user

We'll call the process embodied in steps 3–8 accessing the URI

7. The nature of resources

Back to URIs

At first 'resource' seems like a vacuous label

But consider the word referent

So, being a resource is not about some intrinsic property of something

8. More about resources

Who gets to say what the resource is that a URI identifies?

We'll use the phrase "referent of a URI" for the resource which the minter of a URI intends for it to identify

Who is the owner (sometimes also called the 'minter')?

9. More about identification

In principle, all URIs are opaque

In practice, for most http URIs, this is false

10. Out of scope

Giving your URIs consistent names contributes to the utility of the Web as a shared information space

It's part of what is referred to as the social contract

11. How do I know what a URI identifies?

Now things start to get interesting

In the beginning, there was only one (legitimate) way

12. Ah, finally, something about web pages

Isn't that what URIs are really for?

13. Representations?

'Representation' names a pair: a character sequence and a media type.

Just as, in order to interpret utterances or enscriptions, we need to know the language they are expressed in

The result of a successful access of a URI consists mostly of a representation

14. Representation to presentation

A browser goes beyond just retrieving a representation

15. Putting it all together

We can combine our story about representations with the behind-the-scenes narrative

illustration of HTTP request-response

Notice the HTTP/1.1 200 OK line in the response

16. Resources vs. representations

Why make this distinction? Why isn't the web page itself the resource?

The distinction was barely there, if at all, in the earliest standard (RFC 1630)

Thereafter, the distinction emerged over time

17. Resources vs. representations, cont'd

From quite early on, as well, the idea arose of URIs without associated web pages

We'll return to this idea later on.

18. How can a resource lack a representation?

There are a number of un-interesting reasons why accessing a URI may fail to produce a representation

But there's a more interesting reason as well

19. Having a representation isn't universal

Here's the official story about representations

20. Having a representation, cont'd

A weather report is certainly an information resource

But what about a GET on http://cities.example.org/oaxaca?

So whatever response you get, it shouldn't be a 200 OK

But what should it be, if you have useful information to offer about Oaxaca?

two-resource (one a description of the other) diagram

21. Resources about vs. representations of

If what you have describes or depicts a resource, as opposed to conveying its essential characteristics

So the officially correct response is something like

HTTP/1.1 303 See other
Location: http://cities.example.org/metadata/oaxaca.html

The browser interprets the 303 See other to mean "Do a GET on the 'Location' URI instead"

22. Angels on the head of a pin?

Surely this is all a bit over the top?

Well, things and their descriptions are not the same

And when people started using URIs to make assertions (using RDF, on the Semantic Web)

23. Interim summary

URIS identify resources

Resources can be anything at all

Accessing a URI may yield a representation of (the current state of) its referent

Only information resources have representations

24. A glaring omission

What about the fact that information resources (Voltaire's Candide, Debussy's La Mer, today's issue of Le Monde, Minard's visualisation of Napolean's Russian campaign) have parts

Do we have to make up URIs for all their parts?

How can we do that?

First Brian will expand on this,

25. What's the problem, then?

That's all pretty much OK, isn't it?

No, it's not OK

In practice

26. A narrow view. . .

Not before time, I need to emphasise that there's a lot I've ignored so far

I'll branch out a bit with respect to the first point in a minute

27. Marxist interlude

Why did things go wrong?

The Web as the standards tried to capture it has not stood still

The Web is now more than a conduit for documents to read

The potential the Web offers for people to make money influences behaviour in ways and at a pace that no after-the-fact standard can hope to match

28. Web practice: web pages

Historically, URIs were mostly seen as simply the way you accessed web pages

Not any more

Furthermore, the relation between retrieved representation and observed presentation has changed enormously

The presentation a user experiences as a result of accessing a URI depends

29. Web practice: web pages, cont'd

For example, the representation you get by accessing the www.weather.com home page

Such a representation certainly captures all the 'essential characteristics' of whatever it is that that URI identifies

30. Web practice: URIs/The Semantic Web/Linked Open Data

The Semantic Web names an initiative and a Web-based technology

You could think of it as the marriage of Knowledge Representation and the Web

At its core is the idea of a web of assertions

31. Web practice: SemWeb URIs

One triple from that example:

Subject: http://www.example.org/index.html
Predicate: http://purl.org/dc/elements/1.1/creator
Object: http://www.example.org/members/1234

The subject is presumably meant to name an information resource

The growth of the Semantic Web

mean that huge numbers (at least billions) of such URIs are in use

32. SemWeb URIs: the bad news

Accessing one of those URIs will in practice almost always either

But per the official story accessing a URI which identifies a person should never result in a 200 OK response!

33. Does this matter?

We've identified a divergence between Web standard/theory and Web practice

Not only does this matter in principle

It also matters in practice

In other words, laws that everybody breaks bring the law into disrepute

34. What can be done?

We need to look hard at the things we missed before

35. Where can we look for help?

Although URIs are a new kind of identifier, they share some properties with other identifiers

Other disciplines have something to contribute as well

36. Taking presentation seriously

For the vast majority of web developers, it's all about what I've been calling presentation

Maybe we should make that more central in our theory

Here's a sketch of how we might start

Compose all three of these, and a browser implements a function from URIs to presentations (with dependencies on three kinds of state: browser, server and web): B(U)π

People responsible for URIs know this

37. Theory of presentations, cont'd

B2 (and thus B) depends on Web state

In principle this recursion could fail to terminate

38. Presentations and persistence

The stability over time of B for some URI is then what people are talking about when they discuss persistence

We can outline some interesting classes of expected behaviour in these terms

  1. Once a particular instance of B is established, there is no expectation that the resultant presentation will change in any way
    • URIs which present as images, audio and video are often in this category
  2. Once a particular instance of B is established, there is an expectation that the resultant presentation will not change
    • but there is a recognition that errors may need to be corrected
    • URIs which present as some form of transcription of documents created outside the Web are typically in this category
    • as are W3C Recommendations
  3. URIs whose presentations change only in accordance with a published versioning policy
  4. A newspaper home page
  5. My home page
  6. A 'live' news blog

39. Resource, representations, alternatives

This space is much richer and more complex than the current theory allows for

The Information Scientists have struggled with the ontology of created works for a long time

40. The nature of the work of art

IS have developed a detailed story about all this

FRBR has a four-level ontology

41. FRBR and the Web

Not all of the FRBR architecture maps directly onto the Web situation

Some people in the IS community, notably Allen Renear, have made a start on this

42. Time-varying resources

Clearly there's a lot of it about

We looked at this briefly under the heading of persistence

The Philosophy of Language offers another possible way in

Consider English words such as this, here and tomorrow, as well as you and I.

On one well-thought-of Philosophy of language account, an indexical such as now has

The meaning is fixed, the interpretation varies according to the context of use

43. Indexicals, cont'd

The meaning of now is something like "the time at which the utterance containing the word is made"

But the interpretation is, well, whatever time it happens to be when the word is used

More generally, the meaning of an indexical can be understood as a function from contexts (that is, contexts of utterance) to interpretations

In a formal approach to all this, indexical meaning really is just an index

The parallel with time-varying resources is clear

44. Local context

Arguably, much of the focus of Web architecture discussions to date has been misplaced

Words, after all, don't mean anything

They have to be used before we can talk about their meaning

Consider the following character sequence as if found on a piece of paper in an otherwise featureless bottle on a desert island beach:

chat

Conversation? Cat? Something else altogether? There's no way to tell

It's words in use that have meaning

45. Enter philosophy

One of the primary roles of "the philosophy of ..."

Within the web community discussions about URIs use words such as 'identify' and 'denote', as well as 'name' itself

These are terms of art within the Philosophy of Language

46. Quibble from the philosophers

In natural language, names are easily discoverable

That is, we know what to call things

There is nothing corresponding to this for URIs

47. Back to local context

Ordinary language names function within specific linguistic contexts

What a name means depends on the details of the surrounding utterance or sentence

Consider what at first seems like a very ordinary (imaginery) name: EZY386 (short for easyJet 386)

Here are some example uses of this 'simple' name

  • EZY386 will depart from gate E17 at 2010 [announcement]
  • Just arrived on EZY386 [text message]
  • EZY386 flies from Stansted to Avalon
  • EZY386 is easyJet's 3rd most popular flight to Avalon
  • I prefer EZY386 to EZY387
  • EZY386 has an 102% on-time record
  • EZY386 was cancelled yesterday
  • EZY386 was delayed because of a problem with one of its engines

48. Not so unique. . .

So EZY386 isn't so simple after all

People are smart

None the less it might be that this kind of flexibility in the use and understanding of names would help us with our theories about URIs

49. Local context for URIs

So, for example, maybe the resource/representation distinction is like this

So apparently non-canonical behaviour get brought back into line

For example, contrast these two local contexts:

From this perspective, maybe 200 OK should be more flexible

50. Elaborating context

To put this in a way that connects up with the discussion of indexicals

Something like this seems preferable to just throwing up our hands

51. Envoi

A whistle-stop tour of some key components of the Web

A handful of cases where these two don't line up

And finally some more-or-less wild-and-crazy suggestions

Importing an outside perspective into a complex space is easily dismissed

I hope at least some of the foregoing proves to be neither

52. References and further reading