URIs in data: An ideology-free analytic

1. Acknowledgements
2. The core argument
3. Narrowing the focus: URIs and HTTP
4. Narrowing the focus: extensions
5. Failures of interoperability 1: Aggregation
6. Failures of interoperability 2: Landing pages
7. Landing pages, cont'd
8. Towards a solution
9. Conclusion

1. Acknowledgements

I owe much of what I understand about the Web to Tim Berners-Lee, along with Dan Connolly, Harry Halpin and Larry Masinter

Brian Cantwell Smith has hugely influenced my approach to all things computational

Jonathan and Jeni, my co-authors, haven't vetted these slides, and so can't be held responsible for anything contained herein with which you disagree

2. The core argument

The foundational Web standards left the question of (how we find out) the 'meaning', or 'referent' of URIs under-, if not altogether un-, specified
Berners-Lee knew what he meant, and his view is reflected in AWWW
But underspecification amounts to an extension point
And other extensions, incompatible with Berners-Lee's, have emerged and been widely implemented
The TAG's effort to settle the question after-the-fact, essentially by endorsing Berners-Lee's extension, has not succeeded
Mutually incompatible extensions at the same extension point can lead to failures of interoperability
Machine-actionable documentation can restore interoperability without prejudice to at least two common extensions

3. Narrowing the focus: URIs and HTTP

The URIs this discussion focusses on are limited

http: scheme
hashless
'retrieval-enabled':
- an HTTP GET request for the URI will result in an HTTP 200 response
- (in the absence of connectivity (construed widely) issues)

Restricted, but a very substantial part of the usage of URIs in data

And the only HTTP operation we address is GET

4. Narrowing the focus: extensions

By considering the use of URIs in data, we can exemplify two common usage patterns which reveal more-or-less covert allegance to two distinct extensions

Without actually taking a stance on what any of the controversial words mean

Sometimes information in data involving a URI is evidently about what you can retrieve from that URI:

{           "@id": "http://www.w3.org/People/Berners-Lee/",
  "last modified": "2012-06-08" }

{           "@id": "http://www.websci13.org/files/2013/04/ht-317x370.jpg",
     "resolution": "72x72"}

[We use JSON for our examples throughout, and assume the widespread '@id' convention]

Sometimes information in data involving a URI is evidently about what is described/depicted by what you can retrieve from that URI:

{           "@id": "http://www.w3.org/People/Berners-Lee/",
       "birthday": "1955-06-08" }

{           "@id": "http://www.websci13.org/files/2013/04/ht-317x370.jpg",
        "surname": "Thompson"}

The key word here is 'evidently' -- it's evident to us, but not, at least not without help, to computers

5. Failures of interoperability 1: Aggregation

The value proposition of the Web:

URIs in data enable information aggregation
"No-one ever expects the network effect"

What happens if data based on two different extensions are aggregated?

{           "@id": "http://www.w3.org/People/Berners-Lee/",
       "birthday": "1955-06-08" 
  "last modified": "2012-06-08" }

At best this is confused, at worst potentially destructive of reliable inference

6. Failures of interoperability 2: Landing pages

First, some terminology around the case where data containing URIs is about whatever is described/depicted by what you can retrieve from them

Proxy page: The describing/depicting result of retrieval in such a case
Landing page: A proxy page which describes something (commonly an image) which is itself retrievable via URI
Retrievable: What you can retrieve from a URI

So for example http://www.w3.org/People/Berners-Lee/ is a proxy page for Berners-Lee and http://www.w3.org/2011/05/w3cteam.html is a landing page for http://www.w3.org/2011/05/team-photo.jpg

The image and the descriptions are all retrievables

Berners-Lee is not a retrievable

7. Landing pages, cont'd

How do we (even we humans) understand data involving the URIs of landing pages?

When it includes properties which apply to retrievables?

For example

{           "@id": "http://www.w3.org/2011/05/w3cteam.html",
  "last modified": "2012-06-08" }

Does the information here apply to the landing page, or to the image?

This is not a hypothetical: metadata found on e.g. Flickr landing pages is often, but not always, about the image linked from that page, not about the page itself

8. Towards a solution

Consider what we have seen in terms of relationships:

diagram relating URIs, retrievables etc.

ERf entity retrieved/retrievable from
EDb entity described/depicted by
immediate property supplies information about ERf(U), e.g. last_modified
shorthand property supplies information about ERd(ERf(U)), e.g. birthday

Machine-readable documentation of properties as to whether they are immediate or shorthand would enable interoperability

at least with respect to the two problems identified earlier

9. Conclusion

This talk has been an attempt at a proof-of-concept

To move the conversation around httpRange-14 away from turf-wars with respect to the meaning etc. of URIs
And towards a pragmatically-grounded acknowledgement that reasonable people do differ

And an introduction to an analysis of usage which provides the basis for an approach to interoperability based on documentation of properties

URIs in data: An ideology-free analytic

Table of Contents

1. Acknowledgements

2. The core argument

3. Narrowing the focus: URIs and HTTP

4. Narrowing the focus: extensions

5. Failures of interoperability 1: Aggregation

6. Failures of interoperability 2: Landing pages

7. Landing pages, cont'd

8. Towards a solution

9. Conclusion

1. Acknowledgements

2. The core argument

3. Narrowing the focus: URIs and HTTP

4. Narrowing the focus: extensions

5. Failures of interoperability 1: Aggregation

6. Failures of interoperability 2: Landing pages

7. Landing pages, cont'd

8. Towards a solution

9. Conclusion