The Open Knowledge Foundation (OKFN) held a London Open Data Meetup in the evening of the first day of the Linked Geospatial Data 2014 workshop. The event was, as they themselves put it, at the amazing open concept office of OKFN at the Center for Creative Collaboration in Central London. What could sound cooler? True, OKFN threw a good party, with ever engaging and charismatic founder Rufus Pollock presiding. Phil Archer noted, only half in jest, that OKFN was so influential, visible, had the ear of government and public alike, etc., that it put W3C to shame.
Now, OKFN is a party in the LOD2 FP7 project, so I have over the years met people from there on and off. In LOD2, OKFN is praised to the skies for its visibility and influence and outreach and sometimes, in passing, critiqued for not publishing enough RDF, let alone five star linked data.
As it happens, CSV rules, and even the W3C will, it appears, undertake to standardize a CSV-to-RDF mapping. As far as I am concerned, as long as there is no alignment of identifiers or vocabulary, whether a thing is CSV or exactly equivalent RDF, there is no great difference, except that CSV is smaller and loads into Excel.
For OKFN, which has a mission of opening data, insisting on any particular format would just hinder the cause.
What do we learn from this? OKFN is praised not only for government relations but also for developer friendliness. Lobbying for open data is something I can understand, but how do you do developer relations? This is not like talking to customers, where the customer wants to do something and it is usually possible to give some kind of advice or recommendation on how they can use our technology for the purpose.
Are JSON and Mongo DB the key? A well renowned database guy once said that to be with the times, JSON is your data model, Hadoop your file system, Mongo DB your database, and JavaScript your language, and failing this, you are an old fart, a legacy suit, well, some uncool fossil.
The key is not limited to JSON. More generally, it is zero time to some result and no learning curve. Some people will sacrifice almost anything for this, such as the possibility of doing arbitrary joins. People will even write code, even lots of it, if it only happens to be in their framework of choice.
Phil again deplored the early fiasco of RDF messaging. "Triples are not so difficult. It is not true that RDF has a very steep learning curve." I would have to agree. The earlier gaffes of the RDF/XML syntax and the infamous semantic web layer cake diagram now lie buried and unlamented; let them be.
Generating user experience from data or schema is an old mirage that has never really worked out. The imagined gain from eliminating application writing has however continued to fascinate IT minds and attempts in this direction have never really ceased. The lesson of history seems to be that coding is not to be eliminated, but that it should have fast turnaround time and immediately visible results.
And since this is the age of data, databases should follow this lead. Schema-last is a good point, maybe adding JSON alongside XML as an object type in RDF might not be so bad. There are already XML functions, so why not the analog for JSON? Just don't mention XML to the JSON folks...
How does this relate to OKFN? Well, in the first instance this is the cultural impression I received from the meetup, but in a broader sense these factors are critical to realizing the full potential of OKFN's successes so far. OKFN is a data opening advocacy group; it is not a domain-specific think tank or special interest group. The data owners and their consultants will do analytics and even data integration if they see enough benefit in this, all in the established ways. However, the widespread opening of data does create possibilities that did not exist before. Actual benefits depend in great part on constant lowering of access barriers, and on a commitment by publishers to keep the data up to date, so that developers can build more than just a one-off mashup.
True, there are government users of open data, since there is a productivity gain in already having the neighboring department's data opened to a point; one does no longer have to go through red tape to gain access to it.
For an application ecosystem to keep growing on the base of tens of thousands of very heterogeneous datasets coming into the open, continuing to lower barriers is key. This is a very different task from making faster and faster databases or of optimizing a particular business process, and it demands different thinking.
Linked Geospatial Data 2014 Workshop posts: