Full history for the article here.

Document stores for Big Linked Geodata

Wednesday March 22, 2017 I attended the Location Powers Big Linked Geodata meeting, held at Delft University of Technology in the Netherlands. It was kind of a side event to an Open Geospatial Consortium Technical Committee meeting, but this wednesday was open for subscription for members of the general interested public. The meeting held interesting talks from many apt speakers and both the venue and lunch were excellent.

One thing that struck me as odd, however, was the technological focus. For as far as big linked geodata storage solutions went, the attention was mostly on scaling either ‘classic’ object-relational databases such as Oracle Spatial and PostGIS from the geospatial side, or SPARQL with its spatial extension GeoSPARQL to the Linked Data side. The OGC-proposed marriage of the spatial world and Linked Data resulted in the GeoSPARQL standard, a combination of a vocabulary for describing geospatial relations in Linked Data and an extension of the SPARQL standard for using spatial functions. Since I attended the OGC/W3C Linking Geospatial Data conference in London in 2014 I noticed that the field had progressed enormously in these three years. I think Wouter Beek gave a nice overview of the maturity of GeoSPARQL: there basically are two products that seem to offer ‘production ready’ GeoSPARQL implementations: Oracle Spatial and Graph and Ontotext GraphDB. At the time in 2014, there was not a single implementation that exceeded the experimental phase.

However, I could not help but notice that the focus of the talks did not seem to get beyond GeoSPARQL. Especially with the ‘Big’ in Big Linked Geodata, GeoSPARQL does not seem to me a likely candidate for analyzing petabytes of Linked Geodata. The scalability of SPARQL is still improving, but as Wouter mentioned, the vendors of the GeoSPARQL implementations advise against putting up a public GeoSPARQL endpoint since no guarantees on their availability and stability under heavy load can be given. That is still aside from the ‘big’ aspect. To put it more bluntly in my own words, exposing a Big Linked Geodata SPARQL endpoint looks to me like infrastructural suicide and a very hefty infrastructure bill.

So I wondered out loud at the end of the day why other solutions that do support Linked Data and geospatial analysis were hardly considered. Where JSON-LD was mentioned, it was mostly to do with offering it as an exchange format rather than a storage format. Why is there so little focus on geospatially enabled document stores as big data solutions? I think that we need to look more closely at leveraging JSON-LD as a scalable, performant and highly available geospatial Linked Data storage solution.

While ElasticSearch, MongoDB are rapidly gaining in popularity in offering scalable and very complete geospatial functionality, wednesday it seemed like document store solutions are forgotten in the perspectives they may offer on Big Linked Geodata. In 2014, we embraced JSON-LD as a native RDF serialization, but I hear little from organisations shifting their attention from SPARQL to JSON document stores in building more stable, scalable and performant infrastructures. Is it due to the dogmas of the Linked Data community that SPARQL is still considered as the single solution to creating a global web of data or the famous Linked Open Data Cloud? Why does the SPARQL standard still only allow federation to other SPARQL endpoints when the web offers so much more, at much lower cost?

This ties in neatly with a remark Rinke Hoekstra made in the 5th anniversary publication of the Platform Linked Data Netherlands (in Dutch only unfortunately, p. 49). One of the problems of the adoption of linked data is the ungrounded assumption that the use of special Linked Data storage infrastructure is mandatory - the native RDF triple or quad store with SPARQL as its main interface. With the assumption of ‘special infrastructure’ comes the downside of niche RDF solutions with a small user base. With JSON-LD I would say, we have storage technology at our fingertips in the form of JSON document stores with a huge user base that offer a scalable, performant, cheap and highly available solution including the elusive stable big geospatial linked data functionality.

It is time to drop the Linked Data fixation on SPARQL and move on to more stable options.