Full history here

Using rdf2vec on geospatial graphs

As I wrote earlier, I’m investigating machine learning strategies on geospatial tabular data known to the GIS field as geospatial ‘vector data’. Theoretically, features can be learned through investigation of the surrounding context by constructing a geospatial graph of neighbouring features. We discussed the use of dnc and concluded that it is probably too early to try and apply a novel but complex and probably hardware costly approach.

Rather: we’re discussing next on perhaps try and use rdf2vec. The code is here

Description of the task

The problem to be solved is as follows. The national Cadastre collects reports of users on errors in the data. Theoretically, a neural net could be trained to pre-select geospatial data that have a high probability of being faulty, judging by the training data already generated by the crowd. After training the net,for an individual geospatial object a prediction can be made for some property of object being erroneous.

Approach

Rdf2vec constructs graph embeddings based on random walks. The idea is to feed it spatial data for which the neighbouring spatial objects are interlinked with reified information regarding their distance and angle. As a parameter either the maximum amount of nearest neighbours is specified, or the maximum distance. This is still to be decided.

Data

The training set is obtained as open data from the national cadastre: https://geodata.nationaalgeoregister.nl/bgtterugmeldingen/wms. The BGT is the large-scale topographical dataset, containing fine-grained information about spatial objects in public environment. The data itself can be viewed here: browse down the list and select ‘Terugmeldingen’, then select ‘BGT Terugmeldingen [wms]’.

Example:

Tijdstip_registratie
2017-04-14T17:36:09.188
Omschrijving
Dit stukje Geiserlaan is een fietspad, geen lokale weg.
Status
Nieuw
Meldings_nummer_as_text
BGT00007895

Questions

Does rdf2vec retain edge labels?
Can rdf2vec be accellerated using a GPU?
Is it reasonable to expect training results on a large dataset within, say, a week?
Many reports are on missing data. Can missing data actually be recognized by a neural network?

Resources

http://data.dws.informatik.uni-mannheim.de/rdf2vec/