Locations that exhibit a certain interest or serve a certain purpose are commonly referred to as Points of Interest (POIs). The concept of a POI is quite broad, encompassing anything from a shop, restaurant or museum to an ATM or bus stop. POIs are complex entities that are characterized by their geospatial features (points, polygons) along with various other attributes and metadata indicating their name, type, functionality, etc., as well as their relations to each other (e.g., containment, part-of) with respect to spatial, temporal, and/or thematic contexts.
POI data are the cornerstone of any application, service, and product even remotely related to our physical surroundings. From navigation applications, to social networks, to tourism, and logistics, we use POIs to search, communicate, decide, and plan our actions. A common misconception is that earth imagery (satellite, aerial) is the major market driver. However, the actual cartographic products and services are comprised from three fundamental types of entities: roads, addresses, and POIs. Combined, these answer the most foundational algorithmic and cognitive queries: What is there? How do I get there? Where do I find something?
There are numerous available file formats for exchanging POI data, each of them supporting different schemas and metadata types for describing POIs: The GPS Exchange Format is supported by commercial vendors, such as Garmin or iGo. TomTom Overlay is a proprietary format for encoding POIs. The OziExplorer format focuses on off-road routes and areas and is supported by vendors like Garmin and Magellan. OpenStreetMap adopts an XML format to represent POIs as nodes (points) or ways (linestrings or polygons).
The list of formats goes on; this diversity is attributed to (currently obsolete) technical and market restrictions. The vast majority of formats were originally designed more than a decade ago taking into account the specifications and limitations of Personal Navigation Devices (PNDs). Given PND processing/storage capabilities of the time, POIs (and navigation data in general) were represented in a condensed way, which led to schemas lacking representation accuracy (e.g. point instead of polygon) and richness (e.g. limited sets of attributes). Further, the development of formats and datasets was dispersed between competing vendors, hindering the design of common formats.
Linked data are a set of principles and best practices, based on Semantic Web standards (RDF(S), SPARQL) that prescribe and facilitate the exposition, sharing and interconnection of resources in the Web of Data. The Resource Description Framework (RDF) is a language for representing information about resources on the Web. RDF intends to organize information in a machine readable format, providing a common framework for expressing information so it can be exchanged between applications without loss of meaning. RDF statements are triples (subject, predicate, object) consisting of the resource (the subject) being described, a property (the predicate), and a property value (the object). The RDF Schema (RDFS) is an extension of RDF designed to describe, using a set of reserved words called the RDFS vocabulary, resources and/or relationships between resources. It provides constructs for the description of types of objects (classes), type hierarchies (subclasses), properties that represent object features (properties) and property hierarchies (sub-properties). The SPARQL Protocol and RDF Query Language (SPARQL) is the de facto query language for RDF data. The evaluation of SPARQL queries is based on graph pattern matching. The OGC GeoSPARQL standard defines a vocabulary for representing spatial features and geometries and an extension to SPARQL for representing and processing geospatial data in RDF respectively.
Pioneered by the FP7 project GeoKnow, Linked Data technologies have been applied to effectively maximize the value extracted from open, crowdsourced and proprietary Big Data sources. Validated in the domains of tourism and logistics, these technologies have proven their benefit as a cost-effective and scalable foundation for the quality-assured integration, enrichment, and sharing of generic-purpose geospatial data.
In SLIPO, we argue that Linked Data technologies can address the limitations, gaps and challenges of in integrating, enriching, and sharing POI data. Our goal is to transfer the research output generated by our work in project GeoKnow, to the specific challenge of POI data, introducing validated and cost-effective innovations across their value chain.
At first look, POI data might not seem "Big". Google Places contains 100 Million POIs, a number that is easily manageable by a modern day GIS system. However, POIs carry a wealth of thematic, spatial and temporal metadata, that are either explicitly connected with each POI, or can be implicitly extracted and attributed to the POI. Further, in several datasets, POI carry their histories (versions) which may multiply the attributes related to a single POI. On top of that, the application of Linked Data technologies to interlink and enrich POIs, adds further numbers of attributes to POIs. If we calculate the representation and storage overhead required by Likned Data technologies, a single (Linked and Enriched) POI may require 50-100 triples (records) for storing all its properties (metadata). This leads to Billions of records!
Another aspect of the problem is that some of the integration tasks, such as interlinking or clustering, are algorithmically and, thus, computationally challenging problems. Due to that, trivial systems are not capable of executing these tasks on the aforementioned dataset sizes; Big Data architectures and novel, distributed algorithms are required.
Let’s consider a simple example involving an imaginary POI, the “Slipoville Hotel”. Once this POI was created, it was registered in the local Yellow Pages directory, with some basic information (hotel name, address, telephone). In parallel, the hotel owner created and published a web site providing additional information for the hotel, such as photos, offered facilities and services, price ranges and directions on how to get there. After some time, the Yellow Pages record was picked up by a company collecting local POI data for creating a mobile city guide. The information from the Yellow Pages was combined with info from the hotel’s web site, and also augmented with additional content by the application owners, such as photos and reviews. At some point in time, the hotel was renovated and expanded, adding a car parking space behind the main building and opening a bar/restaurant at the lobby. After a period of time, these changes were detected and included in a mobile city guide application. Moreover, following the changes, any prior photos and reviews about the hotel had to be re-examined and updated accordingly. In the meantime, a geomarketing company collecting and analyzing POI data and location profiles incorporated the hotel information, both from its web site and the city guide, to its own POI database, combining it with additional knowledge about area demographics and customer profiles. This analysis was used by a mobile advertising application based on geo-fencing to push personalized advertisements to users visiting the area, as well as to arrange deals and offers with the hotel’s owners regarding their customers.
This example is a simple but realistic and indicative scenario showing how different companies, in different sectors and focusing on different purposes and services, collect, use and extend information about the same POI. It also points out the importance of volatile data in the POI profile, e.g., its facilities, prices, events, etc. Moreover, it brings up several cases of ambiguity that arise and lead to data integration challenges that these companies have to face throughout this process.
POI data might seem, at first glance, a narrow field, however, they constitute an extremely expanded, cross-border, and cross-domain type of data, affecting several industrial and commercial areas, such as tourism, leisure and entertainment, geo-marketing, navigation software, logistics.
The creation, update, and provision of POI datasets consists a multi-billion cross-domain and cross-border industry, with a value chain natively incorporating most domains of our economy, from mobility and tourism, to logistics and manufacturing. Advances in the timely and accurate provision of POIs result into significant direct and indirect gains throughout our economy. Productivity gains, optimization of value chains, match-making consumers with goods and service providers, new value added products, are just a few examples. POI data are truly one of the foundations and value multipliers of our Digital Economy.