Distributed Storage and Processing of Knowledge Graphs

Research areaKnowledge Representation, Data Lake
DescriptionA knowledge graph (KG) represents real-world entities and their relationships with each other. The thus represented knowledge is often context-dependent, leading to the construction of contextualized KGs. Due to the multidimensional and hierarchical nature of context, the multidimensional OLAP cube model from data analysis is a natural fit for the representation of contextualized KGs. Traditional systems for online analytical processing (OLAP) employ cube models to represent numeric values for further processing using dedicated query operations. Knowledge Graph OLAP (KG-OLAP) adapts the OLAP cube model for KGs. The roll-up operation from traditional OLAP is decomposed into a merge and an abstraction operation. The merge operation corresponds to the selection of knowledge from different contexts whereas abstraction replaces entities with more general entities. The result of such a query is a more abstract, high-level view on the contextualized KG.

The goal of this thesis is to provide an algorithmic implementation of (some of) KG-OLAP query operators using Apache Spark GraphX, a high-performance distributed framework for data analysis.
Literature
  • http://www.semantic-web-journal.net/system/files/swj2269.pdf
  • http://kg-olap.dke.uni-linz.ac.at/
  • https://spark.apache.org/graphx/
Contact personChristoph Schütz