HBase Graph for Jena
HBase is a column-oriented store modeled after BigTable, which is a technology that is built on top of the Google File System. HBase is an open source, distributed and highly scalable database that is built on top of Hadoop, created by the Apache Foundation. HBase is used to create very large tables in a distributed fashion using Hadoop. The HBase extension for Jena is built to be able to leverage the distributed capabilities of HBase when creating large RDF graphs in Jena. The HBase extension covers the following aspects:
Extend the Jena framework with HBase capabilities:
By this extension we can create a Jena model that is based on the HBase framework. The idea is to map Jena's triple store structure to HBase. For this purpose, every triple added to the RDF graph is indexed three times when stored in the HBase table. For every subject, a triple with the index value as the subject, the predicate, and the object is created and stored as a row in the table. For the purpose of reification, the node and its corresponding triple are transformed into a quadruple of triples and stored as four rows in the table.
Ability to query the extended HBase model:
The extended model uses Jena's ARQ engine to deconstruct SPARQL queries into simple find operations on the RDF graph that can then be applied to the HBase table to execute the query. Thus the HBase extension is able to handle regular find operations on the RDF graph as well as SPARQL queries.
Future work with the HBase extension involves the addition of delete operations on a graph and the ability to be able to update the contents of a node in the graph, which are not included in the current version.
Professors: Dr. Murat Kantarcioglu and Dr. Bhavani Thuraisingham
Student: Vaibhav Khadilkar