Optimal Chunking for Ontology RAG: Empirical Analysis & Orphan Axiom Problem
semanticweb.org·8h·
Discuss: DEV
📊Datalog
Preview
Report Post

Retrieval-Augmented Generation (RAG) systems require effective chunking strategies to segment knowledge into retrievable units. While text-based chunking (word, sentence, paragraph boundaries) is well-studied for documents, ontologies present unique challenges due to their semantic structure. This study empirically evaluates 10 chunking strategies—4 text-based and 6 OWL-aware—on a legal domain ontology, measuring similarity scores, answer quality, retrieval consistency, and computational costs.

In my small-scale, independent experiments, I discovered the "Orphan Axiom Problem": 93.8% of axioms in my test ontology were non-hierarchical (individuals, properties, annotations), causing traditional OWL-aware strategies to produce highly unbalanced chunks. In these te...

Similar Posts

Loading similar posts...