Built a tiny c++ text chunker for python
github.com·6h·
Discuss: r/opensource

Semantic Text Chunker

Advanced semantic text chunking library that preserves meaning and context

Overview

This project provides a C++ library (with Python bindings via pybind11) for fast semantically chunking large bodies of text. It aims to split text into coherent, context-preserving segments using semantic similarity, discourse markers, and section detection, fast.

Features

  • Semantic chunking of text based on meaning and context
  • Adjustable chunk size and coherence thresholds
  • Extraction of chunk details: coherence scores, dominant topics, section types
  • Python bindings for easy integration
  • Memory-safe C++ implementation with comprehensive error handling

Installation

Prerequisites

  • C++17 compiler (GCC 7+, Clang 5+, or MSVC 2017+)
  • [pybin…

Similar Posts

Loading similar posts...