Bye-bye Bridges: Direct Java Support for Docling is Now a Reality
๐ Official Java Support for Docling Has Landed!
Just days after a post regarding a third-party Java implementation by Arconia-io, the Docling Project has officially rolled out its native Java API! This is the game-changer developers have been waiting for: direct, officially supported integration allowing you to consume Docling services seamlessly within your Java applications without any external dependencies or workarounds.
What/Which support and services are already implemented?
The current reach implementation the following features;
- ๐๏ธ Parsing of [multiple document formats](https://doclโฆ
Bye-bye Bridges: Direct Java Support for Docling is Now a Reality
๐ Official Java Support for Docling Has Landed!
Just days after a post regarding a third-party Java implementation by Arconia-io, the Docling Project has officially rolled out its native Java API! This is the game-changer developers have been waiting for: direct, officially supported integration allowing you to consume Docling services seamlessly within your Java applications without any external dependencies or workarounds.
What/Which support and services are already implemented?
The current reach implementation the following features;
- ๐๏ธ Parsing of multiple document formats incl. PDF, DOCX, PPTX, XLSX, HTML, WAV, MP3, VTT, images (PNG, TIFF, JPEG, โฆ), and more
- ๐ Advanced PDF understanding incl. page layout, reading order, table structure, code, formulas, image classification, and more
- ๐งฌ Unified, expressive DoclingDocument representation format
- โช๏ธ Various export formats and options, including Markdown, HTML, DocTags and lossless JSON
- ๐ Local execution capabilities for sensitive data and air-gapped environments
- ๐ค Plug-and-play integrations including LangChain4j
- ๐ Extensive OCR support for scanned PDFs and images
- ๐ Support of several Visual Language Models (GraniteDocling)
- ๐๏ธ Audio support with Automatic Speech Recognition (ASR) models
Implementation
On the landing page of the repository, a sample code demonstrates a simple implementation.
โน๏ธ In order to run the samples or your own code with your own environment, youโll need a Docling server running locally.
import ai.docling.api.DoclingApi;
import ai.docling.api.convert.request.ConvertDocumentRequest;
import ai.docling.api.convert.response.ConvertDocumentResponse;
import ai.docling.client.DoclingClient;
DoclingApi doclingApi = DoclingClient.builder()
.baseUrl("<location of docling server>")
.build();
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.addHttpSources(URI.create("https://arxiv.org/pdf/2408.09869"))
.build();
ConvertDocumentResponse response = doclingApi.convertSource(request);
System.out.println(response.document().markdownContent());
The โDoclingClient.javaโ demonstrates integration test suite how interact with the Docling service. Utilizing Testcontainers for a reliable testing environment, the code first starts a Docling container and then demonstrates key functionalities like calling the health endpoint, converting documents from a public HTTP source, and processing local files (like a PDF) supplied via Base64 encoding. It also illustrates how to configure advanced features, such as enabling OCR and setting the table extraction mode, proving the API is robust and ready for production use.
package ai.docling.client;
import static org.assertj.core.api.Assertions.assertThat;
import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import java.time.Duration;
import java.util.Base64;
import java.util.Optional;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;
import ai.docling.api.convert.request.ConvertDocumentRequest;
import ai.docling.api.convert.request.options.ConvertDocumentOptions;
import ai.docling.api.convert.request.options.TableFormerMode;
import ai.docling.api.convert.response.ConvertDocumentResponse;
import ai.docling.api.health.HealthCheckResponse;
import ai.docling.testcontainers.DoclingContainer;
import ai.docling.testcontainers.config.DoclingContainerConfig;
/**
* Integration tests for {@link DoclingClient}.
*/
@Testcontainers
class DoclingClientTests {
@Container
private static final DoclingContainer doclingContainer = new DoclingContainer(
DoclingContainerConfig.builder()
.imageName(Images.DOCLING)
.enableUi(true)
.build(),
Optional.of(Duration.ofMinutes(2))
);
private static DoclingClient doclingClient;
@BeforeAll
static void setUp() {
doclingClient = DoclingClient.builder()
.baseUrl("http://localhost:%s".formatted(doclingContainer.getMappedPort(Images.DOCLING_DEFAULT_PORT)))
.build();
}
@Test
void shouldSuccessfullyCallHealthEndpoint() {
HealthCheckResponse response = doclingClient.health();
assertThat(response).isNotNull();
assertThat(response.status()).isEqualTo("ok");
}
@Test
void shouldConvertHttpSourceSuccessfully() {
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.addHttpSources(URI.create("https://docs.arconia.io/arconia-cli/latest/development/dev/"))
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
assertThat(response).isNotNull();
assertThat(response.status()).isNotEmpty();
assertThat(response.document()).isNotNull();
assertThat(response.document().filename()).isNotEmpty();
if (response.processingTime() != null) {
assertThat(response.processingTime()).isPositive();
}
assertThat(response.document().markdownContent()).isNotEmpty();
}
@Test
void shouldConvertFileSourceSuccessfully() throws IOException {
var fileResource = readFileFromClasspath("story.pdf");
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.addFileSources("story.pdf", Base64.getEncoder().encodeToString(fileResource))
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
assertThat(response).isNotNull();
assertThat(response.status()).isNotEmpty();
assertThat(response.document()).isNotNull();
assertThat(response.document().filename()).isEqualTo("story.pdf");
if (response.processingTime() != null) {
assertThat(response.processingTime()).isPositive();
}
assertThat(response.document().markdownContent()).isNotEmpty();
}
@Test
void shouldHandleConversionWithDifferentDocumentOptions() {
ConvertDocumentOptions options = ConvertDocumentOptions.builder()
.doOcr(true)
.includeImages(true)
.tableMode(TableFormerMode.FAST)
.build();
ConvertDocumentRequest request = ConvertDocumentRequest.builder()
.addHttpSources(URI.create("https://docs.arconia.io/arconia-cli/latest/development/dev/"))
.options(options)
.build();
ConvertDocumentResponse response = doclingClient.convertSource(request);
assertThat(response).isNotNull();
assertThat(response.status()).isNotEmpty();
assertThat(response.document()).isNotNull();
}
private static byte[] readFileFromClasspath(String filePath) throws IOException {
try (InputStream inputStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(filePath)) {
if (inputStream == null) {
throw new IOException("File not found in classpath: " + filePath);
}
return inputStream.readAllBytes();
}
}
}
Final Words โ The Power of Docling Java is Now Yours
The official Docling Java implementation is not just a simple client โ itโs a comprehensive toolkit designed for modern document processing in the JVM ecosystem. It delivers unparalleled versatility, featuring support for parsing virtually all document formats (PDF, DOCX, XLSX, images, audio, and more), coupled with advanced PDF understanding (layout, reading order, and table structure). Developers gain a unified, expressive DoclingDocument format and flexible export options (Markdown, HTML, JSON). For security-conscious or restricted environments, the implementation offers local execution capabilities and extensive OCR support. Furthermore, itโs future-proof, supporting cutting-edge Visual Language Models (GraniteDocling), Audio ASR models, and providing plug-and-play integrations with tools like LangChain4j. This functionality is cleanly modularized across key artifacts: the framework-agnostic docling-api, the robust docling-client, and essential developer tools like docling-testing and the docling-testcontainers module for seamless local development and integration testing. This official release empowers Java developers to build sophisticated, document-aware applications with confidence and ease.
Links
- Docling Java: https://github.com/docling-project/docling-java
- Docling documentation: https://docling-project.github.io/docling/
- Docling GitHub: https://github.com/docling-project
- Granite Docling: https://huggingface.co/ibm-granite/granite-docling-258M