CONCORD: Asynchronous Sparse Aggregation for Device-Cloud RAG under Document Isolation (opens in new tab)
Retrieval-augmented generation (RAG) has emerged as a pivotal technique for improving language models by incorporating external knowledge at inference time. As device-cloud collaborative inference makes it feasible to deploy small language models on edge devices, a new setting arises in which private documents remain on the device and public knowledge resides in the cloud. Privacy and policy constraints often forbid raw document exchange, creati...
Read the original article