TOGGLE: Temporal Logic-Guided Large Language Model Compression for Edge

View PDF

Abstract:Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose Temporal Logic-Guided Large Language Model Compression (TOGGLE), a novel framework that leverages Signal Temporal Logic (STL) to formally specify and enforce linguistic properties during compression. TOGGLE employs an STL robustness-guided Bayesian optimization to systematically explore layer-wise quantization and pruning configurations, generating…

View PDF

Abstract:Large Language Models (LLMs) deliver exceptional performance across natural language tasks but demand substantial computational resources, limiting their deployment on resource-constrained edge devices. Existing compression techniques, such as quantization and pruning, often degrade critical linguistic properties and lack formal guarantees for preserving model behavior. We propose Temporal Logic-Guided Large Language Model Compression (TOGGLE), a novel framework that leverages Signal Temporal Logic (STL) to formally specify and enforce linguistic properties during compression. TOGGLE employs an STL robustness-guided Bayesian optimization to systematically explore layer-wise quantization and pruning configurations, generating compressed models that formally satisfy specified linguistic constraints without retraining or fine-tuning. Evaluating TOGGLE on four LLM architectures (GPT-2, DeepSeek-V2 7B, LLaMA 3 8B, and Mistral 7B), we achieve up to 3.3x reduction in computational costs (FLOPs) and up to a 68.8% reduction in model size while satisfying all linguistic properties. TOGGLE represents the first integration of formal methods into LLM compression, enabling efficient, verifiable deployment of LLMs on edge hardware.


Comments:	Published in the IEEE ICCAD 2025 conference
Subjects:	Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Cite as:	arXiv:2512.16855 [cs.AI]
	(or arXiv:2512.16855v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2512.16855 arXiv-issued DOI via DataCite (pending registration)
Related DOI:	https://doi.org/10.1109/ICCAD66269.2025.11240962 DOI(s) linking to related resources

Submission history

From: Khaza Anuarul Hoque [view email] [v1] Thu, 18 Dec 2025 18:27:42 UTC (514 KB)

Submission history

Similar Posts