Abstract:Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation intent (why a work is cited) with cited content type (what part is cited), limiting their effectiveness in auto classification due to a dilemma between fine-grained type distinctions and practical classification reliability. We introduce SOFT, a Semantically Orthogonal Framework with Two dimensions that explicitly separates citation intent from cited content type, drawing inspiration from semantic role theory. We systematically re-annotate the ACL-ARC dataset using SOFT and release …
Abstract:Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation intent (why a work is cited) with cited content type (what part is cited), limiting their effectiveness in auto classification due to a dilemma between fine-grained type distinctions and practical classification reliability. We introduce SOFT, a Semantically Orthogonal Framework with Two dimensions that explicitly separates citation intent from cited content type, drawing inspiration from semantic role theory. We systematically re-annotate the ACL-ARC dataset using SOFT and release a cross-disciplinary test set sampled from ACT2. Evaluation with both zero-shot and fine-tuned Large Language Models demonstrates that SOFT enables higher agreement between human annotators and LLMs, and supports stronger classification performance and robust cross-domain generalization compared to ACL-ARC and SciCite annotation frameworks. These results confirm SOFT’s value as a clear, reusable annotation standard, improving clarity, consistency, and generalizability for digital libraries and scholarly communication infrastructures. All code and data are publicly available on GitHub this https URL.
| Comments: | Accepted at the 29th International Conference on Theory and Practice of Digital Libraries (TPDL 2025) |
| Subjects: | Digital Libraries (cs.DL); Computation and Language (cs.CL) |
| Cite as: | arXiv:2601.05103 [cs.DL] |
| (or arXiv:2601.05103v1 [cs.DL] for this version) | |
| https://doi.org/10.48550/arXiv.2601.05103 arXiv-issued DOI via DataCite (pending registration) | |
| Related DOI: | https://doi.org/10.1007/978-3-032-05409-8_12 DOI(s) linking to related resources |
Submission history
From: Changxu Duan [view email] [v1] Thu, 8 Jan 2026 16:48:36 UTC (179 KB)