상세 보기
Efficient LLM Adaptation to Low-Resource Languages via Cross-Lingual Semantic Anchoring
- Shukhratov, Bekhzod;
- Baydadaev, Shokhrukh;
- Kwon, Jang Woo
WEB OF SCIENCE
0SCOPUS
0초록
Most Large Language Models (LLMs) remain heavily English-centric, resulting in high tokenizer fertility, where a single word in a low-resource language decomposes into multiple sub-tokens. This inefficiency inflates computational costs and weakens adaptation performance. We introduce bilingual exchange for optimized dense (BXOD) embeddings, a multi-stage framework for efficient cross-lingual adaptation based on cross-lingual semantic anchoring. BXOD employs a cold-start bilingual embedding initialization that anchors new vocabularies for low-resource languages within the semantic space of a high-resource model, guided by the multilingual LaBSE encoder. The pipeline expands the tokenizer to reduce fertility and initializes new embeddings through semantic alignment rather than sub-token composition. BXOD was evaluated using Llama 3.2 (1B, 3B) models adapted to Uzbek and Malay, and Gemma 2 (2B) models adapted to Uzbek, Malay, Korean, and Spanish. It achieved significant gains in low-resource settings, such as improving Malay news classification accuracy from 29.95% to 50.35% (+68.1%, p < 0.001) and English-to-Uzbek translation BLEU from 3.22 to 4.74 (+47.2%, p < 0.001), while preserving English measuring massive multitask language understanding (MMLU) performance and accelerating convergence. Results highlight BXOD’s strength for typologically distant, low-resource languages, while showing reduced benefits for lexically similar, high-resource languages such as Spanish. BXOD thus establishes an effective and computationally efficient paradigm for extending LLMs across linguistically diverse settings. © 2013 IEEE.
키워드
- 제목
- Efficient LLM Adaptation to Low-Resource Languages via Cross-Lingual Semantic Anchoring
- 저자
- Shukhratov, Bekhzod; Baydadaev, Shokhrukh; Kwon, Jang Woo
- 발행일
- 2026-03
- 유형
- Article
- 저널명
- IEEE Access
- 권
- 14
- 페이지
- 55331 ~ 55344