HateBertBN: a hybrid transformer based model for Bangla hate speech detection across various social contexts

Azhar, Tanvir; Mahmud, Tahsin; Hasan, Muhammad Asif; Uddin, Mohammed Nazim; Park, Seung-Bo

doi:10.1007/s10791-025-09804-x

상세 보기

HateBertBN: a hybrid transformer based model for Bangla hate speech detection across various social contexts

Azhar, Tanvir;
Mahmud, Tahsin;
Hasan, Muhammad Asif;
Uddin, Mohammed Nazim;
Park, Seung-Bo

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

The widespread use of online social media platforms has amplified the importance of efficient hate speech detection, especially in low-resource languages like Bengali. While traditional machine learning approaches show promise, deep learning is more effective in capturing the nuanced context of hate speech. Current challenges include a lack of diverse datasets and models capable of context-sensitive detection. To address these, we introduce HateCorpBN-XL, the largest labeled Bengali hate speech dataset to date, containing 65,251 comments across five categories: political (PoHS), religious (ReHS), misogynistic (MisoHS), slander (SlaHS), and xenophobic (XenHS). We also propose HateBertBN, a hybrid transformer-based model combining BanglaBERT embeddings with three neural network fusion strategies using CNN, LSTM, and MLP. We evaluate our approach on two tasks, Task-1: detecting hate speech in Bengali text classifying it as hateful or non-hateful and Task-2: categorizing hateful content into five distinct classes. For Task-1, all HateBertBN variants outperformed current transformer models, achieving an accuracy of 0.92 and a weighted F1-score of 0.92. In Task-2, the HateBertBN-MLP and HateBertBN-CNN variants achieved a notable 0.90 accuracy and weighted F1-score of 0.90, surpassing M-BERT, Distil-M-BERT, BanglaBERT, and XLM-R-Base. Although HateBertBN-LSTM performed slightly lower overall, it achieved strong F1-scores in the ReHS (0.93) and XenHS (1.00) categories. Overall, our hybrid model outperforms state-of-the-art approaches in both tasks, demonstrating its effectiveness and robustness.

키워드

Hate speech; Social media; Large language models; BERT; XLM-R; CNN; LSTM; MLP

제목: HateBertBN: a hybrid transformer based model for Bangla hate speech detection across various social contexts

저자: Azhar, Tanvir; Mahmud, Tahsin; Hasan, Muhammad Asif; Uddin, Mohammed Nazim; Park, Seung-Bo

DOI: 10.1007/s10791-025-09804-x

발행일: 2026-01-08

유형: Article

저널명: DISCOVER COMPUTING

권: 29

호: 1