Afficher la notice abrégée
dc.contributor.author |
MANSOUR KHOUDJA, Asmaa |
|
dc.date.accessioned |
2025-07-23T08:25:25Z |
|
dc.date.available |
2025-07-23T08:25:25Z |
|
dc.date.issued |
2025-06 |
|
dc.identifier.uri |
http://dspace.univ-chlef.dz/handle/123456789/2141 |
|
dc.description.abstract |
This thesis addresses the challenges of gender profiling and bot detection in Modern
Standard Arabic (MSA) using advanced machine learning techniques, including LSTM,
ARABERT, and Prompt-Based Learning. The research highlights the scarcity of
resources and research in Arabic Natural Language Processing (NLP) compared to
high-resource languages like English, aiming to bridge this gap by creating novel
datasets and exploring innovative algorithms. Two datasets were curated: one for
gender profiling (10,000 MSA texts) sourced from PAN 2018, Arabic Parallel Gender
Corpus 2.0, Google Forms, while the other dataset for bot detection (1,100 MSA
texts) was sourced from Fake News, and Automatically-Generated Arabic Tweets.
Preprocessing steps included tokenization, balancing, and translation of dialectal
Arabic to MSA. The experiments evaluated the performance of LSTM, ARABERT, and
Prompt-Based Learning, with ARABERT achieving the highest accuracy (92.4% for
gender profiling and 88% for bot detection), followed by Prompt-Based Learning (92.3%
and 80%) and LSTM (78.5% and 66.8%). The results demonstrate the superiority
of transformer-based models and the potential of prompt-based approaches for
low-resource languages. Key contributions include the creation of high-quality datasets,
the introduction of Prompt-Based Learning to Arabic NLP, and a comprehensive
comparison of model performance. Future work include focusing on dataset expansion,
optimizing prompt-based approaches, and cross-domain applications such as sentiment
analysis and machine translation. This research advances Arabic NLP by providing
tailored models and methodologies for author profiling and bot detection, offering
valuable insights for addressing similar challenges in low-resource language settings |
en_US |
dc.publisher |
Mourad LOUKAM |
en_US |
dc.subject |
LSTM |
en_US |
dc.subject |
Modern Standard Arabic |
en_US |
dc.subject |
Bot Detection |
en_US |
dc.subject |
Gender Profiling |
en_US |
dc.title |
Author Profiling based on Machine Learning Techniques for Modern standard Arabic language |
en_US |
dc.type |
Thesis |
en_US |
Fichier(s) constituant ce document
Ce document figure dans la(les) collection(s) suivante(s)
Afficher la notice abrégée