Dépôt DSpace/Manakin

Author Profiling based on Machine Learning Techniques for Modern standard Arabic language

Afficher la notice abrégée

dc.contributor.author MANSOUR KHOUDJA, Asmaa
dc.date.accessioned 2025-07-23T08:25:25Z
dc.date.available 2025-07-23T08:25:25Z
dc.date.issued 2025-06
dc.identifier.uri http://dspace.univ-chlef.dz/handle/123456789/2141
dc.description.abstract This thesis addresses the challenges of gender profiling and bot detection in Modern Standard Arabic (MSA) using advanced machine learning techniques, including LSTM, ARABERT, and Prompt-Based Learning. The research highlights the scarcity of resources and research in Arabic Natural Language Processing (NLP) compared to high-resource languages like English, aiming to bridge this gap by creating novel datasets and exploring innovative algorithms. Two datasets were curated: one for gender profiling (10,000 MSA texts) sourced from PAN 2018, Arabic Parallel Gender Corpus 2.0, Google Forms, while the other dataset for bot detection (1,100 MSA texts) was sourced from Fake News, and Automatically-Generated Arabic Tweets. Preprocessing steps included tokenization, balancing, and translation of dialectal Arabic to MSA. The experiments evaluated the performance of LSTM, ARABERT, and Prompt-Based Learning, with ARABERT achieving the highest accuracy (92.4% for gender profiling and 88% for bot detection), followed by Prompt-Based Learning (92.3% and 80%) and LSTM (78.5% and 66.8%). The results demonstrate the superiority of transformer-based models and the potential of prompt-based approaches for low-resource languages. Key contributions include the creation of high-quality datasets, the introduction of Prompt-Based Learning to Arabic NLP, and a comprehensive comparison of model performance. Future work include focusing on dataset expansion, optimizing prompt-based approaches, and cross-domain applications such as sentiment analysis and machine translation. This research advances Arabic NLP by providing tailored models and methodologies for author profiling and bot detection, offering valuable insights for addressing similar challenges in low-resource language settings en_US
dc.publisher Mourad LOUKAM en_US
dc.subject LSTM en_US
dc.subject Modern Standard Arabic en_US
dc.subject Bot Detection en_US
dc.subject Gender Profiling en_US
dc.title Author Profiling based on Machine Learning Techniques for Modern standard Arabic language en_US
dc.type Thesis en_US


Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Chercher dans le dépôt


Parcourir

Mon compte