a computer chip with the letter a on top of it

Data Science

Artificial Intelligence

Building a robust spam identifier


The objective of this dissertation is to create and evaluate a robust text spam classifier-
cation system using machine learning techniques.

The project aims to classify SMS messages as either ham (legitimate) or spam. The trained machine learning model will serve as the foundation for developing an SMS spam filter web application in the future.

The data used for this dissertation consists of 5,574 messages in English labelled as ham or spam.

The data were pre-processed using tokenisation, lemmatisation, stemming, and TF-IDF vectorisation techniques.

The methodology involved comparing and combining algorithms selected based on a literature review, including Support Vector Machine (SVM), K-Nearest Neighbours (KNN), and Extremely Randomized Trees (Extra Trees). Additionally, this study implemented a combined approach using Extra Trees and Random Forest with a soft voting classifier and the Synthetic Minority Oversampling Technique (SMOTE) to enhance classification performance.

Model performance was assessed using various metrics, including accuracy, precision, recall, F1-Score, and
ROC-AUC. T

The best classification was achieved with the voting classifier combining Extra Trees and Random Forest using a soft voting classifier, demonstrating an outstanding performance with 99% accuracy. Precision was 0.98% for ham and 0.99% for was 0.99% for ham and 0.98%for spam. Both message classes exhibited F1 scores of 99% accuracy, indicating excellent classification performance with high precision and recall across classes

https://github.com/Carmenmontanes/Hello-Spam/blob/Carmenmontanes-patch-1/Spamnan.py

The Pickle library was used for model serialization, enabling efficient deployment and providing a robust spam detection solution tested with real user-generated sentences.

The app is designed to filter text messages received on a mobile phone, allowing users to copy and paste suspicious messages into the spam detector feature. A crucial component of the app is its reporting function, which enables users to report spam messages to the police. This action sends the relevant message along with its metadata—including the sender’s number and timestamp—to local authorities or a centralized agency for further investigation.

This feature plays a critical role in preventing users from encountering spam and helps identify potentially illegal activities conducted via text messages

🥒 link

https://github.com/Carmenmontanes/Hello-Spam/blob/main/test%20of%20the%20model.ipynb

Design of the app

3 x 3 rubiks cube