My workmate Alessandro and I recently finished a project report on spam filtering using basic Machine Learning algorithms for the Data Mining course here at UPC. Here’s the abstract and complete report in PDF.
Spam is one of the major threats in the use of electronic mail nowadays. It consumes precious connection bandwidth, slows down mail servers and wastes people’s time. In this project we study the performance of a variety of supervised-learning models including Logistic Regression, Naïve Bayes, Neural Networks, K-NearestNeighbours, and Quadratic and Linear Discriminant Analysis as well as their behaviour after a number of feature selection methods. Finally, we briefly compare our results with those of some previous studies using our same database.