This repository contains code for parameter optimization of Support Vector Machines (SVM) using the Dry Bean Dataset. The dataset is a multi-class classification problem with 17 features and 7 classes. The goal of this project is to find the best parameters for the SVM model in order to achieve the highest accuracy possible.
The Dry Bean Dataset consists of 13611 instances with 16 attributes, including 17 feature attributes, 1 ID attribute, and 2 quality attributes. The dataset is publicly available at the UCI Machine Learning Repository. The features include geometric shape, form, and structure features of the beans that are important in selecting high-quality beans. The dataset is preprocessed and ready to be used for classification tasks.
1.) Area (A): The area of a bean zone and the number of pixels within its boundaries.
2.) Perimeter (P): Bean circumference is defined as the length of its border.
3.) Major axis length (L): The distance between the ends of the longest line that can be drawn from a bean.
4.) Minor axis length (l): The longest line that can be drawn from the bean while standing perpendicular to the main axis.
5.) Aspect ratio (K): Defines the relationship between L and l.
6.) Eccentricity (Ec): Eccentricity of the ellipse having the same moments as the region.
7.) Convex area (C): Number of pixels in the smallest convex polygon that can contain the area of a bean seed.
8.) Equivalent diameter (Ed): The diameter of a circle having the same area as a bean seed area.
9.) Extent (Ex): The ratio of the pixels in the bounding box to the bean area.
10.)Solidity (S): Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
11.)Roundness (R): Calculated with the following formula: (4piA)/(P^2)
12.)Compactness (CO): Measures the roundness of an object: Ed/L
13.)ShapeFactor1 (SF1)
14.)ShapeFactor2 (SF2)
15.)ShapeFactor3 (SF3)
16.)ShapeFactor4 (SF4)
17.)Class (Seker, Barbunya, Bombay, Cali, Dermosan, Horoz and Sira)
To run the code in this repository, you will need to have the following libraries installed:
- scikit-learn
- pandas
- numpy
- matplotlib
- seaborn
The repository contains a Jupyter notebook (svm_drybean.ipynb) that walks through the process of parameter optimization for SVM using the Dry Bean Dataset. The notebook includes detailed explanations and comments for each step of the process.
To run the notebook, you can simply open it in Jupyter and run the cells one by one. You can also run the notebook in Google Colab or any other similar environment.
The results of the parameter optimization process are summarized in the notebook. The best parameters for the SVM model were found using GridSearchCV and resulted in an accuracy of 97.8%.