Feature Selection
What is Feature Selection?
Feature selection is the process of selecting a subset of features from the dataset that are the most relevant for predicting the target variable. A smart feature selection process is especially important for large datasets since it reduces model complexity, overfitting, and computational time, and increases the model accuracy. In machine learning terminology, a "feature" is an individual measurable property or characteristic of the phenomenon being observed (often represented as columns in a dataset).
Why is Feature Selection necessary?
When a dataset contains too many variables, machine learning algorithms can learn from noise or irrelevant data. This causes overfitting, a scenario where the model performs exceptionally well on its training data but fails to make accurate predictions on new, unseen data. Feature selection systematically removes this irrelevant or redundant data, ensuring the algorithm only learns from the most predictive variables. It also significantly decreases the memory and processing power required to train the algorithms.
What are the main categories of Feature Selection techniques?
From a theoretical perspective, feature selection methods are grouped into three main categories:
1. Filter methods: Evaluate the relevance of features based on their statistical properties (such as correlation with the target variable) independently of any specific machine learning algorithm.
2. Wrapper methods: Evaluate different combinations of features by repeatedly training and testing a specific machine learning model to find the best-performing subset.
3. Embedded methods: Perform feature selection automatically as an integrated step during the training process of the machine learning algorithm itself.
How does Feature Selection differ from Dimensionality Reduction (Feature Extraction)?
Both processes reduce the number of variables in a dataset, but their execution differs.
- Feature selection keeps a subset of the original, unaltered features and completely discards the rest.
- Dimensionality reduction (or feature extraction) creates entirely new features by mathematically combining the original variables.
Feature selection maintains the original meaning of the data, while feature extraction transforms it into a new numerical format.