Feature Extraction

Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing. A characteristic of these large data sets is a large number of variables that require a lot of computing resources to process. Feature extraction is the name for methods that select and /or combine variables into features, effectively reducing the amount of data that must be processed, while still accurately and completely describing the original data set.

Feature Generation

This is the process of taking raw, unstructured data and defining features (i.e. variables) for potential use in your statistical analysis. For instance, in the case of text mining you may begin with a raw log of thousands of text messages (e.g. SMS, email, social network messages, etc) and generate features by removing low-value words (i.e. stopwords), using certain size blocks of words (i.e. n-grams) or applying other rules.

Both techniques are normally combined in classification.

Leave a Reply

Your email address will not be published. Required fields are marked *