Advance Options

Overview

Advance Options allows you to fine tune your classifier using the parameters like Max Features, N-Gram Range, Default Stopwords, etc.

Advance Options

Max Features

Features are relevant words or n-grams selected from the training text for classification. Max Features is the top maximum number of features selected by the classifier for classifying text. By default, Max Features is set to 10000, which can be adjusted according to the requirement.

N-Gram Range

N-gram is a contiguous sequence of n-words. Classifier uses this setting to build n-gram feature vectors for classification. By default, the N-Gram Range is set to “Unigram, Bigram” i.e both Unigram and Bigram words will be considered for classification.

Below are the available options,

  • Unigrams (n=1, single word)
  • Bigrams (n=2, sequence of two words)
  • Trigrams (n=3, sequence of three words)
  • Fourgrams (n=4, sequence of four words)
  • Unigrams, Bigrams
  • Unigrams, Bigrams, Trigrams
  • Unigrams, Bigrams, Trigrams, Fourgrams
  • Bigrams, Trigrams
  • Bigrams, Trigrams, Fourgrams
  • Trigrams, Fourgrams

Default Filter Stopwords

Stopwords are common and non-important features that are unlikely to help in classification of text. For using custom stop words, “Default Filter StopWords” should be disabled. By default, this setting is enabled and the system provided stopwords are used for filtering stopwords.

Default Stopwords

Comma separated list of stopwords based on the language selected for the classifier. Custom stopwords can be added to the list by disabling “Default Filter StopWords” setting.

Normalize Weights

Weight normalization is useful when training data for the categories is unbalanced. Weight normalization should be disabled if data is required to remain unbalanced. By default, weight normalization is enabled.

Use Stemming

Stemming tries to transform words to its root form, which helps in generalizing feature patterns. Stemming should be disabled if generalization of features is not required. By default, stemming is enabled.