Understanding Outliers when Training Context Themes

What are ‘Outliers’ and how can they affect my Context Theme?

When training a context theme, outliers refer to data points that do not fit well with the rest of that theme’s data set. Outliers arise due to various reasons, most notably when there are:

Additions of unclear examples to the context theme
Examples wrongly categorised while training the theme

Outliers can skew the model and analysis of a context theme, thereby affecting the accuracy of the theme’s outputs. In other words, the theme may return verbatim that is not representative of the context theme, because it is influenced by the outliers themselves. This can affect verbatim volumes and skew sentiment (if there are enough outliers to contribute a bias)!

How does the Wordnerds platform help you to identify and correct Outliers?

When 50 or more examples are added to a context theme, the system will place a red box around any examples that it deems to be outliers. Outliers are mathematically identified based on the probability of the verbatim to belong within the context theme (or not).

In other words, the platform will identify:

Examples that the model thinks belong in the theme (but have been tagged as ‘no’)
Examples that the platform thinks do NOT belong in the theme (but have been tagged ‘yes’)

Note! You may need to skip between consecutive pages of examples to see any outliers. Run through all your pages to make sure you've caught them all!

In the above example:

The context theme has been trained to find verbatim that refers to customers taking time off work to be at home for a scheduled repair appointment. In this example we can see that:

(1) The model is struggling to classify verbatim that refers to a person not working so they could be at home for an appointment, and believes this should be marked as a positive example of the theme. This seems correct, as per the theme name. Therefore the best action is to correct the outlier. To do this simply select the green tick to correctly classify the example and hit 'retrain'.

(2) The model is struggling to classify verbatim that refers to double glazing, and believes this should be marked as a negative example of the theme. This seems correct, as per the theme name. Therefore the best action is to correct the outlier. Reclassify the example and select 'retrain'.

(3) The model is struggling to classify an example about an appointment being on the weekend, that has previously been marked a negative example of the theme. Instead, the model believes this should be marked as a positive example. However, this does not seem to fit the theme in question. Therefore, no action is needed on this outlier.

Note - false positive/negative outliers be symptomatic of the context theme requiring more training.

❗Important! Make sure you retrain your context theme after correcting outliers, so the model can adjust itself, based on your input.

✍️ Article written by Izzie, Product Manager

Understanding Outliers when Training Context Themes

Our Outliers feature adds a layer of data-driven rationale to help users quickly tune the effectiveness of their theme.

What are ‘Outliers’ and how can they affect my Context Theme?

How does the Wordnerds platform help you to identify and correct Outliers?