Traditional text classification in the Classify module (using Natural Language Processing) has an advantage in high performance, however comes with time-consuming challenges:
Creating Training Data:
- Challenge: It takes a lot of effort to create suitable training data manually.
- Limitation: The amount of data generated is usually small, around a few hundred records per category at most.
Topic Segregation Issues:
- Challenge: Selecting topics can be subjective and may not align well with the actual content.
- Issue: Topics can be unbalanced, leading to a significant difference in the amount of training data for each topic.
Subjectivity in Text Assignment:
- Challenge: Assigning texts to specific topics is subjective, and similar texts might end up in different categories, especially when created by different people.
- Requirement: Each language requires its own training data for the classification mode
In summary, building text classification models demands substantial effort, specific domain knowledge, and often involves multiple iterations to achieve acceptable accuracy. This process can be time-consuming and costly in terms of both time and money.
TopcAI is a fully automated system of unsupervised topic detection and sentiment analysis using pre-trained NLP neural networks.
With this tool, you can upload a target source, receive suggested "most relevant" topics, categorise and verify them, and select them in order to create a fully automatic topic model that can be used on any source within the sandsiv+ platform, without the need for preliminary manual classification or topic extraction.
This tool can be accessed under the module AI in the menu bar.
The TopicAI interface includes 2 tabs: Topic Sets and Applied Topic Sets.
- Topic Sets: List of created Topic sets with details and actions related to the configuration of these topic sets;
- Applied Topic Sets: List of topic sets which have been applied to specific sources & text columns with further details and actions.
Languages: The languages used in the Topic Set.
Sources: The sources which have been used to build the Topic Set.
Roles: View a list of users who have permissions to view or edit the Topic Set. *
Actions: Share (extend permissions to users to view or edit the Topic set), Apply to source (apply the Topic Set to a target source), Delete (delete the Topic Set - only allows for users with full permissions).
*By default the user who creates the Topic set is the “Owner” of the classifier and has full rights on it (read/edit/delete). The user or the support team can give “full”, “editor” or "view" rights to other users. A user with “Editor” rights can only edit the topic set - modify list of labels or add (not remove) additional languages to classifier.
Creating a new Topic Set
Press on the "Add topic set" button. The following page will appear:
Once the source and text columns are configured and Show topics is pressed, a list of suggested relevant topics appear (these topics are identified automatically by the tool based on their relevance in the source).
Clicking on the topics will turn them green and add them to the list of topics you want to use in this Topic Set. Furthermore, by checking the check-box you can add similar topics to a category in order to group them.
Once the configurations have been made (source and column selected) and the required topics are selected (in green), you can adjust the Accuracy Threshold which will determine the sensitivity of the topic being matched with the text-case (with a value from 0-1). A high accuracy threshold (>6) will yield a more accurate classification (dismissing classifications with a low accuracy) and therefore may provide less topics per text. A low accuracy threshold (<6) will yield a less accurate classification, and provide more topics per text.
Chart View: In chart view, you get an overview of your topics and their respective occurrence vs probability threshold. (You want to aim for higher occurrence on the higher probability threshold Otherwise, it may mean that your topic is not relevant to the dataset).
Table View: Table view gives you a full view of the text cases, organised by labelled topic and sorted by probability score.
Applying a topic set to a source
Once the Topic Set is created, you can apply it to a target source in the Actions field by clicking on "Apply to Source".
The list of options you'll see are the following:
- Dropdown with “Topic set” - the topic set you're applying is automatically pre-selected.
- Dropdown with “Source” - list of available target sources within the platform to choose from (surveys or data in the Store module);
- Checkbox "Sentiment" - apply a sentiment classification (to further classify data with "Positive", "Negative", "Neutral" labels);
- Checkbox “Multi topic” - allow classification of multiple topics in a text case. You can further configure this application by moving sliders for “Accuracy Threshold” (0=low, 1=high) and “Categories count” (count of max categories that can be classified in a text case).
Multi topic classification works with sources that were applied, it enables the following options:
- Accuracy Threshold will determine the sensitivity of the topic being matched with the text-case (with a value from 0-1)
- Categories count is a customised number of categories your text responses will be classified into.
By enabling both of these configurations, you allow the tool to perform Topic Based Sentiment Analysis - providing a sentiment for every topic discovered in a text-case. In your exported dataset, you will find a sentiment column for every topic classified.
Here is an example of Topic Based Sentiment Analysis:
You can group similar topics into Categories, allowing you to organise your topics by need and create multi-level classifications.
Example: Support, Agent, Customer service and Contact person topics can be categorised as "Customer Support".