Domain customization

The domain of the text matters a lot for automatic text analysis. Each subject area or industry field has its own grammar and special vocabulary, and common words tend to have very particular meanings.To analyze them correctly, an NLP system must be either trained on texts belonging to that domain, and/or be supplied with lexicons that encode that domain's grammar and vocabulary.

Legal texts, such as patents, is one example, where general-language parsers suffer a big accuracy drop. The word "said" is very common in patents, however it is never used as a verb, but as a determiner ("... the said switch, the said cables ..."). A parser trained on texts where "said" was always used as a verb, will get the structure of the entire sentence wrong causing problems for all downstream processing steps. As a result, the overall accuracy of the general-language parser on the patents domain falls by 20% compared to other domains.

With sentiment analysis, domain customization also makes a lot of difference for its accuracy. A customer review mentioning the live music is cool conveys positivity, while "cool coffee" implies the customer is unhappy. Similarly, the generally neutral word "weighty" is negative when talking about laptops, the generally negative "monster" becomes positive when talking about discounts. A sentiment analyzer trained on the wrong domain will not be able to deal with these nuances.

In GetSentiiment, there are several options to adapt the system to the domain of the texts being processing.

General domain

If you are processing texts, whose domains is not known in advance, you can use the general-language version of the system. To do that, you need to set "general" as the value of the "domain" parameter in an API request, or, if you are using the Excel addin, set "general" as the value of the "domain" field in the settings.xml file.

Prebuilt domains

There are five pre-built domains that are available to be used out-of-the-box: Electronics, Retail, Hospitality, Automotive and Telecommunication. The "domain" parameter values and topical categories of the domains are as follows:

 

Domain Topical Categories
electronics Accessories, Battery, BuildQuality, Design, Keyboard, Performance, Picture, Price, Sound, Usability
retail Availability, FoodProducts, NonFoodProducts, OnlineDelivery, Parking, Price, Service, StoreExperience
hospitality Ambience, Food, Location, Parking, Price, Service
automotive Accessories, Aesthetics, Capacity, Comfort, Ecofriendliness, Efficiency, Performance, Price, Safety
telecom CustomerSupport, DownloadSpeed, Price, SignalQuality

 

To use these customizations, simply set the domain value in your API requests.

Your own domain

Finally, you can create your own domain, customized to your specific application needs, either uploading external lexicons or editing them in a browser-based interface. In the browser interface, you will be able to create a domain by extending it from one of the five pre-built ones -- adding, deleting or renaming the topical categories. More details on this can be found in this tutorial.