In April 2015 Microsoft launched Project Oxford, which comprised of a set of machine learning based REST API’s and SDKs that help developers easily add intelligent capabilities such as language understanding, speech recognition, image understanding, and face recognition into their applications.
Previously similar APIs, albeit more basic and search-oriented, were available under the BING brand. At Build 2016 Microsoft introduced a significant number of additions and extensions to the API’s available with Project Oxford and furthermore renamed the initiative as ‘Microsoft Cognitive Services’.
The API’s are ‘Cloud’-based and the cross-platform REST API’s allow access via all internet-connected devices. Every major platform including Android, iOS, Windows, and 3rd party IoT devices are supported. All the API’s offer a trial mode which have both rate limit, in terms of transactions per second or minute, and a monthly usage cap. A transaction is simply an API call. You can upgrade to paid tiers to unlock the restrictions.
Currently there are API’s for Academic Knowledge, Autosuggest, Search, News Search, Speech, Spell-check, Video Search, Web Search, Computer Vision, Emotion, Entity Linking, Face, Knowledge Exploration, Linguistic Analysis, Recommendations, Speaker Recognition, Text Analytics and Video. These API’s are being extended on an ongoing basis.
Clearly with such a wide array of functionality the commercial possibilities are virtually unlimited and as such, the application of these technologies is best viewed on a case-by-case basis. With that caveat in mind, it seems appropriate to settle on one broadly applicable aspect of one of the technologies, namely Speech Recognition with Intent.
Many companies have call centers to interact with their customer base, be it for the purpose of product support and/or informational services. Using the Speech API it is possible to transcribe wav files to text. More significantly when used in conjunction with the Linguistic Analysis API which is based around Natural Language Processing (NLP) which serves to identify the structure of text and hence makes it possible for organizations to conduct Sentiment-based analysis of their customer-calls.
The Linguistic Analysis API uses Sentence Separation, Tokenization, Part of Speech Tagging (categorization of each word into Nouns and Verbs etc.) and finally Constituency Parsing (also known as “Phrase Structure Parsing”). The goal is to identify key phrases and to see the modifiers and actions surrounding such phrases
Extensive reference is made to the Penn Treebank Project, which annotates naturally occurring text for linguistic structure. It produces skeletal parses showing rough syntactic and semantic information, and as a result makes it possible to distill message content down to Phrase Types.
When the resulting models are properly trained it is possible for an organization to extract significant value from recordings of customer calls. Now they can move beyond simply processing customer enquiries, concerns or support needs to create a textual repository comprised of key-words and implied sentiment for the purpose of streamlining their customer interaction processes and furthermore using the information therein for the purposes of improving the customer experience and making sure that they remain aware and responsive to the changing dynamics of their customer base.