Core Technologies

deep learning / supervised and unsupervised large-scale learning / generalized linear models / support vector machines / decision trees & random forests / clustering / distributed computing / Spark & Hadoop / sparse regression / reinforcement learning / map-reduce / and many many more.

Text Understanding

Unstructured text documents such as product reviews, websites or project reports contain infinite valuable information. Smart algorithms can unearth these treasures, by interpreting and “understanding” the content of huge datasets automatically.

 

Sample Applications

  • Sentiment Analysis: distinguish positive and negative posts in social media
  • Main Text Extraction: extract and summarize most important content in websites
  • Named Entity Extraction: identify companies, organizations, persons in unstructured texts
  • Machine Translation: translate text into arbitrary languages
Read More

Example: Sentiment Analysis

Sentiment Analysis decides for a text document whether it is positive, negative, mixed or neutral tone. It is used, for instance, to evaluate customer feedback in social media, where huge amounts of Tweets, facebook posts and blog entries are analyzed automatically.

Solution

We identified a set of approx. two million representative “features” that are indicative of the sentiment of a text. Sample features are: bag-of-words (which words are used), word-embeddings (semantic representation of each word as a vector), n-grams (consecutive sequences of several words), part-of-speech tags (occurrences of nouns, verbs or adjectives), usage of negation, sentiment-information of single words and combinations, use of punctuation, etc. These features are fed into a combination of SVM classifiers and Random Forests. The system was trained on ten-thousands of sample documents which were manually labeled as positive, neutral or negative, plus additional billions of unlabeled text-documents.

Quality Assessment

Our classifier achieves an accuracy of 68% on the target data. This is “state-of-the-art”, compared to other commercial tools, as we have shown in a scientific evaluation (cf. http://ceur-ws.org/Vol-1096/paper4.pdf).

Our technology proved its quality in the official SemEval competition, both 2014 and 2015. SemEval is the international annual competition for semantic text analysis. In both years, our solution achieved rank 8 in the final ranking, being the only tool that was among the top-ten both years (cf. http://alt.qcri.org/semeval2015/task10 and the published papers on our designed systems PDF 2014, PDF 2015).

Numerical Data Analysis

Data is everywhere, and the amount of data increases everyday. We handle “everything” that comes as numbers and tables: sales reports, monthly revenues, customer feedback form, stock exchange rates, sensor data, etc.

 

Sample Tasks

  • Trend prediction and forecasting: will the numbers go up or down in the future
  • Anomaly detection: identify outliers or radical changes
  • Clustering: find patterns of similar entities in huge datasets
  • Data Curation: cleanup data for further processing

Audio Processing

Audio signals can come from TV or radio, conference presentations, or Youtube videos. Most well-known is speech recognition, where the audio signal it transmitted into written text.

 

Sample Tasks

  • Speech Recognition: transmit spoken words into written text
  • Speaker Recognition: identifiy the current speaker
  • Background Noise Removal: eliminate unintended signals

Image and Video Analysis

Images are everywhere, and interpreting their content automatically can yield interesting insights for business processes. One question that we were ask recently was: How often appears my logo in TV shows?

 

Sample Tasks

  • Face Recognition: identify a person in a picture
  • Sentiment Analysis: is the emotion of an image positive or negative
  • Logo Detection: find appearances of company logos in images and videos
Contact form

captcha