TextGraphs-10

Prof. Ivan Titov is an Associate Professor in the Institute for Logic, Language and Computation (ILLC) at the University of Amsterdam (UvA). Prior to joining ILLC in April 2013, he was a head of a research group and a junior faculty member at the Saarland University (2009-13), following a postdoc at UIUC (2008-09). His research interests are in natural language processing and machine learning, and in recent years, in learning semantic representations for reasoning. His research is supported by ERC Starting grant and NWO VIDI, as well as industrial funding (including from Google and SAP). Prof. Titov is an action editor for the journal of machine learning research (JMLR). His other professional services include being an area chair for NLP-related Machine Learning at ACL 2016, EMNLP 2014 and EACL 2012, a senior PC member for IJCAI 2011, a best-paper award committee member at EACL 2012 as well as (regularly) a PC member / reviewer at ACL, ICML, NIPS, NAACL, EMNLP, CL journal, JAIR, TACL and others.

Inducing Semantic Representations from Text with Little or No Supervision

When: June 17, 2016; 9:10 am
Abstract: The lack of accurate methods for predicting meaning representations of text is the key bottleneck for many natural language processing applications such as question answering or text summarization. Although state-of-the-art semantic analyzers work fairly well on closed domains (e.g., interpreting natural language queries to databases), accurately predicting even shallow forms of semantic representations (e.g., underlying predicate-argument structure) for less restricted text remains a challenge. The reason for the unsatisfactory performance is reliance on supervised learning, with the amount of annotation required for accurate open-domain parsing exceeding what is practically feasible.

In this talk, Prof. Titov will consider approaches which induce semantic representations primarily from unannotated text, or more specifically, annotated with automatically-produced syntactic dependency representations. Unlike semantically-annotated data, such texts are easy to obtain for many languages and many domains which makes our approach particularly promising. He will contrast the generative framework (including our non-parametric Bayesian model) and a new approach called reconstruction-error minimization (REM) for semantics. He will show that REM achieves state-of-the-art results on the unsupervised semantic role labeling and relation discovery tasks, across languages without any language-specific tuning. Moreover, the REM framework lets us specialize the semantic representations to be useful for (basic forms of) semantic inference and integrate various forms of prior linguistic knowledge.

Dylan Wenzlau

General purpose semantic platform as an information retrieval system

When: June 17, 2016; 12:20 pm
Abstract: Over the past couple decades, information retrieval systems could be roughly categorized into two groups: keyword search and faceted search. Keyword search is the most popular offering, primarily driven by giant search engines like Google and Bing. Faceted search applications tend to be more narrow, and focused on specific verticals, such as e-commerce, travel, cars, etc. While traditional search provides convenience, breadth, and flexibility, it lacks when it comes to the precision and structure of the results. On the other hand, faceted search is more constrained, but the results often convey a higher degree of structure and context. Roughly, we can say keyword search retrieves documents, whereas faceted search returns records/entities. In essence, traditional search provides a more natural interface to prompt queries, while faceted search provides more optimal results. Therefore, the ideal experience would combine a natural language approach to query construction, combined with a structured knowledge base to power the results.

In this presentation we will show a working product, powered by a comprehensive knowledge graph (data) and the corresponding knowledge platform (software), which leverages insights from the fields of data ingestion, semantic data, natural language processing, and faceted search, to create a hybrid information retrieval experience. In order to achieve this experience, we had to build a vast knowledge graph, with billions of entities and relationships and hundreds of billions of facts. We cover dozens of verticals, from politics, to sports, to health, and have hundreds of entity-collections for each one. Our knowledge graph is seen by over 300 million eyeballs a month, both through our owned and operated websites, as well as through our partnerships with publishers and other enterprises.

Eduard Hovy

Prof. Eduard Hovy is a Research Professor at the Language Technologies Institute of Carnegie Mellon University and the Co-Director of the Department of Homeland Security's Center of Excellence on the Command, Control, and Interoperability Center for Advanced Data Analysis (CCICADA). He has also held multiple advisory and adjunct faculty positions in Beijing University of Posts and Telecommunications and the Universty of Waterloo, and until 2012, was at the University of Southern California as a Fellow of its Information Sciences Institute (ISI), as Director of the Human Language Group, as Research Associate Professor in USC's Computer Science Department, and as Director of Research for ISI's Digital Government Research Center (DGRC).

Taking Graph-Based Methods for NLP a Step Further

When: June 17, 2016; 2:15 pm
Abstract: Essentially all NLP problems exploit interrelationships among the data elements under study. Graphs using the relationships often provide powerful insights. The numerous variations of graphs include simple bipartite graphs (such as between equivalent words of translated sentences); nested tree or frame structures (such as syntax trees and semantic structures); and large connectivity maps (such as those reflecting connection topology in the Semantic Web or social media). Processing is usually not reflected the graph itself; graphs representing not the data elements to be processed but the processing modules, as in algorithm flowcharts or maps of compute servers, are uncommon in NLP.

The recent explosion of neural network processing in NLP throws this picture upside down. A simple neural network is a graph of simple neural processors connected in various configurations. Ensembles of simple networks can form more-complex neural networks, which are just other graphs. But these graphs are not just about processors, they also represent the data itself. A layer in an NN, or even an entire subNN within a larger one, is at the same time a processing module and a data representation.

This makes it much harder to understand how to represent an NLP problem appropriately. Since is difficult to tease apart the two facets (and the NLP community is frantically inventing ever more complex neural architectures every month), we should take a step back and ask some fundamental questions. For example, what is the power/capability of each type of NN graph, for both representation and processing? How can one assemble a NN that responsibly and parsimoniously encodes just what is needed to solve a given problem? Can one view NNs as themselves graph structures that can be learned or reasoned about? This talk has many more questions than answers, but they may be interesting to discuss.

Invited Speakers

Ivan Titov

Inducing Semantic Representations from Text with Little or No Supervision

Dylan Wenzlau

General purpose semantic platform as an information retrieval system

Eduard Hovy

Taking Graph-Based Methods for NLP a Step Further