TextGraphs-4: Graph-based Methods for Natural Language Processing
7th August 2009, SingaporeSession I: Opening | |
8:30–8:45 | Inauguration by Chairs |
8:45–9:48 | Invited Talk by Prof. Vittorio Loreto |
9:48–10:00 | Social (distributed) language modeling, clustering and dialectometry David Ellis |
10:00–10:30 | Coffee Break |
Session II: Special Theme | |
10:30–10:55 | Network analysis reveals structure indicative of syntax in the corpus of undeciphered
Indus civilization inscriptions Sitabhra Sinha, Raj Kumar Pan, Nisha Yadav, Mayank Vahia and Iravatham Mahadevan |
10:55–11:20 | Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences
in dialectology Martijn Wieling and John Nerbonne |
11:20–12:10 | Panel Discussion on "Bridging the gap between language dynamics and NLP: Can network theory help?" Panelists:
|
Session III: Semantics | |
13:50–14:15 | Random Walks for Text Semantic Similarity Daniel Ramage, Anna N. Rafferty and Christopher D. Manning |
14:15–14:40 | Classifying Japanese Polysemous Verbs based on Fuzzy C-means Clustering Yoshimi Suzuki and Fumiyo Fukumoto |
14:40–15:05 | WikiWalk: Random walks on Wikipedia for Semantic Relatedness Eric Yeh, Daniel Ramage, Christopher D. Manning, Eneko Agirre and Aitor Soroa |
15:05–15:18 | Measuring semantic relatedness with vector space models and random walks Amac¸ Herdadelen, Katrin Erk and Marco Baroni |
15:18–15:30 | Graph-based Event Coreference Resolution Zheng Chen and Heng Ji |
15:30–16:00 | Coffee Break |
Session IV: Classification and Clustering | |
16:00–16:25 | Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce Delip Rao and David Yarowsky |
16:25–16:50 | Opinion Graphs for Polarity and Discourse Classification Swapna Somasundaran, Galileo Namata, Lise Getoor and Janyce Wiebe |
16:50–17:15 | A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Nonliteral
Use of Multiword Expressions Linlin Li and Caroline Sporleder |
17:15–17:40 | Quantitative analysis of treebanks using frequent subtree mining methods Scott Martens |
17:40–18:00 | Closing Remarks |
Invited Talk: Collective Dynamics of Social Annotation
Prof. Vittorio LoretoThe enormous increase of popularity and use of the WWW has led in the recent years to important changes in the ways people communicate. An interesting example of this fact is provided by the now very popular social annotation systems, through which users annotate resources (such as web pages or digital photographs) with text keywords dubbed tags. Collaborative tagging has been quickly gaining ground because of its ability to recruit the activity of web users into effectively organizing and sharing vast amounts of information. Understanding the rich emerging structures resulting from the uncoordinated actions of users calls for an interdisciplinary effort. In particular concepts borrowed from statistical physics, such as random walks, and the complex networks framework, can effectively contribute to the mathematical modeling of social annotation systems. First I will introduce a stochastic model of user behavior embodying two main aspects of collaborative tagging: (i) a frequency-bias mechanism related to the idea that users are exposed to each \newpage \noindent others tagging activity; (ii) a notion of memory, or aging of resources, in the form of a heavy-tailed access to the past state of the system. Remarkably, this simple modeling is able to account quantitatively for the observed experimental features with a surprisingly high accuracy. This points in the direction of a universal behavior of users who, despite the complexity of their own cognitive processes and the uncoordinated and selfish nature of their tagging activity, appear to follow simple activity patterns. Next I will show how the process of social annotation can be seen as a collective but uncoordinated exploration of an underlying semantic space, pictured as a graph, through a series of random walks. This modeling framework reproduces several aspects, so far unexplained, of social annotation, among which the peculiar growth of the size of the vocabulary used by the community and its complex network structure that represents an externalization of semantic structures grounded in cognition and typically hard to access.