This annotated copy of the program indicates papers of particular relevance to SIGMORPHON. Abstracts and links to the proceedings have been added.

Conference program for EMNLP 2011

Wednesday, July 27, 2011

Opening remarks and Invited Talk

Location: Pentland — Chair: Paola Merlo

  • 9:00—10:00 Object Detection Grammars David McAllester

Session 1: Plenary session

Location: Pentland — Chair: Jason Eisner

  • 11:00—11:25 Fast and Robust Joint Models for Biomedical Event Extraction Sebastian Riedel and Andrew McCallum
  • 11:25—11:50 Predicting Thread Discourse Structure over Technical Web Forums Li Wang, Marco Lui, Su Nam Kim, Joakim Nivre and Timothy Baldwin
  • 11:50—12:15 Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Yin-Wen Chang and Michael Collins
  • 12:15—12:40 Optimal Search for Minimum Error Rate Training Michel Galley and Chris Quirk

Session 2A: Syntax and Parsing

Location: Pentland East — Chair: Stephen Clark

  • 14:10—14:35 Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance Shay B. Cohen, Dipanjan Das and Noah A. Smith
  • 14:35—15:00 Multi-Source Transfer of Delexicalized Dependency Parsers Ryan McDonald, Slav Petrov and Keith Hall
  • 15:00—15:25 SMT Helps Bitext Dependency Parsing Wenliang Chen, Jun'ichi Kazama, Min Zhang, Yoshimasa Tsuruoka, Yujie Zhang, Yiou Wang, Kentaro Torisawa and Haizhou Li
  • 15:25—15:50 Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP Federico Sangati and Willem Zuidema

Session 2B: Semantics

Location: Prestonfield — Chair: Mirella Lapata

  • 14:10—14:35 A Generate and Rank Approach to Sentence Paraphrasing Prodromos Malakasiotis and Ion Androutsopoulos
  • 14:35—15:00 Correcting Semantic Collocation Errors with L1-induced Paraphrases Daniel Dahlmeier and Hwee Tou Ng
  • 15:00—15:25 Class Label Enhancement via Related Instances Zornitsa Kozareva, Konstantin Voevodski and Shanghua Teng
  • 15:25—15:50 A Joint Model for Extended Semantic Role Labeling Vivek Srikumar and Dan Roth

Session 2C: Sentiment Analysis and Opinion Mining

Location: Pentland West — Chair: Bo Pang

  • 14:10—14:35 Domain-Assisted Product Aspect Hierarchy Generation: Towards Hierarchical Organization of Unstructured Consumer Reviews Jianxing Yu, Zheng-Jun Zha, Meng Wang, Kai Wang and Tat-Seng Chua
  • 14:35—15:00 Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng and Christopher D. Manning
  • 15:00—15:25 Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities Lanjun Zhou, Binyang Li, Wei Gao, Zhongyu Wei and Kam-Fai Wong
  • 15:25—15:50 Compositional Matrix-Space Models for Sentiment Analysis Ainur Yessenalina and Claire Cardie

Session 3A: Machine Translation

Location: Pentland East — Chair: Phil Blunsom

  • 16:20—16:45 Training a Parser for Machine Translation Reordering Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno and Hideto Kazawa
  • 16:45—17:10 Inducing Sentence Structure from Parallel Corpora for Reordering John DeNero and Jakob Uszkoreit
  • 17:10—17:35 Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax Jiajun Zhang, Feifei Zhai and Chengqing Zong
  • 17:35—18:00 A novel dependency-to-string model for statistical machine translation Jun Xie, Haitao Mi and Qun Liu

Session 3B: NLP related Machine Learning

Location: Prestonfield — Chair: David Smith

  • 16:20—16:45 Bayesian Checking for Topic Models David Mimno and David Blei
  • 16:45—17:10 Dual Decomposition with Many Overlapping Components Andre Martins, Noah Smith, Mario Figueiredo and Pedro Aguiar
  • 17:10—17:35 Approximate Scalable Bounded Space Sketch for Large Data NLP Amit Goyal and Hal Daume III
  • 17:35—18:00 Optimizing Semantic Coherence in Topic Models David Mimno, Hanna Wallach, Edmund Talley, Miriam Leenders and Andrew McCallum

Session 3C: Discourse Dialogue and Pragmatics

Location: Pentland West — Chair: Oliver Lemon

  • 16:20—16:45 A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo, Anna Korhonen and Thierry Poibeau
  • 16:45—17:10 Linear Text Segmentation Using Affinity Propagation Anna Kazantseva and Stan Szpakowicz
  • 17:10—17:35 Minimally Supervised Event Causality Identification Quang Do, Yee Seng Chan and Dan Roth
  • 17:35—18:00 A Model of Discourse Predictions in Human Sentence Processing Amit Dubey, Frank Keller and Patrick Sturt

Thursday, July 28, 2011

Session 4: Plenary session

Location: Pentland — Chair: Michael Collins

  • 9:05—9:30Simple Effective Decipherment via Combinatorial Optimization Taylor Berg-Kirkpatrick and Dan Klein

    We present a simple objective function that when optimized yields accurate solutions to both decipherment and cognate pair identification problems. The objective simultaneously scores a matching between two alphabets and a matching between two lexicons, each in a different language. We introduce a simple coordinate descent procedure that efficiently finds effective solutions to the resulting combinatorial optimization problem. Our system requires only a list of words in both languages as input, yet it competes with and surpasses several state-of-the-art systems that are both substantially more complex and make use of more information. [PDF]

  • 9:30—9:55Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João Graça and Benjamin Snyder

    In this paper, we consider the problem of unsupervised morphological analysis from a new angle. Past work has endeavored to design unsupervised learning methods which explicitly or implicitly encode inductive biases appropriate to the task at hand. We propose instead to treat morphological analysis as a structured prediction problem, where languages with labeled data serve as training examples for unlabeled languages, without the assumption of parallel data. We define a universal morphological feature space in which every language and its morphological analysis reside. We develop a novel structured nearest neighbor prediction method which seeks to find the morphological analysis for each unlabeled language which lies as close as possible in the feature space to a training language. We apply our model to eight inflecting languages, and induce nominal morphology with substantially higher accuracy than a traditional, MDL-based approach. Our analysis indicates that accuracy continues to improve substantially as the number of training languages increases. [PDF]

  • 9:55—10:20 Training a Log-Linear Parser with Loss Functions via Softmax-Margin Michael Auli and Adam Lopez

Session 5A: Machine Translation

Location: Pentland East — Chair: Philipp Koehn

  • 11:00—11:25Large-Scale Cognate Recovery David Hall and Dan Klein

    We present a system for the large scale induction of cognate groups. Our model explains the evolution of cognates as a sequence of mutations and innovations along a phylogeny. On the task of identifying cognates from over 21,000 words in 218 different languages from the Oceanic language family, our model achieves a cluster purity score over 91%, while maintaining pairwise recall over 62%. [PDF]

  • 11:25—11:50 Domain Adaptation via Pseudo In-Domain Data Selection amittai axelrod, xiaodong he and jianfeng gao
  • 11:50—12:15 Language Models for Machine Translation: Original vs. Translated Texts Gennadi Lembersky, Noam Ordan and Shuly Wintner
  • 12:15—12:40 Better Evaluation Metrics Lead to Better Machine Translation Chang Liu, Daniel Dahlmeier and Hwee Tou Ng

Session 5B: Syntax and Parsing

Location: Prestonfield — Chair: Ryan McDonald

  • 11:00—11:25 Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation Reut Tsarfaty, Joakim Nivre and Evelina Andersson
  • 11:25—11:50 Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus Emily M. Bender, Dan Flickinger, Stephan Oepen and Yi Zhang
  • 11:50—12:15 Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming Kristian Woodsend and Mirella Lapata
  • 12:15—12:40 Bootstrapping Semantic Parsers from Conversations Yoav Artzi and Luke Zettlemoyer

Session 5C: Summarization and Generation

Location: Pentland West — Chair: Johanna Moore

  • 11:00—11:25 Timeline Generation through Evolutionary Trans-Temporal Summarization Rui Yan, Liang Kong, Congrui Huang, Xiaojun Wan, Xiaoming Li and Yan Zhang
  • 11:25—11:50 Corpus-Guided Sentence Generation of Natural Images Yezhou Yang, Ching Teo, Hal Daume III and Yiannis Aloimonos
  • 11:50—12:15 Corroborating Text Evaluation Results with Heterogeneous Measures Enrique Amigó, Julio Gonzalo, Jesus Gimenez and Felisa Verdejo
  • 12:15—12:40 Ranking Human and Machine Summarization Systems Peter Rankel, John Conroy, Eric Slud and Dianne O'Leary

Session 6A: Machine Translation

Location: Pentland East — Chair: Stefan Riezler

  • 14:10—14:35 Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel and Noah A. Smith
  • 14:35—15:00 A Word Reordering Model for Improved Machine Translation Karthik Visweswariah, Rajakrishnan Rajkumar, Ankur Gandhe, Ananthakrishnan Ramanathan and Jiri Navratil
  • 15:00—15:25 Feature-Rich Language-Independent Syntax-Based Alignment for Statistical Machine Translation Jason Riesa, Ann Irvine and Daniel Marcu
  • 15:25—15:50 Efficient retrieval of tree translation examples for Syntax-Based Machine Translation Fabien Cromieres and Sadao Kurohashi

Session 6B: Semantics

Location: Prestonfield — Chair: Hwee Tou Ng

  • 14:10—14:35 A generative model for unsupervised discovery of relations and argument classes from clinical texts Bryan Rink and Sanda Harabagiu
  • 14:35—15:00 Random Walk Inference and Learning in A Large Scale Knowledge Base Ni Lao, Tom Mitchell and William W. Cohen
  • 15:00—15:25 Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases Matthias Hartung and Anette Frank
  • 15:25—15:50 Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions Weiwei Guo and Mona Diab

Session 6C: Sentiment Analysis and Opinion Mining

Location: Pentland West — Chair: Benjamin Snyder

  • 14:10—14:35Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs Samuel Brody and Nicholas Diakopoulos

    We present an automatic method which leverages word lengthening to adapt a sentiment lexicon specifically for Twitter and similar social messaging networks. The contributions of the paper are as follows. First, we call attention to lengthening as a widespread phenomenon in microblogs and social messaging, and demonstrate the importance of handling it correctly. We then show that lengthening is strongly associated with subjectivity and sentiment. Finally, we present an automatic method which leverages this association to detect domain-specific sentiment- and emotion-bearing words. We evaluate our method by comparison to human judgments, and analyze its strengths and weaknesses. Our results are of interest to anyone analyzing sentiment in microblogs and social networks, whether for research or commercial purposes. [PDF]

  • 14:35—15:00 Personalized Recommendation of User Comments via Factor Models Deepak Agarwal, Bee-Chung Chen and Bo Pang
  • 15:00—15:25 Data-Driven Response Generation in Social Media Alan Ritter, Colin Cherry and William B. Dolan
  • 15:25—15:50 Predicting a Scientific Community's Response to an Article Dani Yogatama, Michael Heilman, Brendan O'Connor, Chris Dyer, Bryan R. Routledge and Noah A. Smith

Session 7A: Phonology Morphology Tagging Chunking and Segmentation

Location: Pentland East — Chair: Noah Smith

  • 16:20—16:45Non-parametric Bayesian Segmentation of Japanese Noun Phrases Yugo Murawaki and Sadao Kurohashi

    A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological analyzer. [PDF]

  • 16:45—17:10Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model Markus Dreyer and Jason Eisner

    We present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing from these paradigms, and discovers inflectional principles (grammar) that generalize to wholly unobserved words.

    Our Bayesian generative model of the data explicitly represents tokens, types, inflections, paradigms, and locally conditioned string edits. It assumes that inflected word tokens are generated from an infinite mixture of inflectional paradigms (string tuples). Each paradigm is sampled all at once from a graphical model, whose potential functions are weighted finite-state transducers with language-specific param- eters to be learned. These assumptions naturally lead to an elegant empirical Bayes inference procedure that exploits Monte Carlo EM, belief propagation, and dynamic programming. Given 50-100 seed paradigms, adding a 10-million-word corpus reduces prediction error for morphological inflections by up to 10%. [PDF]

  • 17:10—17:35 Multilayer Sequence Labeling Ai Azuma and Yuji Matsumoto
  • 17:35—18:00A Bayesian Mixture Model for PoS Induction Using Multiple Features Christos Christodoulopoulos, Sharon Goldwater and Mark Steedman

    In this paper we present a fully unsupervised syntactic class induction system formulated as a Bayesian multinomial mixture model, where each word type is constrained to belong to a single class. By using a mixture model rather than a sequence model (e.g., HMM), we are able to easily add multiple kinds of features, including those at both the type level (morphology features) and token level (context and alignment features, the latter from parallel corpora). Using only context features, our system yields results comparable to state-of-the art, far better than a similar model without the one-class-per-type constraint. Using the additional features provides added benefit, and our final system outperforms the best published results on most of the 25 corpora tested. [PDF]

Session 7B: Semantics

Location: Prestonfield — Chair: Mark Stevenson

  • 16:20—16:45 Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus Su Nam Kim and Preslav Nakov
  • 16:45—17:10 Linguistic Redundancy in Twitter Fabio Massimo Zanzotto, Marco Pennaccchiotti and Kostas Tsioutsiouliklis
  • 17:10—17:35 Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora Matteo Negri, Luisa Bentivogli, Yashar Mehdad, Danilo Giampiccolo and Alessandro Marchetti
  • 17:35—18:00 Literal and Metaphorical Sense Identification through Concrete and Abstract Context Peter Turney, Yair Neuman, Dan Assaf and Yohai Cohen

Session 7C: Spoken Language and IR/QA

Location: Pentland West — Chair: Steve Renals

  • 16:20—16:45 Syntactic Decision Tree LMs: Random Selection or Intelligent Design? Denis Filimonov and Mary Harper
  • 16:45—17:10 The Imagination of Crowds: Conversational AAC Language Modeling using Crowdsourcing and Large Data Sources Keith Vertanen and Per Ola Kristensson
  • 17:10—17:35 Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy! Alessandro Moschitti, Jennifer Chu-carroll, Siddharth Patwardhan, James Fan and Giuseppe Riccardi
  • 17:35—18:00 Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French Spence Green, Marie-Catherine de Marneffe, John Bauer and Christopher D. Manning

Friday, July 29, 2011

Session 8: Plenary session

Location: Pentland — Chair: Shuly Wintner

  • 9:05—9:30 Unsupervised Semantic Role Induction with Graph Partitioning Joel Lang and Mirella Lapata
  • 9:30—9:55 Structural Opinion Mining for Graph-based Sentiment Representation Yuanbin Wu, Qi Zhang, Xuanjing Huang and Lide Wu
  • 9:55—10:20 Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization Rui Yan, Jian-Yun Nie and Xiaoming Li

Session 9A: Machine Translation

Location: Pentland — Chair: John DeNero

  • 10:50—11:15 Tuning as Ranking Mark Hopkins and Jonathan May
  • 11:15—11:40 Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation. Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Och and Juri Ganitkevitch
  • 11:40—12:05 Hierarchical Phrase-based Translation Representations Gonzalo Iglesias, Cyril Allauzen, William Byrne, Adrià de Gispert and Michael Riley
  • 12:05—12:30Improved Transliteration Mining Using Graph Reinforcement Ali El Kahki, Kareem Darwish, Ahmed Saad El Din, Mohamed Abd El-Wahab, Ahmed Hefny and Waleed Ammar

    Mining of transliterations from comparable or parallel text can enhance natural language processing applications such as machine translation and cross language information retrieval. This paper presents an enhanced transliteration mining technique that uses a generative graph reinforcement model to infer mappings between source and target character sequences. An initial set of mappings are learned through automatic alignment of transliteration pairs at character sequence level. Then, these mappings are modeled using a bipartite graph. A graph reinforcement algorithm is then used to enrich the graph by inferring additional mappings. During graph reinforcement, appropriate link reweighting is used to promote good mappings and to demote bad ones. The enhanced transliteration mining technique is tested in the context of mining transliterations from parallel Wikipedia titles in 4 alphabet-based languages pairs, namely English-Arabic, English-Russian, English-Hindi, and English-Tamil. The improvements in F1-measure over the baseline system were 18.7, 1.0, 4.5, and 32.5 basis points for the four language pairs respectively. The results herein outperform the best reported results in the literature by 2.6, 4.8, 0.8, and 4.1 basis points for the four languages respectively. [PDF]

Session 9B: Semantics

Location: Prestonfield — Chair: Peter Turney

  • 10:50—11:15 Experimental Support for a Categorical Compositional Distributional Model of Meaning Edward Grefenstette and Mehrnoosh Sadrzadeh
  • 11:15—11:40 Cross-Cutting Models of Lexical Semantics Joseph Reisinger and Raymond Mooney
  • 11:40—12:05 Reducing Grounded Learning Tasks To Grammatical Inference Benjamin Börschinger, Bevan K. Jones and Mark Johnson
  • 12:05—12:30 Relation Extraction with Relation Topics Chang Wang, James Fan, Aditya Kalyanpur and David Gondek

Session 9C: Information Extraction

Location: Kirkland — Chair: Alessandro Moschitti

  • 10:50—11:15 Extreme Extraction — Machine Reading in a Week Marjorie Freedman, Lance Ramshaw, Elizabeth Boschee, Ryan Gabbard, Gary Kratkiewicz, Nicolas Ward and Ralph Weischedel
  • 11:15—11:40 Discovering Relations between Noun Categories Thahir Mohamed, Estevam Hruschka and Tom Mitchell
  • 11:40—12:05 Structured Relation Discovery using Generative Models Limin Yao, Aria Haghighi, Sebastian Riedel and Andrew McCallum
  • 12:05—12:30 Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances Burr Settles

Session 10A: Syntax and Parsing

Location: Pentland — Chair: Mark Steedman

  • 14:10—14:35 Third-order Variational Reranking on Packed-Shared Dependency Forests Katsuhiko Hayashi, Taro Watanabe, Masayuki Asahara and Yuji Matsumoto
  • 14:35—15:00 Training dependency parsers by jointly optimizing multiple objectives Keith Hall, Ryan McDonald, Jason Katz-Brown and Michael Ringgaard
  • 15:00—15:25 Structured Sparsity in Structured Prediction Andre Martins, Noah Smith, Mario Figueiredo and Pedro Aguiar
  • 15:25—15:50 Lexical Generalization in CCG Grammar Induction for Semantic Parsing Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater and Mark Steedman

Session 10B: Information Extraction

Location: Prestonfield — Chair: Sebastian Riedel

  • 14:10—14:35 Named Entity Recognition in Tweets: An Experimental Study Alan Ritter, Sam Clark, Mausam Mausam and Oren Etzioni
  • 14:35—15:00 Identifying Relations for Open Information Extraction Anthony Fader, Stephen Soderland and Oren Etzioni
  • 15:00—15:25 Active Learning with Amazon Mechanical Turk Florian Laws, Christian Scheible and Hinrich Schütze
  • 15:25—15:50 Bootstrapped Named Entity Recognition for Product Attribute Extraction Duangmanee Putthividhya and Junling Hu

Session 10C: Text Mining and NLP Applications

Location: Kirkland — Chair: Alexandre Klementiev

  • 14:10—14:35 Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI, Sachiko MASKAWA and Mizuki MORITA
  • 14:35—15:00 A Simple Word Trigger Method for Social Tag Suggestion Zhiyuan Liu, Xinxiong Chen and Maosong Sun
  • 15:00—15:25 Rumor has it: Identifying Misinformation in Microblogs Vahed Qazvinian, Emily Rosengren, Dragomir R. Radev and Qiaozhu Mei
  • 15:25—15:50 Exploiting Parse Structures for Native Language Identification Sze-Meng Jojo Wong and Mark Dras

Best Paper Award and Closing

Location: Pentland — Chair: Mark Johnson

  • 16:20—17:10 A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng

Please e-mail questions to .
Website hosted by the Department of Computing Science at the University of Alberta.