sravi_pubs

[1]	Sujith Ravi and Zornitsa Kozareva. On-device Conversational Slot Extraction. In Proceedings of SIGDIAL (upcoming), 2021. [ bib ]
[2]	Ramakanth Pasunuru, Mengwen Liu, Mohit Bansal, Sujith Ravi, and Markus Dreyer. Efficiently Summarizing Text and Graph Encodings of Multi-Document Clusters. In Proceedings of NAACL, 2021. [ bib \| .pdf ]
[3]	Hengxin Fun, Sunil Gandhi, and Sujith Ravi. Efficient Retrieval Optimized Multi-task Learning. In arXiv preprint, 2021. [ bib \| .pdf ]
[4]	Arthur Bražinskas, Mengwen Liu, Ramesh Nallapati, Sujith Ravi, and Markus Dreyer. Transductive Learning for Abstractive News Summarization. In arXiv preprint, 2021. [ bib \| .pdf ]
[5]	Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. ProFormer: Towards on-device LSH projection based transformers. In Proceedings of EACL, 2021. [ bib \| .pdf ]
[6]	Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. On-device text representations robust to misspellings via projections. In Proceedings of EACL, 2021. [ bib \| .pdf ]
[7]	Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, and Sujith Ravi. Environment-agnostic multitask learning for natural language grounded navigation. In Proceedings of ECCV, 2020. [ bib \| .pdf ]
[8]	Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. Low-dimensional hyperbolic knowledge graph embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ bib \| .pdf ]
[9]	Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. Goemotions: A dataset of fine-grained emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [ bib \| .pdf ]
[10]	Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, and Sujith Ravi. Ultra fine-grained image semantic embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), 2020. [ bib \| .pdf ]
[11]	Zornitsa Kozareva and Sujith Ravi. ProSeqo: Projection sequence networks for on-device text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. [ bib ]
[12]	Prabhu Kaliamoorthi‎, Sujith Ravi, and Zornitsa Kozareva. PRADO: Projection attention networks for document classification on-device. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. [ bib ]
[13]	Sujith Ravi and Zornitsa Kozareva. On-device structured and context partitioned projection networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. [ bib \| .pdf ]
[14]	Trapit Bansal, Da-Cheng Juan, Sujith Ravi, and Andrew McCallum. A2N: Attending to neighbors for knowledge graph inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019. [ bib \| .pdf ]
[15]	Chinnadhurai Sankar and Sujith Ravi. Deep reinforcement learning for modeling chit-chat dialog with discrete attributes. In Proceedings of the 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2019. [ bib \| .pdf ]
[16]	Sujith Ravi. Efficient On-Device models using neural projections. In Proceedings of the International Conference on Machine Learning (ICML), 2019. [ bib \| .html ]
[17]	Chinnadhurai Sankar, Sujith Ravi, and Zornitsa Kozareva. Transferable neural projection representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019. [ bib \| .pdf ]
[18]	Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, and Sujith Ravi. Graph-RISE: Graph-regularized image semantic embedding. In arXiv preprint, 2019. [ bib \| .pdf ]
[19]	Azade Nazi, Will Hang, Anna Goldie, Sujith Ravi, and Azalia Mirhoseini. GAP: Generalizable approximate graph partitioning framework. In arXiv preprint, 2019. [ bib \| .pdf ]
[20]	Krishnamurthy Viswanathan, Sushant Sachdeva, Andrew Tomkins, and Sujith Ravi. Improved graph based semi-supervised learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. [ bib \| .pdf ]
[21]	Chinnadhurai Sankar and Sujith Ravi. Conditional utterance generation with discrete dialog attributes in open-domain dialog systems. In AAAI 2019 Deep-Dial Workshop, 2019. [ bib ]
[22]	Zornitsa Kozareva and Sujith Ravi. Fast and small on-device neural networks for short text natural language processing. In NeurIPS 2018 On-Device ML Workshop (MLPCD 2), 2018. [ bib ]
[23]	Chinnadhurai Sankar and Sujith Ravi. Modeling non-goal oriented dialog with discrete attributes. In NeurIPS 2018 Conversational AI Workshop, 2018. [ bib ]
[24]	Sujith Ravi and Zornitsa Kozareva. Self-Governing Neural Networks for on-device short text classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [ bib \| .pdf ] Deep neural networks reach state-of-the-art performance for wide range of natural language processing, computer vision and speech applications. Yet, one of the biggest challenges is running these complex networks on devices such as mobile phones or smart watches with tiny memory footprint and low computational capacity. We propose on-device Self-Governing Neural Networks (SGNNs), which learn compact projection vectors with local sensitive hashing. The key advantage of SGNNs over existing work is that they surmount the need for pre-trained word embeddings and complex networks with huge parameters. We conduct extensive evaluation on dialog act classification and show significant improvement over state-of-the-art results. Our findings show that SGNNs are effective at capturing low-dimensional semantic text representations, while maintaining high accuracy.
[25]	Thang D. Bui, Sujith Ravi, and Vivek Ramavajjala. Neural graph learning: Training neural networks using graphs. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. [ bib \| .pdf ] Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-regularised objective, namely "Neural Graph Machines", that can combine the power of neural networks and label propagation. This work generalises previous literature on graph-augmented training of neural networks, enabling it to be applied to multiple neural architectures (Feed-forward NNs, CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the neural networks to harness both labeled and unlabeled data by: (a) allowing the network to train using labeled data as in the supervised setting, (b) biasing the network to learn similar hidden representations for neighboring nodes on a graph, in the same vein as label propagation. Such architectures with the proposed objective can be trained efficiently using stochastic gradient descent and scaled to large graphs, with a runtime that is linear in the number of edges. The proposed joint training approach convincingly outperforms many existing methods on a wide range of tasks (multi-label classification on social graphs, news categorization, document classification and semantic intent classification), with multiple forms of graph inputs (including graphs with and without node-level features) and using different types of neural networks.
[26]	Ning Ye, Ariel Fuxman, Vivek Ramavajjala, Sergey Nazarov, Patrick McGregor, and Sujith Ravi. Photoreply: Automatically suggesting conversational responses to photos. In Proceedings of the 2018 World Wide Web Conference (WWW), 2018. [ bib \| http ] We introduce the problem of automatically suggesting conversational responses to photos and present an intelligent assistant called PhotoReply that solves the problem in the context of a messaging application. For example, when a user receives a photo showing a dog, PhotoReply suggests responses such as "Aww" and "Cute terrier!" This simplifies composing responses on constrained mobile keyboards, and delights users with uncanny insights into the photos they receive. PhotoReply is an integral part of the Allo chat application, being its predictive assistance feature with highest click-through rate. We formalize the problem of suggesting responses to images as an instance of multimodal learning which is akin to caption generation models, a topic that has recently received significant attention. We then present a system that "translates" image pixels to text responses. The system includes a conditioned language model, based on an LSTM, which, given an embedding of image pixels and the previous predicted words, calculates the probability of all words in a vocabulary of being the next word in the generated response and a triggering model trained with image embeddings and concept labels from a large concept taxonomy. We describe training of the models and a thorough experimental evaluation based on crowdsourced datasets and live traffic.
[27]	Sujith Ravi, Tom Rudick, and Yicheng Fan. Learning on-device conversational models. In NIPS 2017 MLPCD Workshop, 2017. [ bib ]
[28]	Sujith Ravi. ProjectionNet: Learning efficient on-device deep networks using neural projections. In arXiv preprint, 2017. [ bib \| .pdf ] Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly "on" device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks-a full trainer neural network (using existing architectures like Feed-forward NNs or LSTM RNNs) combined with a simpler "projection" network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficient-to-compute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question "how many neural bits are required to solve a given task?" using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets.
[29]	Thang D. Bui, Sujith Ravi, and Vivek Ramavajjala. Neural graph machines: Learning neural networks using graphs. In arXiv preprint, 2017. [ bib \| .pdf ] Label propagation is a powerful and flexible semi-supervised learning technique on graphs. Neural networks, on the other hand, have proven track records in many supervised learning tasks. In this work, we propose a training framework with a graph-regularised objective, namely "Neural Graph Machines", that can combine the power of neural networks and label propagation. This work generalises previous literature on graph-augmented training of neural networks, enabling it to be applied to multiple neural architectures (Feed-forward NNs, CNNs and LSTM RNNs) and a wide range of graphs. The new objective allows the neural networks to harness both labeled and unlabeled data by: (a) allowing the network to train using labeled data as in the supervised setting, (b) biasing the network to learn similar hidden representations for neighboring nodes on a graph, in the same vein as label propagation. Such architectures with the proposed objective can be trained efficiently using stochastic gradient descent and scaled to large graphs, with a runtime that is linear in the number of edges. The proposed joint training approach convincingly outperforms many existing methods on a wide range of tasks (multi-label classification on social graphs, news categorization, document classification and semantic intent classification), with multiple forms of graph inputs (including graphs with and without node-level features) and using different types of neural networks.
[30]	Cheng Li, Michael Bendersky, Vijay Garg, and Sujith Ravi. Related event discovery. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining (WSDM), 2017. [ bib \| .pdf ] In this paper, we consider the problem of discovering local events on the web, where events are entities extracted from pages with Schema.org annotations. Examples of such local events include small venue concerts, farmers markets, sports activities, etc. Given an event entity, we propose a graph-based framework for retrieving a ranked list of related events that a user is likely to be interested in or to attend. We demonstrate that this framework can be easily extended to the keyword search scenario as well, where the user issues a query to a search engine, hoping to find relevant events to attend. Due to the difficulty of obtaining ground-truth labels for event entities, which are temporal and are constrained by location, our general retrieval framework is unsupervised, and its graph-based formulation addresses (a) the challenge of feature sparseness and noisiness, and (b) semantic mismatch problem in a self-contained and principled manner. To validate our methods, we collect human annotations and conduct a comprehensive empirical study, analyzing the performance of our methods with regard to relevance, recall, and diversity. This study shows that our graph-based framework is significantly better than any individual feature for both entity and keyword search scenarios, and can be further improved with minimal supervision. Finally, we demonstrate that our framework can be useful in understanding the temporal and the localized nature of the events on the web.
[31]	Anjuli Kannan, Karol Kurach, Sujith Ravi, Tobias Kaufmann, Andrew Tomkins, Balint Miklos, Greg Corrado, Laszlo Lukacs, Marina Ganea, Peter Young, and Vivek Ramavajjala. Smart reply: Automated response suggestion for email. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2016. [ bib \| .pdf ] In this paper we propose and investigate a novel end-to-end method for automatically generating short email responses, called Smart Reply. It generates semantically diverse suggestions that can be used as complete email responses with just one tap on mobile. The system is currently used in Inbox by Gmail and is responsible for assisting with 10% of all mobile responses. It is designed to work at very high throughput and process hundreds of millions of messages daily. The system exploits state-of-the-art, large-scale deep learning. We describe the architecture of the system as well as the challenges that we faced while building it, like response diversity and scalability. We also introduce a new method for semantic clustering of user-generated content that requires only a modest amount of explicitly labeled data.
[32]	Harrie Oosterhuis, Sujith Ravi, and Michael Bendersky. Semantic video trailers. In ICML 2016 Workshop on Multi-View Representation Learning (MVRL), 2016. [ bib \| .pdf ] Query-based video summarization is the task of creating a brief visual trailer, which captures the parts of the video (or a collection of videos) that are most relevant to the user-issued query. In this paper, we propose an unsupervised label propagation approach for this task. Our approach effectively captures the multimodal semantics of queries and videos using state-of-the-art deep neural networks and creates a summary that is both semantically coherent and visually attractive. We describe the theoretical framework of our graph-based approach and empirically evaluate its effectiveness in creating relevant and attractive trailers. Finally, we showcase example video trailers generated by our system.
[33]	Justine Zhang, Ravi Kumar, Sujith Ravi, and Cristian Danescu-Niculescu-Mizil. Conversational flow in oxford-style debates. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL/HLT), 2016. [ bib \| .pdf ] Public debates are a common platform for presenting and juxtaposing diverging views on important issues. In this work we propose a methodology for tracking how ideas flow between participants throughout a debate. We use this approach in a case study of Oxford-style debates - a competitive format where the winner is determined by audience votes - and show how the outcome of a debate depends on aspects of conversational flow. In particular, we find that winners tend to make better use of a debate’s interactive component than losers, by actively pursuing their opponents' points rather than promoting their own ideas over the course of the conversation
[34]	Sujith Ravi and Qiming Diao. Large scale distributed semi-supervised learning using streaming approximation. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016. [ bib \| .pdf ] Traditional graph-based semi-supervised learning (SSL) approaches are not suited for massive data and large label scenarios since they scale linearly with the number of edges \|E\| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the label distribution per node thereby achieving a space reduction from O(m) to O(log m), under certain conditions. In this paper, we present a novel streaming graphbased SSL approximation that effectively captures the sparsity of the label distribution and further reduces the space complexity per node to O(1). We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. Finally, we propose a robust graph augmentation strategy using unsupervised deep learning architectures that yields further significant quality gains for SSL in natural language applications.
[35]	James B. Wendt, Michael Bendersky, Lluis Garcia-Pueyo, Vanja Josifovski, Balint Miklos, Ivo Krka, Amitabh Saikia, Jie Yang, Marc-Allen Cartright, and Sujith Ravi. Hierarchical label propagation and discovery for machine generated email. In Proceedings of the International Conference on Web Search and Data Mining (WSDM), 2016. [ bib \| .pdf ] Machine-generated documents such as email or dynamic webpages are single instantiations of a pre-defined structural template. As such, they can be viewed as a hierarchy of template and document specific content. This hierarchical template representation has several important advantages for document clustering and classification. First, templates capture common topics among the documents, while filtering out the potentially noisy variabilities such as personal information. Second, template representations scale far better than document representations since a single template captures numerous documents. Finally, since templates group together structurally similar documents, they can propagate properties between all the documents that match the template. In this paper, we use these advantages for document classification by formulating an efficient and effective hierarchical label propagation and discovery algorithm. The labels are propagated first over a template graph (constructed based on either term-based or topic-based similarities), and then to the matching documents. We evaluate the performance of the proposed algorithm using a large donated email corpus and show that the resulting template graph is significantly more compact than the corresponding document graph and the hierarchical label propagation is both efficient and effective in increasing the coverage of the baseline document classification algorithm. We demonstrate that the template label propagation achieves more than 91% precision and 93% recall, while increasing the label coverage by more than 11%.
[36]	Song Feng, Sujith Ravi, Ravi Kumar, Polina Kuznetsova, Wei Liu, Alex Berg, Tamara Berg, and Yejin Choi. Refer-to-as relations as semantic knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, 2015. [ bib \| .pdf ] We study Refer-to-as relations as a new type of semantic knowledge. Compared to the much studied Is-a relation, which concerns factual taxonomic knowledge, Refer-to-as relations aim to address pragmatic semantic knowledge. For example, a "penguin" is a "bird" from a taxonomic point of view, but people rarely refer to a "penguin" as a "bird" in vernacular use. This observation closely relates to the entry-level categorization studied in Psychology. We posit that Refer-to-as relations can be learned from data, and that both textual and visual information would be helpful in inferring the relations. By integrating existing lexical structure knowledge with language statistics and visual similarities, we formulate a collective inference approach to map all object names in an encyclopedia to commonly used names for each object. Our contributions include a new labeled data set, the collective inference and optimization approach, and the computed mappings and similarities.
[37]	Aaron Li, Amr Ahmed, Sujith Ravi, and Alex Smola. Reducing the sampling complexity of topic models. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2014. [ bib \| .pdf ] Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scaleslinearly with the number of actually instantiated topics kd in the document. For large document collections and in structured hierarchical models k_d << k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP [16, 4] and HDP [19]. At its core is the idea that dense, slowly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker's alias method.
[38]	Sujith Ravi, Bo Pang, Vibhor Rastogi, and Ravi Kumar. Great Question! Question quality in Community Q&A. In Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM), 2014. [ bib \| .pdf ] Asking the right question in the right way is an art (and a science). In a community question-answering setting, a good question is not just one that is found to be useful by other people—a question is good if it is also presented clearly and shows prior research. Using a community question-answering site that allows voting over the questions, we show that there is a notion of question quality that goes beyond mere popularity. We present techniques using latent topical models to automatically predict the quality of questions based on their content. Our best system achieves a prediction accuracy of 72%, beating out strong baselines by a significant amount. We also examine the effect of question quality on the dynamics of user behavior and the longevity of questions.
[39]	Sujith Ravi, Sergei Vassilivitskii, and Vibhor Rastogi. Parallel algorithms for unsupervised tagging. In Proceedings of the Transactions of the Association for Computational Linguistics (TACL), 2014. [ bib \| .pdf ] We propose a new method for unsupervised tagging that finds minimal models which are then further improved by Expectation Maximization training. In contrast to previous approaches that rely on manually specified and multi-step heuristics for model minimization, our approach is a simple greedy approximation algorithm DMLC (DISTRIBUTED-MINIMUM-LABEL-COVER) that solves this objective in a single step. We extend the method and show how to efficiently parallelize the algorithm on modern parallel computing platforms while preserving approximation guarantees. The new method easily scales to large data and grammar sizes, overcoming the memory bottleneck in previous approaches. We demonstrate the power of the new algorithm by evaluating on various sequence labeling tasks: Part-of-Speech tagging for multiple languages (including lowresource languages), with complete and incomplete dictionaries, and supertagging, a complex sequence labeling task, where the grammar size alone can grow to millions of entries. Our results show that for all of these settings, our method achieves state-of-the-art scalable performance that yields high quality tagging outputs.
[40]	Anirban Dasgupta, Ravi Kumar, and Sujith Ravi. Summarization through submodularity and dispersion. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 2013. [ bib \| .pdf ] We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011). In our framework the summarization desideratum is expressed as a sum of a submodular function and a nonsubmodular function, which we call dispersion; the latter uses inter-sentence dissimilarities in different ways in order to ensure non-redundancy of the summary. We consider three natural dispersion functions and show that a greedy algorithm can obtain an approximately optimal summary in all three cases. We conduct experiments on two corpora—DUC 2004 and user comments on news articles - and show that the performance of our algorithm outperforms those that rely only on submodularity.
[41]	Sujith Ravi. Scalable decipherment for machine translation via hash sampling. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), 2013. [ bib \| .pdf ] In this paper, we propose a new Bayesian inference method to train statistical machine translation systems using only non-parallel corpora. Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words) as side-information used for estimating translation models. In order to perform fast, efficient Bayesian inference in this framework, we then derive a hash sampling strategy that is inspired by the work of Ahmed et al. (2012). The new translation hash sampler enables us to scale elegantly to complex models (for the first time) and large vocabulary/corpora sizes. We show empirical results on the OPUS data - our method yields the best BLEU scores compared to existing approaches, while achieving significant computational speedups (several orders faster). We also report for the first time - BLEU score results for a largescale MT task using only non-parallel data (EMEA corpus).
[42]	Vidhya Navalpakkam, Ladawn Jentzsch, Rory Sayres, Sujith Ravi, Amr Ahmed, and Alex Smola. Measurement and modeling of eye-mouse behavior. In Proceedings of the 22nd International World Wide Web Conference (WWW), 2013. [ bib \| .pdf ] As search pages are becoming increasingly complex, with images and nonlinear page layouts, understanding how users examine the page is important. We present a lab study on the effect of a rich informational panel to the right of the search result column, on eye and mouse behavior. Using eye and mouse data, we show that the flow of user attention on nonlinear page layouts is different from the widely believed top-down linear examination order of search results. We further demonstrate that the mouse, like the eye, is sensitive to two key attributes of page elements – their position (layout), and their relevance to the user’s task. We identify mouse measures that are strongly correlated with eye movements, and develop models to predict user attention (eye gaze) from mouse activity. These findings show that mouse tracking can be used to infer user attention and information flow patterns on search pages. Potential applications include ranking, search page optimization, and UI evaluation.
[43]	Amr Ahmed, Sujith Ravi, Shravan Narayanamurthy, and Alex Smola. Fastex: Hash clustering with exponential families. In Proceedings of the 26th Conference on Neural Information Processing Systems (NIPS), 2012. [ bib \| .pdf ] Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters.
[44]	Bo Pang and Sujith Ravi. Revisiting the predictability of language: Response completion in social media. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL), 2012. [ bib \| .pdf ] The question “how predictable is English?” has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in conversational scenarios, to better understand predictability of user generated English.
[45]	Sujith Ravi and Kevin Knight. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2011. [ bib \| .pdf ] In this work, we tackle the task of machine translation (MT) without parallel training data. We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from non-parallel text.
[46]	Sujith Ravi and Kevin Knight. Bayesian inference for Zodiac and other homophonic ciphers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), 2011. [ bib \| .pdf ] We introduce a novel Bayesian approach for deciphering complex substitution ciphers. Our method uses a decipherment model which combines information from letter n-gram language models as well as word dictionaries. Bayesian inference is performed on our model using an efficient sampling technique. We evaluate the quality of the Bayesian decipherment output on simple and homophonic letter substitution ciphers and show that unlike a previous approach, our method consistently produces almost 100% accurate decipherments. The new method can be applied on more complex substitution ciphers and we demonstrate its utility by cracking the famous Zodiac-408 cipher in a fully automated fashion, which has never been done before.
[47]	Stephen Boxwell, Chris Brew, Jason Baldridge, Dennis Mehay, and Sujith Ravi. Semantic Role Labeling for CCG without treebanks. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP)., 2011. [ bib \| .pdf ] We describe a method for training a semantic role labeler for CCG in the absence of gold-standard syntax derivations. Traditionally, semantic role labeling is performed by placing human-annotated semantic roles on gold-standard syntactic parses, identifying patterns in the syntax-semantics relationship, and then predicting roles on novel syntactic analyses. The gold standard syntactic training data can be eliminated from the process by extracting training instances from semantic roles projected onto a packed parse chart. This process can be used to rapidly develop NLP tools for resource-poor languages of interest.
[48]	Zornitsa Kozareva and Sujith Ravi. Unsupervised name ambiguity resolution using a generative model. In Proceedings of the EMNLP Workshop on Unsupervised Learning in NLP, 2011. [ bib \| .pdf ] Resolving ambiguity associated with names found on the Web, Wikipedia or medical texts is a very challenging task, which has been of great interest to the research community. We propose a novel approach to disambiguating names using Latent Dirichlet Allocation, where the learned topics represent the underlying senses of the ambiguous name. We conduct a detailed evaluation on multiple data sets containing ambiguous person, location and organization names and for multiple languages such as English, Spanish, Romanian and Bulgarian. We conduct comparative studies with existing approaches and show a substantial improvement of 15 to 35% in task accuracy.
[49]	Sujith Ravi, Ashish Vaswani, Kevin Knight, and David Chiang. Fast, greedy model minimization for unsupervised tagging. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING), pages 940-948, 2010. [ bib \| .pdf ] Model minimization has been shown to work well for the task of unsupervised part-of-speech tagging with a dictionary. In Ravi and Knight (2009), the authors invoke an integer programming (IP) solver to do model minimization. However, solving this problem exactly using an integer programming formulation is intractable for practical purposes. We propose a novel two-stage greedy approximation scheme to replace the IP. Our method runs fast, while yielding highly accurate tagging results. We also compare our method against standard EM training, and show that we consistently obtain better tagging accuracies on test data of varying sizes for English and Italian.
[50]	Sujith Ravi, Jason Baldridge, and Kevin Knight. Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pages 495-503, 2010. [ bib \| .pdf ] We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.
[51]	Sujith Ravi and Kevin Knight. Does GIZA++ make search errors? Computational Linguistics, 36(3):295-302, 2010. [ bib \| .pdf ] Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in GIZA (Al-Onaizan et al. 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this paper, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, i.e., whether it returns alignments that are sub-optimal according to a trained model.
[52]	Jihie Kim, Erin Shaw, and Sujith Ravi. Mining student discussions to profile participation and scaffold learning. In Cristobal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan Baker, editors, The Handbook of Educational Data Mining, pages 299-310. CRC Press, 2010. [ bib ]
[53]	David Chiang, Jonathan Graehl, Kevin Knight, Adam Pauls, and Sujith Ravi. Bayesian inference for finite-state transducers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL/HLT), pages 447-455, 2010. [ bib \| .pdf ] We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finite-state transducers on end-toend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four dierent tasks demonstrate the genericity of this framework, and, where applicable, large improvements in performance over EM. We also show, for unsupervised part-of-speech tagging, that automatic run selection gives a large improvement over previous Bayesian approaches.
[54]	Sujith Ravi, Andrei Z. Broder, Evgeniy Gabrilovich, Vanja Josifovski, Sandeep Pandey, and Bo Pang. Automatic generation of bid phrases for online advertising. In Proceedings of the International Conference on Web Search and Data Mining (WSDM), pages 341-350, 2010. [ bib \| .pdf ] One of the most prevalent online advertising methods is textual advertising. To produce a textual ad, an advertiser must craft a short creative (the text of the ad) linking to a landing page, which describes the product or service being promoted. Furthermore, the advertiser must associate the creative to a set of manually chosen bid phrases representing those Web search queries that should trigger the ad. For efficiency, given a landing page, the bid phrases are often chosen first, and then for each bid phrase the creative is produced using a template. Nevertheless, an ad campaign (e.g., for a large retailer) might involve thousands of landing pages and tens or hundreds of thousands of bid phrases, hence the entire process is very laborious. Our study aims towards the automatic construction of online ad campaigns: given a landing page, we propose several algorithmic methods to generate bid phrases suitable for the given input. Such phrases must be both relevant (that is, reflect the content of the page) and well-formed (that is, likely to be used as queries to a Web search engine). To this end, we use a two phase approach. First, candidate bid phrases are generated by a number of methods, including a (monolingual) translation model capable of generating phrases not contained within the text of the input as well as previously “unseen” phrases. Second, the candidates are ranked in a probabilistic framework using both the translation model, which favors relevant phrases, as well as a bid phrase language model, which favors well-formed phrases. Empirical evaluation based on a real-life corpus of advertisercreated landing pages and associated bid phrases confirms the value of our approach, which successfully re-generates many of the human-crafted bid phrases and performs significantly better than a pure text extraction method.
[55]	Sujith Ravi and Kevin Knight. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conferenceof the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), pages 504-512, 2009. Nominated for the Best Paper Award. [ bib \| .pdf ] We describe a novel method for the task of unsupervised POS tagging with a dictionary, one that uses integer programming to explicitly search for the smallest model that explains the data, and then uses EM to set parameter values. We evaluate our method on a standard test corpus using different standard tagsets (a 45-tagset as well as a smaller 17-tagset), and show that our approach performs better than existing state-of-the-art systems in both settings.
[56]	Tugba Bodrumlu, Kevin Knight, and Sujith Ravi. A new objective function for word alignment. In Proceedings of the NAACL/HLT Workshop on Integer Programming for Natural Language Processing, pages 28-35, 2009. [ bib \| .pdf ] We develop a new objective function for word alignment that measures the size of the bilingual dictionary induced by an alignment. A word alignment that results in a small dictionary is preferred over one that results in a large dictionary. In order to search for the alignment that minimizes this objective, we cast the problemas an integer linear program. We then extend our objective function to align corpora at the sub-word level, which we demonstrate on a small Turkish-English corpus.
[57]	Sujith Ravi and Kevin Knight. Learning phoneme mappings for transliteration without parallel data. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL/HLT), pages 37-45, 2009. [ bib \| .pdf ] We present a method for performing machine transliteration without any parallel resources. We frame the transliteration task as a decipherment problem and show that it is possible to learn cross-language phoneme mapping tables using only monolingual resources. We compare various methods and evaluate their accuracies on a standard name transliteration task.
[58]	Sujith Ravi and Kevin Knight. Probabilistic methods for a japanese syllable cipher. In Proceedings of the 22nd International Conference on the Computer Processing of Oriental Languages (ICCPOL), pages 270-281, 2009. [ bib \| .pdf ] This paper attacks a Japanese syllable-substitution cipher. We use a probabilistic, noisy-channel framework, exploiting various Japanese language models to drive the decipherment. We describe several innova- tions, including a new objective function for searching for the highest- scoring decipherment. We include empirical studies of the relevant phenomena, and we give improved decipherment accuracy rates.
[59]	Sujith Ravi and Kevin Knight. Attacking letter substitution ciphers with integer programming. Cryptologia, 33(4):321-334, 2009. [ bib \| http ] We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy varies as a function of cipher length and n-gram order. We also make an empirical investigation of Shannon's (1949) theory of uncertainty in decipherment.
[60]	Sujith Ravi and Kevin Knight. Attacking decipherment problems optimally with low-order n-gram models. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 812-819, 2008. [ bib \| .pdf ] We introduce a method for solving substitution ciphers using low-order letter n-gram models. This method enforces global constraints using integer programming, and it guarantees that no decipherment key is overlooked. We carry out extensive empirical experiments showing how decipherment accuracy varies as a function of cipher length and n-gram order. We also make an empirical investigation of Shannon's (1949) theory of uncertainty in decipherment.
[61]	Sujith Ravi, Kevin Knight, and Radu Soricut. Automatic prediction of parser accuracy. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 887-896, 2008. [ bib \| .pdf ] Statistical parsers have become increasingly accurate, to the point where they are useful in many natural language applications. However, estimating parsing accuracy on a wide variety of domains and genres is still a challenge in the absence of gold-standard parse trees. In this paper, we propose a technique that automatically takes into account certain characteristics of the domains of interest, and accurately predicts parser performance on data from these new domains. As a result, we have a cheap (no annotation involved) and effective recipe for measuring the performance of a statistical parser on any given domain.
[62]	Sujith Ravi and Marius Pasca. Using structured text for large-scale attribute extraction. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM), pages 1183-1192, 2008. [ bib \| http ] We propose a weakly-supervised approach for extracting class attributes from structured text available within Web documents. The overall precision of the extracted attributes is around 30% higher than with previous methods operating on Web documents. In addition to attribute extraction, this approach also automatically identifies values for a subset of the extracted class attributes.
[63]	Jihie Kim, Erin Shaw, Sujith Ravi, Erin Tavano, Aniwat Arromratana, and Pankaj Sarda. Scaffolding on-line discussions with past discussions: An analysis and pilot study of pedabot. In Proceedings of the 9th International Conference on Intelligent Tutoring Systems Conference (ITS), pages 343-352, 2008. [ bib \| .pdf ] PedaBot is a new discussion scaffolding application designed to aid student knowledge acquisition, promote reflection about course topics and encourage student participation in discussions. It dynamically processes student discussions and presents related discussions from a knowledge base of past discussions. This paper describes the system and presents a comparative analysis of the information retrieval techniques used to respond to free-form student discussions, a combination of topic profiling, term frequency-inverse document frequency, and latent semantic analysis. Responses are presented as annotated links that students can follow and rate. We report a pilot study of PedaBot based on student viewings, student ratings, and a small survey. Initial results indicate that there is a high level of student interest in the feature and that its responses are moderately relevant to student discussions.
[64]	Sujith Ravi and Jihie Kim. Profiling student interactions in threaded discussions with speech act classifiers. In Proceedings of the 13th International Conference on Artificial Intelligence in Education (AIED), pages 357-364, 2007. [ bib \| .pdf ] On-line discussion is a popular form of web-based computer-mediated communication and is an important medium for distance education. Automatic tools for analyzing online discussions are highly desirable for better information management and assistance. This paper presents an approach for automatically profiling student interactions in on-line discussions. Using N-gram features and linear SVM, we developed “speech act” classifiers that identify the roles that individual messages play. The classifiers were used in finding messages that contain questions or answers. We then applied a set of thread analysis rules for identifying threads that may have unanswered questions and need instructor attention. We evaluated the results with three human annotators, and 70-75% of the predictions from the system were consistent with human answers.
[65]	Sujith Ravi, Jihie Kim, and Erin Shaw. Mining on-line discussions: Assessing technical quality for student scaffolding and classifying messages for participation profiling. In Proceedings of the Educational Data Mining Workshop in the 13th International Conference on Artificial Intelligence in Education (AIED), 2007. [ bib ] On-line collaborative discussions play an important role in distance education and web-enhanced courses. Automatic tools for assessing student activities and promoting collaborative problem solving can provide a better learning experience for students and also offer useful assistance to teachers. This paper presents two novel instructional tools that apply data mining and information retrieval techniques. First, we describe an approach that could be used to scaffold undergraduate student discussions by retrieving useful information from past student discussions. The tool exploits both the discussions from the same undergraduate course and the ones from a graduate-level course. The second part of the paper presents an instructional tool that profiles student contributions with respect to student genders and the roles that students play in discussion. We apply speech act classifiers that automatically identify whether the given message contains questions and/or answers, and use the classification results in profiling male and female student contributions. Our initial evaluation of the scaffolding tool shows that discussions from the same course contain more number of similar concepts than the ones from the graduate-level course. However, technical quality of graduate-level discussions is higher. The results from the profiling tool indicate that female participation in undergraduate-level discussions is lower than that in graduate-level discussions, and graduate female students post more questions and answers compared to undergraduate female students.

This file was generated by bibtex2html 1.96.