University of Southern California

Title: Information Retrieval for Virtual Worlds

Abstract:

Computer simulated virtual worlds have become increasingly important in recent years. These worlds range from off-line setups where a single person interacts with a single computer generated character to massive on-line worlds where tens of thousands of people come together interacting with each other and numerous virtual characters. More and more people are using these computer-simulated environments for education, training, communication, and entertainment. These worlds are becoming a source for acquiring and polishing real-world skills. They are also getting used for modeling and analysis of real- world human behavior patterns. Creating effective tools both for analysis and construction of virtual words is highly important.

In this talk I will show how statistical natural language processing (NLP) techniques can be applied to address this problem. In the first part of the talk I will discuss how to use NLP approaches such as language modeling and conditional random fields to build virtual characters capable of natural language understanding (NLU). I will describe three different methods for creating NLU subsystems for virtual characters of different complexities. I will focus my presentation on a novel text classification algorithm that supports creation of simple and effective virtual characters. This algorithm builds on ideas from cross-lingual information retrieval. I will describe experiments that show that the algorithm outperforms traditional classification techniques and remains very robust in the presence of partially correct language input. In the second part of the talk, I will show how statistical language modelling, text classification and clustering can be applied to analyze players' conversations in an online virtual world and how this analysis can be used to detect interesting player activities, players participating in those activities, and interaction patterns.

Biography:

Dr. Anton Leuski is a Research Scientist at the Institute for Creative Technologies with the University of Southern California. He holds a Ph.D. in Computer Science from the University of Massachusetts at Amherst. His research interests center around interactive information access, human-computer interaction, and machine learning. Dr. Leuski's recent work has focused on natural language problems that facilitate dialog between humans and virtual characters, specifically language understanding and classification, natural language generation, and activity detection and tracking in massive collaborative environments.