The University of Arizona

Home     About     Colloquia     People     Research     Undergraduate Program     Courses     Get Involved     Support     Contact Us

SISTA Colloquium

DateFriday, January 20, 2012
Time12:00 pm
Concludes1:30 pm
LocationGould-Simpson 906
SpeakerCraig Knoblock, Ph.D.
School/Dept.Research Profesor, Director of the Information Integration Research Group
AffiliationUniversity of Southern California Information Sciences Institute

Automatic Source Modeling for Interactive Information Integration

Scientists, engineers, and everyday users of the Web are frequently solving data integration problems, but there are few general tools available to solve such problems. Instead users typically solve their problems either on paper, in a spreadsheet, or by writing specialized applications. In this talk I will describe our work on developing an approach for interactive information integration that allows a user to rapidly build integrated applications. In particular, I will present our recent work on interactively constructing semantic descriptions of the data sources in order to support the cleaning, normalization, integration, and publication of data integrated from multiple sources. The approach uses machine learning methods to learn to recognize semantic classes of the data, efficient search algorithms to find the most likely relations between the classes, and a graphical user interface that allow a user to quickly refine the semantic descriptions. I will present an evaluation of the approach on a set of bioinformatics sources and show that it supports the rapid modeling of complex sources with minimal user interaction.


Craig Knoblock is a Research Professor in Computer Science at the University of Southern California (USC) and the Director of Information Integration at the USC Information Sciences Institute. He received his Bachelor of Science degree from Syracuse University, and his Master’s and Ph.D. from Carnegie Mellon University, all in computer science. Dr. Knoblock is also a founder of Fetch Technologies, a web extraction and integration provider, and of Geosemble Technologies, which develops geospatial data integration solutions. At USC, Dr. Knoblock leads a team of about 20 researchers, staff and students in developing techniques for rapid, efficient information integration. He focuses on constructing distributed, integrated applications from online sources through information extraction, source modeling, record linkage, constraint reasoning and other technologies for geospatial and bioinformatics data integration.

Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Distinguished Scientist of the Association of Computing Machinery (ACM), President and Trustee of the International Joint Conference on Artificial Intelligence (IJCAI), and past President of the International Conference on Automated Planning and Scheduling (ICAPS). He has served on the Senior Program Committee of the National Artificial Intelligence Conference, among others, and is conference chair for the 2011 International Joint Conference on AI (IJCAI). Dr. Knoblock has published Generating Abstraction Hierarchies (Kluwer Academic Publishers, 1993), along with more than 200 journal articles, book chapters and conference papers. He serves on the Editorial Boards of several journals, including Artificial Intelligence and the Journal of Web Semantics.