Research Themes
My work is strongly interdisciplinary, drawing from a wide range of subject areas in 17 core research themes:
- Big Data / Supercomputing. I joined the National Center for Supercomputing Applications (NCSA) in 2000 and remain actively involved as a Center Affiliate. My work over the last decade has relied heavily on high performance computing (HPC) techniques, and the use of advanced computing systems and algorithms to make sense of "big data": massive high-dimensionality datasets (such as every large human event on earth since WWII, or every news or social media post around the world on climate change).
- Digital History. As my senior thesis I wrote the first history of the University of Illinois' buildings and spaces, digitized and integrated more than 70,000 pages of historical material, and took more than 80,000 photographs documenting the entire campus through the four seasons. The resulting UIHistories project is the first comprehensive digital library and the first comprehensive digital image archive of a major US university and has become the defacto standard for those researching the University's history, and its images from its photographic gallery have been licensed more than 22,000 times, appearing in publications from all divisions of the University, from annual reports to wall-sized art and covers of magazines and textbooks.
- Digitization. I have worked extensively all areas of digitization, including founding and overseeing the highest-volume microform digitization center in academia and consulting widely for academic, governmental, and industrial digitization initiatives.
- Education Trends. I've worked extensively with United States degree conferral data on projects ranging from the evolution of specific fields of study to spatial patterns in the distribution of certain classes of institutions. I am also the author of the first line-level 40-year combined HEGIS/IPEDS dataset for studying longitudinal patterns in post-secondary degree conferral.
- Funding Analysis. I was the chief architect and developer of the Office of Naval Research's "federal funding search, discovery, and analysis system." This system combined grant and contracting opportunities from across the federal government and offered advanced trend mining, sophisticated visualizations and pattern analysis, and spatial analysis in a single integrated "funding opportunities portal" to support national small business and entrepreneurship in support of federal needs.
- Gender Diversity in Engineering. I was involved with the University of Illinois section of the Society of Women Engineers for more than half a decade and served as their Staff Advisor from 2008 to 2009 as they entered their 50th anniversary as a founding chapter of the national society. I also served as co-lead and special advisor to their diversity and historical initiatives, as well as helping to plan their 50th anniversary and 2010 regional conference.
- GIS / Geocoding / Geographic Intelligence (GEOINT) / Spatial Representation. I have worked extensively with issues of spatial representation and applying GIS and spatial intelligence to global-scale problems, including the automated geocoding and analysis of translated historical news archives and broadcast intercepts numbering in the tens of millions of documents.
- Intelligence and Corporate Intelligence / Media Analysis / Industry Monitoring / Brand Mining / Public Perception / Social Media / Event Mining. I am the chief architect of the largest open source intelligence project in academia, leveraging commercial, governmental, and declassified intelligence products to produce global databases of human activity across multiple disciplines over time. One project involves cataloging all "societal stability" events (riots, assassinations, protests, etc) across the world from 1946 to present using tens of millions of local news reports captured and translated from the local presses of each nation. I was also the chief architect of the NCSA VIAS project, one of the early "web-scale" industry monitoring systems. I am founder of the Carbon Capture Report, the premier global climate change news analytics service. I have worked extensively on a wide range of corporate intelligence, event mining, and media monitoring initiatives, especially issues such as trend mining and public perception.
- Knowledge Management. I have extensive experience architecting and implementing enterprise-class knowledge management environments and the underlying human processes that make them successful.
- Measurements and Metrics from Non-Traditional Sources. My landmark 2006 study explored the use of non-traditional data sources as a model for how to understand institutions of higher education through a variety of data-driven dimensions. For example, mining all university web sites to generate topical networks relating departments, or using human resoures systems to measure interdisciplinary collaboration.
- Media Management. In addition to my work on digitization of historical materials, I have also worked extensively in large-scale media management, developing intelligent systems that observe human behavior and automatically reorganize image collections, as well as automated licensing systems that oversee the licensing operations of the University of Illinois' premier image archives.
- Natural Language Processing / Text Mining / Classification / Unstructured Text / Learning Systems. Natural language processing, the use of automated tools to extract meaning from text, is a core enabling technology underlying much of my work. I have been developing "web-scale" information extraction and unstructured NLP tools for more than a decade and developed a number of innovative approaches to high-accuracy robust text classification and tagging systems.
- Network Analysis / Relationship Extraction. Many of my research themes make implicit or explicit use of large-scale network representation, visualization, and analysis. In particular, the synthesis of structured and large unstructured data sources into single composite networks in the face of conflicting and incomplete data sources. For example, analyzing a large collection of news articles on a given industry and automatically identifying actors in that industry and their relationships.
- Records Management. The "big data" problems that much of my work focuses on necessarily involve a heavy emphasis on records management. I have extensive experience on all levels of records management, including preservation, versioning, and provenance, and a study published in governmental document management in 2008 was published in the New York Times and covered in radio, television, and on thousands of news and blog websites around the world.
- Sentiment Analysis. I have worked extensively with sentiment analysis systems, applying them to international and domestic news coverage, historical and contemporary material, document collections, social media, and communica projects. I have also worked on cross-domain and learning systems for maximizing the accuracy of such systems by tailoring them for specific disciplines.
- Translation. I work extensively with translated material in much of my global work, including the approaches to translation, machine versus human translation, and the influence of translation on analysis.
- Virtual Reality. I have extensive experience developing immersive virtual reality applications (stereoscopic iamgery / 6DOF head+hand tracking), including translating user interfaces into the virtual environment, and I built one of the first mixed-mode virtual reality applications that permitted both structured architectual design and freehand artistic creation, including an advanced plugin "operating system" architecture.
- Other Projects/Papers. This page is a compilation of assorted research projects and working and draft papers from various topics that do not fit into my core research themes. Selected undergraduate and graduate course papers that I receive requests for are also included.