Bookmark and Share

Research: Network Analysis / Relationship Extraction

Many of my research themes make implicit or explicit use of large-scale network representation, visualization, and analysis. In particular, the synthesis of structured and large unstructured data sources into single composite networks in the face of conflicting and incomplete data sources. For example, analyzing a large collection of news articles on a given industry and automatically identifying actors in that industry and their relationships.

ANALYTICS / VISUALIZATION

Displaying large-scale network data in forms conducive to human analysis is a very complex endeavor: large-scale networks by their very nature consist of enormous levels of fine-grain detail arranged over a very large area. A good analogy is viewing imagery of the craters in the surface of Mars on a desktop computer: if one zooms in far enough to see the individual craters, the macro-level patterns of their arrangement is lost, but when one is zoomed out far enough to see the entire image at once, the smaller craters are lost. Only when using a high-resolution large-format display (in this case a 40-projector seamless tiled display wall) can one see the entire image at once at native resolution and understand the broader patterns in the crater impact sites. Similarly, network visualization tools must make the broader patterns in large-scale networks visible, while not obscuring the fine-level detail of those connections. Analytical tools must be able to make sense of the massive web of connections in network data and sift through the noise to find insight.

As part of a larger project with the United States National Archives and Records Administration (NARA), the ENRON email collection was used as a sample dataset to examine various modes of interaction with large-scale communication datasets.

RELATIONSHIP EXTRACTION

A key subcomponent of network analytics is "relationship extraction": the autonomous identification of different classes of connections between entities in a data repository and the conversion of those relationships into structured network connections. For example, a "relationship mining" system might analyze a corpus of news articles covering a specific industry and identify mentions of partnerships or other connections between companies and individuals in that space, and convert the unstructured knowledge represented in that large collection of textual documents into a structured network diagram that can be subjected to various network analysis techniques. One system I developed that is in use at the University of Illinois exploits many aspects of network analysis and relationship extraction to monitor project-specific email mailing lists and automatically build complex models of communicative patterns among participants, including associating users based not only on structured information (who emails whom / who replies to whom), but also unstructured (who tends to ask similar questions or discusses similar topical areas).