Graph Analytics

Much real-world data are inherently relational: molecules share common substructures, people share friends or neighborhoods, and cities are connected by highways or airports. However, most data science techniques are built to handle matrix or tabular views of data.

Graph techniques approach the analytics problem from the perspective that relationships are first-class entities to be explored and modeled. Graph techniques can be computationally demanding and are not applied as often as traditional machine learning approaches, but their ability to encode relationships between entities makes them ideally suited to problems involving communities or connectivity.

Primary Techniques

Community Detection – identifies tightly-knit communities within a broader network by studying relationships in the data. These communities might be of interest themselves, or community labels can be used during modeling.

Graph Prediction – uses graph properties and relationships to classify entities, infer new links in the network, or highlight low-probability connections.

Influence Maximization, Centrality, and Page Rank – measures which aim to identify the influential entities in a network. What defines “influence” can vary by problem, but these techniques can locate “sources” or “sinks” in a network and are applicable to problems ranging from social media analysis to identifying critical failure points in a network.

Entity Resolution – takes advantage of the many relationships in a network to assist in data cleansing. Misspellings, incomplete records, or intentional obfuscation can muddle inferences, but the multidimensional and relational aspects of graph analytics can help to de-duplicate or canonicalize data sources.

General Applications

Fraud Detection

Even with the aid of machine learning, it can be challenging to keep up with emerging patterns of fraud or abuse. Bad actors are highly motivated to cover their tracks, mask their behaviors and present as ‘normal’ users. Graph analytics provides a means to consider many different behaviors together, linked to the same actor, as well as many actors working in concert.

Connectivity(Online and Offline)

Perhaps the best-known application of graph analytics is analyzing and understanding online social relationship networks. But graph approaches can be similarly applied in the physical world to better understand logistics or how a policy might impact a community or workforce.

Knowledge Graphs

In addition to social networks or fraud-detection applications, graph analytics can be paired with structured knowledge to build a “knowledge graph,” linking concepts and drawing new connections between pieces of information.

Case Studies

  • Belair influence maximization
  • IRS application graph
  • IRS tax provider graph
  • Glyphic graph/text mining graph
  • Madison graph (connecting compounds by chemical similarity)