Importance of Adapting to Graph Thinking
“Defenders think in lists. Attackers think in graphs. So long as this is true, attackers will win.” This quote from John Lambert of the Microsoft Threat Intelligence Center highlights the importance that we need to change the way that we are solving problems in order to stay ahead of fraud and abuse. In this video we’re going to be talking about how we can use graph data in order to identify potential fraud within an employer’s childcare subsidy benefit.
Limitations of Relational Tables
And so before we talk about graph data, let’s first talk about the way that we’ve seen data before. And so we’re going to go over relational tables. Now, this is the way that we have seen data for a long time, and we’re all probably used to it. So we see tables here that have different rows and have different columns, identify different features, and we might have multiple tables that are all connected to one another through some kind of common key or similar feature.
We might see these as Excel tables, these SQL tables, and we can join information so that we can combine what we have in each of these tables to answer some simple questions. So like in this particular case, what address might the child of some employee live at? And so in this particular case, we see that if we wanted to answer that question, we’d have to join the children’s table with the addresses table on some common feature like employee ID that we have in common for both of those.
The problem is that can get quite complicated and expensive if the tables are very large. And so in order to simplify looking at how data is connected to one another, we can use what’s known as graph data, where the data are stored as nodes and edges that highlight how they’re connected. And what are nodes and edges?
The Basics of Graph Data
In this graph right here, the graph is just all of this information that we have stored here. And so the nodes of the graph are the individual entities. So you might think of this as like the rows that we see here. So we have the employer—employees like John and Sally and Tom. We have their associated children and the addresses that they live at as well.
The nodes are these circles, and that’s usually how we see them represented in a graph. How are they connected? They’re connected through these lines or these links known as edges. Edges can have strength and direction. We can see that John is the parent of Charlie. John lives at 123 Main Street, and John and Sally both work with one another.
Nodes and Edges: Building Blocks of Graphs
And so this information is much more clear than the relational tables about showing how all of these entities are related to one another. And one way that I like to remember how a graph is structured is that it’s similar to a sentence. Nodes are nouns here. So we have nodes are equivalent to nouns, and edges are equivalent to verbs.
So just like a sentence, nodes are connected to one another through a verb. So for example, John lives at 123 Main Street, or John is the parent of Susan. And so in all of these particular cases, you can think of when you’re trying to develop a graph or translate this data over to this type of data that your nodes are going to be your nouns and your edges are going to be verbs to describe the way that those are related.
Applying Graph Data to Fraud Detection
Now that we have a basic understanding of kind of what a graph is, let’s return to our earlier example right over here. And I’ve just done a subset of this data so that we can more easily see how these entities are related. And so here’s John, who’s the parent of Charlie. John lives at 123 Main Street.
Also, Jane is someone that lives at 123 Main Street and is the parent of Charlie. So these are two parents of the same individual and they want to use this childcare subsidy benefit because childcare is very expensive and they want to take advantage of that. So there might be an individual named Joe that lives at 1 Center Street and John might pay Joe and then Joe therefore might be related to Charlie by Joe watches Charlie or babysits or, you know, is present with Charlie during the day while parents are at work.
So this is what the relationship of all of these different entities should look like. This is what the graph structure should look like. Now, what might fraud look like in this particular case? Let’s say that instead of John paying Joe, let’s say John pays Jane and Jane watches Charlie.
Now just looking at the structure of the graph, we see that the individual that’s watching the child also lives at the same address and is also the parent. So this might be an abuse of the benefit, where instead of subsidizing expensive childcare. They might actually just be paying their spouse or sending that money from the employer. It’s basically a thousand dollar raise a month to their own household through just watching the child. So this might be an example of what fraud would look like in this particular case.
Conclusion
The lesson I want you all to take away from this video is that graphs are very powerful ways of storing data that can help us understand how entities are related to one another. And if you want to learn more about graphs, there’s a lot of information on Neo4j. You can learn a lot about Cypher that queries graph. There’s so much you can learn. So I hope that this got you a little bit excited about how to catch fraud.