In this article, we will explore an example of modeling time-series data using a graph-based approach. We will use Neo4j Aura and Cypher.
For time-series data modeling techniques, we also refer to Graph Databases (2nd Edition), specifically Chapter 4: Building a Graph Database Application.
How do you manage the flow of money in your household?
From salary deposits to savings accounts, multiple bank accounts, credit cards for different purposes, and investment accounts for asset management. You might also have cryptocurrency wallets and debit cards for casual investments. Managing the financial transactions of an entire family can quickly become a complex and overwhelming task.
While there are financial tracking apps and savings assistance services that automatically sync with bank accounts, financial transactions are private information. Some people may prefer to manage them independently.
What would household financial transactions look like when modeled as a graph network?
First, let’s create a Timeline
node for each group of financial transactions we want to manage. In this case, we will create Timeline
nodes for different categories such as bank accounts, investment accounts, and credit cards.
In the example below, we create a timeline for a Foo Bank
account:
CREATE (t:Timeline {id: 1, name: 'Foo Bank'})
RETURN t;
Next, we will associate transaction records with the Timeline
node.
For instance, let’s represent the fact that a salary deposit of 10,000 JPY was made on September 1, 2021. Specifically, we will create nodes representing the year, month, and day, and link them to a Transaction
node that describes the transaction details.
This is an implementation of the Timetree technique, as introduced in Graph Databases (2nd Edition), Chapter 4.
MATCH (timeline:Timeline {name: 'Foo Bank'})
CREATE (transaction:Transaction {text: 'Salary', unit: 'JPY', amount: 10000})
MERGE (timeline)-[:YEAR]->(year:Year {value: 2021})
MERGE (year)-[:MONTH]->(month:Month {value: 9})
MERGE (month)-[:DAY]->(day:Day {value: 1})
MERGE (day)<-[:RECORDED_ON]-(transaction)
The MERGE
query combines MATCH
and CREATE
functionalities. If the node already exists, it returns the existing node; otherwise, it creates a new one.
Now, let’s add more transaction records, such as withdrawals, interest deposits, and salary payments from side jobs.
Example: Cash withdrawal of 5,000 JPY on September 15, 2021
MATCH (timeline:Timeline {name: 'Foo Bank'})
CREATE (transaction:Transaction {text: 'Withdraw', unit: 'JPY', amount: -5000})
MERGE (timeline)-[:YEAR]->(year:Year {value: 2021})
MERGE (year)-[:MONTH]->(month:Month {value: 9})
MERGE (month)-[:DAY]->(day:Day {value: 15})
MERGE (day)<-[:RECORDED_ON]-(transaction)
Example: Interest deposit of 10 JPY on September 30, 2021
MATCH (timeline:Timeline {name: 'Foo Bank'})
CREATE (transaction:Transaction {text: 'Interest', unit: 'JPY', amount: 10})
MERGE (timeline)-[:YEAR]->(year:Year {value: 2021})
MERGE (year)-[:MONTH]->(month:Month {value: 9})
MERGE (month)-[:DAY]->(day:Day {value: 30})
MERGE (day)<-[:RECORDED_ON]-(transaction)
Example: Salary of 10,000 JPY on October 1, 2021
MATCH (timeline:Timeline {name: 'Foo Bank'})
CREATE (transaction:Transaction {text: 'Salary', unit: 'JPY', amount: 10000})
MERGE (timeline)-[:YEAR]->(year:Year {value: 2021})
MERGE (year)-[:MONTH]->(month:Month {value: 10})
MERGE (month)-[:DAY]->(day:Day {value: 1})
MERGE (day)<-[:RECORDED_ON]-(transaction)
Example: Cash withdrawal of 8,000 JPY on October 12, 2021
MATCH (timeline:Timeline {name: 'Foo Bank'})
CREATE (transaction:Transaction {text: 'Withdraw', unit: 'JPY', amount: -8000})
MERGE (timeline)-[:YEAR]->(year:Year {value: 2021})
MERGE (year)-[:MONTH]->(month:Month {value: 10})
MERGE (month)-[:DAY]->(day:Day {value: 12})
MERGE (day)<-[:RECORDED_ON]-(transaction)
At this stage, the graph network should look as follows:
Now, let’s query the transaction records we created.
For example, to retrieve all transactions from September 2021, use the following Cypher query:
MATCH (timeline:Timeline {name: 'Foo Bank'})
MATCH (timeline)-[:YEAR]->(year:Year)-[:MONTH]->(month:Month)-[:DAY]->(day:Day)<-[:RECORDED_ON]-(t:Transaction)
WHERE year.value = 2021 AND month.value = 9
RETURN year.value, month.value, day.value, t.amount, t.unit
ORDER BY day.value
Results:
year.value | month.value | day.value | t.amount | t.unit |
---|---|---|---|---|
2021 | 9 | 1 | 10000 | JPY |
2021 | 9 | 15 | -5000 | JPY |
2021 | 9 | 30 | 10 | JPY |
Currently, our model lacks a direct connection between consecutive transactions. This makes it difficult to perform queries like pagination or cumulative balance calculations.
To address this, we introduce :NEXT
and :PREV
relationships to establish links between transactions.
This is an implementation of the Linked lists technique, also introduced in Graph Databases (2nd Edition), Chapter 4.
Example Cypher query to link transactions:
MATCH (t1:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 9 AND d.value = 1
WITH t1
MATCH (t2:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 9 AND d.value = 15
CREATE (t1)-[:NEXT]->(t2), (t2)-[:PREV]->(t1)
MATCH (t1:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 9 AND d.value = 15
WITH t1
MATCH (t2:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 9 AND d.value = 30
CREATE (t1)-[:NEXT]->(t2), (t2)-[:PREV]->(t1)
MATCH (t1:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 9 AND d.value = 30
WITH t1
MATCH (t2:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 10 AND d.value = 1
CREATE (t1)-[:NEXT]->(t2), (t2)-[:PREV]->(t1)
MATCH (t1:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 10 AND d.value = 1
WITH t1
MATCH (t2:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 10 AND d.value = 12
CREATE (t1)-[:NEXT]->(t2), (t2)-[:PREV]->(t1)
:FIRST/:LAST
We can also introduce :FIRST
and :LAST
relationships to mark the first and last transactions in a timeline.
Example query to create :FIRST
relation:
MATCH (transaction:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 9 AND d.value = 1
CREATE (timeline)-[:FIRST]->(transaction)
Example query to create :LAST
relation:
MATCH (transaction:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)<-[:YEAR]-(timeline:Timeline)
WHERE timeline.name = 'Foo Bank' AND y.value = 2021 AND m.value = 10 AND d.value = 12
CREATE (timeline)-[:LAST]->(transaction)
At this stage, the graph network should look like this:
:NEXT/:PREV
Retrieve all the transactions from Foo Bank
and add marked=true
flag:
MATCH (timeline:Timeline)-[:FIRST]->(start:Transaction),
p=(start)-[:NEXT*]->(finish:Transaction)
WHERE timeline.name = 'Foo Bank'
FOREACH (n IN nodes(p) | SET n.marked = true)
RETURN p
Retrieve the first 3 transactions from Foo Bank
:
MATCH (timeline:Timeline)-[:FIRST]->(start:Transaction),
p=(start)-[:NEXT*..3]->(trans:Transaction)-[:RECORDED_ON]->(d:Day)<-[:DAY]-(m:Month)<-[:MONTH]-(y:Year)
WHERE timeline.name = 'Foo Bank'
RETURN y.value, m.value, d.value, trans.text, trans.amount, trans.unit
This article demonstrated how to model time-series data using a graph network for household financial transactions. By combining the Timetree and Linked lists techniques, we can perform flexible queries on time-series data. Try applying this approach to other time-based datasets!