Using Neo4j as a Vector Database

As of version v5.13, Neo4j has implemented a Vector Search Index. The usage is similar to other VectorDBs, where vectorized data is stored in a Vector Index and approximate results are retrieved using Cypher queries.

This article introduces how to use the Vector Index in Neo4j. The example scenario assumes using Neo4j as the backend for a note-taking application. Each "Note" node contains an ID and metadata for a note. The note's title is stored in a "Title" node, while the Vector Index is applied to the "Abstract" node.

CREATE
  (note:Note)-[:HAS]->(title:Title {text="Hello, world!"})
  , (note)-[:HAS]->(abstract:Abstract)
RETURN
  note, title, abstract

Neo4j Vector Search data-model

Preparation

First, we need to set up the Vector Index. This is done using the db.index.vector.createNodeIndex command. For example, the following command creates an index for the "Abstract" node, which stores note summaries, with a vector size of 768. The vectorized data is stored in the embedding property.

Neo4j currently supports Euclidean and Cosine Similarity for similarity search. In the example below, we specify Cosine Similarity.

CALL db.index.vector.createNodeIndex(
  'abstract-embeddings',
  'Abstract',
  'embedding',
  768,
  'cosine'
)

Writing Data

Next, we insert vector data using the db.create.setNodeVectorProperty command. This prepares the data for retrieval.

MATCH
  (abstract:Abstract)<-[:HAS]-(note:Note)-[:HAS]->(title:Title{text="Hello, world!"})
CALL
  db.create.setNodeVectorProperty(
    abstract,
    'embedding',
    [0.312, 0.234, -0.456, ...]
  )
RETURN
  abstract, note, title

Querying Data

Once the vector data is stored in the Vector Index, we can retrieve similar notes using db.index.vector.queryNodes.

The following query finds the top three "Note" nodes similar to a note titled "Hello world! I'm LLM app!", based on the "Abstract" node’s vector similarity.

MATCH (title:Title)<--(:Note)-->(abstract:Abstract)
WHERE title.text = "Hello world! I'm LLM app!"

CALL db.index.vector.queryNodes(
  'abstract-embeddings',
  3,
  abstract.embedding
)
YIELD node AS similarAbstract, score

MATCH (similarAbstract)<--(:Note)-->(similarTitle:Title)
RETURN similarTitle.text AS title, score

Query Result Example

The query returns notes with titles similar to "Hello world!" ranked by similarity score. This result is based on the Cosine Similarity of the vector data stored in the "title" property.

╒══════════════════════════════════════════════════════════════════╤══════════════════╕
│title                                                             │score             │
╞══════════════════════════════════════════════════════════════════╪══════════════════╡
│"Hello world! I'm LLM app!"                                       │1.0               │
├──────────────────────────────────────────────────────────────────┼──────────────────┤
│"Hello world! Here is a RAG demo."                                │0.8671051263809204│
├──────────────────────────────────────────────────────────────────┼──────────────────┤
│"Here is a RAG demo. I'm going to show you how to create the app."│0.8667137622833252│
└──────────────────────────────────────────────────────────────────┴──────────────────┘

This demonstrates how Cosine Similarity is used to retrieve the top three closest results in vector space.

2023-12-03