Graph database implementation for relationship-heavy data models. Use when building social networks, recommendation engines, knowledge graphs, or fraud detection. Covers Neo4j (primary), ArangoDB, Amazon Neptune, Cypher query patterns, and graph data modeling.
This skill guides selection and implementation of graph databases for applications where relationships between entities are first-class citizens. Unlike relational databases that model relationships through foreign keys and joins, graph databases natively represent connections as properties, enabling efficient traversal-heavy queries.
Use graph databases when:
Do NOT use graph databases when:
DATA CHARACTERISTICS?
├── Fixed schema, shallow joins (≤3 hops)
│ └─ PostgreSQL (relational)
│
├── Already on PostgreSQL + simple graphs
│ └─ Apache AGE (PostgreSQL extension)
│
├── Deep traversals (4+ hops) + general purpose
│ └─ Neo4j (battle-tested, largest ecosystem)
│
├── Multi-model (documents + graph)
│ └─ ArangoDB
│
├── AWS-native, serverless
│ └─ Amazon Neptune
│
└── Real-time streaming, in-memory
└─ Memgraph
Graph databases store data as:
(Person {name: "Alice", age: 28})-[:FRIEND {since: "2020-01-15"}]->(Person {name: "Bob"})
| Language | Databases | Readability | Best For |
|---|---|---|---|
| Cypher | Neo4j, Memgraph, AGE | ⭐⭐⭐⭐⭐ SQL-like | General purpose |
| Gremlin | Neptune, JanusGraph | ⭐⭐⭐ Functional | Cross-database |
| AQL | ArangoDB | ⭐⭐⭐⭐ SQL-like | Multi-model |
| SPARQL | Neptune, RDF stores | ⭐⭐⭐ W3C standard | Semantic web |
Reference references/cypher-patterns.md for comprehensive examples.
// Find all users at a company
MATCH (u:User)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN u.name, u.title
// Find friends up to 3 degrees away
MATCH (u:User {name: 'Alice'})-[:FRIEND*1..3]->(friend)
WHERE u <> friend
RETURN DISTINCT friend.name
LIMIT 100
// Find shortest connection between two users
MATCH path = shortestPath(
(a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path, length(path) AS distance
// Collaborative filtering: Products liked by similar users
MATCH (u:User {id: $userId})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT exists((u)-[:PURCHASED]->(rec))
RETURN rec.name, count(*) AS score
ORDER BY score DESC
LIMIT 10
// Detect circular money flows
MATCH path = (a:Account)-[:SENT*3..6]->(a)
WHERE all(r IN relationships(path) WHERE r.amount > 1000)
RETURN path, [r IN relationships(path) | r.amount] AS amounts
Use for: General-purpose graph applications
Strengths:
Installation:
# Python driver
pip install neo4j
# TypeScript driver
npm install neo4j-driver
# Rust driver
cargo add neo4rs
Reference: references/neo4j.md
Use for: Multi-model applications (documents + graph)
Strengths:
Reference: references/arangodb.md
Use for: Adding graph capabilities to existing PostgreSQL
Strengths:
Reference: Implementation details in examples/
Use for: AWS-native, serverless deployments
Strengths:
Reference references/graph-modeling.md for comprehensive patterns.
Anti-pattern (storing relationships in node properties):
// BAD
(:Person {name: 'Alice', friend_ids: ['b123', 'c456']})
Pattern (explicit relationships):
// GOOD
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'b123'})
(:Person {name: 'Alice'})-[:FRIEND]->(:Person {id: 'c456'})
// Track interaction details on relationships
(:Person)-[:FRIEND {
since: '2020-01-15',
strength: 0.85,
last_interaction: datetime()
}]->(:Person)
// SLOW: Unbounded traversal
MATCH (a)-[:FRIEND*]->(distant)
RETURN distant
// FAST: Bounded depth with index
MATCH (a)-[:FRIEND*1..4]->(distant)
WHERE distant.active = true
RETURN distant
LIMIT 100
Problem: Nodes with thousands of relationships slow traversals.
Solution: Intermediate aggregation nodes
// Instead of: (:User)-[:POSTED]->(:Post) [1M relationships]
// Use time partitioning:
(:User)-[:POSTED_IN]->(:Year {year: 2025})
-[:HAS_MONTH]->(:Month {month: 12})
-[:HAS_POST]->(:Post)
Schema and implementation in examples/social-graph/
Key features:
Integration example in examples/knowledge-graph/
Key features:
Integration with Vector Databases:
# Step 1: Vector search in Qdrant/pgvector
vector_results = qdrant.search(collection="concepts", query_vector=embedding)
# Step 2: Expand with graph relationships
concept_ids = [r.id for r in vector_results]
graph_context = neo4j.run("""
MATCH (c:Concept) WHERE c.id IN $ids
MATCH (c)-[:RELATED_TO|IS_A*1..2]-(related)
RETURN c, related, relationships(path)
""", ids=concept_ids)
Examples in examples/social-graph/
Strategies:
Pattern detection in examples/
Detection patterns:
Reference references/cypher-patterns.md for detailed optimization.
// Single-property index
CREATE INDEX user_email FOR (u:User) ON (u.email)
// Composite index (Neo4j 5.x+)
CREATE INDEX user_name_location FOR (u:User) ON (u.name, u.location)
// Full-text search
CREATE FULLTEXT INDEX product_search FOR (p:Product) ON EACH [p.name, p.description]
// Materialize friend count as property
MATCH (u:User)-[:FRIEND]->(f)
WITH u, count(f) AS friendCount
SET u.friend_count = friendCount
// Query becomes instant
MATCH (u:User) WHERE u.friend_count > 100
RETURN u.name, u.friend_count
| Scale | Strategy | Implementation |
|---|---|---|
| Vertical | Add RAM/CPU | In-memory caching, larger instances |
| Horizontal (Read) | Read replicas | Neo4j Cluster, ArangoDB Cluster |
| Horizontal (Write) | Sharding | ArangoDB SmartGraphs, JanusGraph |
| Caching | App-level cache | Redis for hot paths |
Complete example in examples/social-graph/python-neo4j/
from neo4j import GraphDatabase
class GraphDB:
def __init__(self, uri: str, user: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def find_friends_of_friends(self, user_id: str, max_depth: int = 2):
query = """
MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
WHERE u <> fof
RETURN DISTINCT fof.id, fof.name
LIMIT 100
"""
with self.driver.session() as session:
result = session.run(query, userId=user_id, maxDepth=max_depth)
return [dict(record) for record in result]
# Usage
db = GraphDB("bolt://localhost:7687", "neo4j", "password")
friends = db.find_friends_of_friends("u123", max_depth=3)
Complete example in examples/social-graph/typescript-neo4j/
import neo4j, { Driver } from 'neo4j-driver'
class Neo4jService {
private driver: Driver
constructor(uri: string, username: string, password: string) {
this.driver = neo4j.driver(uri, neo4j.auth.basic(username, password))
}
async findFriendsOfFriends(userId: string, maxDepth: number = 2) {
const session = this.driver.session()
try {
const result = await session.run(
`MATCH (u:User {id: $userId})-[:FRIEND*1..$maxDepth]->(fof)
WHERE u <> fof
RETURN DISTINCT fof.id, fof.name
LIMIT 100`,
{ userId, maxDepth }
)
return result.records.map(r => r.toObject())
} finally {
await session.close()
}
}
}
import (
"github.com/arangodb/go-driver"
"github.com/arangodb/go-driver/http"
)
func findFriendsOfFriends(db driver.Database, userId string, maxDepth int) ([]User, error) {
query := `
FOR vertex, edge, path IN 1..@maxDepth OUTBOUND @startVertex GRAPH 'socialGraph'
FILTER vertex._id != @startVertex
RETURN DISTINCT vertex
LIMIT 100
`
cursor, err := db.Query(ctx, query, map[string]interface{}{
"startVertex": userId,
"maxDepth": maxDepth,
})
// Handle results...
}
Use scripts/validate_graph_schema.py to check for:
Run validation:
python scripts/validate_graph_schema.py --database neo4j://localhost:7687
Combine vector similarity with graph context for AI/RAG applications.
See examples/knowledge-graph/
Implement relationship-based queries: "Find all users within 3 degrees of connection"
Use knowledge graphs to enrich LLM context with structured relationships.
Implement relationship-based access control: "Can user X access resource Y through relation Z?"
(:User)-[:PURCHASED]->(:Product)
(:User)-[:VIEWED]->(:Product)
(:User)-[:RATED]->(:Product)
(:CEO)-[:MANAGES]->(:VP)-[:MANAGES]->(:Director)
(:Event {timestamp})-[:NEXT]->(:Event {timestamp})
references/graph-modeling.mdreferences/cypher-patterns.mdscripts/validate_graph_schema.pyreferences/neo4j.md - Neo4j setup, drivers, GDS algorithmsreferences/arangodb.md - ArangoDB multi-model patternsreferences/cypher-patterns.md - Comprehensive Cypher query libraryreferences/graph-modeling.md - Data modeling best practicesexamples/social-graph/ - Complete social network implementationexamples/knowledge-graph/ - Hybrid vector + graph for AI/RAG