Show HN: SQLite Graph Ext – Graph database with Cypher queries (alpha)

24 points by gwillen85 6 hours ago

I've been working on adding graph database capabilities to SQLite with support for the Cypher query language. As of this week, both CREATE and MATCH operations work with full relationship support.

Here's what it looks like:

    import sqlite3
    conn = sqlite3.connect(":memory:")
    conn.load_extension("./libgraph.so")
    
    conn.execute("CREATE VIRTUAL TABLE graph USING graph()")
    
    # Create a social network
    conn.execute("""SELECT cypher_execute('
        CREATE (alice:Person {name: "Alice", age: 30}),
               (bob:Person {name: "Bob", age: 25}),
               (alice)-[:KNOWS {since: 2020}]->(bob)
    ')""")
    
    # Query the graph with relationship patterns
    conn.execute("""SELECT cypher_execute('
        MATCH (a:Person)-[r:KNOWS]->(b:Person) 
        WHERE a.age > 25 
        RETURN a, r, b
    ')""")

The interesting part was building the complete execution pipeline - lexer, parser, logical planner, physical planner, and an iterator-based executor using the Volcano model. All in C99 with no dependencies beyond SQLite.

What works now: - Full CREATE: nodes, relationships, properties, chained patterns (70/70 openCypher TCK tests) - MATCH with relationship patterns: (a)-[r:TYPE]->(b) with label and type filtering - WHERE clause: property comparisons on nodes (=, >, <, >=, <=, <>) - RETURN: basic projection with JSON serialization - Virtual table integration for mixing SQL and Cypher

Performance: - 340K nodes/sec inserts (consistent to 1M nodes) - 390K edges/sec for relationships - 180K nodes/sec scans with WHERE filtering

Current limitations (alpha): - Only forward relationships (no `<-[r]-` or bidirectional `-[r]-`) - No relationship property filtering in WHERE (e.g., `WHERE r.weight > 5`) - No variable-length paths yet (e.g., `[r*1..3]`) - No aggregations, ORDER BY, property projection in RETURN - Must use double quotes for strings: {name: "Alice"} not {name: 'Alice'}

This is alpha - API may change. But core graph query patterns work! The execution pipeline handles CREATE/MATCH/WHERE/RETURN end-to-end.

Next up: bidirectional relationships, property projection, aggregations. Roadmap targets full Cypher support by Q1 2026.

Built as part of Agentflare AI, but it's standalone and MIT licensed. Would love feedback on what to prioritize.

GitHub: https://github.com/agentflare-ai/sqlite-graph

Happy to answer questions about the implementation!

mentalgear 5 hours ago

I like the ambition and the open-source spirit behind your project! Open-source graph databases are fantastic.

That said, I’d encourage you to consider leveraging existing projects rather than starting from scratch. There are already mature, local / in-browser graph databases that could benefit from your skills and vision.

For example:

- Kuzu https://github.com/kuzudb/kuzu: This project had very active development but was recently archived (as of October 10, 2025). Continuiing or forking it could be a game-changer for the community.

- Cozodb https://www.cozodb.org/ It’s very feature-rich and actively seeking contributors. Your expertise could help push it even further.

I do get the appeal of building something from the ground up; it’s incredibly rewarding. But achieving production readiness is seriously challenging and time-consuming. These projects are already years ahead in scope, so contributing to them could accelerate your impact and save you from reinventing the wheel.

gwillen85 5 hours ago

Thanks for the suggestions! I'm familiar with both. Different category though - this is a SQLite extension, not a standalone database. The value prop is:
Zero friction - If you're already using SQLite (Python scripts, mobile apps, embedded systems), just .load graph_extension and you have graph capabilities Mix SQL + Cypher - Join your relational tables with graph traversals in the same query Works everywhere SQLite works - Serverless functions, Raspberry Pi, iOS apps, wherever Leverage SQLite's ecosystem - All existing tools, bindings, deployment patterns just work
Kuzu and CozoDB are excellent if you want a dedicated graph database. But if you've already got SQLite (which is everywhere), this lets you add graph features without rearchitecting.
Think of it like SQLite's FTS5 extension for full-text search - you're not competing with Elasticsearch, you're giving SQLite users a lightweight option that fits their existing workflow.
- selecsosi 2 hours ago
  
  This reminds me of the apache age postgres extension as well. Very cool work
  - gwillen85 2 hours ago
    
    Thanks! As a Postgres user first, I really appreciate that comparison. Apache AGE does great work.
    Graph databases are crucial for AI memory, especially paired with vector databases. Graph for relationships, vectors for semantic similarity - particularly powerful for embedded systems and robotics where you need lightweight, on-device reasoning.
adsharma 3 hours ago

https://ladybugdb.com/ is a fork of Kuzu.

leetrout 5 hours ago

I have an ELI5 question...

So you're doing the planning and execution which results in what? Some direct calls into sqlite that create tables? Under the hood is this using tables in a conventional manner where there are adjacency lists or just edges and vertexes or ... ?

I'm looking at `graphFindEdgesByType` and it says they're done with SQL queries - are you effectively transpiling some of the Cypher or just have routines that build queries as needed?

Thanks!

gwillen85 5 hours ago

Great Question!
The storage model is just regular SQLite tables. When you create a graph, it makes two backing tables: my_graph_nodes -- id, labels (JSON array), properties (JSON object) my_graph_edges -- id, source, target, edge_type, properties (JSON object) It's an edge list, not adjacency lists.
Query processing is not transpiling Cypher directly. There's a pipeline: Cypher → AST → Logical Plan → Physical Plan (optimizer) → Iterators → SQL queries The iterators generate SQL on-the-fly to fetch from those backing tables. Basically the Volcano model.
graphFindEdgesByType is Actually deprecated and is a no-op now. The comment says "edge lookups are done via SQL queries." They used to have in-memory structures but moved to just generating SQL like: SELECT e.target, e.id, e.edge_type FROM my_graph_edges e WHERE e.source = 123 AND e.edge_type = 'KNOWS'
So it's "build SQL queries as needed during execution" rather than "transpile the whole Cypher query upfront."

jeffreyajewett 6 hours ago

Nothing says weekend project like writing a Cypher planner from scratch in C99. We also recently launched AgentML -> check it out https://github.com/agentflare-ai/agentml (ALSO MIT)

gwillen85 6 hours ago

This will also be used in the yet to be released `memlite` which is our first wasm component for AgentML
- mentalgear 5 hours ago
  
  Interesting, yet the xml syntax feels quite verbose vs JSON for example.
  - gwillen85 5 hours ago
    
    I agree but LLMs are very good at generating XML. Additionally SCXML which AgentML extends has been around and finalized for over 15 years. So generating AgentML works incredibly well.
    
    mentalgear 5 hours ago
    
    I get your point, however I wonder how much better they are than JSON when using structured output endpoints, which is likely what you would want to use with such a format.
    
    gwillen85 5 hours ago
    
    That's a fair point. We're considering adding JSON as a first-class citizen alongside XML - similar to OpenAPI supporting both JSON and YAML.
    But you're right that structured output endpoints make JSON generation more reliable, so supporting both formats long-term makes sense.
    
    leetrout 5 hours ago
    
    I'm also curious if you know if anyone has any definitive test sets on this? Kind of like how Simon Willison uses the bird on the bicycle?
    
    gwillen85 5 hours ago
    
    Good question - we're working on case studies for this.
    My theory: models are heavily trained on HTML/XML and many use XML tags in their own system prompts, so they're naturally fluent in that syntax. Makes nested structures more reliable in our testing.
    Structured output endpoints help JSON a lot though.