Full Content
TITLE: Vector Guide
NSG Vector/Semantic Search Setup Guide
Last Updated: 2026-01-31
Status: Verified working
Overview
Semantic search requires THREE tables with matching rowids:
file_vectors - VSS virtual table holding embeddings
file_lookup - Used by search queries (JOIN in search_vss)
vector_map - Used by status/count queries (get_vector_status)
Both file_lookup AND vector_map are required. Missing either one causes silent failures.
Required Schema
sql-- VSS for semantic search (384 dimensions for all-MiniLM-L6-v2)
CREATE VIRTUAL TABLE file_vectors USING vss0(embedding(384));
-- Used by search_vss() for search results
CREATE TABLE file_lookup (
rowid INTEGER PRIMARY KEY,
collection TEXT,
filepath TEXT
);
-- Used by get_vector_status() for counting embeddings
CREATE TABLE vector_map (
vector_rowid INTEGER PRIMARY KEY,
filepath TEXT
);
Critical: Rowid Matching
All three tables must have matching rowids. The rowid comes from the FTS5 files table:
python# Get rowid from FTS5 insert
rowid = conn.execute('SELECT last_insert_rowid()').fetchone()[0]
# Insert into ALL THREE tables with SAME rowid
conn.execute("INSERT INTO file_vectors(rowid, embedding) VALUES (?, ?)", (rowid, emb))
conn.execute("INSERT INTO file_lookup(rowid, collection, filepath) VALUES (?, ?, ?)", (rowid, collection, filepath))
conn.execute("INSERT INTO vector_map(vector_rowid, filepath) VALUES (?, ?)", (rowid, filepath))
Complete Build Script Template
python#!/usr/bin/env python3
"""
Vector index builder - correct schema
Requires: sentence_transformers, sqlite_vss
"""
import sqlite3
import sqlite_vss
from sentence_transformers import SentenceTransformer
DB_PATH = "indexes/YOUR_INDEX.db"
COLLECTION = "your-index-id" # Must match registry id
print("Loading model...")
model = SentenceTransformer('all-MiniLM-L6-v2')
conn = sqlite3.connect(DB_PATH)
conn.enable_load_extension(True)
sqlite_vss.load(conn)
# Drop old tables (clean rebuild)
print("Dropping old vector tables...")
conn.execute("DROP TABLE IF EXISTS file_vectors")
conn.execute("DROP TABLE IF EXISTS vector_map")
conn.execute("DROP TABLE IF EXISTS file_lookup")
# Create with correct schema
print("Creating tables...")
conn.execute("CREATE VIRTUAL TABLE file_vectors USING vss0(embedding(384))")
conn.execute("CREATE TABLE file_lookup (rowid INTEGER PRIMARY KEY, collection TEXT, filepath TEXT)")
conn.execute("CREATE TABLE vector_map (vector_rowid INTEGER PRIMARY KEY, filepath TEXT)")
conn.commit()
# Get all files from FTS table
rows = conn.execute("SELECT rowid, filepath, filename FROM files").fetchall()
print(f"Found {len(rows)} files")
# Build with matching rowids
for i, (rowid, filepath, filename) in enumerate(rows):
# Embed filename + path (for filesystem indexes without text content)
# For text indexes, use: content[:8000] instead
text = f"{filename} {filepath}"
emb = model.encode(text, convert_to_numpy=True)
conn.execute("INSERT INTO file_vectors(rowid, embedding) VALUES (?, ?)",
(rowid, emb.tobytes()))
conn.execute("INSERT INTO file_lookup(rowid, collection, filepath) VALUES (?, ?, ?)",
(rowid, COLLECTION, filepath))
conn.execute("INSERT INTO vector_map(vector_rowid, filepath) VALUES (?, ?)",
(rowid, filepath))
if (i+1) % 100 == 0:
print(f"{i+1}/{len(rows)}")
conn.commit()
conn.commit()
conn.close()
print(f"Done - {len(rows)} vectors")
Registry Update
After building vectors, update registry.db:
sqlUPDATE index_registry
SET vector_enabled = 1,
vector_table = 'file_vectors',
vector_type = 'vss'
WHERE id = 'your-index-id';
Verification
pythonimport sqlite3
import sqlite_vss
conn = sqlite3.connect('indexes/YOUR_INDEX.db')
conn.enable_load_extension(True)
sqlite_vss.load(conn)
# All three counts must match
fv = conn.execute("SELECT COUNT(*) FROM file_vectors").fetchone()[0]
fl = conn.execute("SELECT COUNT(*) FROM file_lookup").fetchone()[0]
vm = conn.execute("SELECT COUNT(*) FROM vector_map").fetchone()[0]
print(f"file_vectors: {fv}")
print(f"file_lookup: {fl}")
print(f"vector_map: {vm}")
assert fv == fl == vm, "ROWID MISMATCH - rebuild required"
print("✓ All tables match")
How Search Works (Reference)
get_vector_status() in index_vector.py:
python# Counts from vector_map
count = conn.execute("SELECT COUNT(*) FROM vector_map").fetchone()[0]
search_vss() in index_vector.py:
python# Joins file_vectors with file_lookup
SELECT v.rowid, v.distance, l.filepath, l.collection
FROM file_vectors v
JOIN file_lookup l ON v.rowid = l.rowid
WHERE vss_search(v.embedding, vss_search_params(?, ?))
Common Failures
SymptomCauseFixStatus shows "empty"Missing vector_map tableRebuild with both tablesSearch returns 0 resultsRowids don't match between tablesRebuild using rowid from files table"No vector index available"Registry not updatedUPDATE registry SET vector_enabled=1
Source Files (Authoritative)
/var/www/api.nsgia.com/index_api.py - Schema definition
/var/www/api.nsgia.com/upload_index.py - Insert pattern
/var/www/api.nsgia.com/index_vector.py - Search functions
Search RAG for these files before making changes.
Lesson from 2026-01-31: This schema was discovered by examining index_api.py which creates BOTH tables. Two hours were wasted guessing instead of reading existing working code.