Understanding Embeddings and Vector Databases (pgvector, Pinecone)
Search used to mean matching keywords. You searched for “database indexing” and got back documents containing those exact words. Embeddings break that constraint — they let you search by meaning, so a query for “how does a database speed up queries” still finds an article titled “B-tree indexes explained” even if it shares no keywords with your query. This is the engine underneath most modern AI applications: RAG systems, semantic search, recommendation engines, and duplicate detection all rely on it. Let’s look at how embeddings actually work and how vector databases make them queryable at scale.
What Is an Embedding?
An embedding is a fixed-length array of floating-point numbers that represents the meaning of a piece of text. Two semantically similar sentences will produce embeddings that are numerically close together in high-dimensional space.
from openai import OpenAI
client = OpenAI()
def embed(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text,
)
return response.data[0].embedding
e1 = embed("How do database indexes work?")
e2 = embed("What is a B-tree index?")
e3 = embed("What is the capital of France?")
print(f"Dimensions: {len(e1)}")
Dimensions: 1536
The distance between e1 and e2 will be small (similar meaning). The distance between e1 and e3 will be large (unrelated topics). This distance is typically measured using cosine similarity.
Cosine Similarity
Cosine similarity measures the angle between two vectors. A score of 1.0 means identical direction (identical meaning), 0.0 means orthogonal (unrelated), -1.0 means opposite.
import numpy as np
def cosine_similarity(a: list[float], b: list[float]) -> float:
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
print(cosine_similarity(e1, e2)) # similar
print(cosine_similarity(e1, e3)) # unrelated
0.8732
0.1204
Choosing an Embedding Model
Different models produce different-sized embeddings with different quality/cost tradeoffs:
| Model | Dimensions | Best for |
|---|---|---|
text-embedding-3-small (OpenAI) |
1536 | General use, low cost |
text-embedding-3-large (OpenAI) |
3072 | Higher accuracy tasks |
nomic-embed-text (Ollama, local) |
768 | Local/offline pipelines |
all-MiniLM-L6-v2 (Sentence Transformers) |
384 | Fast, open source, no API |
text-embedding-004 (Google) |
768 | Google Cloud users |
More dimensions generally means higher accuracy but more storage and slower queries.
What is a Vector Database?
A vector database is a database optimized to store and search embeddings. Unlike a traditional database where you’d query WHERE name = 'Alice', a vector database answers “give me the 10 embeddings most similar to this query embedding” — fast, even across millions of vectors.
The key algorithm enabling this is Approximate Nearest Neighbor (ANN) search. Exact nearest-neighbor search across millions of 1536-dimensional vectors would be prohibitively slow; ANN trades a small accuracy margin for massive speed gains.
Common ANN algorithms:
- HNSW (Hierarchical Navigable Small World) — fast queries, high memory use
- IVF (Inverted File Index) — good balance of speed and memory
- Flat — exact search, only viable for small datasets
pgvector: Vector Search in PostgreSQL
If you’re already on Postgres, pgvector is the easiest path — you get vector search in the same database as the rest of your app, with no new infrastructure.
$ psql -c "CREATE EXTENSION vector;"
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
Inserting documents:
import psycopg2
import json
conn = psycopg2.connect("dbname=myapp user=postgres")
cur = conn.cursor()
text = "Indexes speed up database queries by avoiding full table scans."
embedding = embed(text)
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(text, json.dumps(embedding)),
)
conn.commit()
Semantic search:
query = "How do databases find rows faster?"
query_embedding = embed(query)
cur.execute("""
SELECT content, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT 5
""", (json.dumps(query_embedding), json.dumps(query_embedding)))
for row in cur.fetchall():
print(f"[{row[1]:.3f}] {row[0]}")
[0.873] Indexes speed up database queries by avoiding full table scans.
[0.791] B-tree indexes store data in sorted order for efficient range queries.
[0.734] Query planners use statistics to decide whether to use an index.
The <=> operator computes cosine distance. Lower distance = higher similarity; the 1 - distance gives you a similarity score between 0 and 1.
Pinecone: Managed Vector Database
Pinecone is a fully managed vector database built specifically for embedding storage and retrieval. There’s nothing to install or configure — you create an index and start upserting vectors.
$ pip install pinecone-client
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="YOUR_API_KEY")
pc.create_index(
name="knowledge-base",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("knowledge-base")
Upserting vectors:
docs = [
("doc1", "Indexes speed up database queries by avoiding full table scans."),
("doc2", "B-tree indexes store data in sorted order for range queries."),
("doc3", "Connection pooling reuses database connections to reduce overhead."),
]
vectors = [(id, embed(text), {"text": text}) for id, text in docs]
index.upsert(vectors=vectors)
Querying:
results = index.query(
vector=embed("How do databases find rows faster?"),
top_k=3,
include_metadata=True,
)
for match in results["matches"]:
print(f"[{match['score']:.3f}] {match['metadata']['text']}")
[0.873] Indexes speed up database queries by avoiding full table scans.
[0.791] B-tree indexes store data in sorted order for range queries.
[0.698] Query planners use statistics to decide whether to use an index.
pgvector vs Pinecone
| pgvector | Pinecone | |
|---|---|---|
| Setup | Self-hosted, needs Postgres | Fully managed |
| Scale | Up to ~1M vectors comfortably | Billions of vectors |
| Cost | Compute for your Postgres instance | Per vector stored + queries |
| Metadata filtering | Full SQL | Limited filter syntax |
| Good for | Apps already on Postgres | Large-scale, pure vector workloads |
For most applications starting out, pgvector is the pragmatic choice — you likely already have Postgres, and it handles millions of vectors comfortably. Pinecone becomes compelling when you’re at the scale where managing Postgres infrastructure for vector workloads becomes a burden.
Conclusion
Embeddings convert meaning into geometry, and vector databases make that geometry searchable at scale. Once you understand that cosine similarity is just measuring how close two vectors point, the whole stack clicks into place — RAG, semantic search, and recommendation systems all become variations on the same pattern. Start with pgvector if you’re already on Postgres; migrate to a dedicated vector database if and when you genuinely need the scale.