<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Vector-Database on David Lang</title>
    <link>https://www.davidlang.tech/tags/vector-database/</link>
    <description>Recent content in Vector-Database on David Lang</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Mon, 22 Apr 2024 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://www.davidlang.tech/tags/vector-database/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Vector Databases: Pinecone, Weaviate, and Chroma Compared</title>
      <link>https://www.davidlang.tech/posts/vector-databases-pinecone-weaviate-and-chroma-compared/</link>
      <pubDate>Mon, 22 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/vector-databases-pinecone-weaviate-and-chroma-compared/</guid>
      <description>&lt;p&gt;Vector databases store embeddings and perform similarity search-the retrieval layer in RAG and recommendation systems.&lt;/p&gt;&#xA;&lt;h2 id=&#34;comparison&#34;&gt;Comparison&lt;/h2&gt;&#xA;&lt;table&gt;&#xA;  &lt;thead&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;th&gt;&lt;/th&gt;&#xA;          &lt;th&gt;Pinecone&lt;/th&gt;&#xA;          &lt;th&gt;Weaviate&lt;/th&gt;&#xA;          &lt;th&gt;Chroma&lt;/th&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/thead&gt;&#xA;  &lt;tbody&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Hosting&lt;/td&gt;&#xA;          &lt;td&gt;Managed cloud&lt;/td&gt;&#xA;          &lt;td&gt;Self-host or cloud&lt;/td&gt;&#xA;          &lt;td&gt;Embedded / local&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Best for&lt;/td&gt;&#xA;          &lt;td&gt;Production scale&lt;/td&gt;&#xA;          &lt;td&gt;Hybrid search + GraphQL&lt;/td&gt;&#xA;          &lt;td&gt;Prototyping&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;      &lt;tr&gt;&#xA;          &lt;td&gt;Ops burden&lt;/td&gt;&#xA;          &lt;td&gt;Low&lt;/td&gt;&#xA;          &lt;td&gt;Medium&lt;/td&gt;&#xA;          &lt;td&gt;Low&lt;/td&gt;&#xA;      &lt;/tr&gt;&#xA;  &lt;/tbody&gt;&#xA;&lt;/table&gt;&#xA;&lt;h2 id=&#34;pgvector-alternative&#34;&gt;pgvector Alternative&lt;/h2&gt;&#xA;&lt;p&gt;PostgreSQL with pgvector keeps vectors beside relational data-excellent when you already run Postgres and need ACID transactions.&lt;/p&gt;&#xA;&lt;h2 id=&#34;selection-criteria&#34;&gt;Selection Criteria&lt;/h2&gt;&#xA;&lt;p&gt;Consider QPS, filtering (metadata predicates), hybrid keyword + vector search, cost, and data residency. Prototype on Chroma or pgvector; migrate to Pinecone or Weaviate at scale.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building RAG Systems: Retrieval-Augmented Generation Explained</title>
      <link>https://www.davidlang.tech/posts/building-rag-systems-retrieval-augmented-generation-explained/</link>
      <pubDate>Thu, 18 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/building-rag-systems-retrieval-augmented-generation-explained/</guid>
      <description>&lt;p&gt;RAG grounds LLM responses in your private data by retrieving relevant documents before generation. It reduces hallucinations and keeps answers current without retraining models.&lt;/p&gt;&#xA;&lt;h2 id=&#34;pipeline-overview&#34;&gt;Pipeline Overview&lt;/h2&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&lt;strong&gt;Ingest&lt;/strong&gt; - Load PDFs, wikis, tickets into chunks (500–1000 tokens).&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; - Convert chunks to vectors with an embedding model.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Store&lt;/strong&gt; - Save vectors in Pinecone, pgvector, or Chroma.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Retrieve&lt;/strong&gt; - On query, embed the question and find top-k similar chunks.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Generate&lt;/strong&gt; - Pass chunks as context to the LLM.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;context &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;.join(retrieved_chunks)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;prompt &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;Use only this context:&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{context}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Question: {user_query}&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;chunking-strategy&#34;&gt;Chunking Strategy&lt;/h2&gt;&#xA;&lt;p&gt;Overlap chunks by 10–20% to avoid cutting sentences. Metadata (source, page) helps citations and debugging.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
