<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Llm on David Lang</title>
    <link>https://www.davidlang.tech/tags/llm/</link>
    <description>Recent content in Llm on David Lang</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Fri, 10 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://www.davidlang.tech/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Context Window Strategies: Making the Most of Long-Context LLMs</title>
      <link>https://www.davidlang.tech/posts/context-window-strategies-making-the-most-of-long-context-llms/</link>
      <pubDate>Fri, 10 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/context-window-strategies-making-the-most-of-long-context-llms/</guid>
      <description>&lt;p&gt;Million-token context windows tempt teams to dump entire repos into prompts. That is expensive, slow, and often less accurate than targeted retrieval.&lt;/p&gt;&#xA;&lt;h2 id=&#34;when-full-context-helps&#34;&gt;When Full Context Helps&lt;/h2&gt;&#xA;&lt;p&gt;Single-file refactors, analyzing one large document, comparing a few long contracts.&lt;/p&gt;&#xA;&lt;h2 id=&#34;when-retrieval-wins&#34;&gt;When Retrieval Wins&lt;/h2&gt;&#xA;&lt;p&gt;Whole codebases, ticket backlogs, and wiki sites-embed, filter, rerank, then pass top-k chunks.&lt;/p&gt;&#xA;&lt;h2 id=&#34;compression-techniques&#34;&gt;Compression Techniques&lt;/h2&gt;&#xA;&lt;p&gt;Summarize conversation history. Use hierarchical memory (session summary + recent turns). Strip comments and generated noise from code context.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building Reliable AI Agents: Lessons from Production</title>
      <link>https://www.davidlang.tech/posts/building-reliable-ai-agents-lessons-from-production/</link>
      <pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/building-reliable-ai-agents-lessons-from-production/</guid>
      <description>&lt;p&gt;Production agents fail in boring ways: timeouts, tool errors, runaway loops, and silent wrong answers. Reliability engineering applies to agents too.&lt;/p&gt;&#xA;&lt;h2 id=&#34;hardening-checklist&#34;&gt;Hardening Checklist&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Max steps and token budgets per session&lt;/li&gt;&#xA;&lt;li&gt;Idempotent tools with clear error messages&lt;/li&gt;&#xA;&lt;li&gt;Checkpoint state for long workflows&lt;/li&gt;&#xA;&lt;li&gt;Circuit breakers when external APIs fail&lt;/li&gt;&#xA;&lt;li&gt;Structured logging of every tool call&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;graceful-degradation&#34;&gt;Graceful Degradation&lt;/h2&gt;&#xA;&lt;p&gt;When the agent fails, fall back to search-only RAG or human handoff-never an empty error.&lt;/p&gt;</description>
    </item>
    <item>
      <title>From RAG to Agentic AI: What&#39;s Next for LLM-Powered Apps</title>
      <link>https://www.davidlang.tech/posts/from-rag-to-agentic-ai-whats-next-for-llm-powered-apps/</link>
      <pubDate>Mon, 01 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/from-rag-to-agentic-ai-whats-next-for-llm-powered-apps/</guid>
      <description>&lt;p&gt;The industry moved from chatbots → RAG → agents. Understanding the progression helps you invest in the right layer for your product maturity.&lt;/p&gt;&#xA;&lt;h2 id=&#34;rag-era&#34;&gt;RAG Era&lt;/h2&gt;&#xA;&lt;p&gt;Ground models in private data. Mature patterns: chunking, hybrid search, citations. Still the right default for Q&amp;amp;A and search.&lt;/p&gt;&#xA;&lt;h2 id=&#34;agent-era&#34;&gt;Agent Era&lt;/h2&gt;&#xA;&lt;p&gt;Models call tools, plan multi-step workflows, and maintain state. Higher capability, higher risk.&lt;/p&gt;&#xA;&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s Next&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Evals-as-code&lt;/strong&gt; in every pipeline&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Smaller specialist models&lt;/strong&gt; routed by orchestrators&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;On-device&lt;/strong&gt; for privacy-sensitive steps&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Human-agent collaboration&lt;/strong&gt; UIs, not just chat&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;migration-path&#34;&gt;Migration Path&lt;/h2&gt;&#xA;&lt;p&gt;Master RAG and evals first. Add one well-scoped agent tool. Measure task completion before expanding autonomy.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Evaluating LLM Outputs: RAGAS, DeepEval, and Custom Metrics</title>
      <link>https://www.davidlang.tech/posts/evaluating-llm-outputs-ragas-deepeval-and-custom-metrics/</link>
      <pubDate>Sat, 18 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/evaluating-llm-outputs-ragas-deepeval-and-custom-metrics/</guid>
      <description>&lt;p&gt;Frameworks like RAGAS and DeepEval codify LLM evaluation metrics so you can regression-test prompts and pipelines in CI.&lt;/p&gt;&#xA;&lt;h2 id=&#34;ragas-rag-assessment&#34;&gt;RAGAS (RAG Assessment)&lt;/h2&gt;&#xA;&lt;p&gt;Measures context precision/recall, faithfulness, and answer relevance-ideal for retrieval pipelines.&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#719e07&#34;&gt;from&lt;/span&gt; ragas &lt;span style=&#34;color:#719e07&#34;&gt;import&lt;/span&gt; evaluate&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#719e07&#34;&gt;from&lt;/span&gt; ragas.metrics &lt;span style=&#34;color:#719e07&#34;&gt;import&lt;/span&gt; faithfulness, answer_relevancy&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;result &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; evaluate(dataset&lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt;eval_dataset, metrics&lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt;[faithfulness, answer_relevancy])&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;deepeval&#34;&gt;DeepEval&lt;/h2&gt;&#xA;&lt;p&gt;Offers pytest-style LLM tests, G-Eval, and hallucination metrics with CI integration.&lt;/p&gt;&#xA;&lt;h2 id=&#34;custom-metrics&#34;&gt;Custom Metrics&lt;/h2&gt;&#xA;&lt;p&gt;Domain-specific checks often outperform generic scores-JSON schema match, SQL execution success, unit test pass rate for codegen.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building Multi-Agent AI Systems</title>
      <link>https://www.davidlang.tech/posts/building-multi-agent-ai-systems/</link>
      <pubDate>Tue, 20 May 2025 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/building-multi-agent-ai-systems/</guid>
      <description>&lt;p&gt;Multi-agent systems divide work among specialized agents-a researcher, coder, critic-coordinated by a supervisor or message bus.&lt;/p&gt;&#xA;&lt;h2 id=&#34;patterns&#34;&gt;Patterns&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Supervisor&lt;/strong&gt; - One model delegates subtasks and aggregates results.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Peer-to-peer&lt;/strong&gt; - Agents message each other until consensus or max rounds.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Pipeline&lt;/strong&gt; - Fixed stages (plan → implement → test).&lt;/p&gt;&#xA;&lt;h2 id=&#34;implementation-tips&#34;&gt;Implementation Tips&lt;/h2&gt;&#xA;&lt;p&gt;Give each agent a narrow system prompt and tool set. Pass structured state (JSON) between agents, not raw chat logs.&lt;/p&gt;&#xA;&lt;h2 id=&#34;failure-modes&#34;&gt;Failure Modes&lt;/h2&gt;&#xA;&lt;p&gt;Infinite loops, duplicated work, conflicting edits. Enforce step limits, idempotent tools, and single-writer rules for shared files.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Improving LLM Accuracy: Techniques Beyond Prompt Engineering</title>
      <link>https://www.davidlang.tech/posts/improving-llm-accuracy-techniques-beyond-prompt-engineering/</link>
      <pubDate>Tue, 25 Mar 2025 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/improving-llm-accuracy-techniques-beyond-prompt-engineering/</guid>
      <description>&lt;p&gt;When prompts plateau, these engineering levers move accuracy more than another adjective in the system message.&lt;/p&gt;&#xA;&lt;h2 id=&#34;better-retrieval&#34;&gt;Better Retrieval&lt;/h2&gt;&#xA;&lt;p&gt;Hybrid search (BM25 + vectors), rerankers (Cohere, cross-encoders), and metadata filters reduce wrong context reaching the model.&lt;/p&gt;&#xA;&lt;h2 id=&#34;structured-outputs&#34;&gt;Structured Outputs&lt;/h2&gt;&#xA;&lt;p&gt;Force JSON with schemas (Zod, Pydantic, OpenAI structured outputs). Parse failures trigger retry with repair prompts.&lt;/p&gt;&#xA;&lt;h2 id=&#34;model-routing&#34;&gt;Model Routing&lt;/h2&gt;&#xA;&lt;p&gt;Small models classify intent; large models answer hard questions. Cuts cost and reduces overconfident rambling on simple queries.&lt;/p&gt;</description>
    </item>
    <item>
      <title>How to Validate and Measure LLM Accuracy in Production</title>
      <link>https://www.davidlang.tech/posts/how-to-validate-and-measure-llm-accuracy-in-production/</link>
      <pubDate>Tue, 18 Feb 2025 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/how-to-validate-and-measure-llm-accuracy-in-production/</guid>
      <description>&lt;p&gt;Shipping an LLM feature without measurement is shipping a bug generator. Production validation combines automated metrics, human review, and business KPIs.&lt;/p&gt;&#xA;&lt;h2 id=&#34;levels-of-evaluation&#34;&gt;Levels of Evaluation&lt;/h2&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&lt;strong&gt;Unit-level&lt;/strong&gt; - Schema validation, regex checks, refusal detection&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Golden set&lt;/strong&gt; - Curated Q&amp;amp;A pairs scored automatically&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Online&lt;/strong&gt; - User thumbs, task completion, support escalations&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Human&lt;/strong&gt; - Expert rubrics on sampled traffic&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;h2 id=&#34;metrics-that-matter&#34;&gt;Metrics That Matter&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;strong&gt;Faithfulness&lt;/strong&gt; - Answer grounded in retrieved context (RAG)&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Relevance&lt;/strong&gt; - Addresses the user question&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Toxicity / PII&lt;/strong&gt; - Safety filters&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Latency and cost&lt;/strong&gt; - p95 tokens and dollars per session&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;implementation-sketch&#34;&gt;Implementation Sketch&lt;/h2&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#719e07&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#268bd2&#34;&gt;validate_response&lt;/span&gt;(answer: &lt;span style=&#34;color:#b58900&#34;&gt;str&lt;/span&gt;, context: &lt;span style=&#34;color:#b58900&#34;&gt;str&lt;/span&gt;) &lt;span style=&#34;color:#719e07&#34;&gt;-&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#b58900&#34;&gt;dict&lt;/span&gt;:&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#719e07&#34;&gt;return&lt;/span&gt; {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;has_citation&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;[source:&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#719e07&#34;&gt;in&lt;/span&gt; answer,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;length_ok&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;50&lt;/span&gt; &lt;span style=&#34;color:#719e07&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#b58900&#34;&gt;len&lt;/span&gt;(answer) &lt;span style=&#34;color:#719e07&#34;&gt;&amp;lt;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;4000&lt;/span&gt;,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;grounded&amp;#34;&lt;/span&gt;: entailment_score(context, answer) &lt;span style=&#34;color:#719e07&#34;&gt;&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;0.7&lt;/span&gt;,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Log scores to your observability stack (Datadog, LangSmith, Phoenix).&lt;/p&gt;</description>
    </item>
    <item>
      <title>AI-Powered Code Review: Integrating LLMs into Dev Workflows</title>
      <link>https://www.davidlang.tech/posts/ai-powered-code-review-integrating-llms-into-dev-workflows/</link>
      <pubDate>Sun, 22 Sep 2024 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/ai-powered-code-review-integrating-llms-into-dev-workflows/</guid>
      <description>&lt;p&gt;LLMs can summarize diffs, flag security smells, and suggest tests-but they should augment human review, not replace it.&lt;/p&gt;&#xA;&lt;h2 id=&#34;ci-integration&#34;&gt;CI Integration&lt;/h2&gt;&#xA;&lt;p&gt;Post PR diffs to an LLM with a structured prompt. Output JSON findings consumed by GitHub Actions or GitLab CI. Fail builds only on high-severity, high-confidence issues to reduce noise.&lt;/p&gt;&#xA;&lt;h2 id=&#34;prompt-design-for-reviews&#34;&gt;Prompt Design for Reviews&lt;/h2&gt;&#xA;&lt;p&gt;Include: changed files, diff hunks, coding standards doc, and explicit instruction to cite line numbers and avoid nits.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Claude API vs OpenAI API: Choosing the Right LLM</title>
      <link>https://www.davidlang.tech/posts/claude-api-vs-openai-api-choosing-the-right-llm/</link>
      <pubDate>Wed, 14 Aug 2024 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/claude-api-vs-openai-api-choosing-the-right-llm/</guid>
      <description>&lt;p&gt;Anthropic&amp;rsquo;s Claude and OpenAI&amp;rsquo;s GPT families both offer strong APIs. Choosing between them depends on task, context length, cost, and compliance-not benchmark hype alone.&lt;/p&gt;&#xA;&lt;h2 id=&#34;strengths-at-a-glance&#34;&gt;Strengths at a Glance&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Claude&lt;/strong&gt; - Long context windows, careful refusals, strong long-document analysis and coding reviews.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;OpenAI&lt;/strong&gt; - Broad ecosystem, function calling maturity, image and audio modalities, largest third-party integration surface.&lt;/p&gt;&#xA;&lt;h2 id=&#34;integration-pattern&#34;&gt;Integration Pattern&lt;/h2&gt;&#xA;&lt;p&gt;Abstract the provider behind an interface:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-typescript&#34; data-lang=&#34;typescript&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;interface&lt;/span&gt; LLMProvider {&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  chat(messages: &lt;span style=&#34;color:#dc322f&#34;&gt;Message&lt;/span&gt;[])&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; Promise&amp;lt;&lt;span style=&#34;color:#268bd2&#34;&gt;string&lt;/span&gt;&amp;gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Swap implementations per route (cheap model for classification, premium for generation).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Fine-Tuning LLMs: When and How to Customize AI Models</title>
      <link>https://www.davidlang.tech/posts/fine-tuning-llms-when-and-how-to-customize-ai-models/</link>
      <pubDate>Wed, 15 May 2024 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/fine-tuning-llms-when-and-how-to-customize-ai-models/</guid>
      <description>&lt;p&gt;Fine-tuning adapts a base model to your domain with labeled examples. Use it when prompting and RAG cannot achieve consistent style, format, or task-specific behavior.&lt;/p&gt;&#xA;&lt;h2 id=&#34;when-to-fine-tune&#34;&gt;When to Fine-Tune&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Fixed output schema (legal clauses, medical codes)&lt;/li&gt;&#xA;&lt;li&gt;Brand voice across thousands of responses&lt;/li&gt;&#xA;&lt;li&gt;Specialized terminology poorly covered by general models&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;when-not-to-fine-tune&#34;&gt;When Not to Fine-Tune&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Facts that change frequently (use RAG)&lt;/li&gt;&#xA;&lt;li&gt;One-off tasks (use prompting)&lt;/li&gt;&#xA;&lt;li&gt;Small datasets without validation (risk overfitting)&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;openai-fine-tuning-flow&#34;&gt;OpenAI Fine-Tuning Flow&lt;/h2&gt;&#xA;&lt;p&gt;Prepare JSONL with &lt;code&gt;messages&lt;/code&gt; arrays. Upload, create job, evaluate on a holdout set. Monitor loss and human ratings before promoting to production.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building RAG Systems: Retrieval-Augmented Generation Explained</title>
      <link>https://www.davidlang.tech/posts/building-rag-systems-retrieval-augmented-generation-explained/</link>
      <pubDate>Thu, 18 Jan 2024 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/building-rag-systems-retrieval-augmented-generation-explained/</guid>
      <description>&lt;p&gt;RAG grounds LLM responses in your private data by retrieving relevant documents before generation. It reduces hallucinations and keeps answers current without retraining models.&lt;/p&gt;&#xA;&lt;h2 id=&#34;pipeline-overview&#34;&gt;Pipeline Overview&lt;/h2&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;&lt;strong&gt;Ingest&lt;/strong&gt; - Load PDFs, wikis, tickets into chunks (500–1000 tokens).&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; - Convert chunks to vectors with an embedding model.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Store&lt;/strong&gt; - Save vectors in Pinecone, pgvector, or Chroma.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Retrieve&lt;/strong&gt; - On query, embed the question and find top-k similar chunks.&lt;/li&gt;&#xA;&lt;li&gt;&lt;strong&gt;Generate&lt;/strong&gt; - Pass chunks as context to the LLM.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;context &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;.join(retrieved_chunks)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;prompt &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;Use only this context:&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{context}&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Question: {user_query}&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;chunking-strategy&#34;&gt;Chunking Strategy&lt;/h2&gt;&#xA;&lt;p&gt;Overlap chunks by 10–20% to avoid cutting sentences. Metadata (source, page) helps citations and debugging.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Prompt Engineering Fundamentals for Developers</title>
      <link>https://www.davidlang.tech/posts/prompt-engineering-fundamentals-for-developers/</link>
      <pubDate>Sat, 14 Oct 2023 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/prompt-engineering-fundamentals-for-developers/</guid>
      <description>&lt;p&gt;Prompt engineering is the practice of designing inputs so LLMs produce reliable, useful outputs. Developers who treat prompts as code ship better AI features.&lt;/p&gt;&#xA;&lt;h2 id=&#34;structure-your-prompts&#34;&gt;Structure Your Prompts&lt;/h2&gt;&#xA;&lt;p&gt;Use clear sections: role, context, task, format, and constraints.&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-fallback&#34; data-lang=&#34;fallback&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You are a code reviewer for a TypeScript React codebase.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Context: PR diff below.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Task: List bugs, security issues, and style problems.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Format: JSON array of { severity, file, message }.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Constraints: Max 10 items. No speculation beyond the diff.&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;few-shot-examples&#34;&gt;Few-Shot Examples&lt;/h2&gt;&#xA;&lt;p&gt;Include 2–3 input/output pairs for classification or extraction tasks. Examples beat lengthy instructions for format adherence.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Introduction to LangChain: Building AI-Powered Apps</title>
      <link>https://www.davidlang.tech/posts/introduction-to-langchain-building-ai-powered-apps/</link>
      <pubDate>Wed, 08 Mar 2023 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/introduction-to-langchain-building-ai-powered-apps/</guid>
      <description>&lt;p&gt;LangChain composes LLM calls with prompts, memory, tools, and retrieval. It standardizes patterns that every AI app eventually needs.&lt;/p&gt;&#xA;&lt;h2 id=&#34;chains-and-prompts&#34;&gt;Chains and Prompts&lt;/h2&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#719e07&#34;&gt;from&lt;/span&gt; langchain_openai &lt;span style=&#34;color:#719e07&#34;&gt;import&lt;/span&gt; ChatOpenAI&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#719e07&#34;&gt;from&lt;/span&gt; langchain_core.prompts &lt;span style=&#34;color:#719e07&#34;&gt;import&lt;/span&gt; ChatPromptTemplate&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;llm &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; ChatOpenAI(model&lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;gpt-4&amp;#34;&lt;/span&gt;)&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;prompt &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; ChatPromptTemplate&lt;span style=&#34;color:#719e07&#34;&gt;.&lt;/span&gt;from_messages([&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    (&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;system&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;Answer as a senior engineer.&amp;#34;&lt;/span&gt;),&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    (&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;{question}&lt;/span&gt;&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;&lt;/span&gt;),&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;])&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;chain &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; prompt &lt;span style=&#34;color:#719e07&#34;&gt;|&lt;/span&gt; llm&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;response &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; chain&lt;span style=&#34;color:#719e07&#34;&gt;.&lt;/span&gt;invoke({&lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#34;What is RAG?&amp;#34;&lt;/span&gt;})&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;retrieval&#34;&gt;Retrieval&lt;/h2&gt;&#xA;&lt;p&gt;Load documents, chunk text, embed with OpenAI or open models, store in a vector DB, and retrieve relevant chunks at query time-foundation for RAG systems.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Getting Started with the OpenAI API in Node.js</title>
      <link>https://www.davidlang.tech/posts/getting-started-with-the-openai-api-in-nodejs/</link>
      <pubDate>Thu, 12 Jan 2023 00:00:00 +0000</pubDate>
      <guid>https://www.davidlang.tech/posts/getting-started-with-the-openai-api-in-nodejs/</guid>
      <description>&lt;p&gt;The OpenAI API brought large language models to application developers through a simple HTTP interface. Node.js remains a natural fit for BFF layers that call LLMs.&lt;/p&gt;&#xA;&lt;h2 id=&#34;installation-and-first-request&#34;&gt;Installation and First Request&lt;/h2&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;npm install openai&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-typescript&#34; data-lang=&#34;typescript&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;import&lt;/span&gt; OpenAI &lt;span style=&#34;color:#268bd2&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#39;openai&amp;#39;&lt;/span&gt;;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;const&lt;/span&gt; client &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#719e07&#34;&gt;new&lt;/span&gt; OpenAI({ apiKey: &lt;span style=&#34;color:#dc322f&#34;&gt;process.env.OPENAI_API_KEY&lt;/span&gt; });&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#268bd2&#34;&gt;const&lt;/span&gt; completion &lt;span style=&#34;color:#719e07&#34;&gt;=&lt;/span&gt; &lt;span style=&#34;color:#719e07&#34;&gt;await&lt;/span&gt; client.chat.completions.create({&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  model&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#39;gpt-4&amp;#39;&lt;/span&gt;,&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  messages&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; [&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    { role&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#39;system&amp;#39;&lt;/span&gt;, content&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#39;You are a helpful coding assistant.&amp;#39;&lt;/span&gt; },&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    { role&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#39;user&amp;#39;&lt;/span&gt;, content&lt;span style=&#34;color:#719e07&#34;&gt;:&lt;/span&gt; &lt;span style=&#34;color:#2aa198&#34;&gt;&amp;#39;Explain async/await in JavaScript.&amp;#39;&lt;/span&gt; },&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  ],&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;});&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;console.log(completion.choices[&lt;span style=&#34;color:#2aa198&#34;&gt;0&lt;/span&gt;].message.content);&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;production-considerations&#34;&gt;Production Considerations&lt;/h2&gt;&#xA;&lt;p&gt;Never expose API keys in frontend bundles. Proxy requests through your backend. Set &lt;code&gt;max_tokens&lt;/code&gt;, timeouts, and retry policies. Log token usage for cost control.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
