Ai on David Lang

Context Window Strategies: Making the Most of Long-Context LLMs

Fri, 10 Apr 2026 00:00:00 +0000

Million-token context windows tempt teams to dump entire repos into prompts. That is expensive, slow, and often less accurate than targeted retrieval.

When Full Context Helps

Single-file refactors, analyzing one large document, comparing a few long contracts.

When Retrieval Wins

Whole codebases, ticket backlogs, and wiki sites-embed, filter, rerank, then pass top-k chunks.

Compression Techniques

Summarize conversation history. Use hierarchical memory (session summary + recent turns). Strip comments and generated noise from code context.

The State of AI Coding Assistants in 2026

Thu, 15 Jan 2026 00:00:00 +0000

By 2026, AI coding assistants are standard in professional workflows-not experiments. The landscape consolidated around a few patterns: inline completion, IDE agents, and terminal agents.

Market Snapshot

Cursor leads among developers who want an AI-native editor with codebase-wide context. GitHub Copilot remains the enterprise default tied to GitHub and Microsoft ecosystems. Claude Code and similar terminal agents dominate backend and automation workflows. Windsurf, Cody, and others compete on price and niche features.

Evaluating LLM Outputs: RAGAS, DeepEval, and Custom Metrics

Sat, 18 Oct 2025 00:00:00 +0000

Frameworks like RAGAS and DeepEval codify LLM evaluation metrics so you can regression-test prompts and pipelines in CI.

RAGAS (RAG Assessment)

Measures context precision/recall, faithfulness, and answer relevance-ideal for retrieval pipelines.

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy

result = evaluate(dataset=eval_dataset, metrics=[faithfulness, answer_relevancy])

DeepEval

Offers pytest-style LLM tests, G-Eval, and hallucination metrics with CI integration.

Custom Metrics

Domain-specific checks often outperform generic scores-JSON schema match, SQL execution success, unit test pass rate for codegen.

FastAPI + LangChain: Building Production-Ready AI APIs

Fri, 05 Sep 2025 00:00:00 +0000

FastAPI’s async support and automatic OpenAPI docs pair naturally with LangChain for production AI backends.

Project Structure

app/
  main.py
  routers/chat.py
  services/rag.py
  models/schemas.py

Async Endpoint

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class ChatRequest(BaseModel):
    message: str

@app.post("/chat")
async def chat(req: ChatRequest):
    result = await rag_chain.ainvoke({"input": req.message})
    return {"answer": result["answer"]}

Production Checklist

Rate limiting, API keys, structured logging, health checks, timeout on LLM calls, background tasks for long ingest jobs.

AI-First Development: Rethinking Your Engineering Workflow

Tue, 22 Jul 2025 00:00:00 +0000

AI-first development means designing processes assuming LLMs and agents participate in design, implementation, and review-not bolting a chatbot onto waterfall.

Shifts in Practice

Specs - Write acceptance criteria LLMs can verify (tests, schemas).

Architecture - Smaller modules with clear boundaries agents can reason about.

Reviews - AI first pass, human mandatory for security and product judgment.

Documentation - Keep AGENTS.md or rules files current so tools understand conventions.

Team Rituals

Start stories with a prompt draft. Pair with AI for spikes; human pair for production-critical paths. Track AI-assisted PR defect rates.

Claude Code: Agentic AI Coding from the Terminal

Mon, 30 Jun 2025 00:00:00 +0000

Claude Code brings agentic coding to the terminal-read files, edit code, run tests, and commit changes through natural language, powered by Anthropic’s models.

Workflow

Run from your repository root. Ask for features or fixes in plain language. Claude Code explores the tree, proposes edits, and executes commands with your approval.

Strengths

Strong on refactors spanning many files, understanding build errors from test output, and following git history. Terminal-native fits backend and DevOps workflows.

MCP (Model Context Protocol): The Future of AI Tool Integration

Tue, 08 Apr 2025 00:00:00 +0000

Model Context Protocol (MCP) standardizes how AI applications connect to data sources and tools-filesystems, databases, APIs, and IDEs speak a common protocol.

Why MCP Matters

Before MCP, every agent framework invented its own plugin format. MCP provides discoverable tools and resources with typed schemas-like LSP for AI tools.

Architecture

Host - Cursor, Claude Desktop, custom agent
MCP Server - Exposes tools (query_db, read_file) and resources
Transport - stdio or SSE

Developers implement servers once; any MCP-compatible host can use them.

Improving LLM Accuracy: Techniques Beyond Prompt Engineering

Tue, 25 Mar 2025 00:00:00 +0000

When prompts plateau, these engineering levers move accuracy more than another adjective in the system message.

Better Retrieval

Hybrid search (BM25 + vectors), rerankers (Cohere, cross-encoders), and metadata filters reduce wrong context reaching the model.

Structured Outputs

Force JSON with schemas (Zod, Pydantic, OpenAI structured outputs). Parse failures trigger retry with repair prompts.

Model Routing

Small models classify intent; large models answer hard questions. Cuts cost and reduces overconfident rambling on simple queries.

How to Validate and Measure LLM Accuracy in Production

Tue, 18 Feb 2025 00:00:00 +0000

Shipping an LLM feature without measurement is shipping a bug generator. Production validation combines automated metrics, human review, and business KPIs.

Levels of Evaluation

Unit-level - Schema validation, regex checks, refusal detection
Golden set - Curated Q&A pairs scored automatically
Online - User thumbs, task completion, support escalations
Human - Expert rubrics on sampled traffic

Metrics That Matter

Faithfulness - Answer grounded in retrieved context (RAG)
Relevance - Addresses the user question
Toxicity / PII - Safety filters
Latency and cost - p95 tokens and dollars per session

Implementation Sketch

def validate_response(answer: str, context: str) -> dict:
    return {
        "has_citation": "[source:" in answer,
        "length_ok": 50 < len(answer) < 4000,
        "grounded": entailment_score(context, answer) > 0.7,
    }

Log scores to your observability stack (Datadog, LangSmith, Phoenix).

Cursor vs GitHub Copilot vs Claude Code: The AI Coding Assistant Showdown

Fri, 10 Jan 2025 00:00:00 +0000

AI coding assistants evolved from inline completions to agentic editors. Cursor, GitHub Copilot, and Claude Code represent three philosophies-knowing the differences helps you pick the right tool per task.

GitHub Copilot

Strengths: Deep IDE integration (VS Code, JetBrains), inline Tab completion, Copilot Chat, enterprise policies, broad language support.

Best for: Day-to-day completion inside your existing editor, teams already on GitHub, minimal workflow change.

Cursor

Strengths: AI-native editor (VS Code fork), multi-file edits, Composer agent, codebase indexing, rules and .cursorrules for project context, integrated terminal agent.

Multi-Modal AI: Working with Images and Text

Tue, 05 Nov 2024 00:00:00 +0000

Multi-modal models accept images and text in one request-enabling document OCR, UI screenshot analysis, and visual Q&A.

Vision API Example

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'What error is shown in this screenshot?' },
        { type: 'image_url', image_url: { url: imageDataUrl } },
      ],
    },
  ],
});

Use Cases

Receipt parsing, diagram explanation, accessibility alt-text generation, and visual regression triage.

AI-Powered Code Review: Integrating LLMs into Dev Workflows

Sun, 22 Sep 2024 00:00:00 +0000

LLMs can summarize diffs, flag security smells, and suggest tests-but they should augment human review, not replace it.

CI Integration

Post PR diffs to an LLM with a structured prompt. Output JSON findings consumed by GitHub Actions or GitLab CI. Fail builds only on high-severity, high-confidence issues to reduce noise.

Prompt Design for Reviews

Include: changed files, diff hunks, coding standards doc, and explicit instruction to cite line numbers and avoid nits.

Claude API vs OpenAI API: Choosing the Right LLM

Wed, 14 Aug 2024 00:00:00 +0000

Anthropic’s Claude and OpenAI’s GPT families both offer strong APIs. Choosing between them depends on task, context length, cost, and compliance-not benchmark hype alone.

Strengths at a Glance

Claude - Long context windows, careful refusals, strong long-document analysis and coding reviews.

OpenAI - Broad ecosystem, function calling maturity, image and audio modalities, largest third-party integration surface.

Integration Pattern

Abstract the provider behind an interface:

interface LLMProvider {
  chat(messages: Message[]): Promise<string>;
}

Swap implementations per route (cheap model for classification, premium for generation).

Fine-Tuning LLMs: When and How to Customize AI Models

Wed, 15 May 2024 00:00:00 +0000

Fine-tuning adapts a base model to your domain with labeled examples. Use it when prompting and RAG cannot achieve consistent style, format, or task-specific behavior.

When to Fine-Tune

Fixed output schema (legal clauses, medical codes)
Brand voice across thousands of responses
Specialized terminology poorly covered by general models

When Not to Fine-Tune

Facts that change frequently (use RAG)
One-off tasks (use prompting)
Small datasets without validation (risk overfitting)

OpenAI Fine-Tuning Flow

Prepare JSONL with messages arrays. Upload, create job, evaluate on a holdout set. Monitor loss and human ratings before promoting to production.

Vector Databases: Pinecone, Weaviate, and Chroma Compared

Mon, 22 Apr 2024 00:00:00 +0000

Vector databases store embeddings and perform similarity search-the retrieval layer in RAG and recommendation systems.

Comparison

	Pinecone	Weaviate	Chroma
Hosting	Managed cloud	Self-host or cloud	Embedded / local
Best for	Production scale	Hybrid search + GraphQL	Prototyping
Ops burden	Low	Medium	Low

pgvector Alternative

PostgreSQL with pgvector keeps vectors beside relational data-excellent when you already run Postgres and need ACID transactions.

Selection Criteria

Consider QPS, filtering (metadata predicates), hybrid keyword + vector search, cost, and data residency. Prototype on Chroma or pgvector; migrate to Pinecone or Weaviate at scale.

Streaming AI Responses with OpenAI API in Next.js

Sat, 30 Mar 2024 00:00:00 +0000

Streaming improves chat UX by showing tokens as they are generated. Next.js Route Handlers make it straightforward to proxy streams securely.

Route Handler

// app/api/chat/route.ts
import OpenAI from 'openai';

export async function POST(req: Request) {
  const { messages } = await req.json();
  const openai = new OpenAI();

  const stream = await openai.chat.completions.create({
    model: 'gpt-4',
    messages,
    stream: true,
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || '';
        if (text) controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}

Client Consumption

Use fetch with a reader loop or libraries like Vercel AI SDK’s useChat for React state management.

GitHub Copilot in Practice: AI-Assisted Development

Sun, 25 Feb 2024 00:00:00 +0000

GitHub Copilot suggests code inline and in chat, trained on public repositories. Used well, it accelerates boilerplate; used blindly, it introduces subtle bugs.

Effective Workflows

Write descriptive function names and docstrings-Copilot uses them as prompts. Accept suggestions in tests and CRUD handlers; scrutinize auth, crypto, and SQL.

Tab vs Chat

Inline completions excel for repetitive patterns. Copilot Chat handles explanations, refactors, and multi-file questions inside VS Code and JetBrains.

Team Policies

Define what code can be sent to cloud models. Some organizations restrict Copilot on regulated codebases. Review license implications for generated code.

Building RAG Systems: Retrieval-Augmented Generation Explained

Thu, 18 Jan 2024 00:00:00 +0000

RAG grounds LLM responses in your private data by retrieving relevant documents before generation. It reduces hallucinations and keeps answers current without retraining models.

Pipeline Overview

Ingest - Load PDFs, wikis, tickets into chunks (500–1000 tokens).
Embed - Convert chunks to vectors with an embedding model.
Store - Save vectors in Pinecone, pgvector, or Chroma.
Retrieve - On query, embed the question and find top-k similar chunks.
Generate - Pass chunks as context to the LLM.

context = "

".join(retrieved_chunks)
prompt = f"Use only this context:
{context}

Question: {user_query}"

Chunking Strategy

Overlap chunks by 10–20% to avoid cutting sentences. Metadata (source, page) helps citations and debugging.

Prompt Engineering Fundamentals for Developers

Sat, 14 Oct 2023 00:00:00 +0000

Prompt engineering is the practice of designing inputs so LLMs produce reliable, useful outputs. Developers who treat prompts as code ship better AI features.

Structure Your Prompts

Use clear sections: role, context, task, format, and constraints.

You are a code reviewer for a TypeScript React codebase.
Context: PR diff below.
Task: List bugs, security issues, and style problems.
Format: JSON array of { severity, file, message }.
Constraints: Max 10 items. No speculation beyond the diff.

Few-Shot Examples

Include 2–3 input/output pairs for classification or extraction tasks. Examples beat lengthy instructions for format adherence.

Introduction to LangChain: Building AI-Powered Apps

Wed, 08 Mar 2023 00:00:00 +0000

LangChain composes LLM calls with prompts, memory, tools, and retrieval. It standardizes patterns that every AI app eventually needs.

Chains and Prompts

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4")
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer as a senior engineer."),
    ("user", "{question}"),
])
chain = prompt | llm
response = chain.invoke({"question": "What is RAG?"})

Retrieval

Load documents, chunk text, embed with OpenAI or open models, store in a vector DB, and retrieve relevant chunks at query time-foundation for RAG systems.

Getting Started with the OpenAI API in Node.js

Thu, 12 Jan 2023 00:00:00 +0000

The OpenAI API brought large language models to application developers through a simple HTTP interface. Node.js remains a natural fit for BFF layers that call LLMs.

Installation and First Request

npm install openai

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const completion = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: 'You are a helpful coding assistant.' },
    { role: 'user', content: 'Explain async/await in JavaScript.' },
  ],
});

console.log(completion.choices[0].message.content);

Production Considerations

Never expose API keys in frontend bundles. Proxy requests through your backend. Set max_tokens, timeouts, and retry policies. Log token usage for cost control.