Module 24 · Section 24.1

Vibe-Coding & AI-Assisted Software Engineering

Code completion, fill-in-the-middle, agentic coding tools, code generation from specs, SWE-bench evaluation, and building a mini coding agent
★ Big Picture

Software engineering is being fundamentally reshaped by LLMs. "Vibe-coding" describes the emerging practice where developers describe what they want in natural language and AI writes the implementation. This ranges from inline code completion (Copilot, Cursor) through agentic coding assistants that execute multi-file changes (Claude Code, Devin) to full application generators that produce working apps from descriptions (Bolt, v0, Lovable). Understanding these tools, their architectures, and their limitations is essential for any developer working in the LLM era.

1. Code Completion and Fill-in-the-Middle

The simplest form of AI-assisted coding is inline completion: the developer writes code, and the model predicts what comes next. Modern code completion goes beyond simple next-token prediction with Fill-in-the-Middle (FIM), where the model sees both the code before and after the cursor position. This allows it to generate code that fits seamlessly into existing context rather than just appending to the end.

FIM Architecture

FIM works by rearranging the input during training. A code file is split into a prefix (before cursor), a middle (the target), and a suffix (after cursor). The model receives <PRE> prefix <SUF> suffix <MID> and learns to predict the middle section. This format, sometimes called PSM (Prefix-Suffix-Middle), teaches the model to generate code that is syntactically and semantically consistent with both surrounding contexts. Models like StarCoder, DeepSeek-Coder, and Codestral are trained with FIM from the start.

<PRE> Prefix def calculate(x, y): result = <SUF> Suffix return result FIM LLM <MID> predict Generated Middle x * y + x
Figure 24.1: Fill-in-the-Middle (FIM). The model receives prefix and suffix context, then generates code that fits between them.
# Using a FIM model directly (DeepSeek-Coder example)
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "deepseek-ai/deepseek-coder-6.7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

# FIM format: prefix + suffix, model fills the middle
prefix = """def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
"""

suffix = """
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1"""

# DeepSeek FIM tokens
fim_input = "<|fim_begin|>" + prefix + "<|fim_hole|>" + suffix + "<|fim_end|>"

inputs = tokenizer(fim_input, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2. AI-Native IDEs and Coding Assistants

The coding assistant landscape has evolved from simple autocomplete plugins into full AI-native development environments. These tools integrate LLM capabilities deeply into the editing experience, providing not just completions but also chat-based code editing, codebase-aware context, and multi-file refactoring.

Tool Type Key Feature Context Strategy
GitHub Copilot IDE Plugin Inline completion, chat Open files, neighboring tabs
Cursor AI-native IDE Cmd+K edit, Composer Codebase indexing, @-mentions
Windsurf AI-native IDE Cascade flows Proactive context gathering
Cline VS Code Extension Agentic file editing Tool use, file search
Claude Code CLI Agent Terminal-native agentic Full repo access, bash tools
📘 Context Engineering for Code

The quality of AI code generation depends heavily on the context provided to the model. Context engineering for coding involves: selecting the right files to include (open tabs, imports, related modules), providing project-specific conventions (via rules files or system prompts), including relevant documentation, and managing the context window budget effectively. Tools like Cursor's @codebase command and Claude Code's CLAUDE.md files let developers control exactly what context the model sees, dramatically improving output quality for project-specific tasks.

3. Agentic Coding

Agentic coding represents the next evolution: instead of suggesting completions that a developer accepts or rejects, the AI operates as an autonomous agent that can read files, write code, run tests, debug errors, and iterate until a task is complete. The developer provides a high-level description and the agent handles the implementation details.

How Agentic Coding Tools Work

Agentic coding tools follow a plan-execute-observe loop. The LLM receives a task description and access to tools (file read/write, terminal execution, search). It plans an approach, executes code changes, runs tests to verify, observes errors, and iterates. This is fundamentally the ReAct pattern (Module 21) applied to software engineering. The key differentiator between tools is how they manage context, what tools they expose, and how autonomously they operate.

Developer "Add auth to API" LLM Agent Plan Generate Code Debug Iterate Read/Write Files Run Terminal Search Code Codebase files, tests, git test results / errors
Figure 24.2: Agentic coding loop. The LLM agent plans, generates code, uses tools to interact with the codebase, and iterates based on test results and errors.
# Simplified agentic coding loop (conceptual)
from openai import OpenAI
import subprocess, json

client = OpenAI()

def coding_agent(task: str, max_iterations: int = 5):
    tools = [
        {"type": "function", "function": {
            "name": "read_file",
            "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}},
        {"type": "function", "function": {
            "name": "write_file",
            "parameters": {"type": "object", "properties": {
                "path": {"type": "string"}, "content": {"type": "string"}}}}},
        {"type": "function", "function": {
            "name": "run_command",
            "parameters": {"type": "object", "properties": {"cmd": {"type": "string"}}}}},
    ]

    messages = [{"role": "system", "content": "You are a coding agent. Use tools to complete the task."},
                {"role": "user", "content": task}]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages, tools=tools
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content  # Task complete

        for tc in msg.tool_calls:
            result = execute_tool(tc.function.name, json.loads(tc.function.arguments))
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

4. Code Generation from Specifications

A new category of tools generates complete applications from high-level descriptions. Bolt.new, Vercel's v0, and Lovable let users describe an app in natural language and receive a working, deployable application. These tools combine LLM code generation with scaffolding templates, component libraries, and deployment infrastructure to bridge the gap between description and running software.

SWE-bench: Evaluating Coding Agents

SWE-bench provides a rigorous benchmark for evaluating coding agents on real software engineering tasks. It collects actual GitHub issues from popular Python repositories along with their corresponding pull requests. The agent receives the issue description and must produce a patch that passes the repository's test suite. SWE-bench Verified (a human-validated subset of 500 problems) is the standard evaluation set, with top agents scoring around 50 to 60% as of early 2026.

# Evaluating an agent on SWE-bench (conceptual)
from swebench.harness.run_evaluation import run_evaluation

results = run_evaluation(
    predictions_path="predictions.json",  # Agent's generated patches
    swe_bench_tasks="princeton-nlp/SWE-bench_Verified",
    log_dir="./eval_logs",
    timeout=900,
)

print(f"Resolved: {results['resolved']} / {results['total']}")
print(f"Pass rate: {results['resolved'] / results['total'] * 100:.1f}%")
⚠ Risks of Vibe-Coding

AI-generated code carries real risks. Security vulnerabilities are common: models may generate code with SQL injection, hardcoded secrets, or insecure defaults. Subtle logic errors can pass tests but fail in production. Over-reliance on AI code can erode a developer's understanding of the codebase. License compliance is another concern, as models trained on open-source code may reproduce copyleft-licensed patterns. Always review AI-generated code carefully, maintain comprehensive test suites, and use static analysis tools as a safety net.

🔍 Key Insight

The most effective AI-assisted coding workflow is not "let the AI write everything" but rather a collaborative loop where the developer provides high-level direction, domain knowledge, and quality judgment while the AI handles boilerplate, implementation details, and repetitive refactoring. The developer's role shifts from writing every line to specifying intent, reviewing outputs, and maintaining architectural coherence. This is why context engineering (providing the right project files, conventions, and constraints to the model) is becoming as important as traditional coding skills.

Knowledge Check

1. How does Fill-in-the-Middle (FIM) differ from standard autoregressive code completion?
Show Answer
Standard autoregressive completion only sees the code before the cursor and predicts what comes next. FIM sees both the prefix (before cursor) and suffix (after cursor), generating code that fits between them. This produces completions that are consistent with both the preceding and following context, making it far more useful for editing code in the middle of existing files rather than just appending to the end.
2. What is the agentic coding loop and how does it differ from code completion?
Show Answer
The agentic coding loop is a plan-execute-observe cycle where an LLM agent reads files, writes code, runs terminal commands, observes results (test output, errors), and iterates until the task is complete. Unlike code completion, which suggests a few lines at a time for human approval, agentic coding operates autonomously across multiple files and can execute multi-step tasks like adding a feature, running tests, fixing failures, and committing the result.
3. What does SWE-bench measure and why is it significant?
Show Answer
SWE-bench evaluates coding agents on real GitHub issues from popular Python repositories. The agent must produce a patch that resolves the issue and passes the repository's test suite. It is significant because it measures end-to-end software engineering capability (understanding issue descriptions, navigating codebases, writing correct patches) rather than isolated code generation, providing a realistic benchmark for AI coding tools.
4. Why is context engineering important for AI-assisted coding?
Show Answer
The quality of AI-generated code depends heavily on the context provided. Context engineering involves selecting relevant files, providing project conventions (via rules or config files), including documentation, and managing context window budget. Without proper context, the model generates generic code that may not follow project patterns, use the right libraries, or maintain consistency with existing code. Tools like CLAUDE.md files and @codebase commands give developers control over this context.
5. What are the main risks of vibe-coding in production environments?
Show Answer
Key risks include: security vulnerabilities (SQL injection, hardcoded secrets, insecure defaults), subtle logic errors that pass tests but fail in edge cases, erosion of developer understanding of the codebase, license compliance issues from training on copyleft code, and over-reliance on AI that can lead to technical debt. Mitigation strategies include thorough code review, comprehensive test suites, static analysis tools, and maintaining developer understanding of generated code.

Key Takeaways