Software engineering is being fundamentally reshaped by LLMs. "Vibe-coding" describes the emerging practice where developers describe what they want in natural language and AI writes the implementation. This ranges from inline code completion (Copilot, Cursor) through agentic coding assistants that execute multi-file changes (Claude Code, Devin) to full application generators that produce working apps from descriptions (Bolt, v0, Lovable). Understanding these tools, their architectures, and their limitations is essential for any developer working in the LLM era.
1. Code Completion and Fill-in-the-Middle
The simplest form of AI-assisted coding is inline completion: the developer writes code, and the model predicts what comes next. Modern code completion goes beyond simple next-token prediction with Fill-in-the-Middle (FIM), where the model sees both the code before and after the cursor position. This allows it to generate code that fits seamlessly into existing context rather than just appending to the end.
FIM Architecture
FIM works by rearranging the input during training. A code file is split into a prefix (before cursor), a middle (the target), and a suffix (after cursor). The model receives <PRE> prefix <SUF> suffix <MID> and learns to predict the middle section. This format, sometimes called PSM (Prefix-Suffix-Middle), teaches the model to generate code that is syntactically and semantically consistent with both surrounding contexts. Models like StarCoder, DeepSeek-Coder, and Codestral are trained with FIM from the start.
# Using a FIM model directly (DeepSeek-Coder example) from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "deepseek-ai/deepseek-coder-6.7b-base" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") # FIM format: prefix + suffix, model fills the middle prefix = """def binary_search(arr, target): left, right = 0, len(arr) - 1 while left <= right: """ suffix = """ if arr[mid] == target: return mid elif arr[mid] < target: left = mid + 1 else: right = mid - 1 return -1""" # DeepSeek FIM tokens fim_input = "<|fim_begin|>" + prefix + "<|fim_hole|>" + suffix + "<|fim_end|>" inputs = tokenizer(fim_input, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
2. AI-Native IDEs and Coding Assistants
The coding assistant landscape has evolved from simple autocomplete plugins into full AI-native development environments. These tools integrate LLM capabilities deeply into the editing experience, providing not just completions but also chat-based code editing, codebase-aware context, and multi-file refactoring.
| Tool | Type | Key Feature | Context Strategy |
|---|---|---|---|
| GitHub Copilot | IDE Plugin | Inline completion, chat | Open files, neighboring tabs |
| Cursor | AI-native IDE | Cmd+K edit, Composer | Codebase indexing, @-mentions |
| Windsurf | AI-native IDE | Cascade flows | Proactive context gathering |
| Cline | VS Code Extension | Agentic file editing | Tool use, file search |
| Claude Code | CLI Agent | Terminal-native agentic | Full repo access, bash tools |
The quality of AI code generation depends heavily on the context provided to the model. Context engineering for coding involves: selecting the right files to include (open tabs, imports, related modules), providing project-specific conventions (via rules files or system prompts), including relevant documentation, and managing the context window budget effectively. Tools like Cursor's @codebase command and Claude Code's CLAUDE.md files let developers control exactly what context the model sees, dramatically improving output quality for project-specific tasks.
3. Agentic Coding
Agentic coding represents the next evolution: instead of suggesting completions that a developer accepts or rejects, the AI operates as an autonomous agent that can read files, write code, run tests, debug errors, and iterate until a task is complete. The developer provides a high-level description and the agent handles the implementation details.
How Agentic Coding Tools Work
Agentic coding tools follow a plan-execute-observe loop. The LLM receives a task description and access to tools (file read/write, terminal execution, search). It plans an approach, executes code changes, runs tests to verify, observes errors, and iterates. This is fundamentally the ReAct pattern (Module 21) applied to software engineering. The key differentiator between tools is how they manage context, what tools they expose, and how autonomously they operate.
# Simplified agentic coding loop (conceptual) from openai import OpenAI import subprocess, json client = OpenAI() def coding_agent(task: str, max_iterations: int = 5): tools = [ {"type": "function", "function": { "name": "read_file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}}}}, {"type": "function", "function": { "name": "write_file", "parameters": {"type": "object", "properties": { "path": {"type": "string"}, "content": {"type": "string"}}}}}, {"type": "function", "function": { "name": "run_command", "parameters": {"type": "object", "properties": {"cmd": {"type": "string"}}}}}, ] messages = [{"role": "system", "content": "You are a coding agent. Use tools to complete the task."}, {"role": "user", "content": task}] for i in range(max_iterations): response = client.chat.completions.create( model="gpt-4o", messages=messages, tools=tools ) msg = response.choices[0].message messages.append(msg) if not msg.tool_calls: return msg.content # Task complete for tc in msg.tool_calls: result = execute_tool(tc.function.name, json.loads(tc.function.arguments)) messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})
4. Code Generation from Specifications
A new category of tools generates complete applications from high-level descriptions. Bolt.new, Vercel's v0, and Lovable let users describe an app in natural language and receive a working, deployable application. These tools combine LLM code generation with scaffolding templates, component libraries, and deployment infrastructure to bridge the gap between description and running software.
SWE-bench: Evaluating Coding Agents
SWE-bench provides a rigorous benchmark for evaluating coding agents on real software engineering tasks. It collects actual GitHub issues from popular Python repositories along with their corresponding pull requests. The agent receives the issue description and must produce a patch that passes the repository's test suite. SWE-bench Verified (a human-validated subset of 500 problems) is the standard evaluation set, with top agents scoring around 50 to 60% as of early 2026.
# Evaluating an agent on SWE-bench (conceptual) from swebench.harness.run_evaluation import run_evaluation results = run_evaluation( predictions_path="predictions.json", # Agent's generated patches swe_bench_tasks="princeton-nlp/SWE-bench_Verified", log_dir="./eval_logs", timeout=900, ) print(f"Resolved: {results['resolved']} / {results['total']}") print(f"Pass rate: {results['resolved'] / results['total'] * 100:.1f}%")
AI-generated code carries real risks. Security vulnerabilities are common: models may generate code with SQL injection, hardcoded secrets, or insecure defaults. Subtle logic errors can pass tests but fail in production. Over-reliance on AI code can erode a developer's understanding of the codebase. License compliance is another concern, as models trained on open-source code may reproduce copyleft-licensed patterns. Always review AI-generated code carefully, maintain comprehensive test suites, and use static analysis tools as a safety net.
The most effective AI-assisted coding workflow is not "let the AI write everything" but rather a collaborative loop where the developer provides high-level direction, domain knowledge, and quality judgment while the AI handles boilerplate, implementation details, and repetitive refactoring. The developer's role shifts from writing every line to specifying intent, reviewing outputs, and maintaining architectural coherence. This is why context engineering (providing the right project files, conventions, and constraints to the model) is becoming as important as traditional coding skills.
Knowledge Check
Show Answer
Show Answer
Show Answer
Show Answer
Show Answer
Key Takeaways
- Fill-in-the-Middle (FIM) models see both prefix and suffix context, generating code that fits seamlessly into existing files rather than just appending at the end.
- AI-native IDEs (Cursor, Windsurf) integrate LLMs deeply into the development workflow with codebase indexing, chat-based editing, and multi-file refactoring.
- Agentic coding tools (Claude Code, Devin, SWE-Agent) operate autonomously using a plan-execute-observe loop with file and terminal access.
- App generators (Bolt, v0, Lovable) produce complete working applications from natural language descriptions, targeting the rapid prototyping use case.
- SWE-bench provides a rigorous evaluation of coding agents on real GitHub issues, with top agents resolving 50 to 60% of verified problems.
- Context engineering (selecting the right files, conventions, and constraints) is becoming as important as traditional coding skills for effective AI-assisted development.