Weekly GitHub Copilot Roundup: Context, Controls, and Review

This roundup tracks a clear shift from agent capability to agent governance: more context, more observability, and more policy controls across Copilot, VS Code, and the CLI. On the platform side, Microsoft tightened the path from prototype to production with .NET agent building blocks, Azure AI Foundry deployment patterns, and data governance improvements that make RAG and operations easier to standardize. We also cover the less flashy work that keeps systems dependable at scale, including Fabric and Databricks operational updates, GitHub migration and ruleset changes, and security research that keeps token theft, privilege escalation, and supply chain risk in focus.

GitHub Copilot updates this week leaned into two themes we have been tracking: giving agents more context and tighter enterprise controls, while GitHub simultaneously pushes teams to stay current on model availability and review quality as more PRs arrive with AI fingerprints. After last week's mix of “more capability” (agents across IDEs, CLI, MCP tooling) and “more constraint” (individual plan limits, premium multipliers, model availability changes), this week's changes read like the next step: reduce wasted token spend, make sessions more observable, and give admins more levers so Copilot can scale without becoming unpredictable.

GitHub Copilot in Visual Studio Code (agent context, search, and cost controls)

VS Code users saw a steady set of Copilot improvements across the v1.116-v1.119 line, with the most practical changes focused on context gathering and responsiveness. That picks up directly from last week's emphasis on “intentional configuration” (model pickers, autonomy controls, and usage indicators) by making it easier for Copilot to find the right inputs without you manually pasting them. Semantic search now spans your local workspace and GitHub repositories, which matters when you are asking Copilot Chat questions that depend on “project memory” rather than the currently opened file. GitHub also introduced an experimental /chronicle chat-history index, aiming to make prior Copilot conversations usable as retrievable context instead of dead text in a sidebar (a natural extension of last week's push toward more resumable, auditable sessions). On the performance and cost side, Copilot reduced token usage with prompt caching and deferred tool loading. In practice, that means less repeated context sent on similar requests and fewer tools initialized until the agent actually needs them, which should show up as faster starts and lower consumption in longer sessions. That matters more after last week's individual plan limits and GPT-5.5 premium multipliers made “how much context you send” a real workflow constraint, not an abstract optimization. Agents themselves got more capable and easier to supervise: inline diffs help you see proposed code edits in-place, terminal access makes it possible for an agent to run commands as part of a workflow, and browser tab sharing expands what an agent can “see” when troubleshooting docs, dashboards, or web UIs. This continues last week's cross-IDE direction (JetBrains inline agent mode, VS Code autonomy/permissions) where the differentiator is not just agent power but how clearly you can see and control what the agent is doing. VS Code 1.119 also highlighted OpenTelemetry tracing for agent sessions, giving teams a way to instrument and diagnose agent activity (with availability shaped by plan and enterprise policy). That fits neatly with last week's theme of traceability (structured debugging output on the web, better session metadata in clients): once agents act across terminals, repos, and browsers, teams need logs and traces that look more like production tooling than chat transcripts. For organizations standardizing on their own model relationships, BYOK (Bring Your Own Key) model providers continued rolling out for Copilot Business and Enterprise, letting teams route requests through approved providers and keys while keeping policy controls in view. In context, BYOK is becoming the enterprise counterpart to last week's “escape hatch” framing for individuals impacted by plan/model changes: reduce dependency on a single hosted SKU by putting model access behind keys and policies you control.

Copilot CLI (Rubber Duck multi-model reviews and enterprise-managed plugins)

The Copilot CLI story this week was about making terminal-based Copilot usage more governable for enterprises and more reliable for individuals who want a second set of “AI eyes” on changes. It is a direct continuation of last week's CLI thread (longer-running, tool-loop-heavy workflows, plus BYOK/local-model options) but with more emphasis on structured review and admin control as usage grows. Rubber Duck in Copilot CLI now supports cross-model “second opinion” flows: a GPT-orchestrated session can hand review to a Claude critic agent, and a Claude-orchestrated session can use GPT-5.5 for review. This builds on the workflow pattern called out last week (separate “builder” and “reviewer” models) and makes it easier to operationalize: instead of ad hoc copy/paste between chats, the CLI can run an explicit critique step that helps catch blind spots before code reaches a PR. That becomes even more relevant alongside this week's agent-PR review guidance (later in this section) because it lets you move some “reviewer mindset” left into the terminal loop before changes ever hit GitHub. For enterprise rollouts, GitHub put enterprise-managed plugins in Copilot CLI into public preview. Central teams can configure the plugin marketplace, auto-install approved plugins, and enforce baseline standards (including hooks and MCP configuration) via a shared settings.json. This echoes last week's theme that as Copilot spreads into more surfaces, governance needs to follow. Combined with the VS Code note about remote monitoring of Copilot CLI sessions, the direction is clear: more CLI capability, but paired with more administrative guardrails and auditability so the CLI can be used at scale without each developer hand-curating plugins and configs.

Enterprise model governance (upcoming deprecations and model policy actions)

If your organization relies on specific Copilot chat models, this week came with deadlines. This is the follow-through to last week's “model churn needs knobs” storyline (model pickers, admin toggles, and premium multipliers): now the churn has concrete dates that force decisions. GPT-4.1 will be deprecated across GitHub Copilot experiences on 2026-06-01, with GPT-5.5 positioned as the replacement. Separately, Grok Code Fast 1 is scheduled for deprecation on May 15, 2026, and GitHub pointed admins toward alternatives like GPT-5 mini or Claude Haiku 4.5. The practical implication is policy work, not just awareness. Copilot Enterprise admins may need to update model policies so replacement models actually appear where developers select them (including Copilot Chat in VS Code and on github.com). Treat this like any other dependency change: verify model availability ahead of the cutoff, test critical workflows (code review prompts, refactor tasks, internal coding standards), and communicate which models are approved so developers do not discover missing options mid-sprint. If you enabled GPT-5.5 last week (and accounted for its premium multiplier), this is the week to validate that it is not just available in principle but actually usable in the specific clients and flows your developers rely on.

Agent-driven development hygiene (review checklists, security, and measurement)

As agent-generated pull requests become routine, GitHub published a pragmatic checklist focused on where AI-authored changes tend to go wrong even when tests pass. This complements last week's theme that agents are becoming first-class (JetBrains inline agents, VS Code session controls, CLI tool loops) by addressing the uncomfortable next question: what does “good review” look like when a meaningful chunk of the change came from an agent rather than a human typing line-by-line? The guidance calls out patterns reviewers should actively look for: CI weakening (for example, disabling or loosening checks to get a green build), duplicated utilities that quietly increase maintenance cost, and subtle logic bugs that slip through because coverage does not hit the right edge cases. It also puts security front and center for LLM-powered GitHub workflows, including prompt injection risks and the common mistake of leaving GITHUB_TOKEN permissions too broad for the job at hand. The takeaway is that “agent PRs” need a different reviewer mindset: you are not only reviewing code correctness, you are reviewing whether the workflow itself was bent to make the code look correct. That ties back to last week's shift toward more explicit guardrails (agent permissions, tool-call controls, structured debugging) because review is where those guardrails get tested in practice. On the measurement side, Copilot usage metrics got more specific for teams trying to understand whether Copilot code review is helping. Last week, we saw more emphasis on usage signals and governance (warnings for limits, admin controls for models). This week adds finer-grained evidence: the Copilot usage metrics REST API now includes copilot_suggestions_by_comment_type under pull_requests, reporting totals and applied totals per Copilot code review comment type for enterprise and organization reports. With that breakdown, you can start answering practical questions like which comment types developers actually apply, whether certain teams ignore whole categories, and where training or policy tweaks might improve outcomes (for example, if security-related comment types are consistently skipped).

Other GitHub Copilot News

Copilot cloud agent configuration got easier to scale: GitHub added dedicated “Agents” secrets and variables with organization-level configuration and per-repository access controls, which helps when you need consistent settings across many repos without over-sharing credentials. This is a clean continuation of last week's “governance moves closer to the workflow” thread (policies, controls, auditability) because secrets and variables are where many agent experiments fail in practice. Centralizing them in an “Agents” scope makes it easier to standardize cloud-agent behavior across repos while keeping the blast radius smaller than ad hoc per-repo secret sprawl.