Weekly GitHub Copilot Roundup: Models, MCP Security, and Admin Controls

This week in the Weekly GitHub Copilot Roundup, model deprecations moved from a background concern to an operational deadline, with admins needing to update allowlists, defaults, and documentation before pinned model choices disappear. In VS Code and Copilot CLI, the theme is more agent capability paired with more governance: semantic indexing and chat history retrieval, richer agent sessions (terminal and browser tab access), and enterprise-managed plugins. MCP-based security tools expanded the agent inner loop with secret scanning now GA and dependency scanning in preview, while new usage metrics, token-efficiency practices, and agent PR review guidance help teams measure cost, validate behavior, and ship changes more safely at scale.

This Week's Overview

Model churn: Copilot deprecations and what Enterprise admins need to do

GitHub Copilot is rotating several foundation models on tight timelines, which can break established defaults for Chat, code review, and agent workflows if your org relies on specific model selections. Building on last week's emphasis on “model churn needs knobs” (and the reality that plan and policy changes can reshape day-to-day workflows), this week's deprecation calendar makes the operational side unavoidable: Claude Sonnet 4 is deprecated (as of May 6, 2026), while Grok Code Fast 1 is scheduled to be removed on May 15, 2026, and GPT-4.1 follows on June 1, 2026.

For Copilot Enterprise, the practical work is policy-driven: admins should review Copilot Enterprise model policies so replacement models are allowed and appear in Copilot Chat (both in VS Code and on github.com). The suggested alternatives called out this week are Claude Sonnet 4.6 (for Claude Sonnet 4), GPT-5 mini or Claude Haiku 4.5 (for Grok Code Fast 1), and GPT-5.5 (for GPT-4.1), which matters if teams have standardized prompts, evaluation baselines, or compliance expectations around a model family.

If you operate shared agent templates or “golden” dev container/VS Code profiles, this is a good week to check for pinned model IDs, documentation that references deprecated models, and any onboarding steps that instruct developers to pick a model that will soon disappear. Treat this as an operational change as much as a product change: update allowlists, communicate the cutoff dates, and rerun any internal prompt or workflow evaluations on the replacement models before the deadlines.

Copilot in VS Code: agent tooling, semantic indexing, and cost controls

April's Copilot-in-VS Code updates kept pushing agent workflows from “chat UI” toward “observable, governable tooling”. Following last week's thread about expanding agent surfaces while tightening governance and predictability, the thread across multiple releases (VS Code 1.116-1.119, plus supporting content around 1.113-1.118) is better context management (semantic indexing and dedicated skill context), more capable agent sessions (terminal and browser tab access, inline diffs, improved sub-sessions), and more levers for enterprises (BYOK and policy/plan gating).

Semantic indexing and /chronicle for chat history

VS Code's Copilot experience is leaning harder into retrieval so you can ask questions that span your workspace and linked GitHub repos without manually pasting context. April's changelog highlights semantic search across workspaces and GitHub repositories, plus the experimental /chronicle chat-history index that aims to make prior chats queryable as a first-class context source.

In practice, this should reduce “re-explain the repo” overhead for teams that use Copilot Chat as a long-running assistant across incidents or refactors. It also raises new governance questions for enterprises, since indexing and history features often intersect with retention rules and policy controls, so it is worth validating which parts are enabled under your plan and org policies.

Agent sessions: terminal access, browser tab sharing, and improved UX

Agent workflows gained more “hands” this month, including browser tab sharing with agents (useful for reproductions, web-based docs, and SaaS admin consoles) and stronger session ergonomics. Building on last week's focus on session titles, model visibility, and tighter agent controls, several of the point-release videos call out improvements like incremental rendering for chat responses, better handling of agent sessions and sub-sessions, and ongoing Agents UI work (for example, the Agents Window in the release highlights).

These changes matter most when Copilot is doing multi-step tasks (triage, migration, refactoring) where the cost is not just tokens but also human time spent shepherding the agent. Better diffs, tighter session control, and smoother rendering reduce review friction and make it more realistic to keep agents running while you context-switch.

BYOK and plan/policy-dependent Copilot capabilities

VS Code 1.117 calls out BYOK (Bring Your Own Key) support for Copilot Business/Enterprise, aligning with a broader enterprise push to control model providers and billing arrangements. This also follows directly from last week's Individual plan tightening and premium multiplier story: capability is expanding, but access and cost are increasingly mediated by plan and policy. Several updates also stress that specific Copilot/agent capabilities depend on enterprise policies and pricing plans, which is becoming a recurring operational theme as Copilot expands beyond autocomplete.

If you manage a fleet, this is a reminder to treat Copilot settings like other developer platform controls: define policy baselines, document allowed capabilities, and test feature availability in the exact SKU your developers are on. This also reduces “works on my machine” confusion when features like agent tooling or model options appear in one environment but not another.

Copilot CLI and managed extensibility: plugins, planning, and Rubber Duck cross-model review

Copilot CLI continues to evolve from a single assistant into a configurable client with enterprise controls and multi-model workflows. This week’s updates extend last week’s theme that longer-running, tool-loop agent sessions need both governance and cost discipline, with a focus on managed plugins, better task setup (the Plan feature), and more structured “second opinion” patterns (Rubber Duck spanning GPT and Claude).

Enterprise-managed plugins (public preview)

GitHub shipped a public preview of enterprise-managed plugins for Copilot CLI, giving enterprises centralized control over the plugin marketplace, auto-install behavior, and baseline standards via a shared settings.json. This is a concrete step toward making Copilot CLI manageable at scale, especially when plugins introduce new tools, hooks, or MCP configuration that can affect data access and workflow safety.

For platform teams, the key implication is that Copilot CLI can now be rolled out like other managed developer tooling: set approved sources, preconfigure required plugins, and ensure consistent defaults across teams. It also creates a path to standardize guardrails (hooks, MCP endpoints, instruction conventions) without relying on every developer to hand-configure their local machine.

Rubber Duck adds cross-model “second opinion” support

Rubber Duck in Copilot CLI now supports more models, specifically enabling cross-model critique. This builds on last week's “separate builder and reviewer models” idea (especially as premium multipliers make model choice a budgeting decision) by formalizing a workflow where a GPT-orchestrated session can call a Claude critic agent for review, and a Claude-orchestrated session can use GPT-5.5 as the reviewer.

This is most useful for tasks like refactors, security-sensitive diffs, or configuration changes where you want an independent read before opening a pull request. If your org is moving to usage-based billing, cross-model critique can also become a deliberate trade-off: spend a bit more on review passes to reduce expensive human rework later.

Planning-first workflow in Copilot CLI (Plan feature)

GitHub also highlighted Copilot CLI's Plan feature, where Copilot asks clarifying questions, produces an approach, and then generates code only after you agree to the plan. Following last week's push toward more intentional configuration (rather than defaulting to the most expensive or most autonomous mode), this reinforces a pattern that scales better than “prompt and hope”: align on steps, constraints, and acceptance checks up front, then let the agent execute within that boundary.

If you're trying to reduce waste under token-based billing, planning-first interactions are one of the simplest knobs to turn. They cut down on iterative back-and-forth and help you keep context tight, especially when multiple developers share the same repo conventions and review expectations.

MCP security tooling: secret scanning GA and dependency scanning preview

GitHub is turning security checks into tools that MCP-compatible agents can call before code ever reaches a commit or pull request. Building on last week's theme of governed tool execution through MCP (with policies and predictable boundaries), this week brought GA for secret scanning via the GitHub MCP Server, plus a public preview for dependency scanning powered by Dependabot and the GitHub Advisory Database.

Secret scanning in GitHub MCP Server (generally available)

Secret scanning in the GitHub MCP Server is now generally available, enabling agents and IDEs that speak MCP (Model Context Protocol) to detect exposed secrets pre-commit or pre-PR. The GA release also honors existing push protection customization, so your current detection rules and bypass behavior carry over into agent-driven workflows.

The practical win is earlier feedback in the same place developers are generating code. Instead of relying on a later pipeline or review step, an agent can warn about a leaked token while it's still cheap to fix, and it can do so using the same enterprise policy surface you already manage for push protection.

Dependency scanning in GitHub MCP Server (public preview)

Dependency scanning in GitHub MCP Server entered public preview, letting agents check code changes for vulnerable dependencies using Dependabot's toolset and the GitHub Advisory Database. The key shift is that vulnerability awareness can be part of the agent's “inner loop” (while editing), not only a “CI found it later” event.

For teams experimenting with agent-generated dependency bumps or template-driven scaffolds, this is a meaningful guardrail. It creates a tighter feedback cycle where an agent proposing a new package version can immediately validate it against known advisories and avoid generating work that will get blocked in CI anyway.

Measuring and controlling Copilot at scale: usage metrics, token efficiency, and validation

As Copilot becomes a platform (not just an IDE feature), teams are asking the same questions they ask about any other system: What is it doing, what is it costing, and how do we test it? This continues last week's arc of tighter usage controls and clearer UX signals around limits by adding better reporting for code review, plus deeper guidance on token spend and validating non-deterministic agent behavior.

Code review comment types in the usage metrics REST API

GitHub Copilot's usage metrics REST API now includes a copilot_suggestions_by_comment_type breakdown under pull_requests. Enterprise and organization reports can see totals and applied totals per Copilot code review comment type, giving a more granular view than aggregate “suggestions used”.

This is especially useful if you're trying to tune rollout policies for Copilot code review. You can correlate which comment types actually get applied, identify where Copilot is generating noise, and build more targeted enablement (or guardrails) instead of debating adoption based on anecdotes.

Token efficiency in GitHub Agentic Workflows (and how to copy the pattern)

GitHub shared concrete techniques they used to reduce LLM token spend in GitHub Agentic Workflows: logging per-call usage, running daily auditor/optimizer workflows, pruning unused MCP tools, and shifting deterministic data fetching into GitHub CLI steps. This is the natural follow-through to last week's message that “limits are real” and that agentic loops can be token-hungry by design: treat token usage as an observable metric and iterate like you would on build time or cloud spend.

Two ideas are immediately reusable: (1) make tool availability explicit and minimal so the agent does not “browse” tool catalogs unnecessarily, and (2) move stable lookups (repo metadata, file listings, API reads) into deterministic steps outside the model. If you're adopting token-based billing, this kind of instrumentation plus workflow hygiene is how you keep experimentation sustainable.

Validating agent behavior when correctness is not deterministic

GitHub also described an approach for validating Copilot Coding Agent behavior in non-deterministic environments by learning a graph-based “trust layer” from successful traces. The technique uses dominator analysis over a learned structure (including a Prefix Tree Acceptor (PTA) concept) to identify which milestones are essential, then validates those milestones in CI rather than trying to assert identical end-to-end behavior every run.

For engineering teams, this reframes testing agentic workflows as “prove the invariants” instead of “replay the exact path”. It is relevant anywhere an agent's plan, tool calls, or intermediate steps can vary but still produce acceptable results, which is increasingly common once you add retrieval, browsing, or multi-agent critique to the loop.

Safer agent-generated pull requests: review checklists, prompt injection risks, and workflow hygiene

Agent-authored pull requests are becoming common enough that teams need repeatable review practices, not just “skim the diff”. This week’s guidance picks up where last week left off on reliability and security being the less glamorous (but necessary) work of making agents usable in production, focusing on where agents tend to cut corners and where risks hide in CI/CD and security boundaries.

How to review agent pull requests

A practical checklist emerged for reviewing agent-generated PRs: watch for weakened CI, duplicated helper utilities, subtle logic bugs that pass tests, and security issues in LLM-powered GitHub workflows. The guidance explicitly calls out prompt injection and over-scoped GITHUB_TOKEN permissions as recurring footguns when agents interact with Actions or external tools.

The most actionable takeaway is to treat the PR's surrounding automation as part of the change, not just the application code. If an agent touches workflow files, permissions, or “helper scripts that run in CI”, review those with the same rigor you would apply to production infrastructure, because a small diff can have outsized blast radius.

When Copilot CLI behavior goes sideways: debugging prompt files and extensions

A real-world troubleshooting story highlighted how Copilot CLI can be influenced by unintended local state. This complements last week's guidance on being deliberate with configuration and avoiding accidental drift (especially under constraints like limits and changing model access), since an extension caused Copilot to repeatedly prompt the author to close GitHub deployment-status notifications and the root cause was a prompt written to disk that kept getting reloaded until it was removed.

For teams standardizing Copilot CLI usage, this is a reminder to inventory where behavior can be configured (extensions, instruction files, prompt files, hooks) and to document a “reset to baseline” procedure. It also reinforces the value of managed settings (where available) so teams can reduce drift and isolate whether a weird behavior comes from policy, local config, or a third-party plugin.

Agent configuration and rollout: secrets, variables, and the “customization stack”

Copilot Agents are increasingly treated like deployable systems: they need credentials, shared defaults, and consistent behavior across many repos. Following last week's theme that governance is moving closer to the workflow (policies, usage reporting, and controllable agent behavior), this week added better configuration primitives (for Copilot cloud agent) and more community guidance on how to structure instructions and skills so agents behave predictably.

Dedicated secrets and variables for Copilot cloud agent

GitHub introduced dedicated “Agents” secrets and variables for Copilot cloud agent, including org-level configuration and per-repository access controls. This targets a common rollout pain: distributing shared settings (for example, endpoints, API keys, or MCP server config) across dozens or hundreds of repositories without hand-managing each repo's secrets.

For platform teams, the access control aspect is as important as convenience. You can centralize defaults while still controlling which repos can read which values, which reduces the temptation to use a single overly-permissive secret everywhere.

The five-file Copilot customization stack (instructions, skills, prompts, roles)

On the “make it predictable” front, guidance this week broke down a practical layering approach: AGENTS.md for repo-level guidance, scoped .instructions.md files for contextual rules, SKILL.md for reusable capabilities, .prompt.md workflows, and .agent.md roles. This continues last week's story that customization is shifting from ad-hoc prompting to maintainable assets (skills and instructions you can version, review, and reuse), with the value coming from separating standards (how we work) from tasks (what we do) so agents apply the right constraints at the right time.

If you're trying to standardize quality across teams, this is a workable way to encode conventions like architecture rules, logging expectations, test requirements, and security do/don't lists so they show up consistently in agent runs. It also makes review easier because your guardrails live in version control and can be evolved like any other engineering asset.

Copilot Spaces + a Markdown knowledge base for standards coaching

A related pattern uses Copilot Spaces backed by a Markdown standards repo to turn Copilot into a “best practices coach”. The approach reinforces the knowledge base with in-repo instruction files and reusable prompt-file slash commands, aiming to make the agent cite and follow team standards during code generation and review.

For teams struggling with inconsistent style or repeated review feedback, this can shift some of that burden left. The main implementation detail to get right is scope: keep the standards repo concise and opinionated, then ensure each repo has a small set of local instructions that connect the generic standards to the repo's specific architecture and tooling.

Copilot for modernization and infrastructure: Java upgrades, Terraform state, and spec-driven repo generation

A cluster of longer-form guides and case studies reinforced where Copilot is getting used beyond day-to-day coding: migrating legacy apps, modernizing integration stacks, and generating infrastructure from specifications. This complements last week's emphasis on getting more value per token under tighter limits and changing model access, since the common thread here is that the fastest results came with strong constraints (plans, specs, verification scripts, and state-aware migrations), not free-form prompting.

Java modernization: assessment dashboards, OpenRewrite milestones, and real-world migration results

Several resources focused on legacy Java modernization with Copilot, including a structured assess-upgrade-migrate-test/deploy loop and tooling that surfaces findings in a Mission Control dashboard (cloud readiness, upgrade issues, CVEs, and coverage signals). Another guide showed an IntelliJ-based flow where Copilot generates an upgrade plan, applies changes in milestones using OpenRewrite, then validates with tests and CVE checks to catch regressions.

A separate case study claimed a Java 5 / Struts 1.3 monolith moved to Java 21 and Spring Boot in about two days using Copilot's app modernization tooling, with emphasis on a detailed plan, custom instructions, and verification scripts. Even if timelines vary widely by codebase, the repeatable lesson is the same: treat modernization as a sequence of verifiable milestones, and make Copilot do the mechanical work while humans keep ownership of correctness and risk.

Terraform on Azure: stable keys, state moves, and Copilot-assisted command generation

A practical Terraform guide dug into a common Azure managed disk problem: index-based for_each keys can trigger destructive churn when list ordering changes. The recommended approach is to migrate to stable keys and use terraform state mv to preserve resource identity, avoiding delete/recreate behavior.

Copilot fits into this workflow as a “command generator” via a reusable Copilot skill that outputs deterministic state mv commands. That is a good example of using Copilot where it performs best: producing consistent boilerplate from explicit inputs, while the engineer retains control over state operations that can impact live infrastructure.

Spec-driven development and repo scaffolding for data platforms

Spec-driven development guidance argued for defining inputs, outputs, constraints, and edge cases before code generation to make AI-assisted changes more predictable. A related Azure Databricks post showed a hands-on implementation: turning a Medallion Architecture narrative into a structured repository that generates Terraform for Azure platform setup and Databricks bundle files for workloads, with strict placeholder/TODO rules to keep generated output reviewable.

Taken together, the pattern is to push “truth” into specs and repository structure so Copilot skills and agents can operate within guardrails. If you're building internal platform templates, this is a useful direction: the more the repo encodes standards, the less every prompt has to restate them.

MCP expands beyond GitHub: Azure Resource Manager MCP Server preview

Azure introduced a public preview of the Azure Resource Manager (ARM) MCP Server, a remote MCP server that lets AI agents perform tool-based Azure Resource Manager operations. This extends last week's MCP narrative from “connect tools with guardrails” into “agents can touch real cloud control planes”, with natural-language-to-Azure Resource Graph querying and ARM template deployments from VS Code positioning MCP as a bridge between coding agents and infrastructure operations.

For Copilot users, this is a concrete step toward “agent can change cloud resources”, which makes governance and least privilege non-negotiable. Treat this like any other infrastructure automation: pair it with Azure Policy, constrain permissions, and ensure the agent's tooling access maps cleanly to environments (dev vs prod) so experimentation does not spill into sensitive subscriptions.

Other GitHub Copilot News

Enterprise QA teams shared more examples of Copilot speeding up test scenario drafting and automation scaffolding while keeping strict human review, responsible AI practices, and regression maintenance discipline. This lines up with last week's “package workflows like code” direction (skills, instructions, repeatable loops) because test and refactor workflows benefit most when you can standardize expectations and reuse them across repos.

Copilot also showed up in platform modernization and developer enablement stories, including an open-source Logic Apps Migration Agent for moving from BizTalk Server and other integration platforms to Logic Apps Standard with stage gates and human checkpoints, plus more content aimed at teaching better day-to-day Copilot habits under usage-based billing.