Planning, Collaboration, Tooling: Building Multi-Agent Systems with Azure Foundry + Semantic Kernel
You already know the meme: chatbots talk, agents act, multi-agent systems actually get stuff done. If you’ve ever begged a bot to fix Intune and got a poem instead, this one’s for you. In this episode, we go full Netflix hands-on: you watch, you...
You already know the meme: chatbots talk, agents act, multi-agent systems actually get stuff done.
If you’ve ever begged a bot to fix Intune and got a poem instead, this one’s for you. In this episode, we go full Netflix hands-on: you watch, you snack, I poke the dangerous Service Principal things so nobody nukes prod. We build a mini self-healing, governed multi-agent system using Azure AI Foundry + Semantic Kernel, wired into real enterprise surfaces:
- Intune
- Entra ID
- Microsoft Graph
- Azure Automation
- Log Analytics
- Single agents are like gas-station Swiss Army knives: technically they have tools, practically they bend on the first real job.
- You stuff planning, reasoning, execution, approvals, and reporting into one prompt → context explodes, latency spikes, hallucinations creep in.
- One agent trying to:
- Plan a change
- Call Graph and Intune
- Write remediation scripts
- Request approvals
- Verify results
- Document everything
- Context windows flooded with logs, policies, and MDM miscellany
- Important details get truncated or invented
- Token usage and costs balloon
- “Fix” attempts that quietly break other things (like deleting the resource instead of rotating a secret 😬)
- Planner focuses on intent & constraints
- Operator focuses on tools & execution
- Reviewer focuses on guardrails & approvals
- Planner — understands the goal, constraints, environment; outputs a stepwise plan with tool calls
- Operator — executes the plan via tools: Graph, Azure Automation, Functions, Logic Apps, etc.
- Reviewer — checks groundedness, scope, compliance, and safety before risky changes
- Messenger/Concierge — interacts with humans: approvals, status updates, and audit summaries
- Tools = hands
- REST APIs (Graph, internal services)
- Azure Automation runbooks (device scripts, remediation)
- Azure Functions & Logic Apps (glue & approvals)
- RAG via Azure AI Search (curated knowledge, not random web junk)
- Memory = budget, not magic
- Minimize per-agent context
- Use external state (Search, state store, thread metadata)
- Only pass what’s needed for the next decision
- Planning vs Execution
- Planner decomposes → Operator calls tools → Reviewer checks → Messenger tells humans
- This is where Semantic Kernel shines: planners, skills, function catalogs, retries, cancellation
- Safety by design
- Managed Identities per agent
- RBAC split into read vs manage
- PIM for destructive operations
- Tool calls logged to Log Analytics
- Content Safety + prompt shields to block jailbreaks & indirect injection
- Instructions — short, role-specific prompts
- Deployments — different models per role (GPT-4-class for planning, SLMs for extraction)
- Knowledge — Azure AI Search indexes, uploaded docs, optional web grounding
- Actions — OpenAPI tools, Graph, Logic Apps, Functions, Azure Automation, Code Interpreter
- Connected agents — yes, one agent can call another like a tool
- Foundry handles threads, safety, tracing, and evaluations
- Semantic Kernel orchestrates the planner → operator → reviewer loop in code
- You keep prompts short and put power in tools with strict schemas
- Reasoning models for planning and complex decisions
- Small models (SLMs) for extraction, classification, parameter shaping
- Mix serverless endpoints and managed compute depending on cost & residency needs
- Content Safety on inputs and outputs
- Prompt shields against jailbreak and indirect injection
- Full tracing of tool calls (who, what, where, how long)
- Application Insights + Log Analytics for performance & audit
- Built-in evaluation flows for groundedness, relevance, and fluency
Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-show-podcast--6704921/support.
Follow us on:
Substack
00:00:00,000 --> 00:00:03,880
You know, chatbot's talk agents act and multi-agent systems actually get stuff done.
2
00:00:03,880 --> 00:00:08,000
If you've begged a bot to fix Intune and it replied with a poem, this is for you.
3
00:00:08,000 --> 00:00:11,600
Here's the deal. Netflix hands on. You watch. You eat snacks.
4
00:00:11,600 --> 00:00:13,800
I do the dangerous service principle things to nobody.
5
00:00:13,800 --> 00:00:19,080
NUX prod. We're building a mini self-healing governed multi-agent system with Azure AI
6
00:00:19,080 --> 00:00:25,360
foundry and semantic kernel real touch points. Intune, Entra, Graph, Azure automation, log
7
00:00:25,360 --> 00:00:29,200
analytics. One agent versus multi-agent head to head, 12 minutes versus three.
8
00:00:29,520 --> 00:00:31,520
And the only casualty is my subscription credit.
9
00:00:31,520 --> 00:00:35,760
First, why one agent stalls while teams fly. Why one agent isn't enough.
10
00:00:35,760 --> 00:00:38,560
Single agents are the Swiss Army knife you got at a gas station.
11
00:00:38,560 --> 00:00:41,640
Technically, it has tools, practically it bends on the first screw.
12
00:00:41,640 --> 00:00:45,640
You cram plan reasoning execution approvals and reporting into one prompt.
13
00:00:45,640 --> 00:00:49,720
And now it's context starved, slow and weirdly confident about the wrong thing.
14
00:00:49,720 --> 00:00:53,160
Most people think bigger model, bigger brain, boom, problem solved.
15
00:00:53,160 --> 00:00:55,440
But the real failure is role confusion.
16
00:00:55,520 --> 00:01:00,240
One agent trying to plan a change call graph, write a remediation script, request approval,
17
00:01:00,240 --> 00:01:05,360
verify the result and document it is like running a help desk, a change board,
18
00:01:05,360 --> 00:01:07,960
and a post-mortem with one very tired intern.
19
00:01:07,960 --> 00:01:11,160
Here's what actually happens. The single agent juggles too much state.
20
00:01:11,160 --> 00:01:12,760
Your prompt becomes a novella.
21
00:01:12,760 --> 00:01:16,120
The context window fills with logs, policies and half the MDM glossary.
22
00:01:16,120 --> 00:01:19,440
So it either truncates the important bits or hallucinates missing detail.
23
00:01:19,440 --> 00:01:23,680
Latency spikes tokens burn and ritres feel like deja vu with a bill.
24
00:01:24,520 --> 00:01:27,840
Enter multi agent roles boundaries parallelism.
25
00:01:27,840 --> 00:01:30,360
The planner focuses on intent and constraints.
26
00:01:30,360 --> 00:01:33,040
The operator focuses on tools and execution.
27
00:01:33,040 --> 00:01:37,160
The reviewer focuses on guardrails approvals and are we grounded in facts?
28
00:01:37,160 --> 00:01:42,000
Each agent keeps a clean short instruction set with the minimum memory it needs.
29
00:01:42,000 --> 00:01:45,760
They pass messages tool results or a short summary, not the entire internet.
30
00:01:45,760 --> 00:01:47,600
Why this matters in the enterprise?
31
00:01:47,600 --> 00:01:51,520
You need end to end workflows with human in the loop gates, identity sculpt actions,
32
00:01:51,520 --> 00:01:55,240
retries and audit trails. That's not a single threaded chat. That's a team sport.
33
00:01:55,240 --> 00:02:00,960
The planner, doer, reviewer pattern reduces risk and rework because every step has a second set of eyes,
34
00:02:00,960 --> 00:02:04,160
automated ones that aren't on our 11 of a quick teams meeting.
35
00:02:04,160 --> 00:02:08,680
Cost and latency win too. Use small language models for extraction and classification.
36
00:02:08,680 --> 00:02:11,800
Use premium reasoning models only where planning actually helps.
37
00:02:11,800 --> 00:02:14,360
Badge jobs for summaries while users sleep.
38
00:02:14,360 --> 00:02:17,840
You'll get fewer hallucinations faster results and lower bills.
39
00:02:18,360 --> 00:02:22,240
Micro story, I saw a one-bottre to do everything, try to rotate a secret.
40
00:02:22,240 --> 00:02:27,040
It updated the key, forgot the app setting and fixed it by deleting the resource.
41
00:02:27,040 --> 00:02:29,600
That was the day we invented the reviewer agent.
42
00:02:29,600 --> 00:02:32,360
Multi-agent systems 101 clear simple.
43
00:02:32,360 --> 00:02:34,240
Okay, so what is a multi-agent system?
44
00:02:34,240 --> 00:02:37,120
You can actually ship thing digital team, not one big brain.
45
00:02:37,120 --> 00:02:41,800
Each agent has a role, a tight instruction set and a limited toolbox.
46
00:02:41,800 --> 00:02:45,840
Planner understands the goal, constraints and environment.
47
00:02:46,080 --> 00:02:49,720
Produces a plan with tool calls, the operator can execute.
48
00:02:49,720 --> 00:02:57,240
Operator calls tools, graph logic apps, functions, Azure automation runbooks and returns structured results.
49
00:02:57,240 --> 00:03:04,080
Reviewer checks, groundedness, compliance scope, approvals and safety outputs before anything risky happens.
50
00:03:04,080 --> 00:03:10,040
Messenger, concierge handles human notifications, approvals and summaries for the audit trail.
51
00:03:10,040 --> 00:03:15,280
They communicate through messages and shared context the way grownups should short clear and only what's needed.
52
00:03:15,560 --> 00:03:20,960
Sequential when actions depend on previous results, parallel when sub tasks can fan out safely.
53
00:03:20,960 --> 00:03:26,320
You don't send a 50 page policy, you send the relevant snippet, the device IDs and the intended change.
54
00:03:26,320 --> 00:03:29,200
Tools are the hands, rest APIs for Microsoft Graph.
55
00:03:29,200 --> 00:03:35,280
Raghvia Azure AI search for curated knowledge, code interpreter for snippets, logic apps for orchestrated sequences,
56
00:03:35,280 --> 00:03:40,800
functions for glue, Azure automation for repeatable device tasks, keep tool inputs, structured with schemers,
57
00:03:40,800 --> 00:03:43,240
so the operator can't creatively delete your tenant.
58
00:03:43,880 --> 00:03:48,880
Memory is not magic, it's a budget, minimized per agent memory, use external state, search indexes,
59
00:03:48,880 --> 00:03:51,440
vector stores or a small state store for threat metadata.
60
00:03:51,440 --> 00:03:52,720
Keep context windows lean.
61
00:03:52,720 --> 00:03:58,280
If it's not needed for the next decision, it doesn't write along planning versus execution is where semantic kernel shines.
62
00:03:58,280 --> 00:04:01,840
The planner delegates the operator picks the right tool with function catalogs.
63
00:04:01,840 --> 00:04:06,040
The reviewer critiques outputs with evaluation prompts and content safety.
64
00:04:06,040 --> 00:04:10,920
If the reviewer flags a scope risk, the planner revises, once you understand this loop,
65
00:04:10,920 --> 00:04:14,080
you stop writing 800 word prompts and start wiring clear roads.
66
00:04:14,080 --> 00:04:22,320
Safety boundaries are non-negotiable, identity scoped tool calls with managed identities per agent permissions split between read and manage.
67
00:04:22,320 --> 00:04:28,920
PM for elevation on destructive actions, all tool calls locked to lock analytics, content safety on both input and output,
68
00:04:28,920 --> 00:04:35,320
prompt shields for jail breaks and indirect prompt injection from knowledge sources, analogy time, researcher,
69
00:04:35,440 --> 00:04:41,400
analyst operator QA each is accountable for their part. The operator doesn't approve their own change.
70
00:04:41,400 --> 00:04:43,040
The analyst doesn't mess with production.
71
00:04:43,040 --> 00:04:46,960
The QA doesn't write the code they're approving, simple, boring, safe.
72
00:04:46,960 --> 00:04:53,640
Why Azure AI Foundry plus semantic kernel foundry gives you agent orchestration, model choices, knowledge grounding,
73
00:04:53,640 --> 00:04:57,520
tool wiring, content safety, tracing evaluations and governance in one place.
74
00:04:57,520 --> 00:05:03,200
Semantic kernel gives you planners, skills, pipelines, retries and cancellation in code.
75
00:05:03,960 --> 00:05:12,080
Together you mix models per role, SLMs for extraction, GPT-4 class for planning, batch for summaries and you keep your prompts short,
76
00:05:12,080 --> 00:05:16,040
your blast radius, small and your weak and yours. Now let's build the thing.
77
00:05:16,040 --> 00:05:19,200
How Azure AI Foundry enables multi agent workflows.
78
00:05:19,200 --> 00:05:24,080
Here's where Azure AI Foundry stops being a glossy portal and starts acting like a production shop.
79
00:05:24,080 --> 00:05:31,720
Foundry gives you places to define agents, connect models, add knowledge, wire actions and observe everything without living in six different blades and a wish.
80
00:05:32,120 --> 00:05:35,680
It's orchestration without duct tape. Start with agent definitions.
81
00:05:35,680 --> 00:05:43,480
In Foundry, each agent gets instructions, tight role guidance, not a manifesto deployment, the actual model behind it,
82
00:05:43,480 --> 00:05:48,080
mix GPT-4 class for planning with five class SLMs for extraction knowledge.
83
00:05:48,080 --> 00:05:56,520
Azure AI search indexes uploaded files or been grounding when you want web context actions open API tools, logic apps, functions, graph,
84
00:05:56,800 --> 00:06:06,280
Azure automation runbooks or code interpreter for quick math and passing connected agents. Yes, an agent can treat another agent like a tool if that keeps the roles clean.
85
00:06:06,280 --> 00:06:10,680
Now this is important because semantic kernel plugs into that same universe in code.
86
00:06:10,680 --> 00:06:14,240
You build planners, define skills and register tools in function catalogs.
87
00:06:14,240 --> 00:06:21,040
You choreograph the planner, operator, reviewer loop and let Foundry handle the threads, history and safety rails.
88
00:06:21,040 --> 00:06:24,640
You keep prompts short and put the heavy lifting in tools with schemers.
89
00:06:24,840 --> 00:06:37,760
Translation, fewer hallucinations, fewer oops, more weekend, model choices matter. Foundry abstracts inference through the model endpoint so you can swap models without rewriting your whole app to common lanes, serverless endpoints for premium models and quick iteration.
90
00:06:37,760 --> 00:06:45,840
You pay per token, scale on demand and don't babysit GPUs like it's 2018. Managed compute for specific open models, you want pinned to your tenant boundary.
91
00:06:45,840 --> 00:06:50,160
You pay by uptime and control the shape, but Microsoft handles the deployment plumbing.
92
00:06:50,680 --> 00:07:02,280
Mix models per role planner on GPT for or 4.1 min for reasoning operator on a small model that excels at extraction and parameter shaping batch your long summaries overnight so humans get glossy reports by morning.
93
00:07:02,280 --> 00:07:12,680
And if a model retires or you want to try deep seek or llama variance, Foundry's inference API lets you keep your code steady while you experiment tools and actions are where the operator earns its badge.
94
00:07:12,680 --> 00:07:19,800
This is where you bring real power without giving the agent keys to your kingdom open API tools for internal services with swagger.
95
00:07:19,960 --> 00:07:26,920
Keep the contract strict if the schema says device it string it better be a device ID logic apps for approval flows and human gates.
96
00:07:26,920 --> 00:07:37,120
Let the reviewer root changes to a change approval then resume execution on yes as your functions for glucose and same transformations small testable version graph API for in tune,
97
00:07:37,160 --> 00:07:51,800
and use a device conditional access tasks give read scopes to monitor agents manage scopes only to remediators as your automation for device scripts and runbooks treated like sharp objects in a locked drawer only the fixer agent can open it.
98
00:07:51,800 --> 00:07:57,720
Ragnarining keep it curated use Azure AI search with hybrid ranking and semantic re ranking.
99
00:07:57,920 --> 00:08:27,880
Shard knowledge by domain in tune policy separate from onboarding s op so you don't ship the kitchen sink the operator should pass a snippet not the encyclopedia size your chunks store citations and have the reviewer check groundedness before it touches production safety is not optional found rebakes in content safety filters for input and output prompt shields for jailbreak and indirect injection plus protected material checks run safety pre model and post model so you don't pay for bad inputs or output PR disasters pair that with managed identity per agent.
100
00:08:27,880 --> 00:08:51,880
List privilege are back and PM for temporary elevation on destructive tools every tool call gets traced every approval gets locked if someone asks who ran what you don't open outlook log analytics already knows observability is grown up mode turn on tracing so you can see run history which agent called which tool with what parameters at what latency and what came back wire application insights for performance patterns.
101
00:08:52,040 --> 00:09:17,560
App logs for audit and ship evaluations into see eye to catch drift in relevance groundedness and fluency before user does if groundedness drops the reviewer flags it the plan on arrows the retrieval query and the operator tries again no human panic required deployment primitives keep it tidy use foundry projects for teamscope quotas and policy baselines put as your API management in front of your tool surface so you can version throttle implement circuit breakers and route between pt and pay as you go inference.
102
00:09:17,880 --> 00:09:33,160
private endpoints where data residency or compliance says we're serious regions that match your users not just your aspirations and yes semantic colonel ties the bow you describe the workflow in code the plan a decomposes the operator chooses tools based on function catalogs and availability.
103
00:09:33,160 --> 00:09:43,960
The reviewer enforces checks with evaluation prompts and content safety outcomes and you get retries with back off and cancellation tokens when an external API pretends to be down for maintenance.
104
00:09:44,440 --> 00:10:14,400
Sk gives you the execution brain foundry gives you the operating theater put it together and you've got a practical pattern define agents in foundry with strict instructions and identities orchestrate roles in sk ground with curated search wire actions with open apis logic apps functions automation and for safety before and after inference and observe everything it's boring in the best way like seat belts or backups or not giving global admin to the intern named default key demo scenario one device cleanup CSI in tune.
105
00:10:14,400 --> 00:10:44,080
Ghost device unit you know those devices that show up as managed retired and undead at the same time welcome to CSI in tune our job find stale duplicate or unenrolled hardware and remediate safely with approval so nobody rage emails legal cast of agents monitor queries Microsoft graph and log analytics classifies device states flags anomalies remediator calls in tune and Azure automation to quarantine or retire devices auditor logs everything.
106
00:10:44,080 --> 00:11:13,680
Notifies humans and writes the after action novella for compliance the flow runs like this monitor pulls from graph device records last check in compliance state enrollment type primary user and retirement flags it also grabs log analytics signals defender heartbeat update compliance and in tune management extension events it fuses those into a short structured record per device last C new DC enrolled compliant duplicate candidates risk signals no novels just feels now the brains to brawn hand off monitor sensor
107
00:11:13,680 --> 00:11:26,760
proposed classification stale duplicate no heartbeat or fine reviewer checks groundedness did these fields actually come from graph and LA content safety runs pre model to block prompt injection from any uploaded file or web
108
00:11:26,760 --> 00:11:43,480
grounding if classification passes remediator prepares actions phase one is software mediation tag and quarantine we apply a candidate for retire device category remove dynamic group bindings that trigger policies brawl and if needed move the device into a quarantine as
109
00:11:43,480 --> 00:12:00,480
your ad group that blocks access no deletes yet approvals land in a logic app the reviewer includes a compact plan device aids reasons proposed actions rollback the approvers sees a digest in teams with citations to graph queries and log analytics KQL
110
00:12:00,480 --> 00:12:12,480
approve sensor token back remediator executes retire wipe when justified or clean duplicates by keeping the device with the freshest heartbeat and unlinking ghosts managed identity keeps our hands clean
111
00:12:12,480 --> 00:12:20,320
monitor runs with device read all remediator uses a separate identity with device read right all and exactly the in tune permissions it needs
112
00:12:20,320 --> 00:12:31,480
delete is behind pimp the remediator agent can't elevate itself when elevation is requested the logic app requires a human and the clock starts ticking on that temporary permission observability is the grownup part
113
00:12:31,480 --> 00:12:58,480
every tool call is traced who called what parameters latency response auditor rights event records to log analytics with correlation ideas so you can reconstruct the crime scene without slag archaeology we also run an evaluation step post change groundedness relevance fluency if groundedness drops the reviewer blocks further destructive actions and kicks the plan back to the plan to refine retrieval on our scope models are mixed intentionally classification uses a small model to keep cost and latency low
114
00:12:58,480 --> 00:13:26,480
edge cases like two devices were autopilot renamed one and as your AD thinks their twins get a reasoning pass with GPT photo mini summaries are batched nobody needs a Shakespearean report a 2pm microstory last week the single agent test tried to fix a sale device by retiring everything in the same dynamic group the reviewer in our multi agent flow flag the scope mismatch asked for device level confirmation and stopped it that's why we separate plan act and check end result goes devices stop haunting compliance
115
00:13:26,480 --> 00:13:38,480
you reduce license waste cut false positives and stop explaining to leadership why retired failed is a lifestyle and yes the audit trail is readable which is more than I can say for half your change tickets
116
00:13:38,480 --> 00:13:57,480
demo scenario to zero touch on boarding actually zero touch zero touch for the user 500 touches for engineering until agents go from an HR event to a ready user and device without swivel chair smooth governed and reversible when HR spells a last name three different ways team line up intake
117
00:13:57,480 --> 00:14:13,480
listen to the HR web hook normalizes payloads and valid dates required fields provisioner creates the user and enter assigned groups and licenses registers autopilot profiles valid data checks policy compliance and conditional access posture before we hand over a shiny laptop
118
00:14:13,480 --> 00:14:42,480
concierge since the welcome pack share point links and your first day checklist flow time intake gets the HR payload name start date role department manager region small model extracts and normalizes maps to canonical fields and catches nonsense like a start date in 29 if required fields are missing it pings HR with a structured error via logic apps no freeform poetry provisional calls graph with the managed identity scope to use a right it creates the entra user sets usage location and applies a base group then it is sent
119
00:14:42,480 --> 00:15:01,480
assembles license assignments based on role in region reasoning helps here does this person need e5 security or business premium we encode policy in data and let the model pick from allowed bundles no model gets to invent licenses it's a strict menu autopilot comes next provisional either registers a hardware hash we already have or prepares
120
00:15:01,480 --> 00:15:31,380
adjust in time profile tied to the users group attacks the device groups for baseline policies then validator wakes up it queries graph and defender for compliance signals verify conditional access won't break first sign in and checks that the right device configuration policies will apply if conditional access would block the new hire on day one validator opens an approval with a time bound exception and the rationale concierge handles the human stuff it posts a teams message to the manager with the new hires u_p_n_ device assignment status and a link to a share point start a
121
00:15:31,380 --> 00:16:00,780
kit it emails the new hire with first-day steps MFA setup guidance and where to find the i_t_ portal it does not paste your internal IP ranges into the email content safety checks outputs for leakage and tone governance is the same seat belt theme provisional manage identity has license right but no delete intake can't provision it only validates p_m_ gates any high risk actions like adding users to a sensitive group all actions trace into app insights and log analytics if HR sends a corrected payload the plan
122
00:16:00,780 --> 00:16:24,540
are recalculates the delta and the operator updates only were changed know your your in group memberships observability gives you milestones HR ingest enter create license assignment autopilot profile compliance verdict notification sent you get s_l_a_ metrics and automatic retreats on transient graph errors with back off if the HR system burps the threat doesn't die it pauses logs and resumes when fixed
123
00:16:24,540 --> 00:16:38,300
evaluations run in c_i_ on synthetic on-boarding payloads to catch regressions in relevance and groundedness before a real hire suffers model mix keeps the bill sane s_l_m_ for extraction and normalization reasoning pass only on policy selection decisions
124
00:16:38,300 --> 00:16:46,540
summaries and welcome emails batch when possible the target is low latency provisioning with human great clarity in the audit trail quick story
125
00:16:46,540 --> 00:16:52,780
single agent on boarding try to add a user to all staff and contractors because the title said contractor staff
126
00:16:52,780 --> 00:17:08,140
multi agent called the mutual exclusivity rule invalid data asked for a human decision and avoided an immediate toaster fire in conditional access outcome truly zero touch for the user one thread four agents clean approvals and the paper trail you can show your c_so without praying to the demo gods
127
00:17:08,140 --> 00:17:14,340
and yes when HR inevitably changes the start date the system recalculates without torching the license budget
128
00:17:14,340 --> 00:17:25,300
demo scenario try automated security hardening please let bit locker beyond security drift is like weeds you pull three five pop up and one of them disables bit lockers out of spite
129
00:17:25,300 --> 00:17:42,100
our goal detect conflict drift continuously and remediate safely with gates alloy i would respect cast drift detector queries graph and defender secure score pulls into compliance and device config baselines and builds a drift delta fixer pushes into policy changes
130
00:17:42,100 --> 00:17:50,020
bit lockers settings and windows update rings runs device scripts via azure automation when policy can't reach far enough
131
00:17:50,020 --> 00:17:57,620
reviewer enforces guardrails scopes changes runs groundedness checks and routes human approvals for risky steps messenger
132
00:17:57,620 --> 00:18:05,860
nudges humans in teams or email with compact plans and receipts flow drift detector runs on a schedule and on demand
133
00:18:05,860 --> 00:18:19,620
it pulls device compliance bit lockers status tpm status os version update ring and policy assignments via graph it compares each device against a golden profile stored as structured data not a vibe's document
134
00:18:19,620 --> 00:18:28,020
attacks findings missing bit locker week encryption method policy not applied secure boot of update ring out of bounds planar pass kicks in
135
00:18:28,020 --> 00:18:34,340
propose a minimal plan per device group for bit locker use into in bit locker policy if the device is capable
136
00:18:34,340 --> 00:18:42,100
if policies already targeted but the status is off propose an automation script to re-kick protect us and escrow the key to azure ad for update drift
137
00:18:42,100 --> 00:18:49,700
proposed moving the device to a staged update ring not patch now hope for conditional access snafu propose a time bound
138
00:18:49,700 --> 00:18:58,500
exception with required justification reviewer evaluates groundedness did drift detector actually site the fields from graph and defender
139
00:18:58,500 --> 00:19:11,380
it checks scope device eyes policy eyes change type if anything smells like all devices it gets blocked with prejudice content safety is on input and output to prevent prompt injection from any external knowledge or pasted logs
140
00:19:11,380 --> 00:19:20,420
high risk actions like removing a conflicting local policy get pm gated the fixers identity can't self elevate approval lives in a logic app with a tidy card
141
00:19:20,420 --> 00:19:25,460
what's wrong what will change roll back steps and citations approve and fixer executes
142
00:19:25,460 --> 00:19:32,500
in tune policy updates are i'd important and targeted bit locker fix uses a runbook with explicit parameters and a sanity check
143
00:19:32,500 --> 00:19:42,180
tpm present o s supports xds as 256 ascrow path verified if any precheck fails it bails with a human friendly error and no side effects
144
00:19:42,180 --> 00:19:51,860
messenger closes the loop devices fixed a digest hits teams with counts links to app inside traces and a csv for the person who still loves excel more than their family
145
00:19:51,860 --> 00:20:00,980
if a device refuse to comply reviewer opens an exception thread with a clock and retries scheduled observability every tool call traced correlated by thread
146
00:20:00,980 --> 00:20:06,420
we lock deltas before and after so audit can see non-compliant compliant without camping in kql
147
00:20:06,420 --> 00:20:11,460
we also run an evaluation agent on post change summaries to check groundedness and clarity
148
00:20:11,460 --> 00:20:18,500
if groundedness drops or failure rates spike planar narrows retrieval reduces batch size or pauses destructive actions
149
00:20:18,500 --> 00:20:25,300
model mix slm for extraction and classification reasoning model for plans synthesis only when needed batch the nightly report
150
00:20:25,300 --> 00:20:32,020
no sonnets about a s and result bit locker is actually on update rings drift less exceptions are documented time bound and reviewable
151
00:20:32,020 --> 00:20:39,060
and nobody fixes a device by re-emaging finance laptops during quarter close architecture breakdown foundry plus
152
00:20:39,060 --> 00:20:44,740
sk plus tools plus identity let's open the hood without spilling engine oil on your quota
153
00:20:44,740 --> 00:20:52,020
think four layers that keep this sane foundry semantic kernel tools identity foundry is home base
154
00:20:52,020 --> 00:20:57,620
you define agents with strict instructions pick deployments per roll attach knowledge via azure a i search
155
00:20:57,620 --> 00:21:05,140
and register actions open api tools logic apps functions graph azure automation code interpreter
156
00:21:05,140 --> 00:21:11,060
threads run history tracing evaluations and safety live here content safety prompt shields and
157
00:21:11,060 --> 00:21:17,300
protected material checks run pre and post model so bad inputs don't cost tokens and bad outputs don't cost jobs semantic
158
00:21:17,300 --> 00:21:24,020
kernel is the conductor in code you wire the planner operator reviewer loop you register function catalogs for tools set
159
00:21:24,020 --> 00:21:30,900
retry back off and add cancellation tokens you keep prompt short push complexity into schemas and let sk decide which tool to
160
00:21:30,900 --> 00:21:38,020
call based on availability and policy if graph times out sk retries with jitter if groundedness dips it roots back to the
161
00:21:38,020 --> 00:21:45,540
planner to refine tools are the hands keep contracts strict with open api put api management in front to throttle version and
162
00:21:45,540 --> 00:21:51,300
circuit break logic apps handle approvals and long running flows functions do glue and transformation as your
163
00:21:51,300 --> 00:21:57,540
automation executes scripts with guardrails and post checks graph is your source of truth for in tune entry compliance
164
00:21:57,540 --> 00:22:03,940
and secure score azure a i search serves curated knowledge per domain no knowledge soup identity is the seatbelt
165
00:22:03,940 --> 00:22:10,580
each agent has its own managed identity monitor agents get red scopes remediators get right scopes never admin
166
00:22:10,580 --> 00:22:17,380
pm gates anything destructive elevation requires a human and expires private endpoints where data boundaries matter
167
00:22:17,380 --> 00:22:23,460
policy baselines on foundry projects enforce allowed models regions and quotas app insights and log
168
00:22:23,460 --> 00:22:30,180
analytics collect traces metrics and audits so you can answer who did what without slack archaeology models plug in through
169
00:22:30,180 --> 00:22:36,100
foundry's inference endpoint use serverless for premium reasoning manage compute for pinned open models swap
170
00:22:36,100 --> 00:22:42,020
models without rewriting code but still evaluate relevance groundedness and fluency in c i before promoting
171
00:22:42,020 --> 00:22:48,180
regions match your users data zones match your regulators put together it's a boring reliable machine agents
172
00:22:48,180 --> 00:22:55,220
defined in foundry orchestrated by sk powered by strict tools locked down by identity measured by observability
173
00:22:55,220 --> 00:23:02,900
and guarded by safety boring wins boring scales boring let's you sleep best practices ship without regret
174
00:23:02,900 --> 00:23:08,580
okay build time wisdom so you don't spend sunday undoing saturday give each agent one job planner plans
175
00:23:08,580 --> 00:23:14,340
operator operates review a reviews if an agent starts writing a heartfelt novel cut its prompt and
176
00:23:14,340 --> 00:23:19,860
give the pros to a tool keep instructions short scoped and boring you'll get fewer surprises prefer tool calls
177
00:23:19,860 --> 00:23:26,020
over long pros put contracts in open api schemas on inputs and guardrails on outputs if a tool expects device
178
00:23:26,020 --> 00:23:31,220
it send a device it not maybe this device in finance models are creative tools are not that's the
179
00:23:31,220 --> 00:23:37,140
point enforce least privilege like your job depends on it because it does separate identities read
180
00:23:37,140 --> 00:23:42,260
only for monitor write only for immediate and no identity has delete without pm elevation is
181
00:23:42,260 --> 00:23:47,060
temporary justified and logged if someone asked for permanent global admin hand them a stress ball
182
00:23:47,060 --> 00:23:51,780
and a policy doc start small two agents doer and checker at a planner wants the loop is stable
183
00:23:51,780 --> 00:23:56,020
then if you must fan out don't start with a digital scrum team of twelve that's how you recreate
184
00:23:56,020 --> 00:24:01,700
your org chart in yaml and cry mix models to cut cost s alms for extraction gbt4 class only for
185
00:24:01,700 --> 00:24:06,660
reasoning where it matters batch summaries overnight you're not paying premium rates to format a date
186
00:24:06,660 --> 00:24:12,900
string save the heavy model for decisions not narration at memory last use external state as your
187
00:24:12,900 --> 00:24:17,700
a i search or a small state store before you start stuffing context windows like a carry on bag
188
00:24:17,700 --> 00:24:22,740
thread metadata should be tiny if it's not needed for the next turn it doesn't go log everything
189
00:24:22,740 --> 00:24:28,660
turn on tracing thread IDs tool parameters latency results ship evaluations into c i so you catch
190
00:24:28,660 --> 00:24:33,300
relevance and groundedness drops before production does alert when groundedness dips not when
191
00:24:33,300 --> 00:24:38,820
twitter does guardrails before speed approvals for destructive tools dry run mode with no side effects
192
00:24:38,820 --> 00:24:44,260
staged rollouts by device group or region if your rollout plan is all at once i hope your resume is
193
00:24:44,260 --> 00:24:49,300
updated build for failure retries with jitter back off on graph throttling circuit breakers and api
194
00:24:49,300 --> 00:24:55,060
management e-dampotant runbooks if a step can be rerun safely you sleep better finally document your
195
00:24:55,060 --> 00:25:00,980
policies as data mutual exclusivity rules license bundles device baselines put them in structured form
196
00:25:00,980 --> 00:25:06,020
models can select they shouldn't invent that's how you keep agents helpful and your blast radius small
197
00:25:06,580 --> 00:25:12,260
if you remember one thing single agents help but teams of agents planned tool driven and permission
198
00:25:12,260 --> 00:25:17,540
scoped quietly transform your environment while you pretend to sleep if this hit home subscribe and
199
00:25:17,540 --> 00:25:21,780
watch the follow up where i drop the semantic kernel repo and the bicep template so you can steal
200
00:25:21,780 --> 00:25:26,180
shamelessly want the evaluation agents episode that keeps your prompts honest it's queued up next