Dec. 8, 2025

Planning, Collaboration, Tooling: Building Multi-Agent Systems with Azure Foundry + Semantic Kernel

You already know the meme: chatbots talk, agents act, multi-agent systems actually get stuff done.
If you’ve ever begged a bot to fix Intune and got a poem instead, this one’s for you. In this episode, we go full Netflix hands-on: you watch, you snack, I poke the dangerous Service Principal things so nobody nukes prod. We build a mini self-healing, governed multi-agent system using Azure AI Foundry + Semantic Kernel, wired into real enterprise surfaces:

Intune
Entra ID
Microsoft Graph
Azure Automation
Log Analytics

We run one-agent vs multi-agent head-to-head on a real workflow: 12 minutes vs 3 minutes time-to-fix — with only my subscription credit on the line. You’ll see why one agent stalls while teams fly, and how to ship this pattern safely in your own tenant. 🔥 What You’ll Learn 1. Why a Single Agent Isn’t Enough in the Enterprise We start by tearing apart the “one giant agent” fantasy:

Single agents are like gas-station Swiss Army knives: technically they have tools, practically they bend on the first real job.
You stuff planning, reasoning, execution, approvals, and reporting into one prompt → context explodes, latency spikes, hallucinations creep in.
One agent trying to:
- Plan a change
- Call Graph and Intune
- Write remediation scripts
- Request approvals
- Verify results
- Document everything

…is basically a help desk, change board, and postmortem crammed into one very tired intern. We break down what actually goes wrong:

Context windows flooded with logs, policies, and MDM miscellany
Important details get truncated or invented
Token usage and costs balloon
“Fix” attempts that quietly break other things (like deleting the resource instead of rotating a secret 😬)

Then we introduce the fix: Multi-agent = roles + boundaries + parallelism

Planner focuses on intent & constraints
Operator focuses on tools & execution
Reviewer focuses on guardrails & approvals

Each agent gets a tight instruction set, minimal memory, and a focused toolset, passing around small structured messages, not a 50-page policy doc. 2. Multi-Agent Systems 101 (No Hype, Just The Pattern) We map out a clear, shippable mental model: think digital team, not one big brain. Roles:

Planner — understands the goal, constraints, environment; outputs a stepwise plan with tool calls
Operator — executes the plan via tools: Graph, Azure Automation, Functions, Logic Apps, etc.
Reviewer — checks groundedness, scope, compliance, and safety before risky changes
Messenger/Concierge — interacts with humans: approvals, status updates, and audit summaries

Core concepts:

Tools = hands
- REST APIs (Graph, internal services)
- Azure Automation runbooks (device scripts, remediation)
- Azure Functions & Logic Apps (glue & approvals)
- RAG via Azure AI Search (curated knowledge, not random web junk)
Memory = budget, not magic
- Minimize per-agent context
- Use external state (Search, state store, thread metadata)
- Only pass what’s needed for the next decision
Planning vs Execution
- Planner decomposes → Operator calls tools → Reviewer checks → Messenger tells humans
- This is where Semantic Kernel shines: planners, skills, function catalogs, retries, cancellation
Safety by design
- Managed Identities per agent
- RBAC split into read vs manage
- PIM for destructive operations
- Tool calls logged to Log Analytics
- Content Safety + prompt shields to block jailbreaks & indirect injection

3. How Azure AI Foundry Powers Multi-Agent Workflows We then show how Azure AI Foundry becomes the control room: You’ll see how to define agents with:

Instructions — short, role-specific prompts
Deployments — different models per role (GPT-4-class for planning, SLMs for extraction)
Knowledge — Azure AI Search indexes, uploaded docs, optional web grounding
Actions — OpenAPI tools, Graph, Logic Apps, Functions, Azure Automation, Code Interpreter
Connected agents — yes, one agent can call another like a tool

Why this matters:

Foundry handles threads, safety, tracing, and evaluations
Semantic Kernel orchestrates the planner → operator → reviewer loop in code
You keep prompts short and put power in tools with strict schemas

Model strategy:

Reasoning models for planning and complex decisions
Small models (SLMs) for extraction, classification, parameter shaping
Mix serverless endpoints and managed compute depending on cost & residency needs

Safety & observability:

Content Safety on inputs and outputs
Prompt shields against jailbreak and indirect injection
Full tracing of tool calls (who, what, where, how long)
Application Insights + Log Analytics for performance & audit
Built-in evaluation flows for groundedness, relevance, and fluency

Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-show-podcast--6704921/support.

Follow us on:
LInkedIn
Substack

Transcript

1
00:00:00,000 --> 00:00:03,880
You know, chatbot's talk agents act and multi-agent systems actually get stuff done.

2
00:00:03,880 --> 00:00:08,000
If you've begged a bot to fix Intune and it replied with a poem, this is for you.

3
00:00:08,000 --> 00:00:11,600
Here's the deal. Netflix hands on. You watch. You eat snacks.

4
00:00:11,600 --> 00:00:13,800
I do the dangerous service principle things to nobody.

5
00:00:13,800 --> 00:00:19,080
NUX prod. We're building a mini self-healing governed multi-agent system with Azure AI

6
00:00:19,080 --> 00:00:25,360
foundry and semantic kernel real touch points. Intune, Entra, Graph, Azure automation, log

7
00:00:25,360 --> 00:00:29,200
analytics. One agent versus multi-agent head to head, 12 minutes versus three.

8
00:00:29,520 --> 00:00:31,520
And the only casualty is my subscription credit.

9
00:00:31,520 --> 00:00:35,760
First, why one agent stalls while teams fly. Why one agent isn't enough.

10
00:00:35,760 --> 00:00:38,560
Single agents are the Swiss Army knife you got at a gas station.

11
00:00:38,560 --> 00:00:41,640
Technically, it has tools, practically it bends on the first screw.

12
00:00:41,640 --> 00:00:45,640
You cram plan reasoning execution approvals and reporting into one prompt.

13
00:00:45,640 --> 00:00:49,720
And now it's context starved, slow and weirdly confident about the wrong thing.

14
00:00:49,720 --> 00:00:53,160
Most people think bigger model, bigger brain, boom, problem solved.

15
00:00:53,160 --> 00:00:55,440
But the real failure is role confusion.

16
00:00:55,520 --> 00:01:00,240
One agent trying to plan a change call graph, write a remediation script, request approval,

17
00:01:00,240 --> 00:01:05,360
verify the result and document it is like running a help desk, a change board,

18
00:01:05,360 --> 00:01:07,960
and a post-mortem with one very tired intern.

19
00:01:07,960 --> 00:01:11,160
Here's what actually happens. The single agent juggles too much state.

20
00:01:11,160 --> 00:01:12,760
Your prompt becomes a novella.

21
00:01:12,760 --> 00:01:16,120
The context window fills with logs, policies and half the MDM glossary.

22
00:01:16,120 --> 00:01:19,440
So it either truncates the important bits or hallucinates missing detail.

23
00:01:19,440 --> 00:01:23,680
Latency spikes tokens burn and ritres feel like deja vu with a bill.

24
00:01:24,520 --> 00:01:27,840
Enter multi agent roles boundaries parallelism.

25
00:01:27,840 --> 00:01:30,360
The planner focuses on intent and constraints.

26
00:01:30,360 --> 00:01:33,040
The operator focuses on tools and execution.

27
00:01:33,040 --> 00:01:37,160
The reviewer focuses on guardrails approvals and are we grounded in facts?

28
00:01:37,160 --> 00:01:42,000
Each agent keeps a clean short instruction set with the minimum memory it needs.

29
00:01:42,000 --> 00:01:45,760
They pass messages tool results or a short summary, not the entire internet.

30
00:01:45,760 --> 00:01:47,600
Why this matters in the enterprise?

31
00:01:47,600 --> 00:01:51,520
You need end to end workflows with human in the loop gates, identity sculpt actions,

32
00:01:51,520 --> 00:01:55,240
retries and audit trails. That's not a single threaded chat. That's a team sport.

33
00:01:55,240 --> 00:02:00,960
The planner, doer, reviewer pattern reduces risk and rework because every step has a second set of eyes,

34
00:02:00,960 --> 00:02:04,160
automated ones that aren't on our 11 of a quick teams meeting.

35
00:02:04,160 --> 00:02:08,680
Cost and latency win too. Use small language models for extraction and classification.

36
00:02:08,680 --> 00:02:11,800
Use premium reasoning models only where planning actually helps.

37
00:02:11,800 --> 00:02:14,360
Badge jobs for summaries while users sleep.

38
00:02:14,360 --> 00:02:17,840
You'll get fewer hallucinations faster results and lower bills.

39
00:02:18,360 --> 00:02:22,240
Micro story, I saw a one-bottre to do everything, try to rotate a secret.

40
00:02:22,240 --> 00:02:27,040
It updated the key, forgot the app setting and fixed it by deleting the resource.

41
00:02:27,040 --> 00:02:29,600
That was the day we invented the reviewer agent.

42
00:02:29,600 --> 00:02:32,360
Multi-agent systems 101 clear simple.

43
00:02:32,360 --> 00:02:34,240
Okay, so what is a multi-agent system?

44
00:02:34,240 --> 00:02:37,120
You can actually ship thing digital team, not one big brain.

45
00:02:37,120 --> 00:02:41,800
Each agent has a role, a tight instruction set and a limited toolbox.

46
00:02:41,800 --> 00:02:45,840
Planner understands the goal, constraints and environment.

47
00:02:46,080 --> 00:02:49,720
Produces a plan with tool calls, the operator can execute.

48
00:02:49,720 --> 00:02:57,240
Operator calls tools, graph logic apps, functions, Azure automation runbooks and returns structured results.

49
00:02:57,240 --> 00:03:04,080
Reviewer checks, groundedness, compliance scope, approvals and safety outputs before anything risky happens.

50
00:03:04,080 --> 00:03:10,040
Messenger, concierge handles human notifications, approvals and summaries for the audit trail.

51
00:03:10,040 --> 00:03:15,280
They communicate through messages and shared context the way grownups should short clear and only what's needed.

52
00:03:15,560 --> 00:03:20,960
Sequential when actions depend on previous results, parallel when sub tasks can fan out safely.

53
00:03:20,960 --> 00:03:26,320
You don't send a 50 page policy, you send the relevant snippet, the device IDs and the intended change.

54
00:03:26,320 --> 00:03:29,200
Tools are the hands, rest APIs for Microsoft Graph.

55
00:03:29,200 --> 00:03:35,280
Raghvia Azure AI search for curated knowledge, code interpreter for snippets, logic apps for orchestrated sequences,

56
00:03:35,280 --> 00:03:40,800
functions for glue, Azure automation for repeatable device tasks, keep tool inputs, structured with schemers,

57
00:03:40,800 --> 00:03:43,240
so the operator can't creatively delete your tenant.

58
00:03:43,880 --> 00:03:48,880
Memory is not magic, it's a budget, minimized per agent memory, use external state, search indexes,

59
00:03:48,880 --> 00:03:51,440
vector stores or a small state store for threat metadata.

60
00:03:51,440 --> 00:03:52,720
Keep context windows lean.

61
00:03:52,720 --> 00:03:58,280
If it's not needed for the next decision, it doesn't write along planning versus execution is where semantic kernel shines.

62
00:03:58,280 --> 00:04:01,840
The planner delegates the operator picks the right tool with function catalogs.

63
00:04:01,840 --> 00:04:06,040
The reviewer critiques outputs with evaluation prompts and content safety.

64
00:04:06,040 --> 00:04:10,920
If the reviewer flags a scope risk, the planner revises, once you understand this loop,

65
00:04:10,920 --> 00:04:14,080
you stop writing 800 word prompts and start wiring clear roads.

66
00:04:14,080 --> 00:04:22,320
Safety boundaries are non-negotiable, identity scoped tool calls with managed identities per agent permissions split between read and manage.

67
00:04:22,320 --> 00:04:28,920
PM for elevation on destructive actions, all tool calls locked to lock analytics, content safety on both input and output,

68
00:04:28,920 --> 00:04:35,320
prompt shields for jail breaks and indirect prompt injection from knowledge sources, analogy time, researcher,

69
00:04:35,440 --> 00:04:41,400
analyst operator QA each is accountable for their part. The operator doesn't approve their own change.

70
00:04:41,400 --> 00:04:43,040
The analyst doesn't mess with production.

71
00:04:43,040 --> 00:04:46,960
The QA doesn't write the code they're approving, simple, boring, safe.

72
00:04:46,960 --> 00:04:53,640
Why Azure AI Foundry plus semantic kernel foundry gives you agent orchestration, model choices, knowledge grounding,

73
00:04:53,640 --> 00:04:57,520
tool wiring, content safety, tracing evaluations and governance in one place.

74
00:04:57,520 --> 00:05:03,200
Semantic kernel gives you planners, skills, pipelines, retries and cancellation in code.

75
00:05:03,960 --> 00:05:12,080
Together you mix models per role, SLMs for extraction, GPT-4 class for planning, batch for summaries and you keep your prompts short,

76
00:05:12,080 --> 00:05:16,040
your blast radius, small and your weak and yours. Now let's build the thing.

77
00:05:16,040 --> 00:05:19,200
How Azure AI Foundry enables multi agent workflows.

78
00:05:19,200 --> 00:05:24,080
Here's where Azure AI Foundry stops being a glossy portal and starts acting like a production shop.

79
00:05:24,080 --> 00:05:31,720
Foundry gives you places to define agents, connect models, add knowledge, wire actions and observe everything without living in six different blades and a wish.

80
00:05:32,120 --> 00:05:35,680
It's orchestration without duct tape. Start with agent definitions.

81
00:05:35,680 --> 00:05:43,480
In Foundry, each agent gets instructions, tight role guidance, not a manifesto deployment, the actual model behind it,

82
00:05:43,480 --> 00:05:48,080
mix GPT-4 class for planning with five class SLMs for extraction knowledge.

83
00:05:48,080 --> 00:05:56,520
Azure AI search indexes uploaded files or been grounding when you want web context actions open API tools, logic apps, functions, graph,

84
00:05:56,800 --> 00:06:06,280
Azure automation runbooks or code interpreter for quick math and passing connected agents. Yes, an agent can treat another agent like a tool if that keeps the roles clean.

85
00:06:06,280 --> 00:06:10,680
Now this is important because semantic kernel plugs into that same universe in code.

86
00:06:10,680 --> 00:06:14,240
You build planners, define skills and register tools in function catalogs.

87
00:06:14,240 --> 00:06:21,040
You choreograph the planner, operator, reviewer loop and let Foundry handle the threads, history and safety rails.

88
00:06:21,040 --> 00:06:24,640
You keep prompts short and put the heavy lifting in tools with schemers.

89
00:06:24,840 --> 00:06:37,760
Translation, fewer hallucinations, fewer oops, more weekend, model choices matter. Foundry abstracts inference through the model endpoint so you can swap models without rewriting your whole app to common lanes, serverless endpoints for premium models and quick iteration.

90
00:06:37,760 --> 00:06:45,840
You pay per token, scale on demand and don't babysit GPUs like it's 2018. Managed compute for specific open models, you want pinned to your tenant boundary.

91
00:06:45,840 --> 00:06:50,160
You pay by uptime and control the shape, but Microsoft handles the deployment plumbing.

92
00:06:50,680 --> 00:07:02,280
Mix models per role planner on GPT for or 4.1 min for reasoning operator on a small model that excels at extraction and parameter shaping batch your long summaries overnight so humans get glossy reports by morning.

93
00:07:02,280 --> 00:07:12,680
And if a model retires or you want to try deep seek or llama variance, Foundry's inference API lets you keep your code steady while you experiment tools and actions are where the operator earns its badge.

94
00:07:12,680 --> 00:07:19,800
This is where you bring real power without giving the agent keys to your kingdom open API tools for internal services with swagger.

95
00:07:19,960 --> 00:07:26,920
Keep the contract strict if the schema says device it string it better be a device ID logic apps for approval flows and human gates.

96
00:07:26,920 --> 00:07:37,120
Let the reviewer root changes to a change approval then resume execution on yes as your functions for glucose and same transformations small testable version graph API for in tune,

97
00:07:37,160 --> 00:07:51,800
and use a device conditional access tasks give read scopes to monitor agents manage scopes only to remediators as your automation for device scripts and runbooks treated like sharp objects in a locked drawer only the fixer agent can open it.

98
00:07:51,800 --> 00:07:57,720
Ragnarining keep it curated use Azure AI search with hybrid ranking and semantic re ranking.

99
00:07:57,920 --> 00:08:27,880
Shard knowledge by domain in tune policy separate from onboarding s op so you don't ship the kitchen sink the operator should pass a snippet not the encyclopedia size your chunks store citations and have the reviewer check groundedness before it touches production safety is not optional found rebakes in content safety filters for input and output prompt shields for jailbreak and indirect injection plus protected material checks run safety pre model and post model so you don't pay for bad inputs or output PR disasters pair that with managed identity per agent.

100
00:08:27,880 --> 00:08:51,880
List privilege are back and PM for temporary elevation on destructive tools every tool call gets traced every approval gets locked if someone asks who ran what you don't open outlook log analytics already knows observability is grown up mode turn on tracing so you can see run history which agent called which tool with what parameters at what latency and what came back wire application insights for performance patterns.

101
00:08:52,040 --> 00:09:17,560
App logs for audit and ship evaluations into see eye to catch drift in relevance groundedness and fluency before user does if groundedness drops the reviewer flags it the plan on arrows the retrieval query and the operator tries again no human panic required deployment primitives keep it tidy use foundry projects for teamscope quotas and policy baselines put as your API management in front of your tool surface so you can version throttle implement circuit breakers and route between pt and pay as you go inference.

102
00:09:17,880 --> 00:09:33,160
private endpoints where data residency or compliance says we're serious regions that match your users not just your aspirations and yes semantic colonel ties the bow you describe the workflow in code the plan a decomposes the operator chooses tools based on function catalogs and availability.

103
00:09:33,160 --> 00:09:43,960
The reviewer enforces checks with evaluation prompts and content safety outcomes and you get retries with back off and cancellation tokens when an external API pretends to be down for maintenance.

104
00:09:44,440 --> 00:10:14,400
Sk gives you the execution brain foundry gives you the operating theater put it together and you've got a practical pattern define agents in foundry with strict instructions and identities orchestrate roles in sk ground with curated search wire actions with open apis logic apps functions automation and for safety before and after inference and observe everything it's boring in the best way like seat belts or backups or not giving global admin to the intern named default key demo scenario one device cleanup CSI in tune.

105
00:10:14,400 --> 00:10:44,080
Ghost device unit you know those devices that show up as managed retired and undead at the same time welcome to CSI in tune our job find stale duplicate or unenrolled hardware and remediate safely with approval so nobody rage emails legal cast of agents monitor queries Microsoft graph and log analytics classifies device states flags anomalies remediator calls in tune and Azure automation to quarantine or retire devices auditor logs everything.

106
00:10:44,080 --> 00:11:13,680
Notifies humans and writes the after action novella for compliance the flow runs like this monitor pulls from graph device records last check in compliance state enrollment type primary user and retirement flags it also grabs log analytics signals defender heartbeat update compliance and in tune management extension events it fuses those into a short structured record per device last C new DC enrolled compliant duplicate candidates risk signals no novels just feels now the brains to brawn hand off monitor sensor

107
00:11:13,680 --> 00:11:26,760
proposed classification stale duplicate no heartbeat or fine reviewer checks groundedness did these fields actually come from graph and LA content safety runs pre model to block prompt injection from any uploaded file or web

108
00:11:26,760 --> 00:11:43,480
grounding if classification passes remediator prepares actions phase one is software mediation tag and quarantine we apply a candidate for retire device category remove dynamic group bindings that trigger policies brawl and if needed move the device into a quarantine as

109
00:11:43,480 --> 00:12:00,480
your ad group that blocks access no deletes yet approvals land in a logic app the reviewer includes a compact plan device aids reasons proposed actions rollback the approvers sees a digest in teams with citations to graph queries and log analytics KQL

110
00:12:00,480 --> 00:12:12,480
approve sensor token back remediator executes retire wipe when justified or clean duplicates by keeping the device with the freshest heartbeat and unlinking ghosts managed identity keeps our hands clean

111
00:12:12,480 --> 00:12:20,320
monitor runs with device read all remediator uses a separate identity with device read right all and exactly the in tune permissions it needs

112
00:12:20,320 --> 00:12:31,480
delete is behind pimp the remediator agent can't elevate itself when elevation is requested the logic app requires a human and the clock starts ticking on that temporary permission observability is the grownup part

113
00:12:31,480 --> 00:12:58,480
every tool call is traced who called what parameters latency response auditor rights event records to log analytics with correlation ideas so you can reconstruct the crime scene without slag archaeology we also run an evaluation step post change groundedness relevance fluency if groundedness drops the reviewer blocks further destructive actions and kicks the plan back to the plan to refine retrieval on our scope models are mixed intentionally classification uses a small model to keep cost and latency low

114
00:12:58,480 --> 00:13:26,480
edge cases like two devices were autopilot renamed one and as your AD thinks their twins get a reasoning pass with GPT photo mini summaries are batched nobody needs a Shakespearean report a 2pm microstory last week the single agent test tried to fix a sale device by retiring everything in the same dynamic group the reviewer in our multi agent flow flag the scope mismatch asked for device level confirmation and stopped it that's why we separate plan act and check end result goes devices stop haunting compliance

115
00:13:26,480 --> 00:13:38,480
you reduce license waste cut false positives and stop explaining to leadership why retired failed is a lifestyle and yes the audit trail is readable which is more than I can say for half your change tickets

116
00:13:38,480 --> 00:13:57,480
demo scenario to zero touch on boarding actually zero touch zero touch for the user 500 touches for engineering until agents go from an HR event to a ready user and device without swivel chair smooth governed and reversible when HR spells a last name three different ways team line up intake

117
00:13:57,480 --> 00:14:13,480
listen to the HR web hook normalizes payloads and valid dates required fields provisioner creates the user and enter assigned groups and licenses registers autopilot profiles valid data checks policy compliance and conditional access posture before we hand over a shiny laptop

118
00:14:13,480 --> 00:14:42,480
concierge since the welcome pack share point links and your first day checklist flow time intake gets the HR payload name start date role department manager region small model extracts and normalizes maps to canonical fields and catches nonsense like a start date in 29 if required fields are missing it pings HR with a structured error via logic apps no freeform poetry provisional calls graph with the managed identity scope to use a right it creates the entra user sets usage location and applies a base group then it is sent

119
00:14:42,480 --> 00:15:01,480
assembles license assignments based on role in region reasoning helps here does this person need e5 security or business premium we encode policy in data and let the model pick from allowed bundles no model gets to invent licenses it's a strict menu autopilot comes next provisional either registers a hardware hash we already have or prepares

120
00:15:01,480 --> 00:15:31,380
adjust in time profile tied to the users group attacks the device groups for baseline policies then validator wakes up it queries graph and defender for compliance signals verify conditional access won't break first sign in and checks that the right device configuration policies will apply if conditional access would block the new hire on day one validator opens an approval with a time bound exception and the rationale concierge handles the human stuff it posts a teams message to the manager with the new hires u_p_n_ device assignment status and a link to a share point start a

121
00:15:31,380 --> 00:16:00,780
kit it emails the new hire with first-day steps MFA setup guidance and where to find the i_t_ portal it does not paste your internal IP ranges into the email content safety checks outputs for leakage and tone governance is the same seat belt theme provisional manage identity has license right but no delete intake can't provision it only validates p_m_ gates any high risk actions like adding users to a sensitive group all actions trace into app insights and log analytics if HR sends a corrected payload the plan

122
00:16:00,780 --> 00:16:24,540
are recalculates the delta and the operator updates only were changed know your your in group memberships observability gives you milestones HR ingest enter create license assignment autopilot profile compliance verdict notification sent you get s_l_a_ metrics and automatic retreats on transient graph errors with back off if the HR system burps the threat doesn't die it pauses logs and resumes when fixed

123
00:16:24,540 --> 00:16:38,300
evaluations run in c_i_ on synthetic on-boarding payloads to catch regressions in relevance and groundedness before a real hire suffers model mix keeps the bill sane s_l_m_ for extraction and normalization reasoning pass only on policy selection decisions

124
00:16:38,300 --> 00:16:46,540
summaries and welcome emails batch when possible the target is low latency provisioning with human great clarity in the audit trail quick story

125
00:16:46,540 --> 00:16:52,780
single agent on boarding try to add a user to all staff and contractors because the title said contractor staff

126
00:16:52,780 --> 00:17:08,140
multi agent called the mutual exclusivity rule invalid data asked for a human decision and avoided an immediate toaster fire in conditional access outcome truly zero touch for the user one thread four agents clean approvals and the paper trail you can show your c_so without praying to the demo gods

127
00:17:08,140 --> 00:17:14,340
and yes when HR inevitably changes the start date the system recalculates without torching the license budget

128
00:17:14,340 --> 00:17:25,300
demo scenario try automated security hardening please let bit locker beyond security drift is like weeds you pull three five pop up and one of them disables bit lockers out of spite

129
00:17:25,300 --> 00:17:42,100
our goal detect conflict drift continuously and remediate safely with gates alloy i would respect cast drift detector queries graph and defender secure score pulls into compliance and device config baselines and builds a drift delta fixer pushes into policy changes

130
00:17:42,100 --> 00:17:50,020
bit lockers settings and windows update rings runs device scripts via azure automation when policy can't reach far enough

131
00:17:50,020 --> 00:17:57,620
reviewer enforces guardrails scopes changes runs groundedness checks and routes human approvals for risky steps messenger

132
00:17:57,620 --> 00:18:05,860
nudges humans in teams or email with compact plans and receipts flow drift detector runs on a schedule and on demand

133
00:18:05,860 --> 00:18:19,620
it pulls device compliance bit lockers status tpm status os version update ring and policy assignments via graph it compares each device against a golden profile stored as structured data not a vibe's document

134
00:18:19,620 --> 00:18:28,020
attacks findings missing bit locker week encryption method policy not applied secure boot of update ring out of bounds planar pass kicks in

135
00:18:28,020 --> 00:18:34,340
propose a minimal plan per device group for bit locker use into in bit locker policy if the device is capable

136
00:18:34,340 --> 00:18:42,100
if policies already targeted but the status is off propose an automation script to re-kick protect us and escrow the key to azure ad for update drift

137
00:18:42,100 --> 00:18:49,700
proposed moving the device to a staged update ring not patch now hope for conditional access snafu propose a time bound

138
00:18:49,700 --> 00:18:58,500
exception with required justification reviewer evaluates groundedness did drift detector actually site the fields from graph and defender

139
00:18:58,500 --> 00:19:11,380
it checks scope device eyes policy eyes change type if anything smells like all devices it gets blocked with prejudice content safety is on input and output to prevent prompt injection from any external knowledge or pasted logs

140
00:19:11,380 --> 00:19:20,420
high risk actions like removing a conflicting local policy get pm gated the fixers identity can't self elevate approval lives in a logic app with a tidy card

141
00:19:20,420 --> 00:19:25,460
what's wrong what will change roll back steps and citations approve and fixer executes

142
00:19:25,460 --> 00:19:32,500
in tune policy updates are i'd important and targeted bit locker fix uses a runbook with explicit parameters and a sanity check

143
00:19:32,500 --> 00:19:42,180
tpm present o s supports xds as 256 ascrow path verified if any precheck fails it bails with a human friendly error and no side effects

144
00:19:42,180 --> 00:19:51,860
messenger closes the loop devices fixed a digest hits teams with counts links to app inside traces and a csv for the person who still loves excel more than their family

145
00:19:51,860 --> 00:20:00,980
if a device refuse to comply reviewer opens an exception thread with a clock and retries scheduled observability every tool call traced correlated by thread

146
00:20:00,980 --> 00:20:06,420
we lock deltas before and after so audit can see non-compliant compliant without camping in kql

147
00:20:06,420 --> 00:20:11,460
we also run an evaluation agent on post change summaries to check groundedness and clarity

148
00:20:11,460 --> 00:20:18,500
if groundedness drops or failure rates spike planar narrows retrieval reduces batch size or pauses destructive actions

149
00:20:18,500 --> 00:20:25,300
model mix slm for extraction and classification reasoning model for plans synthesis only when needed batch the nightly report

150
00:20:25,300 --> 00:20:32,020
no sonnets about a s and result bit locker is actually on update rings drift less exceptions are documented time bound and reviewable

151
00:20:32,020 --> 00:20:39,060
and nobody fixes a device by re-emaging finance laptops during quarter close architecture breakdown foundry plus

152
00:20:39,060 --> 00:20:44,740
sk plus tools plus identity let's open the hood without spilling engine oil on your quota

153
00:20:44,740 --> 00:20:52,020
think four layers that keep this sane foundry semantic kernel tools identity foundry is home base

154
00:20:52,020 --> 00:20:57,620
you define agents with strict instructions pick deployments per roll attach knowledge via azure a i search

155
00:20:57,620 --> 00:21:05,140
and register actions open api tools logic apps functions graph azure automation code interpreter

156
00:21:05,140 --> 00:21:11,060
threads run history tracing evaluations and safety live here content safety prompt shields and

157
00:21:11,060 --> 00:21:17,300
protected material checks run pre and post model so bad inputs don't cost tokens and bad outputs don't cost jobs semantic

158
00:21:17,300 --> 00:21:24,020
kernel is the conductor in code you wire the planner operator reviewer loop you register function catalogs for tools set

159
00:21:24,020 --> 00:21:30,900
retry back off and add cancellation tokens you keep prompt short push complexity into schemas and let sk decide which tool to

160
00:21:30,900 --> 00:21:38,020
call based on availability and policy if graph times out sk retries with jitter if groundedness dips it roots back to the

161
00:21:38,020 --> 00:21:45,540
planner to refine tools are the hands keep contracts strict with open api put api management in front to throttle version and

162
00:21:45,540 --> 00:21:51,300
circuit break logic apps handle approvals and long running flows functions do glue and transformation as your

163
00:21:51,300 --> 00:21:57,540
automation executes scripts with guardrails and post checks graph is your source of truth for in tune entry compliance

164
00:21:57,540 --> 00:22:03,940
and secure score azure a i search serves curated knowledge per domain no knowledge soup identity is the seatbelt

165
00:22:03,940 --> 00:22:10,580
each agent has its own managed identity monitor agents get red scopes remediators get right scopes never admin

166
00:22:10,580 --> 00:22:17,380
pm gates anything destructive elevation requires a human and expires private endpoints where data boundaries matter

167
00:22:17,380 --> 00:22:23,460
policy baselines on foundry projects enforce allowed models regions and quotas app insights and log

168
00:22:23,460 --> 00:22:30,180
analytics collect traces metrics and audits so you can answer who did what without slack archaeology models plug in through

169
00:22:30,180 --> 00:22:36,100
foundry's inference endpoint use serverless for premium reasoning manage compute for pinned open models swap

170
00:22:36,100 --> 00:22:42,020
models without rewriting code but still evaluate relevance groundedness and fluency in c i before promoting

171
00:22:42,020 --> 00:22:48,180
regions match your users data zones match your regulators put together it's a boring reliable machine agents

172
00:22:48,180 --> 00:22:55,220
defined in foundry orchestrated by sk powered by strict tools locked down by identity measured by observability

173
00:22:55,220 --> 00:23:02,900
and guarded by safety boring wins boring scales boring let's you sleep best practices ship without regret

174
00:23:02,900 --> 00:23:08,580
okay build time wisdom so you don't spend sunday undoing saturday give each agent one job planner plans

175
00:23:08,580 --> 00:23:14,340
operator operates review a reviews if an agent starts writing a heartfelt novel cut its prompt and

176
00:23:14,340 --> 00:23:19,860
give the pros to a tool keep instructions short scoped and boring you'll get fewer surprises prefer tool calls

177
00:23:19,860 --> 00:23:26,020
over long pros put contracts in open api schemas on inputs and guardrails on outputs if a tool expects device

178
00:23:26,020 --> 00:23:31,220
it send a device it not maybe this device in finance models are creative tools are not that's the

179
00:23:31,220 --> 00:23:37,140
point enforce least privilege like your job depends on it because it does separate identities read

180
00:23:37,140 --> 00:23:42,260
only for monitor write only for immediate and no identity has delete without pm elevation is

181
00:23:42,260 --> 00:23:47,060
temporary justified and logged if someone asked for permanent global admin hand them a stress ball

182
00:23:47,060 --> 00:23:51,780
and a policy doc start small two agents doer and checker at a planner wants the loop is stable

183
00:23:51,780 --> 00:23:56,020
then if you must fan out don't start with a digital scrum team of twelve that's how you recreate

184
00:23:56,020 --> 00:24:01,700
your org chart in yaml and cry mix models to cut cost s alms for extraction gbt4 class only for

185
00:24:01,700 --> 00:24:06,660
reasoning where it matters batch summaries overnight you're not paying premium rates to format a date

186
00:24:06,660 --> 00:24:12,900
string save the heavy model for decisions not narration at memory last use external state as your

187
00:24:12,900 --> 00:24:17,700
a i search or a small state store before you start stuffing context windows like a carry on bag

188
00:24:17,700 --> 00:24:22,740
thread metadata should be tiny if it's not needed for the next turn it doesn't go log everything

189
00:24:22,740 --> 00:24:28,660
turn on tracing thread IDs tool parameters latency results ship evaluations into c i so you catch

190
00:24:28,660 --> 00:24:33,300
relevance and groundedness drops before production does alert when groundedness dips not when

191
00:24:33,300 --> 00:24:38,820
twitter does guardrails before speed approvals for destructive tools dry run mode with no side effects

192
00:24:38,820 --> 00:24:44,260
staged rollouts by device group or region if your rollout plan is all at once i hope your resume is

193
00:24:44,260 --> 00:24:49,300
updated build for failure retries with jitter back off on graph throttling circuit breakers and api

194
00:24:49,300 --> 00:24:55,060
management e-dampotant runbooks if a step can be rerun safely you sleep better finally document your

195
00:24:55,060 --> 00:25:00,980
policies as data mutual exclusivity rules license bundles device baselines put them in structured form

196
00:25:00,980 --> 00:25:06,020
models can select they shouldn't invent that's how you keep agents helpful and your blast radius small

197
00:25:06,580 --> 00:25:12,260
if you remember one thing single agents help but teams of agents planned tool driven and permission

198
00:25:12,260 --> 00:25:17,540
scoped quietly transform your environment while you pretend to sleep if this hit home subscribe and

199
00:25:17,540 --> 00:25:21,780
watch the follow up where i drop the semantic kernel repo and the bicep template so you can steal

200
00:25:21,780 --> 00:25:26,180
shamelessly want the evaluation agents episode that keeps your prompts honest it's queued up next

Planning, Collaboration, Tooling: Building Multi-Agent Systems with Azure Foundry + Semantic Kernel

Listen On

Support On

Recent Episodes

Data Talk Episodes

Power Platform Talk Episodes

Security Talk Episodes

Azure Talk Episodes

Copilot Talk Episodes

Dynamics Talk Episodes

Dev Talk Episodes

M365 Talk Episodes

Browse episodes by category