The Embodied Lie: How the Speaking Agent Obscures Architectural Entropy
It sounds governed, it feels safe, and every log lines up—yet the system still does the wrong thing. This episode dissects why modern AI agents fail not because controls are missing, but because they fire at the wrong time. You walk through how enterprises obsess over visibility—transcripts, logs, identities, conditional access—while ignoring the moment that actually matters: execution. Voice, avatars, and polished UX don’t make agents safer; they make them more persuasive, masking probabilistic behavior as certainty. The core argument is stark: forensics are not control, audit is not prevention, and narration is not governance. Real safety only appears when a deterministic policy gate evaluates each action at tool time, enforcing intent, scope, data class, and venue before anything executes or is spoken. Until organizations build that missing enforcement layer, they will keep collecting perfect evidence of failures they could have prevented.
Modern AI agents don’t just act — they speak. And that voice changes how we perceive risk, control, and system integrity. In this episode, we unpack “the embodied lie”: how giving AI agents a conversational interface masks architectural drift, hides decision entropy, and creates a dangerous illusion of coherence. When systems talk fluently, we stop inspecting them. This episode explores why that’s a problem — and why no amount of UX polish, prompts, or DAX-like logic can compensate for decaying architectural intent. Key Topics Covered
- What “Architectural Entropy” Really Means
How complex systems naturally drift away from their original design — especially when governed by probabilistic agents. - The Speaking Agent Problem
Why voice, chat, and persona-driven agents create a false sense of authority, intentionality, and correctness. - Why Observability Breaks When Systems Talk
How conversational interfaces collapse multiple execution layers into a single narrative output. - The Illusion of Control
Why hearing reasons from an agent is not the same as having guarantees about system behavior. - Agents vs. Architecture
The difference between systems that decide and systems that merely explain after the fact. - Why UX Cannot Fix Structural Drift
How better prompts, better explanations, or better dashboards fail to address root architectural decay.
Key Takeaways
- A speaking agent is not transparency — it’s compression.
- Fluency increases trust while reducing scrutiny.
- Architectural intent cannot be enforced at the interaction layer.
- Systems don’t fail loudly anymore — they fail persuasively.
- If your system needs to explain itself constantly, it’s already drifting.
Who This Episode Is For
- Platform architects and system designers
- AI engineers building agent-based systems
- Security and identity professionals
- Data and analytics leaders
- Anyone skeptical of “AI copilots” as a governance strategy
Notable Quotes
- “When the system speaks, inspection stops.”
- “Explanation is not enforcement.”
- “The agent doesn’t lie — the embodiment does.”
Final Thought The future risk of AI isn’t that systems act autonomously — it’s that they sound convincing while doing so. If we don’t separate voice from architecture, we’ll keep trusting systems that can no longer prove they’re under control.
Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-show-modern-work-security-and-productivity-with-microsoft-365--6704921/support.
1
00:00:00,000 --> 00:00:02,200
At 0902, the agent signs in.
2
00:00:02,200 --> 00:00:06,880
Conditional access evaluates once, passes, and a token gets issued.
3
00:00:06,880 --> 00:00:11,000
At 0904, the meeting changes, and external guest joins a channel gets renamed
4
00:00:11,000 --> 00:00:13,200
a document link, shifts, whatever.
5
00:00:13,200 --> 00:00:14,360
Context moves.
6
00:00:14,360 --> 00:00:20,920
At 0907, the agent executes a destructive tool call anyway, inside the workload with a still valid token.
7
00:00:20,920 --> 00:00:23,240
At 0908, Perview has the transcript.
8
00:00:23,240 --> 00:00:24,760
Copilot logs have the activity.
9
00:00:24,760 --> 00:00:28,080
The identity is correct, the timestamps are correct, the story is perfect.
10
00:00:28,080 --> 00:00:32,720
Every control worked, every log is correct, and the system still did the wrong thing is
11
00:00:32,720 --> 00:00:33,600
what this is not.
12
00:00:33,600 --> 00:00:36,280
Anti-voice, anti-UX, anti-microsoft.
13
00:00:36,280 --> 00:00:37,880
This is not an anti-voice rant.
14
00:00:37,880 --> 00:00:43,320
Voice is useful, avatars are useful, accessibility matters, and real-time interaction matters.
15
00:00:43,320 --> 00:00:48,120
A speaking interface can lower friction, reduce cognitive load, and make a system usable
16
00:00:48,120 --> 00:00:50,560
for people who would never type into a chat box.
17
00:00:50,560 --> 00:00:52,760
This is also not an anti-microsoft episode.
18
00:00:52,760 --> 00:00:56,880
Microsoft has shipped real governance improvements that most platforms still don't have.
19
00:00:56,880 --> 00:01:01,320
Perview can capture transcripts, copilot studio can log activities, Entra has a clearer model
20
00:01:01,320 --> 00:01:03,840
for workload identities and non-human identities.
21
00:01:03,840 --> 00:01:06,120
Conditional access exists and it's mature.
22
00:01:06,120 --> 00:01:07,200
Those are not small things.
23
00:01:07,200 --> 00:01:10,640
That is the scaffolding you need for operating agents at enterprise scale.
24
00:01:10,640 --> 00:01:14,040
But this is the boundary line, the industry keeps refusing to say out loud.
25
00:01:14,040 --> 00:01:15,400
Forensics are not control.
26
00:01:15,400 --> 00:01:17,400
Or it tells you what happened after the fact.
27
00:01:17,400 --> 00:01:19,240
It gives you a narrative you can export.
28
00:01:19,240 --> 00:01:23,600
It helps legal, it helps incident response, it helps you argue with reality less.
29
00:01:23,600 --> 00:01:28,200
One of that prevents an allowed identity from doing a wrong thing at the moment of execution.
30
00:01:28,200 --> 00:01:30,200
And that's why the format of this episode matters.
31
00:01:30,200 --> 00:01:33,920
This isn't a tutorial on building agents, there won't be configuration walkthroughs, no
32
00:01:33,920 --> 00:01:35,400
click here demos.
33
00:01:35,400 --> 00:01:36,400
This is an autopsy.
34
00:01:36,400 --> 00:01:39,600
Claim, failure pattern, architectural cause consequence.
35
00:01:39,600 --> 00:01:41,600
Because the failure isn't that teams lack tools.
36
00:01:41,600 --> 00:01:45,360
The failure is that teams keep buying comfort instead of determinism.
37
00:01:45,360 --> 00:01:49,760
The embodied lie, trust signaling wrapped around probabilistic execution.
38
00:01:49,760 --> 00:01:50,840
Here's the embodied lie.
39
00:01:50,840 --> 00:01:54,080
A voice in a face are not features, they are trust signals.
40
00:01:54,080 --> 00:01:58,000
They're a human interface hack that makes a probabilistic system feel like a deterministic
41
00:01:58,000 --> 00:01:59,000
one.
42
00:01:59,000 --> 00:02:02,440
And the moment you add them, you change how people evaluate risk.
43
00:02:02,440 --> 00:02:05,920
The thing most people miss is that the speaking agent isn't just an agent.
44
00:02:05,920 --> 00:02:08,200
It's an execution engine, wearing a personality.
45
00:02:08,200 --> 00:02:10,200
The avatar doesn't make the agent more accurate.
46
00:02:10,200 --> 00:02:11,960
It makes the output more persuasive.
47
00:02:11,960 --> 00:02:14,880
That distinction matters because persuasion is not governance.
48
00:02:14,880 --> 00:02:17,600
In architectural terms, the agent is not a teammate.
49
00:02:17,600 --> 00:02:19,520
It is a distributed decision engine.
50
00:02:19,520 --> 00:02:23,880
It takes an input, retrieves some context, chooses a tool and executes an action.
51
00:02:23,880 --> 00:02:27,880
The choice is probabilistic, the retrieval is probabilistic, tool selection is probabilistic.
52
00:02:27,880 --> 00:02:31,480
Even when it's grounded, it's grounded in whatever it retrieved, not in what your intent
53
00:02:31,480 --> 00:02:32,480
actually was.
54
00:02:32,480 --> 00:02:36,760
Now add embodiment, low latency speech, smooth turn, taking a confident tone.
55
00:02:36,760 --> 00:02:38,040
Humans read those as competence.
56
00:02:38,040 --> 00:02:42,040
They stop asking what approved this and start accepting it sounded right.
57
00:02:42,040 --> 00:02:45,920
That's human interface trust bias doing what it always does, shifting scrutiny away from
58
00:02:45,920 --> 00:02:48,280
the control plane and onto the performance.
59
00:02:48,280 --> 00:02:50,400
That's why governance gets worse when you add a face.
60
00:02:50,400 --> 00:02:53,080
The organization starts optimizing for the experience plane.
61
00:02:53,080 --> 00:02:57,000
Prompt tweaks, persona tuning, make it sound more cautious.
62
00:02:57,000 --> 00:02:58,520
Add a confirmation question.
63
00:02:58,520 --> 00:02:59,600
Those are theater patches.
64
00:02:59,600 --> 00:03:01,440
They don't change the system's blast radius.
65
00:03:01,440 --> 00:03:02,760
They don't enforce intent.
66
00:03:02,760 --> 00:03:08,240
They don't create a deterministic gate between the agent's proposal and the platform's execution.
67
00:03:08,240 --> 00:03:12,320
This clicked for me when I watched teams celebrate transcripts as if they were safety.
68
00:03:12,320 --> 00:03:13,560
A transcript is not safety.
69
00:03:13,560 --> 00:03:15,440
A transcript is a post-incident artifact.
70
00:03:15,440 --> 00:03:16,440
It's a replay.
71
00:03:16,440 --> 00:03:17,440
It's a confession.
72
00:03:17,440 --> 00:03:20,520
You hand to counsel when the action already happened.
73
00:03:20,520 --> 00:03:23,160
The system did not become safer because it can narrate itself.
74
00:03:23,160 --> 00:03:26,600
Now, Microsoft will say the right things here and they're not wrong.
75
00:03:26,600 --> 00:03:28,760
Conditional access evaluates a token acquisition.
76
00:03:28,760 --> 00:03:30,800
Per view can capture interactions.
77
00:03:30,800 --> 00:03:31,800
Activity logs exist.
78
00:03:31,800 --> 00:03:33,280
Workload identity controls exist.
79
00:03:33,280 --> 00:03:34,280
That's the ticket booth.
80
00:03:34,280 --> 00:03:35,280
That's the camera system.
81
00:03:35,280 --> 00:03:36,280
That's the audit trail.
82
00:03:36,280 --> 00:03:41,360
But the embodied lie lives in the gap between those controls and the moment a tool called
83
00:03:41,360 --> 00:03:42,680
executes.
84
00:03:42,680 --> 00:03:45,000
Token time controls decide who can show up.
85
00:03:45,000 --> 00:03:47,560
And controls decide what is allowed to happen next.
86
00:03:47,560 --> 00:03:49,960
Most organizations build only the first one.
87
00:03:49,960 --> 00:03:53,080
Then they act surprised when the second one behaves like a suggestion.
88
00:03:53,080 --> 00:03:57,400
And this is where the speaking agent becomes an entropy generator because the more human
89
00:03:57,400 --> 00:04:01,440
it seems, the more likely you are to let it run with broad scopes, the more likely you
90
00:04:01,440 --> 00:04:06,040
are to skip segmentation, the more likely you are to accept its logged as a substitute
91
00:04:06,040 --> 00:04:07,800
for its prevented.
92
00:04:07,800 --> 00:04:12,960
Over time, you accumulate permissions, exceptions, and implicit trust until you have conditional
93
00:04:12,960 --> 00:04:13,960
chaos.
94
00:04:13,960 --> 00:04:18,160
And that behaves correctly most of the time, right up until the moment it doesn't.
95
00:04:18,160 --> 00:04:21,920
So when the agent speaks with certainty, treat that as a warning, not a reassurance.
96
00:04:21,920 --> 00:04:23,360
You are not hearing determinism.
97
00:04:23,360 --> 00:04:27,000
You are hearing probability wrapped in a voice that implies accountability.
98
00:04:27,000 --> 00:04:30,600
The control plane versus the experience plane, two timelines that don't meet.
99
00:04:30,600 --> 00:04:34,320
There are two timelines running every time an agent helps someone.
100
00:04:34,320 --> 00:04:38,320
Most organizations only instrument one of them because it's the one humans notice.
101
00:04:38,320 --> 00:04:39,880
That's the experience plane.
102
00:04:39,880 --> 00:04:43,840
It includes the chat transcript, the speaking voice, the avatar, the response latency,
103
00:04:43,840 --> 00:04:48,760
the citations, the little thinking indicator, and the meeting dynamics where nobody wants
104
00:04:48,760 --> 00:04:53,320
to slow the room down by arguing with a confident sounding assistant.
105
00:04:53,320 --> 00:04:56,440
Its perception management, its social flow, its persuasion at scale.
106
00:04:56,440 --> 00:04:59,480
The other timeline is the only one that matters when something breaks.
107
00:04:59,480 --> 00:05:01,160
That's the control plane.
108
00:05:01,160 --> 00:05:07,280
Identity issuance, token lifetime, scope, retrieval boundaries, tool invocation, side effects,
109
00:05:07,280 --> 00:05:13,720
state transitions, retry behavior, compensating actions, data class enforcement, venue enforcement.
110
00:05:13,720 --> 00:05:16,360
That's the plane where blast radius is defined.
111
00:05:16,360 --> 00:05:19,840
And the uncomfortable truth is that these two timelines don't line up.
112
00:05:19,840 --> 00:05:21,600
They rarely even touch.
113
00:05:21,600 --> 00:05:25,920
Because the platform's strongest controls tend to fire at token time while the damage
114
00:05:25,920 --> 00:05:27,800
happens at tool time.
115
00:05:27,800 --> 00:05:29,360
Condition access is a perfect example.
116
00:05:29,360 --> 00:05:30,560
It's the ticket booth.
117
00:05:30,560 --> 00:05:32,080
It answers a narrow question.
118
00:05:32,080 --> 00:05:35,360
Should this identity get a token right now under current conditions?
119
00:05:35,360 --> 00:05:40,520
It can evaluate signals, risk, device posture, location, it can deny, it can require stronger
120
00:05:40,520 --> 00:05:41,520
auth.
121
00:05:41,520 --> 00:05:42,520
That is real control.
122
00:05:42,520 --> 00:05:45,240
If the token exists, the train leaves the station.
123
00:05:45,240 --> 00:05:46,920
Now the system is in the workload.
124
00:05:46,920 --> 00:05:51,400
Tool selection happens, data gets read, rights happen, shares happen, deletes happen, and
125
00:05:51,400 --> 00:05:54,920
the control plane is often no longer in the loop in a deterministic way.
126
00:05:54,920 --> 00:05:59,120
You've moved from who may show up to what is happening, and most enterprises have no enforcement
127
00:05:59,120 --> 00:06:00,120
point in the middle.
128
00:06:00,120 --> 00:06:02,720
Per view is the other half of the same mismatch.
129
00:06:02,720 --> 00:06:04,640
Per view is the security camera system.
130
00:06:04,640 --> 00:06:09,000
It records, it correlates, it lets you do forensics after the fact, it's useful, and
131
00:06:09,000 --> 00:06:10,000
it's getting better.
132
00:06:10,000 --> 00:06:11,520
But cameras do not stop the train.
133
00:06:11,520 --> 00:06:14,440
They just help you reconstruct which door was forced and when.
134
00:06:14,440 --> 00:06:18,400
The reason this gap keeps surprising people is that the experience plane looks like control.
135
00:06:18,400 --> 00:06:19,520
The agent speaks calmly.
136
00:06:19,520 --> 00:06:20,520
It cites a document.
137
00:06:20,520 --> 00:06:22,200
It says, based on policy.
138
00:06:22,200 --> 00:06:25,660
It feels governed, and because it feels governed, people assume the control plane must have
139
00:06:25,660 --> 00:06:26,660
approved it.
140
00:06:26,660 --> 00:06:28,120
That assumption is false.
141
00:06:28,120 --> 00:06:30,760
A citation is not an authorization decision.
142
00:06:30,760 --> 00:06:32,960
A transcript is not a policy evaluation.
143
00:06:32,960 --> 00:06:35,840
A token issuance event is not a per-action gate.
144
00:06:35,840 --> 00:06:38,520
If you want a mental model, you can hold in your head, use the rail system.
145
00:06:38,520 --> 00:06:40,280
The ticket booth is conditional access.
146
00:06:40,280 --> 00:06:42,240
It can stop someone from entering the station.
147
00:06:42,240 --> 00:06:45,800
It cannot stop them from pulling the emergency brake once they're on the train.
148
00:06:45,800 --> 00:06:46,960
The cameras are per view.
149
00:06:46,960 --> 00:06:48,840
They can tell you which car it happened in.
150
00:06:48,840 --> 00:06:50,320
They cannot prevent the derailment.
151
00:06:50,320 --> 00:06:52,240
The missing role is the guard on the train.
152
00:06:52,240 --> 00:06:56,880
The deterministic policy gate that evaluates each action at the moment it is about to execute.
153
00:06:56,880 --> 00:06:59,640
And that's the heart of the architectural lie.
154
00:06:59,640 --> 00:07:04,160
Organizations keep building governance around artifacts that exist before and after execution,
155
00:07:04,160 --> 00:07:05,440
but not at execution.
156
00:07:05,440 --> 00:07:08,240
So you get beautiful audit trails and ugly outcomes.
157
00:07:08,240 --> 00:07:11,520
This also explains why embodiment makes the problem worse.
158
00:07:11,520 --> 00:07:15,000
The more polished the experience plane becomes, the more it masks the absence of control plane
159
00:07:15,000 --> 00:07:16,000
enforcement.
160
00:07:16,000 --> 00:07:18,480
The organization feels safer because it can see more.
161
00:07:18,480 --> 00:07:21,640
But visibility without gating is just higher resolution regret.
162
00:07:21,640 --> 00:07:26,640
Once you separate the two planes, you stop arguing about whether the platform has governance.
163
00:07:26,640 --> 00:07:27,640
It does.
164
00:07:27,640 --> 00:07:31,120
You start arguing about where in the timeline governance actually applies.
165
00:07:31,120 --> 00:07:35,080
And you stop treating that as semantics because timing is where incidents live.
166
00:07:35,080 --> 00:07:38,960
Token time control without tool time control is a polite front door with no locks inside
167
00:07:38,960 --> 00:07:40,280
the building.
168
00:07:40,280 --> 00:07:43,680
What Microsoft gets right and why it still doesn't save you.
169
00:07:43,680 --> 00:07:45,160
Microsoft is not asleep at the wheel here.
170
00:07:45,160 --> 00:07:49,840
That's what makes this harder because the comfortable critique is the platform is immature.
171
00:07:49,840 --> 00:07:50,840
It isn't.
172
00:07:50,840 --> 00:07:54,360
The uncomfortable critique is that the platform is improving in the places enterprises
173
00:07:54,360 --> 00:07:58,800
like to measure while the failure happens in the place they avoid designing.
174
00:07:58,800 --> 00:07:59,800
Start with purview.
175
00:07:59,800 --> 00:08:03,800
Getting co-pided conversations into a compliant surface is real progress.
176
00:08:03,800 --> 00:08:05,480
Change the nature of investigations.
177
00:08:05,480 --> 00:08:06,480
They give you a timeline.
178
00:08:06,480 --> 00:08:09,320
They give you a record of what was asked and what was answered.
179
00:08:09,320 --> 00:08:14,800
They also give you a way to correlate that conversation with a user identity and increasingly
180
00:08:14,800 --> 00:08:18,080
with the sources the system touched that closes a lot of the old.
181
00:08:18,080 --> 00:08:19,880
We have no idea what it did problem.
182
00:08:19,880 --> 00:08:25,360
Co-pilot studio logging is the same category of win activity logging tool invocation traces.
183
00:08:25,360 --> 00:08:29,960
The ability to see what actions were taken and when again real for operations teams that's
184
00:08:29,960 --> 00:08:33,800
better than folklore and screen recordings it turns agent behavior into something you can
185
00:08:33,800 --> 00:08:34,800
query.
186
00:08:34,800 --> 00:08:35,800
Now identity.
187
00:08:35,800 --> 00:08:39,000
Interest framing of workload identities and non-human identities is exactly where this
188
00:08:39,000 --> 00:08:40,000
should go.
189
00:08:40,000 --> 00:08:41,000
An agent is not a user.
190
00:08:41,000 --> 00:08:42,160
It is not an intern.
191
00:08:42,160 --> 00:08:47,080
It is a workload identity with automation privileges and treating it as such is the first
192
00:08:47,080 --> 00:08:48,600
admission of reality.
193
00:08:48,600 --> 00:08:51,480
Conditional access applying to those identities matters.
194
00:08:51,480 --> 00:08:54,920
Token issuance becomes conditional signals driven and enforceable.
195
00:08:54,920 --> 00:08:55,920
Risk goes up.
196
00:08:55,920 --> 00:08:57,160
Token issuance gets blocked.
197
00:08:57,160 --> 00:08:58,400
Device posture is wrong.
198
00:08:58,400 --> 00:08:59,880
Token issuance gets blocked.
199
00:08:59,880 --> 00:09:02,120
Token issuance gets blocked.
200
00:09:02,120 --> 00:09:03,720
You can make the front door real.
201
00:09:03,720 --> 00:09:07,640
And there's also continuous access evaluation sitting in the background as Microsoft's answer
202
00:09:07,640 --> 00:09:09,960
to context changes after sign in.
203
00:09:09,960 --> 00:09:13,960
It's an attempt to reduce the time lag between a changing risk posture and what the token
204
00:09:13,960 --> 00:09:15,160
is allowed to do.
205
00:09:15,160 --> 00:09:16,160
That direction is correct.
206
00:09:16,160 --> 00:09:20,640
You can't keep treating authentication as a one time ceremony in a world where sessions
207
00:09:20,640 --> 00:09:22,320
persist and context drift.
208
00:09:22,320 --> 00:09:23,520
All of that is necessary.
209
00:09:23,520 --> 00:09:25,040
All of it is still insufficient.
210
00:09:25,040 --> 00:09:26,760
Here's the boundary you don't get to hand wave.
211
00:09:26,760 --> 00:09:30,200
These controls mostly operate at token time and after execution.
212
00:09:30,200 --> 00:09:34,440
They don't operate at action time inside the tool call path with deterministic intent
213
00:09:34,440 --> 00:09:35,440
enforcement.
214
00:09:35,440 --> 00:09:36,600
Per view tells you what happened.
215
00:09:36,600 --> 00:09:38,960
It does not decide what is allowed to happen next.
216
00:09:38,960 --> 00:09:43,560
Conditional access decides whether an identity should be issued a token under current conditions.
217
00:09:43,560 --> 00:09:48,800
It does not evaluate whether a specific delete, share or send is appropriate given the intent
218
00:09:48,800 --> 00:09:53,680
of the request, the sensitivity of the target and the venue in which the result will be exposed.
219
00:09:53,680 --> 00:09:57,920
That distinction matters because enterprise harm rarely looks like the agent got global
220
00:09:57,920 --> 00:09:59,400
admin.
221
00:09:59,400 --> 00:10:02,320
Microsoft has already blocked a lot of those extremes for agent identities.
222
00:10:02,320 --> 00:10:07,520
The real harm looks like the agent had legitimate right access in the wrong place or the agent
223
00:10:07,520 --> 00:10:10,800
retrieved legitimate data and disclosed it in the wrong venue.
224
00:10:10,800 --> 00:10:12,360
And those are action time failures.
225
00:10:12,360 --> 00:10:14,400
If you want to hear the gap, walk the timeline.
226
00:10:14,400 --> 00:10:15,720
Agent signs in.
227
00:10:15,720 --> 00:10:17,960
Conditional access evaluates token issued.
228
00:10:17,960 --> 00:10:18,960
Fine.
229
00:10:18,960 --> 00:10:19,960
Agent retrieves documents.
230
00:10:19,960 --> 00:10:20,960
It is entitled to retrieve.
231
00:10:20,960 --> 00:10:21,960
Fine.
232
00:10:21,960 --> 00:10:22,960
Transcript gets captured.
233
00:10:22,960 --> 00:10:24,840
The agents get recorded fine.
234
00:10:24,840 --> 00:10:29,000
Now the agent proposes a tool call, delete a site, share a file, post a message, send an
235
00:10:29,000 --> 00:10:31,200
email, trigger a workflow.
236
00:10:31,200 --> 00:10:35,440
Where is the deterministic policy gate that evaluates that proposed action against intent,
237
00:10:35,440 --> 00:10:38,840
scope, data classification and venue before the tool executes?
238
00:10:38,840 --> 00:10:40,440
In most deployments it isn't there.
239
00:10:40,440 --> 00:10:42,520
The platform gave you the ticket booth and the cameras.
240
00:10:42,520 --> 00:10:45,120
It did not automatically give you a guard on the train.
241
00:10:45,120 --> 00:10:48,640
And because those Microsoft controls exist, organizations stop designing.
242
00:10:48,640 --> 00:10:50,320
They assume governance is covered.
243
00:10:50,320 --> 00:10:53,120
They feel safe because they can export transcripts.
244
00:10:53,120 --> 00:10:56,400
They feel safe because conditional access policies look mature.
245
00:10:56,400 --> 00:11:00,960
They feel safe because the agent has an identity object and identities feel like control.
246
00:11:00,960 --> 00:11:02,880
But control is not a directory object.
247
00:11:02,880 --> 00:11:04,160
Control is an enforcement point.
248
00:11:04,160 --> 00:11:05,880
So yes, praise the forensics.
249
00:11:05,880 --> 00:11:07,160
Praise the identity model.
250
00:11:07,160 --> 00:11:08,160
Praise the growing observability.
251
00:11:08,160 --> 00:11:10,120
And those are the raw materials you need.
252
00:11:10,120 --> 00:11:13,520
Then say the sentence that forces the architectural truth into the room.
253
00:11:13,520 --> 00:11:15,360
Microsoft has significantly improved visibility.
254
00:11:15,360 --> 00:11:18,200
They have not eliminated non-deterministic execution.
255
00:11:18,200 --> 00:11:23,000
Once you accept that, you stop asking the platform to save you with more logs and you start building
256
00:11:23,000 --> 00:11:24,480
the missing thing.
257
00:11:24,480 --> 00:11:28,280
Action time, per tool call determinism.
258
00:11:28,280 --> 00:11:29,760
Audit provenance policy gate.
259
00:11:29,760 --> 00:11:32,360
Here is the trilogy that keeps getting blurred on purpose.
260
00:11:32,360 --> 00:11:33,880
Audit is a record of what happened.
261
00:11:33,880 --> 00:11:37,800
Who asked what the agent said, which identity executed, which file got touched, which API
262
00:11:37,800 --> 00:11:39,440
got called, what time it happened?
263
00:11:39,440 --> 00:11:40,440
It's a timeline.
264
00:11:40,440 --> 00:11:41,440
It's useful.
265
00:11:41,440 --> 00:11:43,080
It's also inherently retrospective.
266
00:11:43,080 --> 00:11:46,320
Audit is the black box flight recorder you consult after the impact.
267
00:11:46,320 --> 00:11:48,880
It doesn't change the trajectory of the plane.
268
00:11:48,880 --> 00:11:51,920
Provenance is the missing middle that most teams pretend is nice to have.
269
00:11:51,920 --> 00:11:53,920
Provenance is not the transcript.
270
00:11:53,920 --> 00:11:57,720
Provenance is the decision chain, which chunks were retrieved, which candidates were considered
271
00:11:57,720 --> 00:12:02,560
and rejected, which tool options were available, which constraints were applied, and what caused
272
00:12:02,560 --> 00:12:04,080
the final selection.
273
00:12:04,080 --> 00:12:08,160
It is the explanation graph that ties inputs to outputs in a way that survives an incident
274
00:12:08,160 --> 00:12:09,160
review.
275
00:12:09,160 --> 00:12:12,360
Without provenance, you don't know why the agent did what it did.
276
00:12:12,360 --> 00:12:13,840
You only know that it did it.
277
00:12:13,840 --> 00:12:15,840
And then there's the part that prevents harm.
278
00:12:15,840 --> 00:12:16,840
The policy gate.
279
00:12:16,840 --> 00:12:20,560
A policy gate is a deterministic decision point that runs before execution.
280
00:12:20,560 --> 00:12:25,640
It evaluates a structured request against policy and authoritative state and returns, allow,
281
00:12:25,640 --> 00:12:26,640
deny or transform.
282
00:12:26,640 --> 00:12:28,160
It is not a prompt instruction.
283
00:12:28,160 --> 00:12:29,160
It is not a persona.
284
00:12:29,160 --> 00:12:31,360
It is not a please ask for confirmation.
285
00:12:31,360 --> 00:12:32,640
It is an enforcement layer.
286
00:12:32,640 --> 00:12:34,600
The agent cannot bypass.
287
00:12:34,600 --> 00:12:35,920
Most enterprises have ordered.
288
00:12:35,920 --> 00:12:37,520
Some have fragments of provenance.
289
00:12:37,520 --> 00:12:38,760
Almost none have a real gate.
290
00:12:38,760 --> 00:12:43,040
That distinction matters because your worst failures happen in the gap between entitled
291
00:12:43,040 --> 00:12:44,360
and appropriate.
292
00:12:44,360 --> 00:12:48,000
The agent can be entitled to read a document and still be wrong to disclose it in that
293
00:12:48,000 --> 00:12:49,000
venue.
294
00:12:49,000 --> 00:12:52,840
The agent can be entitled to write and still be wrong to write here now in that way.
295
00:12:52,840 --> 00:12:56,160
Audit will happily record the wrong thing with perfect fidelity.
296
00:12:56,160 --> 00:12:58,160
Provenance helps you argue with reality less.
297
00:12:58,160 --> 00:13:00,120
It tells you how you arrived at the bad action.
298
00:13:00,120 --> 00:13:03,680
It's what you need when a regulator asks, why did the system decide this?
299
00:13:03,680 --> 00:13:06,320
And your only other answer is, it felt right.
300
00:13:06,320 --> 00:13:09,760
Provenance turns post mortems from fan fiction into analysis, but provenance still doesn't
301
00:13:09,760 --> 00:13:11,000
prevent the incident.
302
00:13:11,000 --> 00:13:12,080
Only a gate does.
303
00:13:12,080 --> 00:13:13,960
And the thing most people miss is timing.
304
00:13:13,960 --> 00:13:17,680
The strongest built in controls are mostly outside the action path.
305
00:13:17,680 --> 00:13:19,680
Conditional access happens at token acquisition.
306
00:13:19,680 --> 00:13:23,480
Per view happens after the fact those are important controls, but they are not action time
307
00:13:23,480 --> 00:13:24,720
authorization.
308
00:13:24,720 --> 00:13:28,680
So here's what audit provenance policy gate looks like on a real timeline.
309
00:13:28,680 --> 00:13:32,600
The user asks the agent to do something in the agent retrieves context.
310
00:13:32,600 --> 00:13:34,160
It compiles candidates.
311
00:13:34,160 --> 00:13:35,520
It selects a tool.
312
00:13:35,520 --> 00:13:38,600
In a safe architecture, there's a hard boundary right there.
313
00:13:38,600 --> 00:13:41,680
The agent submits a request, not an imperative.
314
00:13:41,680 --> 00:13:45,600
Data, intent, scope, data class, venue and an operation ID.
315
00:13:45,600 --> 00:13:49,840
The policy engine evaluates those attributes against rules and authoritative state, produces
316
00:13:49,840 --> 00:13:53,000
a decision artifact and only then does execution happen.
317
00:13:53,000 --> 00:13:55,800
And the decision artifact gets stored next to the action.
318
00:13:55,800 --> 00:13:59,560
That last part is what makes governance real, because the artifact is proof, not narrative.
319
00:13:59,560 --> 00:14:00,560
You can sample it.
320
00:14:00,560 --> 00:14:01,560
You can query it.
321
00:14:01,560 --> 00:14:02,560
You can show it in an audit.
322
00:14:02,560 --> 00:14:07,720
A loud under rule, D104, constraints C17 based on state version 6.
323
00:14:07,720 --> 00:14:11,000
Or denied under rule V302 due to mixed audience.
324
00:14:11,000 --> 00:14:13,160
This is what prevention looks like when it's measurable.
325
00:14:13,160 --> 00:14:15,800
Now, the obvious pushback is, but we have transcripts.
326
00:14:15,800 --> 00:14:17,000
We have citations.
327
00:14:17,000 --> 00:14:18,000
We have activity logs.
328
00:14:18,000 --> 00:14:19,000
Isn't that provenance?
329
00:14:19,000 --> 00:14:20,000
No.
330
00:14:20,000 --> 00:14:22,440
Transcripts are experienced playing narration.
331
00:14:22,440 --> 00:14:24,200
Citations are retrieval references.
332
00:14:24,200 --> 00:14:25,360
Activity logs are event records.
333
00:14:25,360 --> 00:14:26,360
They are necessary.
334
00:14:26,360 --> 00:14:27,360
They are not sufficient.
335
00:14:27,360 --> 00:14:28,880
They do not tell you what was excluded.
336
00:14:28,880 --> 00:14:31,200
They do not tell you what alternatives were rejected.
337
00:14:31,200 --> 00:14:35,080
They do not tell you whether a policy evaluated the action before execution.
338
00:14:35,080 --> 00:14:37,680
They do not tell you whether the system could have stopped itself.
339
00:14:37,680 --> 00:14:40,960
If you remember, nothing else from this section, keep this ordering straight.
340
00:14:40,960 --> 00:14:42,400
It explains what happened.
341
00:14:42,400 --> 00:14:44,680
Provenance explains why that path was taken.
342
00:14:44,680 --> 00:14:47,840
A policy gate decides whether it's allowed to happen at all.
343
00:14:47,840 --> 00:14:52,000
And when you add a face and a voice, you increase the probability that your organization
344
00:14:52,000 --> 00:14:54,120
confuses the first two for the third.
345
00:14:54,120 --> 00:14:58,120
Case study 1, mis-scoped tool call, deletes the wrong sharepoint side.
346
00:14:58,120 --> 00:15:02,320
Here's the first failure pattern because it's the one that keeps happening quietly in enterprises
347
00:15:02,320 --> 00:15:04,160
that believe were governed.
348
00:15:04,160 --> 00:15:08,160
A productivity team rolls out an agent to clean up obsolete project sites.
349
00:15:08,160 --> 00:15:09,480
The brief sounds harmless.
350
00:15:09,480 --> 00:15:11,280
The agent is grounded in sharepoint.
351
00:15:11,280 --> 00:15:15,680
It can read site metadata, pass a tracker spreadsheet, and it has Microsoft graph write
352
00:15:15,680 --> 00:15:19,720
access because eventually it needs to delete or archive things.
353
00:15:19,720 --> 00:15:21,240
The organization is proud.
354
00:15:21,240 --> 00:15:23,920
It's using a dedicated workload identity.
355
00:15:23,920 --> 00:15:26,600
Conditional access is enforced and purview capture is enabled.
356
00:15:26,600 --> 00:15:29,080
At 0902, the agent authenticates.
357
00:15:29,080 --> 00:15:30,880
Conditional access evaluates and passes.
358
00:15:30,880 --> 00:15:31,880
A token is issued.
359
00:15:31,880 --> 00:15:32,880
No anomaly.
360
00:15:32,880 --> 00:15:33,880
No risk event.
361
00:15:33,880 --> 00:15:35,880
This is what good looks like.
362
00:15:35,880 --> 00:15:40,160
At 0905, a user asks, "Can you remove the old project spaces from last year?"
363
00:15:40,160 --> 00:15:42,640
The active list is in the project's archive tracker.
364
00:15:42,640 --> 00:15:44,400
Now the agent does what agents do.
365
00:15:44,400 --> 00:15:45,400
It retrieves context.
366
00:15:45,400 --> 00:15:46,480
It reads the tracker.
367
00:15:46,480 --> 00:15:48,520
It searches for sites with similar names.
368
00:15:48,520 --> 00:15:49,720
It weighs signals.
369
00:15:49,720 --> 00:15:54,440
Last modified date, owner, whether a team's channel exists, whether there are recent files,
370
00:15:54,440 --> 00:15:58,680
maybe a week hint, from an email thread, none of those are authoritative truth.
371
00:15:58,680 --> 00:15:59,680
They're clues.
372
00:15:59,680 --> 00:16:00,680
Then it makes the choice.
373
00:16:00,680 --> 00:16:02,640
It selects a site that looks obsolete.
374
00:16:02,640 --> 00:16:03,800
And it calls the tool.
375
00:16:03,800 --> 00:16:06,520
It executes a graph delete on the wrong sharepoint site.
376
00:16:06,520 --> 00:16:07,520
Nothing exotic happened here.
377
00:16:07,520 --> 00:16:08,520
No prompt injection.
378
00:16:08,520 --> 00:16:10,120
No compromised credential.
379
00:16:10,120 --> 00:16:11,680
No global admin role.
380
00:16:11,680 --> 00:16:16,520
This is normal probabilistic selection, acting at machine speed, with standing right scopes.
381
00:16:16,520 --> 00:16:18,800
Now look at what your governance artifacts say.
382
00:16:18,800 --> 00:16:20,200
Purview will show an interaction.
383
00:16:20,200 --> 00:16:21,720
It will show the user request.
384
00:16:21,720 --> 00:16:23,520
It will show the agent's response.
385
00:16:23,520 --> 00:16:24,520
You will see timestamps.
386
00:16:24,520 --> 00:16:25,920
You will see the agent identity.
387
00:16:25,920 --> 00:16:29,920
You may see citations pointing to the tracker and maybe a policy doc.
388
00:16:29,920 --> 00:16:33,600
And you will see an activity a site was deleted by that agent identity.
389
00:16:33,600 --> 00:16:34,600
Everything is correct.
390
00:16:34,600 --> 00:16:37,640
And none of it answers the question that matters in the incident review.
391
00:16:37,640 --> 00:16:38,800
Why that site?
392
00:16:38,800 --> 00:16:40,880
Not the narrative because it was obsolete.
393
00:16:40,880 --> 00:16:42,360
The actual decision chain.
394
00:16:42,360 --> 00:16:46,000
Which retrieved chunk, pushed it over the threshold, which alternative candidates were
395
00:16:46,000 --> 00:16:47,840
considered and rejected.
396
00:16:47,840 --> 00:16:51,040
What eligibility rule was evaluated at the moment of execution?
397
00:16:51,040 --> 00:16:54,920
In most deployments, the answer is, no eligibility rule was evaluated.
398
00:16:54,920 --> 00:16:56,760
The agent inferred eligibility.
399
00:16:56,760 --> 00:17:00,760
That inference became an action because the tool was callable and the token was valid.
400
00:17:00,760 --> 00:17:02,200
Or it gave you a story.
401
00:17:02,200 --> 00:17:03,680
It did not give you prevention.
402
00:17:03,680 --> 00:17:07,360
And the worst part is how the post-incident conversation usually goes because it's always
403
00:17:07,360 --> 00:17:09,120
experience plane thinking.
404
00:17:09,120 --> 00:17:10,120
We'll improve the prompt.
405
00:17:10,120 --> 00:17:11,840
We'll add a confirmation step.
406
00:17:11,840 --> 00:17:13,960
We'll tell users to be more specific.
407
00:17:13,960 --> 00:17:15,280
Those are all entropy generators.
408
00:17:15,280 --> 00:17:18,800
They add more conditional branches, more human confusion and more opportunity for the
409
00:17:18,800 --> 00:17:21,280
agent to interpret a suggestion as a command.
410
00:17:21,280 --> 00:17:25,080
The architectural fix is boring and it works because it doesn't require belief.
411
00:17:25,080 --> 00:17:26,080
First, idempotency.
412
00:17:26,080 --> 00:17:30,120
Every destructive request carries an operation ID persisted before execution.
413
00:17:30,120 --> 00:17:33,560
The same request is replayed, retried, duplicated or reordered.
414
00:17:33,560 --> 00:17:37,240
The system returns the prior outcome and does not re-execute side effects.
415
00:17:37,240 --> 00:17:40,760
That turns event-driven unreliability into safe replay.
416
00:17:40,760 --> 00:17:41,920
Second, authoritative state.
417
00:17:41,920 --> 00:17:43,720
Eligible for deletion is not a vibe.
418
00:17:43,720 --> 00:17:46,360
It's a state property stored in a system of record.
419
00:17:46,360 --> 00:17:50,000
If the authoritative catalog says retired through dormant 90 days and owner approved
420
00:17:50,000 --> 00:17:51,720
true, then the site can be deleted.
421
00:17:51,720 --> 00:17:53,440
If not, the site cannot be deleted.
422
00:17:53,440 --> 00:17:55,280
The agent does not get to negotiate that.
423
00:17:55,280 --> 00:17:57,280
Third, the policy gate.
424
00:17:57,280 --> 00:18:00,080
Before the delete tool executes, the agent submits a structure.
425
00:18:00,080 --> 00:18:01,080
Request.
426
00:18:01,080 --> 00:18:02,080
Actor.
427
00:18:02,080 --> 00:18:03,080
Intent.
428
00:18:03,080 --> 00:18:04,080
Delete.
429
00:18:04,080 --> 00:18:05,080
Scope.
430
00:18:05,080 --> 00:18:06,080
Side.
431
00:18:06,080 --> 00:18:07,080
ID.
432
00:18:07,080 --> 00:18:08,080
Data class.
433
00:18:08,080 --> 00:18:09,080
Venue.
434
00:18:09,080 --> 00:18:10,080
Operation.
435
00:18:10,080 --> 00:18:11,080
ID.
436
00:18:11,080 --> 00:18:12,080
The policy engine evaluates that request against rules and authoritative state and returns allow,
437
00:18:12,080 --> 00:18:13,080
deny or transform.
438
00:18:13,080 --> 00:18:16,080
If it denies, the tool never sees the request.
439
00:18:16,080 --> 00:18:19,080
If it allows, the decision artifact is stored next to the action.
440
00:18:19,080 --> 00:18:21,440
Now, replay the same scenario under that model.
441
00:18:21,440 --> 00:18:22,840
The agent compiles candidates.
442
00:18:22,840 --> 00:18:24,520
It proposes the wrong side.
443
00:18:24,520 --> 00:18:28,160
The policy engine evaluates the proposal against the authoritative catalog.
444
00:18:28,160 --> 00:18:30,160
The wrong side fails eligibility.
445
00:18:30,160 --> 00:18:31,160
Deny.
446
00:18:31,160 --> 00:18:32,160
The user still gets a transcript.
447
00:18:32,160 --> 00:18:33,480
The activity logs still exist.
448
00:18:33,480 --> 00:18:37,320
The difference is that your incident is now a denied decision, not a post-mortem.
449
00:18:37,320 --> 00:18:41,080
That's what audit provenance policy gate means operationally.
450
00:18:41,080 --> 00:18:42,840
Audit will always be perfect after the damage.
451
00:18:42,840 --> 00:18:44,880
A gate makes the damage never happen.
452
00:18:44,880 --> 00:18:46,560
Case study 2.
453
00:18:46,560 --> 00:18:47,560
Compliant retrieval.
454
00:18:47,560 --> 00:18:49,560
Policy violation via voice in a meeting.
455
00:18:49,560 --> 00:18:53,600
Now move from wrong target to the failure that governance teams hate because it breaks
456
00:18:53,600 --> 00:18:55,760
all their comfortable categories.
457
00:18:55,760 --> 00:18:56,760
Everything is entitled.
458
00:18:56,760 --> 00:18:57,760
Everything is logged.
459
00:18:57,760 --> 00:18:58,760
Correct.
460
00:18:58,760 --> 00:19:00,080
And it's still unacceptable.
461
00:19:00,080 --> 00:19:03,560
An HR assistant agent gets deployed into team's meetings.
462
00:19:03,560 --> 00:19:08,400
It's grounded on policy documents, FAQ's, compensation guidance and a curated SharePoint
463
00:19:08,400 --> 00:19:11,080
library managed by the compensation team.
464
00:19:11,080 --> 00:19:12,520
The pitch sounds responsible.
465
00:19:12,520 --> 00:19:13,520
The agent is read only.
466
00:19:13,520 --> 00:19:15,080
It's not writing anywhere.
467
00:19:15,080 --> 00:19:17,480
And it's meant to reduce interruptions in live calls.
468
00:19:17,480 --> 00:19:18,760
A director asks a question.
469
00:19:18,760 --> 00:19:19,760
The agent answers.
470
00:19:19,760 --> 00:19:20,760
Everyone moves on.
471
00:19:20,760 --> 00:19:22,120
The identity model looks clean.
472
00:19:22,120 --> 00:19:24,400
It runs under a workload identity.
473
00:19:24,400 --> 00:19:26,120
Conditional access protects token issuance.
474
00:19:26,120 --> 00:19:29,240
Per view is configured to capture conversation transcripts.
475
00:19:29,240 --> 00:19:30,920
Copilot activity logs are enabled.
476
00:19:30,920 --> 00:19:33,080
From a governance standpoint it checks boxes.
477
00:19:33,080 --> 00:19:34,080
Then the meeting happens.
478
00:19:34,080 --> 00:19:37,160
A director asks, what are the employee trends this quarter?
479
00:19:37,160 --> 00:19:38,720
That question is vague on purpose.
480
00:19:38,720 --> 00:19:43,240
Humans ask vague questions in meetings because they don't want to specify constraints out loud.
481
00:19:43,240 --> 00:19:46,400
They assume the audience understands the implied boundaries.
482
00:19:46,400 --> 00:19:49,120
Don't mention anything sensitive in front of externals.
483
00:19:49,120 --> 00:19:50,120
Keep it high level.
484
00:19:50,120 --> 00:19:53,120
Don't surface anything that can be misinterpreted or forwarded.
485
00:19:53,120 --> 00:19:54,800
The agent does not have those instincts.
486
00:19:54,800 --> 00:19:56,240
It does what it was built to do.
487
00:19:56,240 --> 00:19:57,240
It retrieves.
488
00:19:57,240 --> 00:19:58,240
It aggregates.
489
00:19:58,240 --> 00:19:59,240
It summarizes.
490
00:19:59,240 --> 00:20:01,680
It picks numbers because numbers sound authoritative.
491
00:20:01,680 --> 00:20:03,680
It synthesizes a clean verbal answer.
492
00:20:03,680 --> 00:20:04,840
And it says it out loud.
493
00:20:04,840 --> 00:20:07,920
Maybe it mentions compensation movement by level in region.
494
00:20:07,920 --> 00:20:10,280
Maybe it reports internal mobility rates.
495
00:20:10,280 --> 00:20:14,600
Maybe it references subgroup deltas because the underlying documents include those charts.
496
00:20:14,600 --> 00:20:15,600
No names.
497
00:20:15,600 --> 00:20:17,320
No row level PII.
498
00:20:17,320 --> 00:20:18,960
No single record disclosure.
499
00:20:18,960 --> 00:20:20,760
Still a policy violation.
500
00:20:20,760 --> 00:20:22,760
Because the harm here isn't access.
501
00:20:22,760 --> 00:20:24,520
The harm is venue.
502
00:20:24,520 --> 00:20:26,960
The harm is aggregation.
503
00:20:26,960 --> 00:20:29,080
The meeting includes external participants.
504
00:20:29,080 --> 00:20:33,200
Vendors, a partner org, someone dialing in from an unfamiliar domain that happens constantly
505
00:20:33,200 --> 00:20:34,920
in modern enterprises.
506
00:20:34,920 --> 00:20:36,880
Teams meetings are porous by default.
507
00:20:36,880 --> 00:20:39,440
The audience boundary shifts in real time.
508
00:20:39,440 --> 00:20:42,160
And the agent, because it is speaking, becomes an egress path.
509
00:20:42,160 --> 00:20:46,120
Now look at the telemetry and watch how it fails you while remaining technically correct.
510
00:20:46,120 --> 00:20:47,320
Per view shows the transcript.
511
00:20:47,320 --> 00:20:48,800
The question and the answer are there.
512
00:20:48,800 --> 00:20:51,240
The citations point to the right HR library documents.
513
00:20:51,240 --> 00:20:52,880
The agent identity is valid.
514
00:20:52,880 --> 00:20:56,240
The user who asked the question is entitled to the documents.
515
00:20:56,240 --> 00:20:57,800
The share point permissions are correct.
516
00:20:57,800 --> 00:20:59,440
The retrieval was security trimmed.
517
00:20:59,440 --> 00:21:01,240
All the traditional controls passed.
518
00:21:01,240 --> 00:21:02,480
So what exactly was missing?
519
00:21:02,480 --> 00:21:05,360
The policy evaluation that should have happened at speech time.
520
00:21:05,360 --> 00:21:09,320
Nobody asked a deterministic question like, is it permissible to verbalize this class of
521
00:21:09,320 --> 00:21:13,080
information at this aggregation level in this venue to this audience?
522
00:21:13,080 --> 00:21:15,720
Because speech was treated as output, not action.
523
00:21:15,720 --> 00:21:16,720
This is the trap.
524
00:21:16,720 --> 00:21:18,840
Teams treat tool calls like actions.
525
00:21:18,840 --> 00:21:22,320
Graph rights, deletes, shares, but they treat speech like harmless UI.
526
00:21:22,320 --> 00:21:23,160
It is not.
527
00:21:23,160 --> 00:21:25,040
In a meeting speech is publication.
528
00:21:25,040 --> 00:21:28,000
It leaves the system boundary the moment it hits the room.
529
00:21:28,000 --> 00:21:29,080
People repeat it.
530
00:21:29,080 --> 00:21:30,080
Screen shots happen.
531
00:21:30,080 --> 00:21:31,640
Someone says, can you send that to me?
532
00:21:31,640 --> 00:21:32,760
And now it's in chat.
533
00:21:32,760 --> 00:21:37,240
The output becomes durable even if the data never left share point at the file level.
534
00:21:37,240 --> 00:21:40,640
Your deal-p policies can stay green while your policy posture goes red.
535
00:21:40,640 --> 00:21:44,360
And because the agent sounds calm and competent, nobody interrupts it.
536
00:21:44,360 --> 00:21:47,440
Human interface trust bias turns the meeting into an amplifier.
537
00:21:47,440 --> 00:21:51,000
The agent just shipped an aggregation to a mixed audience at machine speed, wrapped in
538
00:21:51,000 --> 00:21:52,800
a tone that implies permission.
539
00:21:52,800 --> 00:21:54,560
Now the fix again isn't band voice.
540
00:21:54,560 --> 00:21:56,760
The fix is to treat voice as a tool call.
541
00:21:56,760 --> 00:22:00,640
Before the agent speaks, you classify the output, not just the input documents.
542
00:22:00,640 --> 00:22:01,680
The output.
543
00:22:01,680 --> 00:22:06,320
You attach attributes, data class compensation aggregation cohort, venue or team's meeting,
544
00:22:06,320 --> 00:22:09,000
audience mixed external present true.
545
00:22:09,000 --> 00:22:12,080
Then you submit that as a request to a policy engine.
546
00:22:12,080 --> 00:22:16,120
And the policy engine does what humans do automatically and machines never do unless
547
00:22:16,120 --> 00:22:17,160
you force them to.
548
00:22:17,160 --> 00:22:19,200
It evaluates a rule like.
549
00:22:19,200 --> 00:22:23,720
Compensation cohorts may not be disclosed verbally when external participants are present.
550
00:22:23,720 --> 00:22:27,040
Allow only high level summaries with no subgroup references.
551
00:22:27,040 --> 00:22:30,600
Transform the response or deny it if it denies the speech to never runs.
552
00:22:30,600 --> 00:22:33,480
If it transforms, the agent speaks a sanitized version.
553
00:22:33,480 --> 00:22:35,680
High level trends remained within target ranges.
554
00:22:35,680 --> 00:22:39,120
Detailed breakdown is available to HR only audiences.
555
00:22:39,120 --> 00:22:43,240
And the decision artifact gets stored next to the action denied or transformed under rule
556
00:22:43,240 --> 00:22:46,000
v302 with the attributes that triggered it.
557
00:22:46,000 --> 00:22:47,360
Now replay the incident.
558
00:22:47,360 --> 00:22:50,200
Some questions, same retrieval, same entitlement, different outcome.
559
00:22:50,200 --> 00:22:54,040
The agent proposes a detailed answer, the control plane disposes, the meeting gets a safe
560
00:22:54,040 --> 00:22:57,280
summary, and your governance story becomes boring on purpose.
561
00:22:57,280 --> 00:23:01,040
Because compliance systems still fail when venue and intent aren't enforced at the moment
562
00:23:01,040 --> 00:23:02,440
of publication.
563
00:23:02,440 --> 00:23:06,120
Case study 3, external shadow agent with internal blast radius.
564
00:23:06,120 --> 00:23:10,280
Now the failure pattern that doesn't show up as a breach until the screenshots are already
565
00:23:10,280 --> 00:23:11,280
circulating.
566
00:23:11,280 --> 00:23:13,120
A developer is under pressure.
567
00:23:13,120 --> 00:23:14,360
Support tickets are piling up.
568
00:23:14,360 --> 00:23:18,920
The product team wants a deflection bot and someone has seen a demo where an agent answers questions
569
00:23:18,920 --> 00:23:19,920
instantly.
570
00:23:19,920 --> 00:23:21,720
So they do what modern platforms encourage.
571
00:23:21,720 --> 00:23:26,240
They stand up an externally accessible agent, put a chat widget on a public page, and wire
572
00:23:26,240 --> 00:23:29,320
it to enterprise knowledge so it doesn't sound stupid.
573
00:23:29,320 --> 00:23:34,360
And because it's just answering questions, they give it broad read scopes to internal content.
574
00:23:34,360 --> 00:23:38,960
A share point side with runbooks, a wiki, maybe a knowledge base, maybe a support analytics
575
00:23:38,960 --> 00:23:39,960
store.
576
00:23:39,960 --> 00:23:42,280
They also add a couple of write scopes for later.
577
00:23:42,280 --> 00:23:46,320
Because they're malicious, because future features always arrive and nobody wants to redo consent,
578
00:23:46,320 --> 00:23:48,600
the agent authenticates using an app registration.
579
00:23:48,600 --> 00:23:49,600
It gets a token.
580
00:23:49,600 --> 00:23:51,000
It calls internal systems.
581
00:23:51,000 --> 00:23:52,000
Everything is legitimate.
582
00:23:52,000 --> 00:23:53,480
That's the core danger here.
583
00:23:53,480 --> 00:23:55,480
Nothing has to be compromised for this to go wrong.
584
00:23:55,480 --> 00:24:00,720
A customer asks a harmless question, what's the work around for the X120 firmware outage?
585
00:24:00,720 --> 00:24:04,320
The agent retrieves internal runbooks and post mortem fragments that were never meant
586
00:24:04,320 --> 00:24:05,600
to leave the tenant.
587
00:24:05,600 --> 00:24:07,320
It assembles a confident answer.
588
00:24:07,320 --> 00:24:08,800
It publishes it to the public chat.
589
00:24:08,800 --> 00:24:12,800
No exploit chain, no prompt injection, no data exfiltration tooling, just a public interface
590
00:24:12,800 --> 00:24:16,600
connected to an internal corpus by an overpermissioned workload identity.
591
00:24:16,600 --> 00:24:19,680
Now walk through what the logs tell you and what they can't.
592
00:24:19,680 --> 00:24:22,720
Enter shows token issuance under a workload identity.
593
00:24:22,720 --> 00:24:26,440
If conditional access is configured for that identity, it evaluates the signing context
594
00:24:26,440 --> 00:24:27,720
and issues the token.
595
00:24:27,720 --> 00:24:30,440
Per view shows the agent reading internal documents.
596
00:24:30,440 --> 00:24:34,880
The audit trail is pristine, identity timestamps resource access downstream calls.
597
00:24:34,880 --> 00:24:39,160
The organization can prove down to the minute that the agent touched those files and responded
598
00:24:39,160 --> 00:24:40,640
to that external user.
599
00:24:40,640 --> 00:24:45,320
And that's the trap, because the logs being correct becomes evidence incorrectly that the
600
00:24:45,320 --> 00:24:46,680
system was governed.
601
00:24:46,680 --> 00:24:49,640
What's missing is the decision chain and the boundary enforcement.
602
00:24:49,640 --> 00:24:53,720
Why did the external request get access to internal only material, which rule asserted
603
00:24:53,720 --> 00:24:56,360
that this venue is allowed to consume that corpus?
604
00:24:56,360 --> 00:25:01,320
Where is the policy artifact that says public audience internal classification deny disclosure?
605
00:25:01,320 --> 00:25:04,840
In most shadow deployments, there is no artifact because there was no gate.
606
00:25:04,840 --> 00:25:08,280
The agent selected sources based on similarity and availability.
607
00:25:08,280 --> 00:25:10,920
The tool call executed because the token allowed it.
608
00:25:10,920 --> 00:25:13,040
The system did exactly what you configured.
609
00:25:13,040 --> 00:25:16,720
This is where audit provenance policy gate becomes operationally expensive.
610
00:25:16,720 --> 00:25:18,360
Audit tells you the leak happened.
611
00:25:18,360 --> 00:25:21,840
Provenance would tell you how the agent chose that runbook over a public doc.
612
00:25:21,840 --> 00:25:23,120
What other candidates existed?
613
00:25:23,120 --> 00:25:24,640
What was excluded and why?
614
00:25:24,640 --> 00:25:28,280
A policy gate would have prevented the response from ever being published externally.
615
00:25:28,280 --> 00:25:30,640
But the public facing agent usually has none of that.
616
00:25:30,640 --> 00:25:35,200
It has an experience plane that looks polished and a control plane that is effectively absent.
617
00:25:35,200 --> 00:25:37,240
Now the blast radius, this isn't a single reply.
618
00:25:37,240 --> 00:25:38,880
It's speed, reach and replication.
619
00:25:38,880 --> 00:25:41,920
The agent can answer a thousand external users in a day.
620
00:25:41,920 --> 00:25:45,120
Each answer can include a slightly different internal detail.
621
00:25:45,120 --> 00:25:48,600
Customers screenshot aggregators scrape the information spreads because the interfaces
622
00:25:48,600 --> 00:25:51,560
public and the system is consistent in the one way that matters.
623
00:25:51,560 --> 00:25:53,200
It's consistently allowed.
624
00:25:53,200 --> 00:25:54,960
And the post incident review is predictable.
625
00:25:54,960 --> 00:25:59,920
People say we'll tighten the prompt or we'll add a disclaimer or we'll retrain the model.
626
00:25:59,920 --> 00:26:01,480
Those are not containment strategies.
627
00:26:01,480 --> 00:26:03,000
Those are narrative strategies.
628
00:26:03,000 --> 00:26:07,000
The deterministic fix is boring and it works because it creates failure domains.
629
00:26:07,000 --> 00:26:09,280
First, split the identities.
630
00:26:09,280 --> 00:26:13,840
The public facing agent identity must have zero access to internal core data planes.
631
00:26:13,840 --> 00:26:14,840
None.
632
00:26:14,840 --> 00:26:17,640
It should only query a curated, published, approved external knowledge base.
633
00:26:17,640 --> 00:26:22,840
If the public corpus can't answer the correct behavior as refusal or escalation, not improvisation.
634
00:26:22,840 --> 00:26:26,920
Second, if you truly need internal knowledge to support external responses, you introduce
635
00:26:26,920 --> 00:26:27,920
a broker.
636
00:26:27,920 --> 00:26:31,380
The public agent can ask the broker for candidate content but the broker is the policy
637
00:26:31,380 --> 00:26:32,380
gate.
638
00:26:32,380 --> 00:26:34,640
It evaluates venue, audience and data classification.
639
00:26:34,640 --> 00:26:35,980
It transforms or denies.
640
00:26:35,980 --> 00:26:39,880
The public agent never sees internal chunks that are not eligible for egress.
641
00:26:39,880 --> 00:26:42,780
Third, persist the decision artifact with the action.
642
00:26:42,780 --> 00:26:48,520
When a response is allowed externally, you store a loud underrule EX2-1 source set,
643
00:26:48,520 --> 00:26:51,320
PUB docs 2024 Q2.
644
00:26:51,320 --> 00:26:56,640
When it's denied, you store denied underrule EX3-01 internal only content.
645
00:26:56,640 --> 00:27:00,120
Now your audit stops being a story and becomes proof of enforcement.
646
00:27:00,120 --> 00:27:01,960
Replay the same incident under that model.
647
00:27:01,960 --> 00:27:03,880
The customer asks about firmware.
648
00:27:03,880 --> 00:27:06,880
The public agent searches the external corpus and finds nothing definitive.
649
00:27:06,880 --> 00:27:08,160
It asks the broker.
650
00:27:08,160 --> 00:27:11,280
The broker evaluates the internal candidate and denies egress.
651
00:27:11,280 --> 00:27:12,920
The agent replies calmly.
652
00:27:12,920 --> 00:27:17,000
I can't share internal remediation notes here but I can connect you with support.
653
00:27:17,000 --> 00:27:19,160
The screenshot that circulates is a refusal.
654
00:27:19,160 --> 00:27:22,000
That is what containment looks like when you stop trusting the interface and start
655
00:27:22,000 --> 00:27:24,200
enforcing the control plane.
656
00:27:24,200 --> 00:27:27,320
The internal standardizes the envelope, not the guarantees.
657
00:27:27,320 --> 00:27:31,560
Microsoft is right about one thing that most teams quietly misunderstand.
658
00:27:31,560 --> 00:27:35,800
Activities, turn context, direct line, the bot framework patterns, those are not hacks.
659
00:27:35,800 --> 00:27:37,560
They are intentional design surfaces.
660
00:27:37,560 --> 00:27:41,560
They're how Microsoft expects you to build conversational systems that can run across channels
661
00:27:41,560 --> 00:27:43,800
and survive real-world connectivity.
662
00:27:43,800 --> 00:27:48,640
But teams keep hearing supported protocol and mentally upgrading it to guaranteed behavior.
663
00:27:48,640 --> 00:27:49,640
It is not.
664
00:27:49,640 --> 00:27:51,680
A protocol standardizes the envelope.
665
00:27:51,680 --> 00:27:55,560
It standardizes field names, schemas and how events are represented on the wire.
666
00:27:55,560 --> 00:27:59,920
It does not standardize the guarantees you wish you had, ordering exactly once delivery,
667
00:27:59,920 --> 00:28:02,360
causal consistency or safe side effects.
668
00:28:02,360 --> 00:28:06,280
That distinction matters because the moment you wire tool execution to an event stream,
669
00:28:06,280 --> 00:28:07,840
you've built a distributed system.
670
00:28:07,840 --> 00:28:11,160
And distributed systems don't fail because you wrote bad code.
671
00:28:11,160 --> 00:28:13,800
They fail because reality is asynchronous.
672
00:28:13,800 --> 00:28:17,120
Here are the four failure modes you inherit the second you go event-driven.
673
00:28:17,120 --> 00:28:20,520
Duplication, delay, reordering and loss.
674
00:28:20,520 --> 00:28:23,080
Not edge cases, not rare, the environment.
675
00:28:23,080 --> 00:28:25,120
A retry duplicates an activity.
676
00:28:25,120 --> 00:28:27,720
A congested path delays it, two workers re-order it.
677
00:28:27,720 --> 00:28:29,680
A transient broker drop loses it.
678
00:28:29,680 --> 00:28:31,480
The SDK abstracts the plumbing.
679
00:28:31,480 --> 00:28:32,720
It does not repeal physics.
680
00:28:32,720 --> 00:28:33,720
Now add tools.
681
00:28:33,720 --> 00:28:38,160
A send email, delete site or post-message tool call is not a chat reply.
682
00:28:38,160 --> 00:28:39,160
It's a side effect.
683
00:28:39,160 --> 00:28:41,480
Side effects are where your system becomes expensive.
684
00:28:41,480 --> 00:28:45,560
If you treat an incoming activity as authoritative state you are saying, "If I see this envelope,
685
00:28:45,560 --> 00:28:46,880
I will mutate the world."
686
00:28:46,880 --> 00:28:49,080
That's fine for rendering a typing indicator.
687
00:28:49,080 --> 00:28:51,040
It's insane for deleting a site.
688
00:28:51,040 --> 00:28:54,480
And this is exactly how you get the incident pattern from the opening.
689
00:28:54,480 --> 00:28:59,000
Conditional access issued a token once, the context drifted, the agent executed anyway,
690
00:28:59,000 --> 00:29:02,440
and per view logged the whole tragedy with perfect fidelity.
691
00:29:02,440 --> 00:29:06,840
The comfortable response is to say, "Will did you, teams try to did you by best effort in
692
00:29:06,840 --> 00:29:11,520
memory caches, fuzzy comparisons, if the text matches or correlation IDs that exist only
693
00:29:11,520 --> 00:29:13,040
inside a process boundary?"
694
00:29:13,040 --> 00:29:16,080
That's not id-impotency, that's optimism.
695
00:29:16,080 --> 00:29:17,480
But impotency is a contract.
696
00:29:17,480 --> 00:29:22,240
The same operation id produces the same result once, no matter how many times it arrives.
697
00:29:22,240 --> 00:29:26,200
And the only way to make that true is to persist the operation id in an authoritative store
698
00:29:26,200 --> 00:29:28,080
before the side effect happens.
699
00:29:28,080 --> 00:29:29,920
That store is the real boundary.
700
00:29:29,920 --> 00:29:33,080
Not the activity envelope, not the transcript, not the avatar.
701
00:29:33,080 --> 00:29:35,200
This is where most agent architectures quietly rot.
702
00:29:35,200 --> 00:29:37,000
They build a state machine out of envelopes.
703
00:29:37,000 --> 00:29:40,720
Turn context feels like state because it carries context, but it's a context object, not
704
00:29:40,720 --> 00:29:41,720
a ledger.
705
00:29:41,720 --> 00:29:43,800
It's a structured wrapper for a single turn.
706
00:29:43,800 --> 00:29:46,680
Not a durable source of truth for workflow eligibility.
707
00:29:46,680 --> 00:29:49,680
When the process restarts, the truth evaporates.
708
00:29:49,680 --> 00:29:53,520
And the event stream happily replays old messages into a new process that has no memory
709
00:29:53,520 --> 00:29:54,920
of what it already did.
710
00:29:54,920 --> 00:29:56,400
That's conditional chaos.
711
00:29:56,400 --> 00:29:58,600
And notice what makes it worse, embodiment.
712
00:29:58,600 --> 00:30:00,520
Voice adds latency sensitivity.
713
00:30:00,520 --> 00:30:01,520
Streaming adds retries.
714
00:30:01,520 --> 00:30:04,080
Web RTC reconnect logic adds more events.
715
00:30:04,080 --> 00:30:08,040
The experience plane injects more asynchronous behavior into the system, which increases the
716
00:30:08,040 --> 00:30:11,640
probability of duplicates, reordering, and partial failures.
717
00:30:11,640 --> 00:30:13,160
Exactly where your tool calls live.
718
00:30:13,160 --> 00:30:14,880
So the fix starts with the demotion.
719
00:30:14,880 --> 00:30:17,160
Demote events to proposals and telemetry.
720
00:30:17,160 --> 00:30:20,960
Treat every event as a thing that happened or a request that arrived.
721
00:30:20,960 --> 00:30:23,440
Not a state transition that must execute.
722
00:30:23,440 --> 00:30:25,960
Your authoritative workflow position lives elsewhere.
723
00:30:25,960 --> 00:30:27,440
Eligibility lives elsewhere.
724
00:30:27,440 --> 00:30:28,960
The decision lives elsewhere.
725
00:30:28,960 --> 00:30:30,520
Then you do the boring thing that works.
726
00:30:30,520 --> 00:30:32,280
You interpose a deterministic gate.
727
00:30:32,280 --> 00:30:33,880
The agent does not send commands.
728
00:30:33,880 --> 00:30:36,640
It sends structured requests with an operation ID.
729
00:30:36,640 --> 00:30:42,240
The policy engine evaluates against authoritative state and returns, allow, deny, or transform.
730
00:30:42,240 --> 00:30:45,040
These accept decisions, not imperatives.
731
00:30:45,040 --> 00:30:49,800
If an event replays, the same operation ID returns the same decision and the same effect.
732
00:30:49,800 --> 00:30:53,960
If the event arrives out of order, the state store says resource doesn't exist yet.
733
00:30:53,960 --> 00:30:56,040
And the request gets denied deterministically.
734
00:30:56,040 --> 00:31:00,720
If the event arrives late, it gets the same cash decision, not a fresh guess.
735
00:31:00,720 --> 00:31:03,840
Protocol standardization made the system interoperable.
736
00:31:03,840 --> 00:31:05,680
Deterministic design makes it survivable.
737
00:31:05,680 --> 00:31:06,920
Event-driven entropy.
738
00:31:06,920 --> 00:31:09,320
Why retries become incidents without determinism?
739
00:31:09,320 --> 00:31:12,800
This is where enterprise teams accidentally build roulette tables and then act shocked when
740
00:31:12,800 --> 00:31:14,120
the ball lands on red.
741
00:31:14,120 --> 00:31:15,840
They wire an agent to an event stream.
742
00:31:15,840 --> 00:31:19,920
They see clean activities arriving and they treat the arrival of an envelope as permission
743
00:31:19,920 --> 00:31:21,360
to mutate the world.
744
00:31:21,360 --> 00:31:25,160
Create the task, delete the site, send the email, post the message, and because the agent
745
00:31:25,160 --> 00:31:28,200
is just responding, they don't treat it like a transactional system.
746
00:31:28,200 --> 00:31:29,440
They treat it like UI.
747
00:31:29,440 --> 00:31:30,840
That's the foundational mistake.
748
00:31:30,840 --> 00:31:33,240
In an event-driven system, delivery is not a guarantee.
749
00:31:33,240 --> 00:31:34,240
It is an attempt.
750
00:31:34,240 --> 00:31:35,240
The platform will retry.
751
00:31:35,240 --> 00:31:36,720
The SDK will reconnect.
752
00:31:36,720 --> 00:31:38,160
WebRTC will renegotiate.
753
00:31:38,160 --> 00:31:39,640
The broker will re-deliver.
754
00:31:39,640 --> 00:31:43,120
When you add a speaking agent, you add more opportunities for those retries because the
755
00:31:43,120 --> 00:31:46,840
experience plane depends on low latency streaming and noisy networks.
756
00:31:46,840 --> 00:31:48,160
The system compensates.
757
00:31:48,160 --> 00:31:49,640
It tries again.
758
00:31:49,640 --> 00:31:50,800
Here's the part.
759
00:31:50,800 --> 00:31:52,480
People refuse to internalize.
760
00:31:52,480 --> 00:31:54,760
At least once delivery is not reliability.
761
00:31:54,760 --> 00:31:56,880
It is duplication with good intentions.
762
00:31:56,880 --> 00:32:00,520
If your tool call is not a damp-potent, at least once becomes twice.
763
00:32:00,520 --> 00:32:03,120
Twice is an incident if the action isn't reversible.
764
00:32:03,120 --> 00:32:05,000
The cleanest example is email.
765
00:32:05,000 --> 00:32:07,880
Everyone thinks email is harmless because it's not deleting data.
766
00:32:07,880 --> 00:32:11,440
When the agent sends the same message twice because a transient error occurred after the
767
00:32:11,440 --> 00:32:15,600
first send before the system persisted success, the business impact isn't technical.
768
00:32:15,600 --> 00:32:16,600
It's human.
769
00:32:16,600 --> 00:32:18,080
People respond to the wrong thread.
770
00:32:18,080 --> 00:32:19,240
Someone escalates.
771
00:32:19,240 --> 00:32:20,480
Someone forwards.
772
00:32:20,480 --> 00:32:24,480
Now you've created confusion and possibly disclosure and your logs will insist everything
773
00:32:24,480 --> 00:32:26,840
was fine because both sends were legitimate.
774
00:32:26,840 --> 00:32:31,040
Now, upgrade the action to SharePoint Deletion or Permissions Changes and the same pattern
775
00:32:31,040 --> 00:32:32,440
becomes catastrophic.
776
00:32:32,440 --> 00:32:34,360
This is what actually happens in real workloads.
777
00:32:34,360 --> 00:32:35,840
The agent proposes an action.
778
00:32:35,840 --> 00:32:37,320
The orchestrator calls the tool.
779
00:32:37,320 --> 00:32:38,320
The tool executes.
780
00:32:38,320 --> 00:32:41,560
The response is slow or the network flakes or the process restarts.
781
00:32:41,560 --> 00:32:44,160
The orchestrator never receives the success, so it retries.
782
00:32:44,160 --> 00:32:45,640
The tool executes again.
783
00:32:45,640 --> 00:32:47,360
You now have two side effects.
784
00:32:47,360 --> 00:32:50,560
And the only reason you're surprised is because you treated the first execution as if it
785
00:32:50,560 --> 00:32:52,400
was tied to the event receipt.
786
00:32:52,400 --> 00:32:53,400
It wasn't.
787
00:32:53,400 --> 00:32:54,400
The action was tied to optimism.
788
00:32:54,400 --> 00:32:56,280
That's why D-D-D-UP doesn't save you.
789
00:32:56,280 --> 00:32:58,880
D-D-D-UP by best effort is not a safety mechanism.
790
00:32:58,880 --> 00:33:00,400
It is a logging convenience.
791
00:33:00,400 --> 00:33:01,800
People did do it using message text.
792
00:33:01,800 --> 00:33:04,920
That fails the first time the model rephrases the same intent.
793
00:33:04,920 --> 00:33:09,600
People did d-D-UP using timestamps that fails when delayed delivery shifts the arrival window.
794
00:33:09,600 --> 00:33:10,680
People did d-UP in memory.
795
00:33:10,680 --> 00:33:12,120
That fails on process restart.
796
00:33:12,120 --> 00:33:14,080
People did d-UP by correlating turn IDs.
797
00:33:14,080 --> 00:33:17,240
That fails across channels and adapters where IDs are transformed.
798
00:33:17,240 --> 00:33:19,400
The important see is not probably the same.
799
00:33:19,400 --> 00:33:20,760
It is provably the same.
800
00:33:20,760 --> 00:33:24,480
And provably the same requires two things, a stable operation identity and an authoritative
801
00:33:24,480 --> 00:33:26,400
state store that outlives the process.
802
00:33:26,400 --> 00:33:27,560
This is the system law.
803
00:33:27,560 --> 00:33:30,960
If an event can't be safely replayed, it shouldn't control state.
804
00:33:30,960 --> 00:33:32,520
Now apply that law to agents.
805
00:33:32,520 --> 00:33:33,960
The agents job is to propose.
806
00:33:33,960 --> 00:33:37,520
The system's job is to decide and decide must be persistent.
807
00:33:37,520 --> 00:33:39,560
So the deterministic design is simple.
808
00:33:39,560 --> 00:33:43,600
Even if the implementation isn't, every side effecting operation gets an immutable operation
809
00:33:43,600 --> 00:33:48,360
ID that is generated once, not per retry, once.
810
00:33:48,360 --> 00:33:51,960
That operation ID is persisted before the tool executes.
811
00:33:51,960 --> 00:33:55,720
The persisted record includes the proposed action and its current status.
812
00:33:55,720 --> 00:33:58,280
Proposed allowed denied executed failed.
813
00:33:58,280 --> 00:34:00,000
Then every retry becomes boring.
814
00:34:00,000 --> 00:34:03,840
If the same operation ID arrives again, the system returns the already recorded decision
815
00:34:03,840 --> 00:34:04,840
and outcome.
816
00:34:04,840 --> 00:34:05,840
No new side effect.
817
00:34:05,840 --> 00:34:08,960
No second deletion, no second email, no double share.
818
00:34:08,960 --> 00:34:10,480
The replay is saved by design.
819
00:34:10,480 --> 00:34:12,800
This is also how you neutralize reordering.
820
00:34:12,800 --> 00:34:17,160
If complete task arrives before create task, the state store says the task doesn't exist
821
00:34:17,160 --> 00:34:19,440
and the request is denied deterministically.
822
00:34:19,440 --> 00:34:20,920
Not handled later, denied.
823
00:34:20,920 --> 00:34:23,400
The event becomes telemetry, not authority.
824
00:34:23,400 --> 00:34:25,320
And this is how you neutralize delay.
825
00:34:25,320 --> 00:34:27,120
Later rivals don't trigger fresh decisions.
826
00:34:27,120 --> 00:34:30,640
They map to existing operation IDs and return existing outcomes.
827
00:34:30,640 --> 00:34:33,720
The system does not relitigate intent because the packet arrived late.
828
00:34:33,720 --> 00:34:35,960
Now here's the uncomfortable part.
829
00:34:35,960 --> 00:34:37,560
None of this is a model problem.
830
00:34:37,560 --> 00:34:39,120
None of this is hallucination.
831
00:34:39,120 --> 00:34:40,840
None of this is Microsoft being sloppy.
832
00:34:40,840 --> 00:34:44,240
This is distributed systems behavior colliding with side effects.
833
00:34:44,240 --> 00:34:47,800
And the more you anthropomorphize the agent, the less likely you are to build these boring
834
00:34:47,800 --> 00:34:51,160
guarantees because you start believing the interaction is the system.
835
00:34:51,160 --> 00:34:52,160
It isn't.
836
00:34:52,160 --> 00:34:54,800
The system is the state spine behind the conversation.
837
00:34:54,800 --> 00:34:57,120
Without that spine, retries are not resilience.
838
00:34:57,120 --> 00:35:02,160
Retries are how your architecture manufactures incidents out of transient failures.
839
00:35:02,160 --> 00:35:05,800
At pattern one, idempotency keys, post-authoritative state spine.
840
00:35:05,800 --> 00:35:09,200
Now the first deterministic pattern is the one everybody claims they already have.
841
00:35:09,200 --> 00:35:11,800
Right up until the first replay deletes the wrong thing.
842
00:35:11,800 --> 00:35:14,280
Idempotency is not, we try not to do it twice.
843
00:35:14,280 --> 00:35:15,480
Idempotency is a guarantee.
844
00:35:15,480 --> 00:35:19,640
The same operation, identified the same way, produces the same effect exactly once,
845
00:35:19,640 --> 00:35:22,400
no matter how many times the system replays the request.
846
00:35:22,400 --> 00:35:26,920
And the only way to get that guarantee is to stop pretending the event stream is your state.
847
00:35:26,920 --> 00:35:29,000
Here's the model that holds under pressure.
848
00:35:29,000 --> 00:35:33,080
This side-effecting action gets an operation id that is generated once, at the moment the
849
00:35:33,080 --> 00:35:34,680
intent becomes a request.
850
00:35:34,680 --> 00:35:37,200
Not after the tool call, not after the model replies.
851
00:35:37,200 --> 00:35:39,480
Before, that operation id is immutable.
852
00:35:39,480 --> 00:35:43,240
Collision-resistant, boring, it doesn't encode meaning, it encodes identity.
853
00:35:43,240 --> 00:35:47,360
Then you persisted to an authoritative state spine before you execute anything.
854
00:35:47,360 --> 00:35:51,520
That spine is not your turn state, it is not an in-memory cache, it is not will reconstructed
855
00:35:51,520 --> 00:35:52,720
from logs.
856
00:35:52,720 --> 00:35:57,760
It is a durable store that outlives the process and survives retries, restarts and parallel
857
00:35:57,760 --> 00:35:58,760
workers.
858
00:35:58,760 --> 00:36:03,480
Then you store the minimum fields that make replay safe, operation id, proposed action
859
00:36:03,480 --> 00:36:09,160
structured, not pros, current status, proposed, decided executed.
860
00:36:09,160 --> 00:36:14,160
Decision artifact pointer allowed, denied, transformed, target resource identifiers,
861
00:36:14,160 --> 00:36:16,760
and a timestamp plus version for concurrency control.
862
00:36:16,760 --> 00:36:20,760
Once you have that, retries stop being dangerous because retries stop being meaningful.
863
00:36:20,760 --> 00:36:25,040
A duplicate event arrives, you look up the operation id, you already have a status of
864
00:36:25,040 --> 00:36:28,640
executed, you return the previous outcome, no second side effect, no best effort
865
00:36:28,640 --> 00:36:29,640
to do it.
866
00:36:29,640 --> 00:36:33,800
It is deterministic because the state spine is authoritative, reordering becomes boring
867
00:36:33,800 --> 00:36:38,720
for the same reason, an event arrives that says complete task before create task, in
868
00:36:38,720 --> 00:36:43,000
an envelope driven system that creates a time machine, in a spine driven system you check
869
00:36:43,000 --> 00:36:44,000
state.
870
00:36:44,000 --> 00:36:45,800
Task doesn't exist.
871
00:36:45,800 --> 00:36:50,720
You deny deterministically, not because the agent is smart, but because the state is authoritative.
872
00:36:50,720 --> 00:36:52,240
Delays become boring as well.
873
00:36:52,240 --> 00:36:56,520
If the event arrives late, it still references the same operation id, you return the same decision.
874
00:36:56,520 --> 00:37:00,240
The system doesn't reinterpret intent because a packet took a scenic route through someone's
875
00:37:00,240 --> 00:37:01,240
VPN happened.
876
00:37:01,240 --> 00:37:04,440
Now the thing most people miss is the separation of concerns.
877
00:37:04,440 --> 00:37:07,600
Item potency prevents double harm, it does not decide what is allowed.
878
00:37:07,600 --> 00:37:13,520
That's why the operation id must exist before policy evaluation and before tool execution.
879
00:37:13,520 --> 00:37:17,880
It becomes the anchor for everything else, decision execution, audit, provenance.
880
00:37:17,880 --> 00:37:22,480
Without that anchor, you can't tie what we decided to, what we did in a way that survives
881
00:37:22,480 --> 00:37:23,480
failure.
882
00:37:23,480 --> 00:37:25,880
And you need one more piece to make the spine real.
883
00:37:25,880 --> 00:37:30,440
A workflow state machine that is defined outside the agent, the agent can propose the system
884
00:37:30,440 --> 00:37:34,920
must track progression, proposed decided executed is not optional ceremony.
885
00:37:34,920 --> 00:37:38,360
It is how you prevent event noise from becoming irreversible action.
886
00:37:38,360 --> 00:37:42,920
When a worker crashes after executing, but before replying, the state already says executed.
887
00:37:42,920 --> 00:37:44,720
The next worker doesn't try again.
888
00:37:44,720 --> 00:37:46,160
It returns the recorded effect.
889
00:37:46,160 --> 00:37:48,480
This is also why you don't store only success.
890
00:37:48,480 --> 00:37:50,440
You store denials and transforms too.
891
00:37:50,440 --> 00:37:54,480
Because the absence of an action is still a decision you need to replace safely.
892
00:37:54,480 --> 00:38:00,240
If the policy denied the delete at 0905 and the same request replace at 0906, you must deny
893
00:38:00,240 --> 00:38:04,320
again for the same operation id, otherwise you've built a system that can be bypassed by
894
00:38:04,320 --> 00:38:05,840
retry storms.
895
00:38:05,840 --> 00:38:08,440
The practical consequence is that you stop debugging ghosts.
896
00:38:08,440 --> 00:38:12,600
Your incident review stops being how did this run twice and becomes why did we ever allow
897
00:38:12,600 --> 00:38:15,240
this operation id to execute once.
898
00:38:15,240 --> 00:38:16,240
That's progress.
899
00:38:16,240 --> 00:38:17,560
That is where accountability lives.
900
00:38:17,560 --> 00:38:19,280
And yes, this costs engineering effort.
901
00:38:19,280 --> 00:38:23,400
So does every week you spend reconstructing an incident from transcripts and half correlated
902
00:38:23,400 --> 00:38:28,440
log lines, id, potency keys, plus an authoritative state spine are not an optimization.
903
00:38:28,440 --> 00:38:33,080
They are the price of admission for letting probabilistic agents touch deterministic systems.
904
00:38:33,080 --> 00:38:35,320
Deterministic pattern 2 per tool call policy gate.
905
00:38:35,320 --> 00:38:37,160
id, potency gives you safe replay.
906
00:38:37,160 --> 00:38:38,160
Good.
907
00:38:38,160 --> 00:38:40,000
It stops duplicates from turning into double damage.
908
00:38:40,000 --> 00:38:43,960
But id, potency doesn't answer the question that actually decides whether you have an incident.
909
00:38:43,960 --> 00:38:45,480
Should this action be allowed at all?
910
00:38:45,480 --> 00:38:47,920
That's where most enterprises fall back into religion.
911
00:38:47,920 --> 00:38:49,200
The agent knows.
912
00:38:49,200 --> 00:38:50,520
The prompt told it.
913
00:38:50,520 --> 00:38:51,360
We trained it.
914
00:38:51,360 --> 00:38:52,520
It has citations.
915
00:38:52,520 --> 00:38:54,120
None of that is an enforcement model.
916
00:38:54,120 --> 00:38:55,120
It's a hope model.
917
00:38:55,120 --> 00:39:00,080
A per tool called policy gate is the mechanism that converts hope into a deterministic decision.
918
00:39:00,080 --> 00:39:02,200
And the key change is conceptual, not technical.
919
00:39:02,200 --> 00:39:03,680
The agent stops issuing commands.
920
00:39:03,680 --> 00:39:05,600
It starts submitting requests.
921
00:39:05,600 --> 00:39:11,920
The moment you let the model speak in imperatives, delete side X, share file Y, email this to Zid.
922
00:39:11,920 --> 00:39:13,920
You've made the LLM the control plane.
923
00:39:13,920 --> 00:39:16,640
You've delegated authority to a probabilistic system.
924
00:39:16,640 --> 00:39:18,040
That is not agentic.
925
00:39:18,040 --> 00:39:19,040
That is application.
926
00:39:19,040 --> 00:39:20,960
A policy gate flips the relationship.
927
00:39:20,960 --> 00:39:23,840
The agent proposes the control plane disposes.
928
00:39:23,840 --> 00:39:25,480
So what does the gate actually evaluate?
929
00:39:25,480 --> 00:39:30,760
Not pros, not vibe, not it sounded reasonable, a structured request.
930
00:39:30,760 --> 00:39:36,360
At minimum, every tool reaching request carries a tuple, actor, intent, scope, data class,
931
00:39:36,360 --> 00:39:38,400
venue and operation id.
932
00:39:38,400 --> 00:39:43,320
Actor is the identity that would execute the call, human, workload identity or segmented
933
00:39:43,320 --> 00:39:44,800
agent principle.
934
00:39:44,800 --> 00:39:51,280
It is the verb, delete, share, send, post, create, approve, small, innumerable, boring.
935
00:39:51,280 --> 00:39:56,760
Scope is the concrete target, site id, file id, mailbox, distribution list, external domain,
936
00:39:56,760 --> 00:39:58,080
API endpoint.
937
00:39:58,080 --> 00:40:02,200
Data class is the sensitivity of the thing being touched or disclosed, derived from authoritative
938
00:40:02,200 --> 00:40:04,840
classification, not guessed by the model.
939
00:40:04,840 --> 00:40:08,920
Venew is where the effect will manifest internal tenant, external email, teams meeting
940
00:40:08,920 --> 00:40:13,640
with external's public web chat, operation id, anchors, replay and traceability as we already
941
00:40:13,640 --> 00:40:14,640
covered.
942
00:40:14,640 --> 00:40:18,360
Now the policy engine evaluates that tuple against rules and authoritative state.
943
00:40:18,360 --> 00:40:20,240
It returns one of three outcomes.
944
00:40:20,240 --> 00:40:24,240
Allow, deny, transform.
945
00:40:24,240 --> 00:40:27,680
Allow means it can proceed, but not as a blank check.
946
00:40:27,680 --> 00:40:32,920
It can attach constraints, time window, max recipients, required approval, rate limits,
947
00:40:32,920 --> 00:40:34,440
a narrow target set.
948
00:40:34,440 --> 00:40:36,280
deny means the tool never executes.
949
00:40:36,280 --> 00:40:40,280
The refusal is not a moral stance, it's a deterministic result, intent x and venue
950
00:40:40,280 --> 00:40:46,320
y with data class z violates rule r, transform is the underused one that keeps systems usable.
951
00:40:46,320 --> 00:40:49,600
It means the action is allowed only in a safer form.
952
00:40:49,600 --> 00:40:53,800
Replace share externally with share internally and create an approval task.
953
00:40:53,800 --> 00:40:58,600
Replace speak compensation cohort data with speaker high level summary template.
954
00:40:58,600 --> 00:41:00,720
Replace delete with move to quarantine.
955
00:41:00,720 --> 00:41:02,960
This is how you avoid the false choice.
956
00:41:02,960 --> 00:41:05,840
Between agents are useless and agents are dangerous.
957
00:41:05,840 --> 00:41:09,760
The gate is also where you encode negative space, not just what you did, what you refused
958
00:41:09,760 --> 00:41:13,720
to do, what you refused to retrieve, what you refused to disclose.
959
00:41:13,720 --> 00:41:17,440
Because governance without refusal telemetry becomes performative, it only shows motion.
960
00:41:17,440 --> 00:41:21,080
Now here's the part that separates a real gate from a prompt based imitation.
961
00:41:21,080 --> 00:41:23,480
Tools must accept decisions, not requests.
962
00:41:23,480 --> 00:41:28,080
If your tool endpoint will execute any authenticated call that contains delete, true, you don't have
963
00:41:28,080 --> 00:41:31,360
a gate, you have a suggestion layer in front of a loaded weapon.
964
00:41:31,360 --> 00:41:35,480
The tool should accept only a signed decision artifact from the policy engine bound to
965
00:41:35,480 --> 00:41:40,200
the operation ID with a short TTL if the decision doesn't match the tool denies.
966
00:41:40,200 --> 00:41:43,840
If the operation ID was already executed, the tool returns the prior outcome.
967
00:41:43,840 --> 00:41:48,440
That binds execution to policy and makes bypassing the gate materially harder.
968
00:41:48,440 --> 00:41:49,920
And yes, this sounds like overhead.
969
00:41:49,920 --> 00:41:50,920
It is.
970
00:41:50,920 --> 00:41:53,640
It's also the only place where intent can be enforced at action time.
971
00:41:53,640 --> 00:41:54,880
Conditional access can't do this.
972
00:41:54,880 --> 00:41:56,200
It doesn't see the tool call.
973
00:41:56,200 --> 00:41:57,760
It sees token issuance context.
974
00:41:57,760 --> 00:41:58,760
Per view can't do this.
975
00:41:58,760 --> 00:42:00,360
It sees the aftermath.
976
00:42:00,360 --> 00:42:01,360
Citations can't do this.
977
00:42:01,360 --> 00:42:03,160
They explain retrieval, not permission.
978
00:42:03,160 --> 00:42:05,160
Only a gate can stop the train while it's moving.
979
00:42:05,160 --> 00:42:09,680
If you want a concrete mental picture, treat the policy engine like an authorization compiler.
980
00:42:09,680 --> 00:42:11,360
The agent submits a high level request.
981
00:42:11,360 --> 00:42:13,280
The compiler checks it against rules and state.
982
00:42:13,280 --> 00:42:15,000
It emits a decision artifact.
983
00:42:15,000 --> 00:42:16,760
The runtime can execute.
984
00:42:16,760 --> 00:42:18,760
Without that artifact execution is invalid.
985
00:42:18,760 --> 00:42:21,600
That's determinism grafted onto probabilistic reasoning.
986
00:42:21,600 --> 00:42:24,800
And once you have it, your incident reviews change shape.
987
00:42:24,800 --> 00:42:28,360
You stop asking why did it do that as if the agent had agency?
988
00:42:28,360 --> 00:42:30,480
You ask which rule allowed this?
989
00:42:30,480 --> 00:42:31,960
And who changed it?
990
00:42:31,960 --> 00:42:33,440
That's accountability.
991
00:42:33,440 --> 00:42:38,040
And then the logistic pattern three segmented agent identities as failure domains.
992
00:42:38,040 --> 00:42:41,920
Once you put a real policy gate in front of tools, you've solved the should this be a
993
00:42:41,920 --> 00:42:46,800
loud problem at action time, but you still haven't solved the bigger failure domain problem.
994
00:42:46,800 --> 00:42:50,560
Because if one identity can do everything, your gate becomes your only break and breaks
995
00:42:50,560 --> 00:42:51,560
fail.
996
00:42:51,560 --> 00:42:52,560
Rules drift.
997
00:42:52,560 --> 00:42:53,880
Someone adds an exception.
998
00:42:53,880 --> 00:42:56,120
An urgent request becomes permanent.
999
00:42:56,120 --> 00:42:58,760
Entropy always wins unless you give it walls to hit.
1000
00:42:58,760 --> 00:43:00,360
Segmented agent identities are those walls.
1001
00:43:00,360 --> 00:43:01,960
One agent is not one identity.
1002
00:43:01,960 --> 00:43:03,520
One agent is an orchestrator.
1003
00:43:03,520 --> 00:43:07,600
It should coordinate multiple principles, each with a narrow capability and a narrow blast
1004
00:43:07,600 --> 00:43:08,600
radius.
1005
00:43:08,600 --> 00:43:10,120
Read, write and address.
1006
00:43:10,120 --> 00:43:13,840
That distinction matters because the dominant failure mode in enterprise agents is not the
1007
00:43:13,840 --> 00:43:15,680
agent got global admin.
1008
00:43:15,680 --> 00:43:17,680
Microsoft has already constrained a lot of that.
1009
00:43:17,680 --> 00:43:19,440
The dominant failure is the boring one.
1010
00:43:19,440 --> 00:43:24,440
A convenience driven, overscoped identity executing at machine speed in the wrong place.
1011
00:43:24,440 --> 00:43:25,880
Least privilege isn't a value statement.
1012
00:43:25,880 --> 00:43:26,880
It's math.
1013
00:43:26,880 --> 00:43:30,880
Scopes multiplied by ambiguity multiplied by speed equals blast radius.
1014
00:43:30,880 --> 00:43:35,360
If you let the same identity retrieve broadly, write broadly and communicate externally, you've
1015
00:43:35,360 --> 00:43:37,240
created a super user with a polite interface.
1016
00:43:37,240 --> 00:43:39,400
It doesn't matter how good your prompt is.
1017
00:43:39,400 --> 00:43:40,680
The capability exists.
1018
00:43:40,680 --> 00:43:42,680
The model will eventually root intent into it.
1019
00:43:42,680 --> 00:43:44,760
So you split capability at the identity boundary.
1020
00:43:44,760 --> 00:43:46,240
The read identity can only read.
1021
00:43:46,240 --> 00:43:50,000
It can query SharePoint metadata, retrieve files and summarize content.
1022
00:43:50,000 --> 00:43:51,000
It cannot delete.
1023
00:43:51,000 --> 00:43:52,000
It cannot share.
1024
00:43:52,000 --> 00:43:53,000
It cannot send.
1025
00:43:53,000 --> 00:43:54,960
It cannot write anywhere that matters.
1026
00:43:54,960 --> 00:43:56,600
Its job is to propose not to act.
1027
00:43:56,600 --> 00:43:58,040
Then you create a write identity.
1028
00:43:58,040 --> 00:43:59,440
This one is intentionally painful.
1029
00:43:59,440 --> 00:44:03,320
It holds only the minimum permissions needed for irreversible actions.
1030
00:44:03,320 --> 00:44:07,880
And those permissions are resource, scoped, short-lived and ideally minted just in time.
1031
00:44:07,880 --> 00:44:11,240
If you can't make them short-lived, then you rotate aggressively and monitor like you mean
1032
00:44:11,240 --> 00:44:12,240
it.
1033
00:44:12,240 --> 00:44:13,720
This identity never retrieves broadly.
1034
00:44:13,720 --> 00:44:14,720
It doesn't need to.
1035
00:44:14,720 --> 00:44:18,640
It executes against explicit targets that have already passed policy evaluation.
1036
00:44:18,640 --> 00:44:20,320
And then you create an egress identity.
1037
00:44:20,320 --> 00:44:24,560
This one can talk outside the tenant or publish to public surfaces or send email to external
1038
00:44:24,560 --> 00:44:25,560
domains.
1039
00:44:25,560 --> 00:44:27,720
It has zero access to internal corporate data planes.
1040
00:44:27,720 --> 00:44:28,720
None.
1041
00:44:28,720 --> 00:44:32,600
You can see internal runbooks and also post externally you've already lost.
1042
00:44:32,600 --> 00:44:33,680
Egress is not a feature.
1043
00:44:33,680 --> 00:44:34,680
It's a failure domain.
1044
00:44:34,680 --> 00:44:37,560
Now the obvious objection is that's three times the complexity.
1045
00:44:37,560 --> 00:44:41,840
No, it's three times the clarity because now every action has a lane and lanes don't cross
1046
00:44:41,840 --> 00:44:43,000
without a broker.
1047
00:44:43,000 --> 00:44:44,440
The orchestrator can request.
1048
00:44:44,440 --> 00:44:47,960
The policy gate can decide the correct identity can execute.
1049
00:44:47,960 --> 00:44:52,280
If the read identity gets compromised, the attacker gets visibility, not destruction.
1050
00:44:52,280 --> 00:44:56,040
If the right identity gets compromised, the attacker gets destruction, but only within
1051
00:44:56,040 --> 00:45:00,640
a narrowly scoped domain and ideally only for a short time window.
1052
00:45:00,640 --> 00:45:04,720
If the egress identity gets compromised, the attacker can speak, but they can't see your
1053
00:45:04,720 --> 00:45:06,040
internal knowledge base.
1054
00:45:06,040 --> 00:45:09,920
This is how you build containment into the system rather than writing incident reviews
1055
00:45:09,920 --> 00:45:11,680
about will be more careful.
1056
00:45:11,680 --> 00:45:14,400
And it pairs cleanly with the previous two patterns.
1057
00:45:14,400 --> 00:45:16,040
Identity gives you safe replay.
1058
00:45:16,040 --> 00:45:19,200
The policy gate gives you action time authorization.
1059
00:45:19,200 --> 00:45:22,360
Segmented identities give you blast radius containment when the gate is wrong.
1060
00:45:22,360 --> 00:45:25,200
Now here's the part most teams miss.
1061
00:45:25,200 --> 00:45:28,160
Reaction must be enforced by design, not etiquette.
1062
00:45:28,160 --> 00:45:31,720
Don't let the agent choose which identity to use based on a prompt instruction.
1063
00:45:31,720 --> 00:45:32,720
That's still hope.
1064
00:45:32,720 --> 00:45:37,160
Identity selection should be a deterministic mapping from intent and venue to a principle
1065
00:45:37,160 --> 00:45:38,920
enforced by the control plane.
1066
00:45:38,920 --> 00:45:41,320
Delete intent routes to the right principle.
1067
00:45:41,320 --> 00:45:44,320
External publication routes to the egress principle.
1068
00:45:44,320 --> 00:45:46,080
Retrieval routes to the read principle.
1069
00:45:46,080 --> 00:45:50,400
The agent can't override that because it never directly holds the credentials for the
1070
00:45:50,400 --> 00:45:51,640
other lanes.
1071
00:45:51,640 --> 00:45:54,160
This is also how you survive shadow agents sprawl.
1072
00:45:54,160 --> 00:45:58,120
When someone spins up a quick external bot, the external lane simply cannot authenticate
1073
00:45:58,120 --> 00:46:00,200
to internal core data planes.
1074
00:46:00,200 --> 00:46:02,280
Even if they try, even if they copy code.
1075
00:46:02,280 --> 00:46:06,240
Even if they add a connector, the design makes the bad path impossible without an explicit
1076
00:46:06,240 --> 00:46:07,400
governance decision.
1077
00:46:07,400 --> 00:46:11,280
So if you remember one sentence from this pattern, make it this.
1078
00:46:11,280 --> 00:46:13,880
Agents should fail small, not fail loud.
1079
00:46:13,880 --> 00:46:17,240
A single identity design makes failure loud by default.
1080
00:46:17,240 --> 00:46:20,640
Segmented identities make failure bounded by default.
1081
00:46:20,640 --> 00:46:24,720
That's the difference between a contained incident and a tenet wide outage delivered by
1082
00:46:24,720 --> 00:46:26,120
a calm voice.
1083
00:46:26,120 --> 00:46:31,040
Bragg as a security boundary, retrieval filters plus negative space plus output classification
1084
00:46:31,040 --> 00:46:34,960
not tie the whole thing back to the part everybody treats as just search.
1085
00:46:34,960 --> 00:46:37,120
Retrieval.
1086
00:46:37,120 --> 00:46:39,360
Most teams implement rag like a convenience feature.
1087
00:46:39,360 --> 00:46:43,440
Embed documents, vector search, pull the top five chunks, stuff them into the prompt and
1088
00:46:43,440 --> 00:46:45,080
call it grounded.
1089
00:46:45,080 --> 00:46:46,080
That is not a boundary.
1090
00:46:46,080 --> 00:46:49,120
That is a suggestion engine feeding a probabilistic model.
1091
00:46:49,120 --> 00:46:51,520
In a real enterprise, retrieval is an authorization event.
1092
00:46:51,520 --> 00:46:55,240
It is the moment your system decides what information is allowed to exist for this actor
1093
00:46:55,240 --> 00:46:56,960
in this venue right now.
1094
00:46:56,960 --> 00:47:00,680
If you don't treat it that way, the nearest neighbor algorithm will outrun your governance
1095
00:47:00,680 --> 00:47:01,680
model every time.
1096
00:47:01,680 --> 00:47:03,560
So the boundary starts before similarity.
1097
00:47:03,560 --> 00:47:05,800
Alligibility comes first.
1098
00:47:05,800 --> 00:47:10,200
Before you run a vector search, you filter the candidate set using hard predicates,
1099
00:47:10,200 --> 00:47:13,800
principle, access scope, confidentiality and venue.
1100
00:47:13,800 --> 00:47:17,720
Principle means the workload identity or user context that is actually operating.
1101
00:47:17,720 --> 00:47:21,480
This scope means what corpus this identity is allowed to see based on an authoritative
1102
00:47:21,480 --> 00:47:25,200
catalog, not on whatever connector happens to be configured.
1103
00:47:25,200 --> 00:47:28,520
Confidentiality means the classification level of the content.
1104
00:47:28,520 --> 00:47:31,320
venue means where the answer will be consumed.
1105
00:47:31,320 --> 00:47:35,480
Internal chat, mixed audience meeting, external web, email public site.
1106
00:47:35,480 --> 00:47:38,720
If a chunk is not eligible under those predicates, it does not exist.
1107
00:47:38,720 --> 00:47:40,040
Not it won't be used.
1108
00:47:40,040 --> 00:47:41,480
It doesn't exist.
1109
00:47:41,480 --> 00:47:43,040
This is the uncomfortable truth.
1110
00:47:43,040 --> 00:47:44,760
Similarity search is not a permissions model.
1111
00:47:44,760 --> 00:47:48,600
It is math and math will happily return the best match from an ineligible corpus unless
1112
00:47:48,600 --> 00:47:49,800
you fence it.
1113
00:47:49,800 --> 00:47:53,200
Then you do the thing that makes the whole system safer without anyone noticing.
1114
00:47:53,200 --> 00:47:55,160
You build negative space.
1115
00:47:55,160 --> 00:47:58,440
Negative space means the system records what it refused to retrieve and what it refused
1116
00:47:58,440 --> 00:47:59,440
to say.
1117
00:47:59,440 --> 00:48:03,600
When the pre-filters exclude chunks, you lock that exclusion with a reason.
1118
00:48:03,600 --> 00:48:08,160
Excluded because venue external, excluded because confidentiality and turn only, excluded
1119
00:48:08,160 --> 00:48:10,320
because principle lacks access scope.
1120
00:48:10,320 --> 00:48:14,120
When the filtered retrieval returns nothing, that emptiness is not an error.
1121
00:48:14,120 --> 00:48:15,280
It is a safe outcome.
1122
00:48:15,280 --> 00:48:18,440
It is the system refusing to invent or overshare.
1123
00:48:18,440 --> 00:48:21,120
Most organizations treat no results as a UX bug.
1124
00:48:21,120 --> 00:48:23,120
They force the model to answer anyway.
1125
00:48:23,120 --> 00:48:25,680
That turns your rack system into a leak mechanism.
1126
00:48:25,680 --> 00:48:27,680
The safe behavior is sight or silent.
1127
00:48:27,680 --> 00:48:30,480
If there is no eligible evidence, the agent says less.
1128
00:48:30,480 --> 00:48:33,080
No eligible content found for this request is a feature.
1129
00:48:33,080 --> 00:48:37,120
It's the guardrail that stops the model from converting uncertainty into confident
1130
00:48:37,120 --> 00:48:38,120
nonsense.
1131
00:48:38,120 --> 00:48:40,840
Now you enforce the same discipline on generation.
1132
00:48:40,840 --> 00:48:45,240
The model can only assert claims that map to eligible chunk IDs and it must cite them.
1133
00:48:45,240 --> 00:48:46,720
If it can't cite it downgrades.
1134
00:48:46,720 --> 00:48:49,200
If it can't downgrade safely, it refuses.
1135
00:48:49,200 --> 00:48:52,360
This is how you make grounding measurable instead of aspirational.
1136
00:48:52,360 --> 00:48:55,200
But the part that most people miss is output classification.
1137
00:48:55,200 --> 00:48:59,120
Enterprises label inputs and then pretend outputs inherit safety biosmosis.
1138
00:48:59,120 --> 00:49:00,120
They don't.
1139
00:49:00,120 --> 00:49:01,120
The output is a new artifact.
1140
00:49:01,120 --> 00:49:02,120
It can aggregate.
1141
00:49:02,120 --> 00:49:03,120
It can summarize.
1142
00:49:03,120 --> 00:49:07,040
It can combine two non-sensitive facts into a sensitive conclusion.
1143
00:49:07,040 --> 00:49:09,360
And invoice scenarios output is publication.
1144
00:49:09,360 --> 00:49:13,680
So you derive an output sensitivity label from the sources used and the aggregation level
1145
00:49:13,680 --> 00:49:14,680
of the answer.
1146
00:49:14,680 --> 00:49:19,720
If the answer pulls from compensation guidance and produces cohort level metrics, the output
1147
00:49:19,720 --> 00:49:24,000
is compensation sensitive even if no single chunk was labeled secret.
1148
00:49:24,000 --> 00:49:27,400
Then you root that output through the same policy gate that controls tool calls because
1149
00:49:27,400 --> 00:49:32,360
speech is a tool call, venue plus output classification becomes your egress boundary.
1150
00:49:32,360 --> 00:49:37,960
Mixed audience, external participants, then the speech path requires a transform or deny.
1151
00:49:37,960 --> 00:49:39,120
Internal HR channel.
1152
00:49:39,120 --> 00:49:42,440
You might allow text, deny speech or require different identity.
1153
00:49:42,440 --> 00:49:46,920
The point is that the system decides at action time, not after the transcript is stored.
1154
00:49:46,920 --> 00:49:51,120
This is how rag stops being a knowledge feature and becomes a security boundary.
1155
00:49:51,120 --> 00:49:53,360
Eligibility before similarity.
1156
00:49:53,360 --> 00:49:57,440
Negative space as a first class record, outputs classified and gated like actions.
1157
00:49:57,440 --> 00:49:59,200
The agent still speaks when it has proof.
1158
00:49:59,200 --> 00:50:00,880
It goes quiet when it doesn't.
1159
00:50:00,880 --> 00:50:05,360
And that silence is what prevents the next incident from being perfectly logged.
1160
00:50:05,360 --> 00:50:06,360
Conditional access?
1161
00:50:06,360 --> 00:50:07,360
Necessary?
1162
00:50:07,360 --> 00:50:08,360
Not sufficient.
1163
00:50:08,360 --> 00:50:12,080
Conditional access is the most over praised control in the agent conversation and it's
1164
00:50:12,080 --> 00:50:13,080
still mandatory.
1165
00:50:13,080 --> 00:50:14,080
It is the front gate.
1166
00:50:14,080 --> 00:50:18,600
It decides whether an identity should receive a token right now under current risk signals,
1167
00:50:18,600 --> 00:50:22,040
device posture, location, sign-in-risk, workload context.
1168
00:50:22,040 --> 00:50:25,240
For agents and other non-human identities, that matters.
1169
00:50:25,240 --> 00:50:27,920
It shrinks who can even show up holding credentials.
1170
00:50:27,920 --> 00:50:28,920
You don't skip that.
1171
00:50:28,920 --> 00:50:33,160
But conditional access is also where enterprises stop thinking because it feels like enforcement.
1172
00:50:33,160 --> 00:50:34,880
This is the uncomfortable truth.
1173
00:50:34,880 --> 00:50:36,560
Conditional access is a token time decision.
1174
00:50:36,560 --> 00:50:38,080
It is not an action time decision.
1175
00:50:38,080 --> 00:50:40,760
And answers, may this identity obtain a token?
1176
00:50:40,760 --> 00:50:44,480
Not may this identity delete this site, share this file or speak this aggregation in this
1177
00:50:44,480 --> 00:50:45,480
venue?
1178
00:50:45,480 --> 00:50:48,720
Once the token exists, you are no longer in an authentication problem.
1179
00:50:48,720 --> 00:50:50,520
You are in an authorization problem.
1180
00:50:50,520 --> 00:50:55,480
And token issuance cannot adjudicate tool execution because tool execution happens later in a different
1181
00:50:55,480 --> 00:51:00,480
context after retrieval, after orchestration, after the meeting audience changes, after
1182
00:51:00,480 --> 00:51:03,160
the agent chooses a path you didn't anticipate.
1183
00:51:03,160 --> 00:51:08,040
That's why so many incidents look compliant in entra and still unacceptable in the business.
1184
00:51:08,040 --> 00:51:10,080
Walk the timeline and the gap becomes obvious.
1185
00:51:10,080 --> 00:51:12,040
The agent requests a token.
1186
00:51:12,040 --> 00:51:13,920
Conditional access evaluates and passes.
1187
00:51:13,920 --> 00:51:14,920
Good.
1188
00:51:14,920 --> 00:51:18,760
Then the agent retrieves a loud data under its scopes, logged, fine.
1189
00:51:18,760 --> 00:51:20,400
Then the agent proposes an action.
1190
00:51:20,400 --> 00:51:22,760
Delete, share, email, post, speak.
1191
00:51:22,760 --> 00:51:26,240
This is the moment that matters because this is the moment side effects happen.
1192
00:51:26,240 --> 00:51:30,120
And conditional access is not in that path unless you force it back in with a separate decision
1193
00:51:30,120 --> 00:51:31,120
point.
1194
00:51:31,120 --> 00:51:32,440
That is what the policy gate is for.
1195
00:51:32,440 --> 00:51:34,600
It is not a replacement for conditional access.
1196
00:51:34,600 --> 00:51:36,360
It is the missing second gate.
1197
00:51:36,360 --> 00:51:37,960
Conditional access decides who may try.
1198
00:51:37,960 --> 00:51:39,760
The policy engine decides what may happen.
1199
00:51:39,760 --> 00:51:43,960
Now the mistake teams make is trying to stretch conditional access to cover what it can't.
1200
00:51:43,960 --> 00:51:48,040
They pile on network locations, token protection, session controls, device filters and assume
1201
00:51:48,040 --> 00:51:50,080
the blast radius shrinks automatically.
1202
00:51:50,080 --> 00:51:51,080
It doesn't.
1203
00:51:51,080 --> 00:51:54,960
If the agent holds broad right scopes, the radius is already baked in.
1204
00:51:54,960 --> 00:51:57,320
Conditional access just decides who gets to hold the match.
1205
00:51:57,320 --> 00:51:59,760
So the architecture you enforce is a braid.
1206
00:51:59,760 --> 00:52:02,800
Conditional access at token time, strict and non-negotiable.
1207
00:52:02,800 --> 00:52:06,400
Least privilege on scopes because permissions are blast radius math.
1208
00:52:06,400 --> 00:52:10,480
Uncreated identities because one agent should not be one super identity and per tool call
1209
00:52:10,480 --> 00:52:14,640
policy evaluation because action time authorization is where incidents either happen or don't.
1210
00:52:14,640 --> 00:52:16,440
Now make monitoring, earn its keep.
1211
00:52:16,440 --> 00:52:19,080
Watch token issuance patterns on agent identities.
1212
00:52:19,080 --> 00:52:22,240
Unusual cadence, unusual geos, new client types.
1213
00:52:22,240 --> 00:52:23,480
That's the identity plane.
1214
00:52:23,480 --> 00:52:28,760
But also watch tool call shapes, spikes in deletes, sudden external egress, novel venues.
1215
00:52:28,760 --> 00:52:29,760
That's the action plane.
1216
00:52:29,760 --> 00:52:32,640
And when you detect drift, you don't retrain the agent.
1217
00:52:32,640 --> 00:52:37,040
With titan scopes, titan policies and shrink failure domains, conditional access is necessary
1218
00:52:37,040 --> 00:52:39,560
because it keeps the wrong identities from showing up.
1219
00:52:39,560 --> 00:52:43,080
It is not sufficient because the right identity can still do the wrong thing, perfectly
1220
00:52:43,080 --> 00:52:45,240
logged with a valid token.
1221
00:52:45,240 --> 00:52:49,240
The experience plane tax, web RTC, speech regions and metered certainty.
1222
00:52:49,240 --> 00:52:52,960
Now the punchline nobody budgets for until the demo becomes production, the experience plane
1223
00:52:52,960 --> 00:52:53,960
tax.
1224
00:52:53,960 --> 00:52:56,400
The face and the voice don't just add engagement.
1225
00:52:56,400 --> 00:53:01,120
They add failure domains, networks, regions and metering and none of that complexity
1226
00:53:01,120 --> 00:53:04,800
buys you a single extra millisecond of deterministic control.
1227
00:53:04,800 --> 00:53:05,880
Start with web RTC.
1228
00:53:05,880 --> 00:53:07,800
It works beautifully in a clean lab.
1229
00:53:07,800 --> 00:53:12,360
Then it meets enterprise reality, NIT traversal, VPN hairpins, deep packet inspection, split
1230
00:53:12,360 --> 00:53:16,760
tunnel policies and firewalls that quietly hate UDP, so you fall back to relays.
1231
00:53:16,760 --> 00:53:18,040
Turn becomes mandatory.
1232
00:53:18,040 --> 00:53:20,640
That adds hops, jitter and operational overhead.
1233
00:53:20,640 --> 00:53:25,200
The avatar stutters, the audio talks over itself and the system compensates with retries
1234
00:53:25,200 --> 00:53:30,000
and reconnects more events, more envelope churn, more entropy injected into the same pathway
1235
00:53:30,000 --> 00:53:31,880
that also drives tool calls.
1236
00:53:31,880 --> 00:53:32,880
Then speech regions.
1237
00:53:32,880 --> 00:53:35,040
Azure Speech is region bound by design.
1238
00:53:35,040 --> 00:53:37,080
Keys are region locked and points are regional.
1239
00:53:37,080 --> 00:53:40,400
If you serve multiple geographies, you don't have a voice.
1240
00:53:40,400 --> 00:53:45,840
You have a fleet of voices, separate resources, quotas, keys, routing logic and failover plans.
1241
00:53:45,840 --> 00:53:50,120
When a region blips, the agent doesn't fail in a way your control plane can reason about.
1242
00:53:50,120 --> 00:53:53,840
It fails in the human layer, the voice disappears and the business interprets that as the
1243
00:53:53,840 --> 00:53:58,640
agent is down, even if the decision engine is still happily proposing actions.
1244
00:53:58,640 --> 00:54:02,720
And it's all metered, per second, not per outcome, not per prevented incident, per second
1245
00:54:02,720 --> 00:54:04,040
of stream certainty.
1246
00:54:04,040 --> 00:54:07,960
So you end up financing persuasion, tens of millions of seconds of compute to animate confidence
1247
00:54:07,960 --> 00:54:11,560
while the control plane that could prevent harm remains underbuilt.
1248
00:54:11,560 --> 00:54:15,160
Conclusion, assume the face is lying and force intended action time.
1249
00:54:15,160 --> 00:54:16,600
The voice adds trust.
1250
00:54:16,600 --> 00:54:19,520
The system did not earn it and logs won't save you after the fact.
1251
00:54:19,520 --> 00:54:22,800
Make the agent propose then force the control plane to dispose.
1252
00:54:22,800 --> 00:54:29,160
Item potency, authoritative state, per tool call policy gates and segmented identities.
1253
00:54:29,160 --> 00:54:33,240
If you do one thing next, audit where actions execute without a gate and market red, then
1254
00:54:33,240 --> 00:54:34,720
fund determinism not avatars.