The Secret Architecture That Makes AI Agents Actually Work
Tired of “smart” AI agents doing dumb, dangerous things in your Microsoft 365 tenant? This episode shows you the one architectural move that turns flaky prompt-powered agents into reliable, auditable systems: a pre-execution contract check that blocks bad behavior before it ever hits your data. We walk through how to separate LLM cognition from real-world operations, why executors and validated workflow graphs beat prompt hacks every time, and how to wire this into Microsoft 365 Graph, Azure OpenAI and Copilot Studio without creating a compliance nightmare.
You’ll see how a validator proves three things before any tool call runs: the capability is real, the caller actually has permission right now, and the outcome is feasible and verifiable within strict data boundaries. No “trust the model,” no silent partial failures, no hallucinated tools. Instead, you get schema-checked JSON, idempotent executors, policy-enforced allow lists, human checkpoints as first-class workflow nodes, and graph validation that blocks unsafe workflows at build time and runtime.
By the end, you have a mental model you can run in your head: the LLM proposes, the executor enforces, the graph constrains, the validator decides. Accuracy stabilizes, latency narrows, cost flattens and audits turn from witch hunts into simple queries. If you’re serious about building Copilot-style agents on Microsoft 365, this is the secure-by-design blueprint that replaces vibes with numbers and turns “I think it worked” into “I am allowed, I know how and I can prove I did.”
Most people think AI agents fail because of weak prompts. Not true. Prompts guide reasoning—but executors, validation, and workflow graphs are what guarantee reliability. In this episode, we reveal the architecture behind stable, predictable, enterprise-ready AI agents using Microsoft 365 Graph, Azure OpenAI, and Copilot Studio. You’ll learn why traditional prompt-only agents hallucinate tools, break policies, and silently fail—and how a contract-first, validator-enforced architecture fixes accuracy, latency, cost, and auditability. This is the mental model and blueprint every AI builder should have started with. What You’ll Learn 1. Why Prompts Fail at Real-World Operations
- The difference between cognition (LLMs) and operations (executors)
- Why models hallucinate tools and ignore preconditions
- How executors enforce idempotency, postconditions, and error recovery
- The “silent partial” problem that breaks enterprise workflows
2. Workflow Graphs: The Map AI Agents Actually Need
- Nodes, edges, state, and explicit control flow
- Why DAGs (directed acyclic graphs) dominate reliable workflows
- State isolation: persistent vs ephemeral vs derived
- Compensations and rollback logic for real-world side effects
- Memory boundaries to prevent cross-session leakage
3. Secure-by-Design: Validation That Stops Chaos
- Static graph validation: cycles, unreachable nodes, contract checks
- Runtime policy checks: RBAC, ABAC, allowlists, token scopes
- Input/output sanitization to prevent prompt injection
- Sandboxing, segmentation, and safe egress controls
- Immutable logging and node-level tracing for auditability
4. Microsoft Integration: M365 Graph + Azure OpenAI + Copilot Studio
- Least-privilege Graph access with selective fields and delta queries
- Chunking, provenance, and citation enforcement
- Azure OpenAI as a reasoning layer with schema-bound outputs
- Copilot Studio for orchestration, human checkpoints, and approvals
- Reliable execution using idempotency keys, retries, and validation gates
5. Before/After Metrics: The Proof
- Higher factual accuracy due to citation-verified grounding
- Lower p95 latency via parallel nodes + early exit
- Reduced token cost from selective context and structured plans
- Dramatic drop in admin overhead through traceability and observability
- Stable first-pass completion rates with fewer human rescues
6. The One Gate That Prevents Dumb Agent Mistakes
- The pre-execution contract check:
- Capability match
- Policy compliance
- Postcondition feasibility
- Deny-with-reason paths that provide safe alternatives
- Preventing privilege escalation, data leaks, and invalid actions
Key Takeaways
- Prompts are thoughts. Executors are actions. Validation is safety.
- Reliable AI agents require architecture—not vibes.
- Graph validation, policy enforcement, and idempotent execution turn “smart” into safe + correct.
- Grounding with Microsoft Graph and Azure OpenAI citations ensures accuracy you can audit.
- A single contract gate prevents 90% of catastrophic agent failures.
Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-show-podcast--6704921/support.
Follow us on:
LInkedIn
Substack
1
00:00:00,000 --> 00:00:03,560
The one validation that prevents smart agents doing dumb things.
2
00:00:03,560 --> 00:00:07,760
There's one gate that turns clever into competent, the pre-execution contract check.
3
00:00:07,760 --> 00:00:10,960
Before any tool runs, the validator proves three things in order.
4
00:00:10,960 --> 00:00:14,960
In 10 matches are real capability, the caller has permission right now,
5
00:00:14,960 --> 00:00:18,480
and the requested outcome is feasible within the declared data boundaries.
6
00:00:18,480 --> 00:00:21,000
Fail any part, and nothing executes.
7
00:00:21,000 --> 00:00:22,880
Not be careful, not try anyway.
8
00:00:22,880 --> 00:00:24,520
Deny with reasons and alternatives.
9
00:00:24,520 --> 00:00:26,000
Start with capability match.
10
00:00:26,000 --> 00:00:28,200
The plan says update SharePoint list item.
11
00:00:28,200 --> 00:00:32,040
The validator asks which tool, which method, which schema version.
12
00:00:32,040 --> 00:00:37,960
Arguments are checked against the registry, required fields present, types correct, value ranges sane,
13
00:00:37,960 --> 00:00:41,800
and yes, no extra fields smuggled in, hoping someone ignores them.
14
00:00:41,800 --> 00:00:45,240
Tool aliasing is forbidden. Use the canonical name or get rejected.
15
00:00:45,240 --> 00:00:49,160
This kills hallucinated tools and gassy parameters before they can misbehave.
16
00:00:49,160 --> 00:00:51,560
Next, policy compliance.
17
00:00:51,560 --> 00:00:55,240
The policy engine evaluates the proposed call against active policy,
18
00:00:55,240 --> 00:00:59,480
allow lists for tools and domains, RBIAC or ABC checks tied to
19
00:00:59,480 --> 00:01:04,680
enter ID claims and scopes, environment, tier rules, data classification boundaries,
20
00:01:04,680 --> 00:01:06,600
if the agent is scoped to files.
21
00:01:06,600 --> 00:01:10,360
Read for a specific site, an update call or a different site is a hard no.
22
00:01:10,360 --> 00:01:14,920
If the payload dips into restricted classification without a human checkpoint also know,
23
00:01:14,920 --> 00:01:20,040
tokens are verified fresh, scopes are verified exact and privileged escalation by
24
00:01:20,040 --> 00:01:23,560
just this once is treated like what it is, an attempted breach,
25
00:01:23,560 --> 00:01:27,080
then post-conditioned feasibility. It's not enough to want an outcome.
26
00:01:27,080 --> 00:01:29,320
It has to be achievable and verifiable.
27
00:01:29,320 --> 00:01:33,080
The validator asks, does the destination support idempotency keys?
28
00:01:33,080 --> 00:01:36,520
Will the system emit a durable identifier or ETAC we can check after?
29
00:01:36,520 --> 00:01:39,000
Is there a compensating action if downstream fails?
30
00:01:39,000 --> 00:01:43,400
If the plan can't produce verifiable post-conditions, it's rejected or rewritten.
31
00:01:43,400 --> 00:01:47,240
We don't accept trust me, our logit later. Later is how incidents happen.
32
00:01:47,240 --> 00:01:51,560
Put those together and you get the triogate capability match policy compliance post-conditioned
33
00:01:51,560 --> 00:01:56,200
feasibility, pass all three and the executor proceeds, fail any and the deny with reason
34
00:01:56,200 --> 00:02:00,520
path activates. That path is polite, thorough and unambiguous.
35
00:02:00,520 --> 00:02:05,160
Here's what you tried, here's the exact policy or schema you broke, here are safe alternatives.
36
00:02:05,160 --> 00:02:09,560
If the user intent can be repaired, narrow the scope, switch to a read only summary,
37
00:02:09,560 --> 00:02:11,160
root to a permitted site.
38
00:02:11,160 --> 00:02:14,600
The validator proposes a compliant plan and asks for approval.
39
00:02:14,600 --> 00:02:19,000
If it can't be repaired, escalate to a human checkpoint with full context or stop outright.
40
00:02:19,000 --> 00:02:21,400
No mystery stalls, no silence success.
41
00:02:21,400 --> 00:02:25,320
Quick micro story you've lived, even if you didn't notice.
42
00:02:25,320 --> 00:02:29,720
The agent is asked to summarize all HR docs from last quarter an email legal.
43
00:02:29,720 --> 00:02:33,000
Retrieval proposes graph queries across HR and legal sites.
44
00:02:33,000 --> 00:02:34,920
Validator sees a boundary crossing.
45
00:02:34,920 --> 00:02:37,480
HR is restricted, legal is internal.
46
00:02:37,480 --> 00:02:40,600
The triogate blocks the cross-read and the outbound email.
47
00:02:40,600 --> 00:02:42,360
The deny path returns.
48
00:02:42,360 --> 00:02:44,040
Restricted content detected.
49
00:02:44,040 --> 00:02:47,080
Proposed alternative, summarize internal policy index,
50
00:02:47,080 --> 00:02:50,520
provide a request link for HR summary to authorised reviewers.
51
00:02:50,520 --> 00:02:53,720
User approves the compliant plan, executor runs it,
52
00:02:53,720 --> 00:02:57,560
sites internal content and attaches a permission request for the HR material.
53
00:02:57,560 --> 00:02:59,160
No leak, no drama, still helpful.
54
00:02:59,160 --> 00:03:03,000
Implementation is boring by design, a policy DSL defines who can do what,
55
00:03:03,000 --> 00:03:04,600
where, with which side effects,
56
00:03:04,600 --> 00:03:09,800
a schema registry stores tool contracts, names, versions, argument shapes and post-conditions.
57
00:03:09,800 --> 00:03:14,040
An allow list resolver maps domains, sites and scopes to environment tiers.
58
00:03:14,040 --> 00:03:18,040
The validator composes these, stamps every decision with an audit record inputs,
59
00:03:18,040 --> 00:03:22,920
policy evaluations outcomes and hands either a green light or a repair plan to the executor.
60
00:03:22,920 --> 00:03:27,160
The executor never freelances around a red light, it enforces the decision or stops.
61
00:03:27,160 --> 00:03:31,400
The mental model is clean enough to tattoo on the forehead of your architecture diagram.
62
00:03:31,400 --> 00:03:34,840
Executors enforce, graphs constrain, validators decide.
63
00:03:34,840 --> 00:03:37,880
In that order, the model proposes only within the fenced yard
64
00:03:37,880 --> 00:03:42,360
and the minute it tries to climb over, the validator pulls it back, explains why,
65
00:03:42,360 --> 00:03:43,880
and points to the gate.
66
00:03:43,880 --> 00:03:47,880
Order preserved, safety enforced, progress unblocked, within policy.
67
00:03:47,880 --> 00:03:52,360
You wanted one thing that prevents smart agents from doing dumb things.
68
00:03:52,360 --> 00:03:55,800
This is it a pre-execution contract check that proves capability,
69
00:03:55,800 --> 00:03:59,240
permission and verifiable outcome before any real world mutation.
70
00:03:59,240 --> 00:04:03,400
It turns, I think I can into, I am allowed, I know how and I can prove I did.
71
00:04:03,400 --> 00:04:08,200
You now have the architecture.
72
00:04:08,200 --> 00:04:09,000
Use it.
73
00:04:09,000 --> 00:04:10,840
Key takeaway plus CT.
74
00:04:10,840 --> 00:04:12,840
Key takeaway prompts our opinions,
75
00:04:12,840 --> 00:04:15,400
executors and validated graphs are operations,
76
00:04:15,400 --> 00:04:19,000
and the pre-execution contract check is the guardrail that keeps both honest.
77
00:04:19,000 --> 00:04:22,120
If this saved you time, repay the debt.
78
00:04:22,120 --> 00:04:22,680
Subscribe.
79
00:04:22,680 --> 00:04:28,920
Next, watch the long graph versus Microsoft agent framework breakdown on performance and observability,
80
00:04:28,920 --> 00:04:31,560
actual traces, costs and P95s.
81
00:04:31,560 --> 00:04:35,880
Lock in your upgrade path, follow, enable alerts and get the next episode delivered on schedule.
82
00:04:35,880 --> 00:04:36,520
Proceed.
83
00:04:36,520 --> 00:04:38,920
Most people think better prompts fix flaky agents.
84
00:04:38,920 --> 00:04:42,680
Cute theory, prompt skythodes, they don't execute operations.
85
00:04:42,680 --> 00:04:46,520
The truth, reliability comes from executors and graph validation.
86
00:04:46,520 --> 00:04:49,560
The spine that keeps agents from face planting when reality shows up.
87
00:04:49,560 --> 00:04:51,560
We're going to wire this to Microsoft scenarios.
88
00:04:51,560 --> 00:04:55,160
Microsoft 365 Graph Retrieval as your open AI reasoning
89
00:04:55,160 --> 00:04:57,800
and co-pilot studio agents that don't go rogue.
90
00:04:57,800 --> 00:05:02,920
Stakes are simple, accuracy, latency, cost and auditability.
91
00:05:02,920 --> 00:05:04,680
Measureable, not vibes.
92
00:05:04,680 --> 00:05:07,000
I'll give you a mental model you can run in your head,
93
00:05:07,000 --> 00:05:12,360
diagrams in words you won't forget and one validation step that stops smart agents from doing dumb things.
94
00:05:12,360 --> 00:05:14,840
Enter the architecture you should have used day one.
95
00:05:14,840 --> 00:05:18,360
Why prompts fail at operations executors don't?
96
00:05:18,360 --> 00:05:19,560
Okay, so here's the thing.
97
00:05:19,560 --> 00:05:21,080
LLM's handle cognition.
98
00:05:21,080 --> 00:05:23,720
Executors handle operations.
99
00:05:23,720 --> 00:05:25,400
Mixing those is how you get chaos.
100
00:05:25,400 --> 00:05:27,320
The model can propose a plan.
101
00:05:27,320 --> 00:05:32,600
It cannot guarantee that the email was sent, the file was saved or the permission existed.
102
00:05:32,600 --> 00:05:35,880
It speaks in probabilities operations demand guarantees.
103
00:05:35,880 --> 00:05:37,320
Enter the executor.
104
00:05:37,320 --> 00:05:42,520
Think of it as the adult in the room, a policy bound function runner with state constraints and item potency.
105
00:05:42,520 --> 00:05:45,720
It doesn't believe an action succeeded.
106
00:05:45,720 --> 00:05:46,520
It checks.
107
00:05:46,520 --> 00:05:48,440
It doesn't assume a tool exists.
108
00:05:48,440 --> 00:05:52,920
It validates capability, parameters and permissions before it even tries.
109
00:05:52,920 --> 00:05:58,200
And when it fails, it fails loudly, classifies the error and takes the prescribed recovery path.
110
00:05:58,200 --> 00:06:01,000
Most prompt only agents fall into three failure modes.
111
00:06:01,000 --> 00:06:03,880
First, hallucinated tools.
112
00:06:03,880 --> 00:06:08,760
The model requests a function that isn't registered or calls it with fields that don't exist.
113
00:06:08,760 --> 00:06:10,600
Second, missing preconditions.
114
00:06:10,600 --> 00:06:16,600
It tries to edit a SharePoint file without checking if it can access the site, the list or the item version.
115
00:06:16,600 --> 00:06:21,480
Third, Silent Partials step two fails, but the agent keeps going and declares victory because the text looks confident.
116
00:06:21,480 --> 00:06:22,200
You've seen this.
117
00:06:22,200 --> 00:06:23,640
You just called it flaky.
118
00:06:23,640 --> 00:06:26,600
Executors run a loop that looks boring and that's why it works.
119
00:06:26,600 --> 00:06:27,560
Preconditions.
120
00:06:27,560 --> 00:06:30,040
Verify inputs, permissions and invariants.
121
00:06:30,040 --> 00:06:30,680
Action.
122
00:06:30,680 --> 00:06:34,840
Call the tool with an item potency key so retries won't double bill or double post.
123
00:06:34,840 --> 00:06:36,600
Post-conditions.
124
00:06:36,600 --> 00:06:38,600
Confirm effects against the source of truth.
125
00:06:38,600 --> 00:06:40,680
Error taxonomy.
126
00:06:40,680 --> 00:06:44,360
Is it validation, transient, rate limit, youth or policy?
127
00:06:44,360 --> 00:06:45,960
Recovery.
128
00:06:45,960 --> 00:06:49,400
Back-off and retry for transient, re-auth or reconsent for oath,
129
00:06:49,400 --> 00:06:53,080
fallbacks for known alternates and hard stops with reasons for policy violations.
130
00:06:53,080 --> 00:06:55,080
Edempotency is non-negotiable.
131
00:06:55,080 --> 00:07:00,360
If an action can be retried, it needs a key that makes the second attempt a no-op or a consistent override.
132
00:07:00,360 --> 00:07:02,120
Timeouts prevent zombie calls.
133
00:07:02,120 --> 00:07:03,720
Back-off respects rate limits.
134
00:07:03,720 --> 00:07:08,520
This is deterministic behavior, governing, inherently, stochastic text generation.
135
00:07:08,520 --> 00:07:11,720
The executor is the gearbox, the LLM is the engine.
136
00:07:11,720 --> 00:07:14,680
You don't floor the engine and hope the wheels understand.
137
00:07:14,680 --> 00:07:17,800
Contract first outputs are the antidote to best effort paragraphs.
138
00:07:17,800 --> 00:07:20,280
The reasoning model doesn't get to ramble.
139
00:07:20,280 --> 00:07:24,840
It emits JSON matching a schema, tool name, arguments and expected post-conditions.
140
00:07:24,840 --> 00:07:27,640
Validators check schema compliance before anything runs.
141
00:07:27,640 --> 00:07:30,200
If the shape is wrong, the executor denies with a reason,
142
00:07:30,200 --> 00:07:32,200
requests a corrected plan or escalates.
143
00:07:32,200 --> 00:07:35,240
That's how you stop noun-salads from becoming production incidents.
144
00:07:35,240 --> 00:07:38,760
Now you might be thinking, can't I just prompt the model to check its work?
145
00:07:38,760 --> 00:07:41,000
You can ask, it will say yes, it always does.
146
00:07:41,000 --> 00:07:42,680
The average user accepts that.
147
00:07:42,680 --> 00:07:47,000
Professionals require proofs, proofs live in post-conditions verified against systems like
148
00:07:47,000 --> 00:07:51,640
Microsoft Graph, SharePoint or Exchange, real sources of truth, not the model's memory.
149
00:07:51,640 --> 00:07:53,240
But here's where it gets interesting.
150
00:07:53,240 --> 00:07:54,520
Single steps are fine.
151
00:07:54,520 --> 00:07:58,920
Real workflows have branches, parallelism, compensations and human checkpoints.
152
00:07:58,920 --> 00:08:02,040
You need more than an executor, you need to map the executor can read.
153
00:08:02,040 --> 00:08:03,160
That's a workflow graph.
154
00:08:03,160 --> 00:08:06,280
In a graph, nodes are tasks or sub-agents with explicit contracts.
155
00:08:06,280 --> 00:08:08,360
Edge is defined, control flow and data flow.
156
00:08:08,360 --> 00:08:09,320
State is first class.
157
00:08:09,320 --> 00:08:11,880
What persists, what's check-pointed, what's ephemeral.
158
00:08:11,880 --> 00:08:16,360
The executor walks the graph deterministically honoring allow lists and schemers at each edge.
159
00:08:16,360 --> 00:08:19,880
If a node fails, the graph specifies compensations.
160
00:08:19,880 --> 00:08:22,280
Undo, repair or escalate.
161
00:08:22,280 --> 00:08:25,320
No mystery pipes, no and then the magic happens.
162
00:08:25,320 --> 00:08:27,080
This is the moment reliability flips.
163
00:08:27,080 --> 00:08:28,200
The LLM proposes.
164
00:08:28,200 --> 00:08:32,120
The executor enforces the graph constraints validation decides.
165
00:08:32,120 --> 00:08:33,400
Order, restored.
166
00:08:33,400 --> 00:08:35,800
Once you separate cognition from operations,
167
00:08:35,800 --> 00:08:38,280
your agents stop improvising and start behaving.
168
00:08:38,280 --> 00:08:41,800
And yes, they still feel smart because they are, but now they are supervised.
169
00:08:41,800 --> 00:08:42,920
Blueprint ready.
170
00:08:42,920 --> 00:08:46,920
Let's wire it to Microsoft 365 without turning your tenant into a buffet.
171
00:08:46,920 --> 00:08:49,080
Graph workflows 101.
172
00:08:49,080 --> 00:08:51,400
Nodes, edges and state that doesn't leak.
173
00:08:51,400 --> 00:08:52,200
Picture this.
174
00:08:52,200 --> 00:08:54,440
You've got a reliable executor but no map.
175
00:08:54,440 --> 00:08:56,680
It will follow orders, but to where?
176
00:08:56,680 --> 00:08:59,240
Enter the workflow graph, your operating manual.
177
00:08:59,240 --> 00:09:02,520
Nodes are tasks or sub-agents with explicit contracts.
178
00:09:02,520 --> 00:09:04,360
Edge is defined, control flow and data flow.
179
00:09:04,360 --> 00:09:06,920
The graph encodes what runs, when it runs,
180
00:09:06,920 --> 00:09:09,080
and what data is allowed to cross boundaries.
181
00:09:09,080 --> 00:09:10,600
No improvisational jazz.
182
00:09:10,600 --> 00:09:11,880
This is sheet music.
183
00:09:11,880 --> 00:09:13,000
Start with the shape.
184
00:09:13,000 --> 00:09:16,360
Most production graphs are DAGs, directed acyclic graphs.
185
00:09:16,360 --> 00:09:19,880
Because cycles invite infinite loops and state corrosion.
186
00:09:19,880 --> 00:09:23,640
You can still have loops, but you mark them intentionally with counters and guards.
187
00:09:23,640 --> 00:09:26,600
Maximum iterations, exit predicates and timeouts.
188
00:09:26,600 --> 00:09:29,320
That's how you keep think harder from becoming think forever.
189
00:09:29,320 --> 00:09:32,280
Conditional routing is explicit.
190
00:09:32,280 --> 00:09:36,520
If the retrieval confidence is above threshold, branch to synthesis.
191
00:09:36,520 --> 00:09:38,280
If not, branch to requery.
192
00:09:38,280 --> 00:09:40,280
Parallelism is first class.
193
00:09:40,280 --> 00:09:43,400
Run summarization and citation extraction side by side,
194
00:09:43,400 --> 00:09:47,720
then join at a barrier node that verifies both met their post-conditions.
195
00:09:47,720 --> 00:09:49,320
State is not a vibe, it's a ledger.
196
00:09:49,320 --> 00:09:51,320
You maintain three kinds, persistent state.
197
00:09:51,320 --> 00:09:53,880
Durable checkpoints you can recover from after a crash.
198
00:09:53,880 --> 00:09:55,800
Inputs, decisions, signed actions.
199
00:09:55,800 --> 00:09:58,920
Ephemeral state short-lived buffers like intermediate model outputs
200
00:09:58,920 --> 00:10:00,680
you don't want to pollute long-term memory.
201
00:10:00,680 --> 00:10:04,520
Derived state, re-computable artifacts like embeddings or filtered results
202
00:10:04,520 --> 00:10:06,040
you can rebuild deterministically.
203
00:10:06,040 --> 00:10:06,920
The rule is simple.
204
00:10:06,920 --> 00:10:09,080
Only persist what you can defend in an audit.
205
00:10:09,080 --> 00:10:10,840
Everything else is disposable on purpose.
206
00:10:10,840 --> 00:10:12,440
Rollback strategy matters.
207
00:10:12,440 --> 00:10:14,440
When a node mutates the outside world,
208
00:10:14,440 --> 00:10:17,320
creates a calendar event, updates a list item,
209
00:10:17,320 --> 00:10:21,560
you record an inverse action if it exists or a compensating plan if it doesn't.
210
00:10:21,560 --> 00:10:25,960
If a downstream node fails fatally, the graph can walk those compensations in reverse order.
211
00:10:25,960 --> 00:10:27,160
No, this is not overkill.
212
00:10:27,160 --> 00:10:31,480
It's how you avoid oops, double-booked the CEO becoming oops, we can't fix it.
213
00:10:31,480 --> 00:10:32,920
Edge is carry contracts.
214
00:10:32,920 --> 00:10:34,680
An edge isn't a mystery pipe.
215
00:10:34,680 --> 00:10:36,520
It's an API between nodes.
216
00:10:36,520 --> 00:10:38,120
Define IO schemas.
217
00:10:38,120 --> 00:10:40,280
Types required fields allowed values.
218
00:10:40,280 --> 00:10:43,800
Define allow lists, what tools or domains the next node may call.
219
00:10:43,800 --> 00:10:46,760
Define capability tags, what the receiving node promises to do
220
00:10:46,760 --> 00:10:48,120
and what it will refuse.
221
00:10:48,120 --> 00:10:50,440
The executor enforces those at runtime.
222
00:10:50,440 --> 00:10:53,720
A node can't smuggle a sharepoint token through a summary edge.
223
00:10:53,720 --> 00:10:55,080
That's not security theater.
224
00:10:55,080 --> 00:10:58,120
That's how you prevent lateral movement by your own agent.
225
00:10:58,120 --> 00:11:01,560
Error handling lives in the graph, not in vibes.
226
00:11:01,560 --> 00:11:05,400
Every node declares its error taxonomy, validation error,
227
00:11:05,400 --> 00:11:07,880
transient infrastructure, rate limiting,
228
00:11:07,880 --> 00:11:12,200
authentication, authorization, policy violation, and unknown.
229
00:11:12,200 --> 00:11:14,760
For each class, the graph provides a path.
230
00:11:14,760 --> 00:11:17,400
Retry with exponential back-off for transient,
231
00:11:17,400 --> 00:11:20,520
refreshed token for authentication, alternate tool for rate limiting,
232
00:11:20,520 --> 00:11:22,200
deny with reason for policy.
233
00:11:22,200 --> 00:11:24,360
Deadlet accuse exist for the unknowns.
234
00:11:24,360 --> 00:11:27,000
Failed payloads go to quarantine with full context,
235
00:11:27,000 --> 00:11:30,440
so humans can inspect without replaying chaos into production.
236
00:11:30,440 --> 00:11:34,200
Human in the loop check points are nodes, not ad hoc Slack messages.
237
00:11:34,200 --> 00:11:37,320
They freeze the execution, present the proposed action and evidence,
238
00:11:37,320 --> 00:11:40,040
and require an approval or edit that's logged and signed.
239
00:11:40,040 --> 00:11:41,640
Once approved execution resumes,
240
00:11:41,640 --> 00:11:44,200
if denied the graph routes to a safe fallback.
241
00:11:44,200 --> 00:11:46,680
Congratulations, you've just implemented change control
242
00:11:46,680 --> 00:11:48,600
that developers will actually follow
243
00:11:48,600 --> 00:11:49,960
because it's faster than email.
244
00:11:49,960 --> 00:11:52,120
Memory isolation is non-negotiable.
245
00:11:52,120 --> 00:11:53,880
Each session gets scoped context,
246
00:11:53,880 --> 00:11:57,160
only the documents, tokens, and intermediate results it needs.
247
00:11:57,160 --> 00:12:00,280
Cross-session poisoning, where one conversation's prompt injection bleeds
248
00:12:00,280 --> 00:12:03,640
into another is how you accidentally ex-filterate data.
249
00:12:03,640 --> 00:12:05,080
The graph enforces boundaries.
250
00:12:05,080 --> 00:12:08,520
No shared mutable memory, only sanctioned reads from a vetted store
251
00:12:08,520 --> 00:12:10,520
with content filters and schema validators.
252
00:12:10,520 --> 00:12:12,280
Yes, you can cache embeddings and summaries,
253
00:12:12,280 --> 00:12:14,200
but you tag them with provenance and permissions
254
00:12:14,200 --> 00:12:16,040
and you evict them on policy changes.
255
00:12:16,040 --> 00:12:18,520
Observability is built in, not bolted on.
256
00:12:18,520 --> 00:12:21,080
Node-level traces show inputs, outputs, durations,
257
00:12:21,080 --> 00:12:22,520
retries, and downstream effects.
258
00:12:22,520 --> 00:12:24,680
You stitch traces into a graph run ID
259
00:12:24,680 --> 00:12:27,880
so you can replay, diagnose, and prove compliance.
260
00:12:27,880 --> 00:12:28,920
Who did what?
261
00:12:28,920 --> 00:12:30,040
When and why?
262
00:12:30,040 --> 00:12:32,440
Anomaly detection flags weird patterns.
263
00:12:32,440 --> 00:12:35,480
Sudden spikes in tool calls, unusual domains, token blowups.
264
00:12:35,480 --> 00:12:36,680
That's your early warning system
265
00:12:36,680 --> 00:12:38,920
before interesting becomes incident.
266
00:12:38,920 --> 00:12:40,920
Essentially, the graph is the constitution.
267
00:12:40,920 --> 00:12:42,440
The executor is law enforcement.
268
00:12:42,440 --> 00:12:44,360
The LLM is counsel, not judge.
269
00:12:44,360 --> 00:12:47,400
When you encode nodes, edges, and state like this,
270
00:12:47,400 --> 00:12:49,640
you don't just get workflows that succeed.
271
00:12:49,640 --> 00:12:52,280
You get workflows that fail safely, explain themselves
272
00:12:52,280 --> 00:12:53,800
and recover predictably.
273
00:12:53,800 --> 00:12:57,000
Now, blueprint in hand, we can connect to Microsoft 365
274
00:12:57,000 --> 00:12:59,400
without turning your tenant into a buffer.
275
00:12:59,400 --> 00:13:02,920
Secure by design, graph validation beats chaos engineering.
276
00:13:02,920 --> 00:13:05,800
You don't prove reliability by throwing chaos at production
277
00:13:05,800 --> 00:13:08,040
and hoping the survivors are resilient.
278
00:13:08,040 --> 00:13:11,400
You prove it by rejecting unsafe workflows before they ever run.
279
00:13:11,400 --> 00:13:12,440
That's graph validation.
280
00:13:12,440 --> 00:13:14,920
Static checks to keep nonsense out, run time guard rails
281
00:13:14,920 --> 00:13:16,360
to keep danger in a box.
282
00:13:16,360 --> 00:13:18,040
Static validation is the pre-flight.
283
00:13:18,040 --> 00:13:19,800
You check the structure before wheels up.
284
00:13:19,800 --> 00:13:22,120
Cycles that create unbounded loops,
285
00:13:22,120 --> 00:13:24,280
rejected or forced to declare iteration guards,
286
00:13:24,280 --> 00:13:27,640
unreachable nodes, dead code is risk, delete or justify.
287
00:13:27,640 --> 00:13:28,920
Missing contracts?
288
00:13:28,920 --> 00:13:31,400
Every node must declare input and output schemers
289
00:13:31,400 --> 00:13:33,480
required capabilities and side effects.
290
00:13:33,480 --> 00:13:36,760
Privileged boundaries, nodes that mutate external systems
291
00:13:36,760 --> 00:13:39,800
must run in segments with least privileged credentials
292
00:13:39,800 --> 00:13:41,080
and explicit allow lists.
293
00:13:41,080 --> 00:13:43,640
And yes, if your summarized node suddenly requests
294
00:13:43,640 --> 00:13:46,920
right access to SharePoint, the validator says no with prejudice.
295
00:13:46,920 --> 00:13:48,120
Now, run time.
296
00:13:48,120 --> 00:13:49,640
This is where people get sloppy.
297
00:13:49,640 --> 00:13:53,000
A policy engine sits beside the executor, not behind it.
298
00:13:53,000 --> 00:13:55,320
Tool and domain allow lists aren't documentation
299
00:13:55,320 --> 00:13:56,760
they're enforced decisions.
300
00:13:56,760 --> 00:13:59,560
At call time, the engine checks R-back or R-back
301
00:13:59,560 --> 00:14:00,920
against the active principle,
302
00:14:00,920 --> 00:14:03,320
Entra ID token, scopes, claims,
303
00:14:03,320 --> 00:14:05,160
and propagates auth correctly down the chain.
304
00:14:05,160 --> 00:14:06,600
No ambient superpowers.
305
00:14:06,600 --> 00:14:08,840
Tokens are scoped, refreshed when permitted,
306
00:14:08,840 --> 00:14:10,760
and never smuggled through friendly edges.
307
00:14:10,760 --> 00:14:13,720
The agent earns access on every call or a doesn't call.
308
00:14:13,720 --> 00:14:16,680
Input and output sanitization is hygiene, not optional.
309
00:14:16,680 --> 00:14:18,440
Prompt injection isn't clever.
310
00:14:18,440 --> 00:14:19,720
It's predictable.
311
00:14:19,720 --> 00:14:22,520
Every inbound content source passes through content filters,
312
00:14:22,520 --> 00:14:24,200
HTML's, markdown scrubbers,
313
00:14:24,200 --> 00:14:27,560
and instruction firewalls that strip ignore previous nonsense.
314
00:14:27,560 --> 00:14:30,040
Output passes schema validators.
315
00:14:30,040 --> 00:14:31,800
If a node promised Jason,
316
00:14:31,800 --> 00:14:34,840
the executor rejects pros and requests a repair.
317
00:14:34,840 --> 00:14:36,120
The model can argue,
318
00:14:36,120 --> 00:14:37,720
the validator doesn't negotiate.
319
00:14:37,720 --> 00:14:40,440
It enforces types, ranges, and invariants.
320
00:14:40,440 --> 00:14:43,320
Sandboxing and segmentation contain the blast radius.
321
00:14:43,320 --> 00:14:46,440
Notes that call external code or untrusted connectors
322
00:14:46,440 --> 00:14:47,880
run in constrained environments.
323
00:14:47,880 --> 00:14:50,920
API gateways with rate limits and out-of-there network policies
324
00:14:50,920 --> 00:14:53,800
that prevent lateral movement and egress controls
325
00:14:53,800 --> 00:14:56,600
that only permit traffic to vetted domains.
326
00:14:56,600 --> 00:15:01,000
You don't let a retrieval node discover a new data source in production.
327
00:15:01,000 --> 00:15:02,440
Discovery happens in dev,
328
00:15:02,440 --> 00:15:05,240
behind tests with signed updates to the allow list.
329
00:15:05,240 --> 00:15:07,880
Observability isn't log somewhere, it's surgical.
330
00:15:07,880 --> 00:15:10,040
Node-level tracing records, inputs, outputs,
331
00:15:10,040 --> 00:15:12,680
durations, retries, and decisions from the policy engine.
332
00:15:12,680 --> 00:15:15,000
You correlate everything under a run ID.
333
00:15:15,000 --> 00:15:17,160
Audit logs are immutable and attributed,
334
00:15:17,160 --> 00:15:18,440
which principle authorized,
335
00:15:18,440 --> 00:15:20,760
which action with what scopes at what time
336
00:15:20,760 --> 00:15:22,280
and why the policy allowed it.
337
00:15:22,280 --> 00:15:24,120
When a regulator asks who did what,
338
00:15:24,120 --> 00:15:25,560
you don't shrug, you search.
339
00:15:25,560 --> 00:15:28,600
Compliance ready means your evidence is boring and complete.
340
00:15:28,600 --> 00:15:31,000
Patch and third party risk are supply chain problems.
341
00:15:31,000 --> 00:15:32,040
Treat them like it.
342
00:15:32,040 --> 00:15:33,960
Dependency scanning runs on every build.
343
00:15:33,960 --> 00:15:35,560
Connectors and plugins are audited,
344
00:15:35,560 --> 00:15:37,800
version pinned, and reviewed for permission creep.
345
00:15:37,800 --> 00:15:41,000
If you import an MCP or open API spec,
346
00:15:41,000 --> 00:15:43,000
you validate that the declared methods
347
00:15:43,000 --> 00:15:45,000
match the least privileged policy you expect.
348
00:15:45,000 --> 00:15:47,240
No wildcard endpoints,
349
00:15:47,240 --> 00:15:49,320
no hidden right paths pretending to be red.
350
00:15:49,320 --> 00:15:51,640
Hygiene isn't glamorous,
351
00:15:51,640 --> 00:15:53,240
it is however the reason you sleep.
352
00:15:53,240 --> 00:15:56,520
The truth, graph validation, outperforms chaos engineering
353
00:15:56,520 --> 00:15:58,440
because it prevents classes of incidents
354
00:15:58,440 --> 00:16:00,120
rather than documenting their fallout.
355
00:16:00,120 --> 00:16:01,320
You still test failure modes,
356
00:16:01,320 --> 00:16:02,920
but you do it to confirm guardrails
357
00:16:02,920 --> 00:16:04,840
not to discover that you forgot to install them.
358
00:16:04,840 --> 00:16:06,520
Let's stitch this back to the mental model.
359
00:16:06,520 --> 00:16:09,080
Static checks keep the blueprint sane.
360
00:16:09,080 --> 00:16:12,200
No impossible paths, no orphaned work,
361
00:16:12,200 --> 00:16:13,240
no privileged leaks.
362
00:16:13,240 --> 00:16:16,120
Runtime guardrails keep behavior sane.
363
00:16:16,120 --> 00:16:17,560
Every call authenticated,
364
00:16:17,560 --> 00:16:19,880
authorized, sanitized, and observed.
365
00:16:19,880 --> 00:16:22,040
The executor enforces the graph constraints
366
00:16:22,040 --> 00:16:23,880
the validator decides the model,
367
00:16:23,880 --> 00:16:25,560
it proposes within boundaries
368
00:16:25,560 --> 00:16:27,240
and gets clipped when it wonders.
369
00:16:27,240 --> 00:16:28,520
Isn't this heavy?
370
00:16:28,520 --> 00:16:30,280
Only if you enjoy breaches.
371
00:16:30,280 --> 00:16:32,520
The overhead is mechanical and automated.
372
00:16:32,520 --> 00:16:35,640
Static validation runs at build time and deploy time.
373
00:16:35,640 --> 00:16:37,160
Runtime policy is a sidecar,
374
00:16:37,160 --> 00:16:38,200
fast and local.
375
00:16:38,200 --> 00:16:39,880
Schema checks are milliseconds.
376
00:16:39,880 --> 00:16:42,680
The cost you remove, incidents, manual reviews,
377
00:16:42,680 --> 00:16:45,720
retrofits, dwarfs the micro latency you add.
378
00:16:45,720 --> 00:16:47,240
And the payoff is measurable,
379
00:16:47,240 --> 00:16:49,080
fewer unauthorized calls,
380
00:16:49,080 --> 00:16:50,600
fewer token blowups,
381
00:16:50,600 --> 00:16:52,600
and far fewer why did it do that?
382
00:16:52,600 --> 00:16:53,400
Post mortems.
383
00:16:53,400 --> 00:16:56,440
One more point, the average user misses.
384
00:16:56,440 --> 00:16:58,600
Validation is composable.
385
00:16:58,600 --> 00:17:01,000
You can wrap third party tools with proxy nodes
386
00:17:01,000 --> 00:17:03,080
that enforce contracts and policies
387
00:17:03,080 --> 00:17:04,520
without trusting the tool itself.
388
00:17:04,520 --> 00:17:06,760
You can segment graphs by sensitivity,
389
00:17:06,760 --> 00:17:08,840
public, internal, restricted,
390
00:17:08,840 --> 00:17:10,680
and promote workflows between tiers
391
00:17:10,680 --> 00:17:13,240
only after validation passes for the new boundary.
392
00:17:13,240 --> 00:17:14,760
That's how you scale safely.
393
00:17:14,760 --> 00:17:16,360
Secure by design isn't a slogan,
394
00:17:16,360 --> 00:17:17,800
it's a workflow property.
395
00:17:17,800 --> 00:17:19,640
You don't hope agents behave.
396
00:17:19,640 --> 00:17:21,800
You make misbehavior structurally hard
397
00:17:21,800 --> 00:17:23,720
and operationally visible.
398
00:17:23,720 --> 00:17:24,760
With the rails in place,
399
00:17:24,760 --> 00:17:27,480
plugging in Microsoft 365 Graph and Azure OpenAI
400
00:17:27,480 --> 00:17:28,520
isn't roulette.
401
00:17:28,520 --> 00:17:30,360
It's controlled power on your terms.
402
00:17:30,360 --> 00:17:32,920
Now we can talk about wiring, not firefighting.
403
00:17:32,920 --> 00:17:34,520
The Microsoft scenario,
404
00:17:34,520 --> 00:17:38,920
M365 Graph plus Azure OpenAI plus Copilot Studio.
405
00:17:38,920 --> 00:17:40,360
Let's assemble the cast.
406
00:17:40,360 --> 00:17:42,600
Retrieval agent, disciplined librarian
407
00:17:42,600 --> 00:17:45,320
that only fetches from Microsoft Graph with least privilege.
408
00:17:45,320 --> 00:17:47,160
Reasoning agent,
409
00:17:47,160 --> 00:17:49,080
Azure OpenAI model that plans,
410
00:17:49,080 --> 00:17:51,160
sites, and never freelancers past its brief.
411
00:17:51,160 --> 00:17:53,960
Executor, policy bound operator
412
00:17:53,960 --> 00:17:55,960
that runs tools with idempotency
413
00:17:55,960 --> 00:17:57,320
and post-conditioned checks.
414
00:17:57,960 --> 00:18:00,920
Valley data, the bouncer, schema, policy,
415
00:18:00,920 --> 00:18:02,360
and boundary enforcement.
416
00:18:02,360 --> 00:18:04,120
Policy guard, runtime site card
417
00:18:04,120 --> 00:18:06,920
that ties everything to enter id scopes and R-Back.
418
00:18:06,920 --> 00:18:08,920
Together, they behave like a competent team
419
00:18:08,920 --> 00:18:10,440
instead of a committee thread.
420
00:18:10,440 --> 00:18:12,360
Data access starts with Graph, not guesswork.
421
00:18:12,360 --> 00:18:14,200
The retrieval agent holds an app registration
422
00:18:14,200 --> 00:18:16,280
with granular scopes, files.
423
00:18:16,280 --> 00:18:17,880
Read for a specific site, mail.
424
00:18:17,880 --> 00:18:20,280
Read basic for a confined mailbox calendars.
425
00:18:20,280 --> 00:18:21,880
Read for a resource calendar.
426
00:18:21,880 --> 00:18:24,360
No graph read right all heroics.
427
00:18:24,360 --> 00:18:26,200
Queries use delta and selective fields
428
00:18:26,200 --> 00:18:27,400
to keep payloads thin.
429
00:18:27,960 --> 00:18:28,920
Paging is first class.
430
00:18:28,920 --> 00:18:31,160
The executor follows next links deterministically
431
00:18:31,160 --> 00:18:33,480
with timeouts, honoring service throttling.
432
00:18:33,480 --> 00:18:35,000
And when 429s happen,
433
00:18:35,000 --> 00:18:36,360
back off is mathematical.
434
00:18:36,360 --> 00:18:39,000
No tantrums, just exponential patience.
435
00:18:39,000 --> 00:18:40,760
Grounding isn't a vibe, it's a pipeline.
436
00:18:40,760 --> 00:18:44,200
Retrieve candidate documents via graph search or list queries.
437
00:18:44,200 --> 00:18:46,520
The dupe by item id and version ETAC
438
00:18:46,520 --> 00:18:48,360
so you don't blend stale and current.
439
00:18:48,360 --> 00:18:50,280
Chunk by semantic boundaries,
440
00:18:50,280 --> 00:18:52,040
section headers, slide breaks,
441
00:18:52,040 --> 00:18:53,400
then attach provenance,
442
00:18:53,400 --> 00:18:55,720
drive, site, path, item id,
443
00:18:55,720 --> 00:18:57,400
last modified and assigned hash.
444
00:18:57,400 --> 00:19:00,200
The reasoning agent only sees chunks plus metadata
445
00:19:00,200 --> 00:19:03,480
and is required to output citations mapped back to those IDs.
446
00:19:03,480 --> 00:19:04,840
No citation no claim.
447
00:19:04,840 --> 00:19:06,840
The executor enforces that as a post-condition
448
00:19:06,840 --> 00:19:08,520
before any outward action.
449
00:19:08,520 --> 00:19:10,600
Enter co-pilot studio for orchestration.
450
00:19:10,600 --> 00:19:12,120
You define declarative tools,
451
00:19:12,120 --> 00:19:14,520
graph query packs, sharepoint write actions,
452
00:19:14,520 --> 00:19:16,200
teams posts, outlook sends,
453
00:19:16,200 --> 00:19:19,160
each behind a proxy with explicit schemas and allow lists.
454
00:19:19,160 --> 00:19:21,240
Agent to agent coordination is structured.
455
00:19:21,240 --> 00:19:23,880
The retrieval agent exposes a ground tool.
456
00:19:23,880 --> 00:19:26,440
The reasoning agent requests it with parameters.
457
00:19:26,440 --> 00:19:28,680
The executor mediates, validates,
458
00:19:28,680 --> 00:19:30,280
and returns grounded context.
459
00:19:30,280 --> 00:19:33,080
Human checkpoints are native.
460
00:19:33,080 --> 00:19:35,640
A proposed action node pauses the run,
461
00:19:35,640 --> 00:19:39,080
presents the plan plus citations and requires approval.
462
00:19:39,080 --> 00:19:40,440
Approval is signed and logged,
463
00:19:40,440 --> 00:19:43,160
denial routes to a safe alternative or escalation.
464
00:19:43,160 --> 00:19:46,120
Tokens and latency are managed, not wished away.
465
00:19:46,120 --> 00:19:49,080
Selective context means you feed only the relevant chunks,
466
00:19:49,080 --> 00:19:50,600
not your entire tenant.
467
00:19:50,600 --> 00:19:52,280
Summaries are pre-computed and cached
468
00:19:52,280 --> 00:19:55,160
with embeddings keyed by content hash and permissions.
469
00:19:55,160 --> 00:19:56,760
Change the dock, change the key,
470
00:19:56,760 --> 00:19:58,440
miss the cache, recompute.
471
00:19:58,440 --> 00:20:00,440
Streaming responses keep the UI alive
472
00:20:00,440 --> 00:20:02,280
while the executor handles side effects
473
00:20:02,280 --> 00:20:04,840
only after the full schema valid plan arrives.
474
00:20:04,840 --> 00:20:07,240
Early exit conditions stop the reasoning loop
475
00:20:07,240 --> 00:20:09,400
when confidence plus coverage hits threshold.
476
00:20:09,400 --> 00:20:12,120
No extra thinking because the model felt poetic.
477
00:20:12,120 --> 00:20:13,960
Auditability is baked in.
478
00:20:13,960 --> 00:20:16,280
Every action is signed by the service principle
479
00:20:16,280 --> 00:20:19,480
or delegated user and stamped with run ID,
480
00:20:19,480 --> 00:20:24,120
tool, parameters, redactedware necessary, scopes and result.
481
00:20:24,120 --> 00:20:26,520
Immutable logs live in your observability stack,
482
00:20:26,520 --> 00:20:28,600
pick your favorite so you can replay a run
483
00:20:28,600 --> 00:20:30,680
without re-executing side effects.
484
00:20:30,680 --> 00:20:31,960
Who did what when and why?
485
00:20:31,960 --> 00:20:33,480
Becomes a query, not a witch hunt.
486
00:20:33,480 --> 00:20:35,480
And yes, the citations survive intact
487
00:20:35,480 --> 00:20:37,480
so you can verify that the answer traced
488
00:20:37,480 --> 00:20:39,800
to actual tenant content, not model lore.
489
00:20:39,800 --> 00:20:41,400
Failure is normalized and boring.
490
00:20:41,400 --> 00:20:44,200
429s, the executor retrieves with jitter
491
00:20:44,200 --> 00:20:47,320
then falls back to a lower cost query or reduced page size.
492
00:20:47,320 --> 00:20:50,200
Stale cache, the validator detects mismatched e-tags
493
00:20:50,200 --> 00:20:51,640
and forces a refresh.
494
00:20:51,640 --> 00:20:54,040
Permission denial, the policy guard denies with reason
495
00:20:54,040 --> 00:20:56,120
proposes a consent request path or roots
496
00:20:56,120 --> 00:20:58,440
to a redacted summary that doesn't leak.
497
00:20:58,440 --> 00:21:00,680
Tool outage, the graph declares alternates
498
00:21:00,680 --> 00:21:02,520
or parks the run in a dead letter queue
499
00:21:02,520 --> 00:21:04,920
with full context for human remediation.
500
00:21:04,920 --> 00:21:08,200
Deterministic fallbacks turn incident into ticket.
501
00:21:08,200 --> 00:21:10,760
Now a very short walkthrough, user asks,
502
00:21:10,760 --> 00:21:14,040
draft a summary of last quarter's roadmap decisions with links.
503
00:21:14,040 --> 00:21:15,400
Reasoning agent proposes,
504
00:21:15,400 --> 00:21:17,720
use graph search across a specific sharepoint site
505
00:21:17,720 --> 00:21:20,040
and a team's channel filter by last quarter,
506
00:21:20,040 --> 00:21:22,040
then synthesize validator checks
507
00:21:22,040 --> 00:21:24,760
that the requested scopes match the agent's role.
508
00:21:24,760 --> 00:21:25,640
They do.
509
00:21:25,640 --> 00:21:28,440
executor issues graph calls with paging and field selection,
510
00:21:28,440 --> 00:21:31,880
dedupes by item ID, chunks and returns context with provenance.
511
00:21:31,880 --> 00:21:35,480
Reasoning produces a summary with inline citations mapped to item IDs.
512
00:21:35,480 --> 00:21:39,080
Validator checks schema and citations, passes.
513
00:21:39,080 --> 00:21:41,400
Human checkpoint appears with summary and evidence.
514
00:21:41,400 --> 00:21:43,320
Approver clicks, okay.
515
00:21:43,320 --> 00:21:46,200
executor posts the result in teams and emails stakeholders,
516
00:21:46,200 --> 00:21:47,880
each action using idempotency keys,
517
00:21:47,880 --> 00:21:49,320
so retries don't double post.
518
00:21:49,320 --> 00:21:51,800
Note the discipline, no agent invents a tool,
519
00:21:51,800 --> 00:21:54,760
no note crosses a domain outside its allow list.
520
00:21:54,760 --> 00:21:58,200
Tokens are scoped and propagated correctly via Entra ID,
521
00:21:58,200 --> 00:22:00,360
not copy pasted between nodes.
522
00:22:00,360 --> 00:22:02,360
The model never concludes success.
523
00:22:02,360 --> 00:22:05,000
The executor proves it with graph post conditions.
524
00:22:05,000 --> 00:22:07,960
Created message ID, updated item ETag calendar event ID,
525
00:22:07,960 --> 00:22:09,320
then stamps the run complete.
526
00:22:09,320 --> 00:22:11,000
And yes, you can extend this safely,
527
00:22:11,000 --> 00:22:15,000
bring in planner, viva or third party services via mcp or open api,
528
00:22:15,000 --> 00:22:18,680
but only behind proxy tools with strict schemas and network egress controls.
529
00:22:18,680 --> 00:22:21,640
Wrap every connector with the same validator logic and policy guard.
530
00:22:21,640 --> 00:22:24,360
Promotion between environments requires validation passes
531
00:22:24,360 --> 00:22:26,920
that match the new boundary, dev to test to prod,
532
00:22:26,920 --> 00:22:28,920
with scope increases reviewed, not assumed.
533
00:22:28,920 --> 00:22:31,320
That's Microsoft's architecture done properly.
534
00:22:31,320 --> 00:22:32,440
Graph for truth.
535
00:22:32,440 --> 00:22:34,200
As your open AI for thinking,
536
00:22:34,200 --> 00:22:36,280
co-pilot studio for orchestration,
537
00:22:36,280 --> 00:22:37,960
executors for operations,
538
00:22:37,960 --> 00:22:39,960
validators and policy for safety,
539
00:22:39,960 --> 00:22:41,800
and observability for proof.
540
00:22:41,800 --> 00:22:42,920
Numbers next.
541
00:22:42,920 --> 00:22:43,880
Not vibes.
542
00:22:43,880 --> 00:22:45,160
Before after metrics.
543
00:22:45,160 --> 00:22:48,440
Accuracy, latency, cost, admin, overhead.
544
00:22:48,440 --> 00:22:49,560
Nice architecture.
545
00:22:49,560 --> 00:22:50,280
Prove it.
546
00:22:50,280 --> 00:22:51,720
Numbers, not vibes.
547
00:22:51,720 --> 00:22:54,440
Baseline first, prompt only agents are drama queens.
548
00:22:54,440 --> 00:22:58,600
Accuracy is inconsistent because they invent sources and forget citations.
549
00:22:58,600 --> 00:23:01,640
Without grounding, you get confident pros that points to nowhere.
550
00:23:01,640 --> 00:23:03,800
Tail latency is brutal.
551
00:23:03,800 --> 00:23:06,440
One long chain of serial think harder calls,
552
00:23:06,440 --> 00:23:08,760
each bloated with redundant context.
553
00:23:08,760 --> 00:23:10,840
Cost spirals because every turn,
554
00:23:10,840 --> 00:23:13,720
ships full transcripts and raw documents back to the model
555
00:23:13,720 --> 00:23:15,400
like an overpaid courier service.
556
00:23:15,400 --> 00:23:16,760
Admin overhead?
557
00:23:16,760 --> 00:23:17,880
High.
558
00:23:17,880 --> 00:23:20,040
Incidents, hotfixes, mystery failures,
559
00:23:20,040 --> 00:23:22,680
and audits that feel like archaeology with a blindfold.
560
00:23:22,680 --> 00:23:25,560
Now the after-state with executors and validated graphs.
561
00:23:25,560 --> 00:23:27,880
Accuracy jumps because claims require receipts.
562
00:23:27,880 --> 00:23:30,840
Grounded citations tied to graph item IDs,
563
00:23:30,840 --> 00:23:32,920
e-tags and signed hashes mean an answer
564
00:23:32,920 --> 00:23:35,480
that lacks provenance simply doesn't pass the validator.
565
00:23:35,480 --> 00:23:36,600
The effect is immediate.
566
00:23:36,600 --> 00:23:37,960
Fewer wrong answers shipped.
567
00:23:37,960 --> 00:23:42,280
Fewer rework loops and a measurable lift in task success rates on e-valtzets.
568
00:23:42,280 --> 00:23:43,800
When a claim can't be supported,
569
00:23:43,800 --> 00:23:47,000
the agent denies with reason or requests human approval,
570
00:23:47,000 --> 00:23:49,320
predictable, reviewable, safe.
571
00:23:49,320 --> 00:23:51,320
Latency compresses for three reasons.
572
00:23:51,320 --> 00:23:53,560
First, parallelism, retrieval, re-ranking,
573
00:23:53,560 --> 00:23:55,640
and citation extraction run side by side,
574
00:23:55,640 --> 00:23:57,320
then synchronize at a barrier note.
575
00:23:57,320 --> 00:23:59,160
Second, caching, embeddings and summaries
576
00:23:59,160 --> 00:24:00,920
keyed by content hash and permission scope
577
00:24:00,920 --> 00:24:03,400
avoid recomputing what hasn't changed.
578
00:24:03,400 --> 00:24:04,760
Third, early exit.
579
00:24:04,760 --> 00:24:07,960
Once coverage and confidence hit threshold, the graph stops the loop.
580
00:24:07,960 --> 00:24:09,640
Compare that to serial prompting,
581
00:24:09,640 --> 00:24:11,640
where the model reflects for a paragraph
582
00:24:11,640 --> 00:24:13,080
and your users reflect on quitting.
583
00:24:13,080 --> 00:24:16,280
Cost drops because token discipline is enforced, not begged.
584
00:24:16,280 --> 00:24:18,680
Schema-constrained outputs prevent rambling.
585
00:24:18,680 --> 00:24:21,000
Selective context feeds only the relevant chunks
586
00:24:21,000 --> 00:24:23,720
with metadata, not entire sites.
587
00:24:23,720 --> 00:24:26,440
Short or prompt, smaller responses, fewer retries.
588
00:24:26,440 --> 00:24:28,600
The executors' identity and back-off logic
589
00:24:28,600 --> 00:24:31,320
avoid duplicate calls and wasted cycles.
590
00:24:31,320 --> 00:24:34,040
The net effect is fewer tokens per successful outcome
591
00:24:34,040 --> 00:24:37,080
and far less variance, finance likes, variance reduction.
592
00:24:37,080 --> 00:24:39,560
Admin overhead shrinks because observability is engineered,
593
00:24:39,560 --> 00:24:40,680
not improvised.
594
00:24:40,680 --> 00:24:42,360
Note-level traces and immutable logs
595
00:24:42,360 --> 00:24:44,520
collapse incident time to diagnose.
596
00:24:44,520 --> 00:24:47,240
You see which note failed, why the policy engine denied,
597
00:24:47,240 --> 00:24:49,480
and what the executor tried next.
598
00:24:49,480 --> 00:24:52,680
Repeatable deployments cut works on my machine theater.
599
00:24:52,680 --> 00:24:54,600
Compliance stops being a seasonal crisis
600
00:24:54,600 --> 00:24:58,680
because every run already contains who, what, when, and why.
601
00:24:58,680 --> 00:25:00,440
Let's make this concrete with a measurement rig
602
00:25:00,440 --> 00:25:01,880
you can actually run.
603
00:25:01,880 --> 00:25:03,800
Build an evil set of representative tasks,
604
00:25:03,800 --> 00:25:06,760
Q&A with citations, summary with links and action proposals
605
00:25:06,760 --> 00:25:07,720
with approvals.
606
00:25:07,720 --> 00:25:11,160
For each defined golden answers or acceptance criteria,
607
00:25:11,160 --> 00:25:13,240
correct facts with mapped item IDs,
608
00:25:13,240 --> 00:25:15,720
citation coverage, and allowed variance.
609
00:25:15,720 --> 00:25:18,840
Instrument SLOs, P50 and P95, end-to-end latency,
610
00:25:18,840 --> 00:25:21,560
tokens spend per successful task and policy deny rates.
611
00:25:21,560 --> 00:25:23,880
Link every metric to traces, so any regression
612
00:25:23,880 --> 00:25:25,240
has a breadcrumb trail.
613
00:25:25,240 --> 00:25:27,880
Results you should expect if you follow the architecture,
614
00:25:27,880 --> 00:25:29,080
not improvised.
615
00:25:29,080 --> 00:25:31,560
Higher answer validity because unsupported claims
616
00:25:31,560 --> 00:25:33,320
never leave staging.
617
00:25:33,320 --> 00:25:36,840
Lower P95 latency because long tails get sliced
618
00:25:36,840 --> 00:25:39,240
by parallel nodes and early exits.
619
00:25:39,240 --> 00:25:41,720
Lower token spend because you stop shipping novels
620
00:25:41,720 --> 00:25:44,040
and start shipping relevant snippets.
621
00:25:44,040 --> 00:25:46,280
Fewer pages to admins because most failures
622
00:25:46,280 --> 00:25:48,520
get handled by deterministic fallbacks.
623
00:25:48,520 --> 00:25:51,240
And yes, the boring metric everyone forgets.
624
00:25:51,240 --> 00:25:53,240
Successful first pass completion ratio,
625
00:25:53,240 --> 00:25:55,160
more runs finish without human rescue.
626
00:25:55,160 --> 00:25:56,840
Business impact faster resolutions mean
627
00:25:56,840 --> 00:25:58,680
user stop opening duplicate tickets.
628
00:25:58,680 --> 00:26:00,360
Predictable spend means budgeting
629
00:26:00,360 --> 00:26:01,880
without surprise token hangovers.
630
00:26:01,880 --> 00:26:04,120
Compliance confidence means fewer audit cycles
631
00:26:04,120 --> 00:26:05,720
hijacking your roadmap.
632
00:26:05,720 --> 00:26:08,440
The non-obvious win is reputational.
633
00:26:08,440 --> 00:26:10,840
When the agents answers site tenant content
634
00:26:10,840 --> 00:26:14,120
and the links work, people trusted, use it,
635
00:26:14,120 --> 00:26:16,600
and stop forwarding screenshots that begin with
636
00:26:16,600 --> 00:26:18,120
why did it say this?
637
00:26:18,120 --> 00:26:21,320
Direct imperative advice, measure, trace to metric linkage
638
00:26:21,320 --> 00:26:22,680
or you're flying on opinion.
639
00:26:22,680 --> 00:26:25,000
If you can't open a run and see exactly
640
00:26:25,000 --> 00:26:27,480
which node inflated tokens or stalled latency,
641
00:26:27,480 --> 00:26:30,360
you don't have observability, you have vibes with timestamps.
642
00:26:30,360 --> 00:26:32,520
Everything changes when the validator sits between
643
00:26:32,520 --> 00:26:34,840
nice plan and real action.
644
00:26:34,840 --> 00:26:37,480
Accuracy stabilizes latency narrows,
645
00:26:37,480 --> 00:26:40,040
cost flattens, admins sleep, that's not magic,
646
00:26:40,040 --> 00:26:42,920
that's executors, graphs and validation
647
00:26:42,920 --> 00:26:45,640
doing the work you incorrectly assigned to prompts.