Work About Resume Contact
Overview Research Questions Method Findings Improvements Impact Next Steps Reflection

UX Research · Usability Testing · AI Platform

Microsoft
Foundry

End-to-end usability testing and research driving strategic design improvements for Microsoft's AI developer platform: uncovering critical blockers in the agent-creation flow.

My Role
User Experience Researcher
Team
4 researchers · Microsoft CoreAI
Timeline
Jan – Mar 2026 (10 weeks)
Methods
Moderated Usability Testing · Think Aloud
azure.microsoft.com/en-us/products/ai-foundry
Azure AI Foundry website
Explore the live platform
Role
UX Researcher
Timeline
Jan – Mar 2026
(10 weeks)
Team
Microsoft CoreAI
Cross-functional
Tools
Figma, Lookback,
Dovetail

Introduction

How might we make AI agent creation more intuitive for new developers?

Microsoft Foundry lets students, developers, and startup founders build AI agents for their products. The vision: democratize AI agent creation for people still learning. But vision only matters if users can get through the flow.

As a User Experience Researcher on Microsoft's CoreAI team, I led 8 moderated usability sessions testing the Knowledge, Data, and FoundryIQ features. I defined research questions, designed the study protocol, and owned team communication. In the first session (and confirmed in every subsequent one), we discovered a critical blocker that 100% of participants hit, one the team hadn't anticipated.

This study went from "let's see how intuitive the flow is" to "we need to fix this before anything else matters."

My role: Led UX research on a 4-person team: defined research questions, designed the study protocol, ran 8 think-aloud usability sessions, owned team communication, and synthesized findings for the CoreAI team.

Note: Due to the confidential nature of this product and user privacy, visuals in this case study are limited.

Research Process

🎯
Scoping
Defined research questions with the design team
📋
Protocol Design
Designed study protocol and task scenarios
🎙️
8 Sessions
Moderated think-aloud usability sessions
🔍
Synthesis
Coded findings, identified 5 themes and 1 critical blocker
📊
Readout
Presented prioritized recommendations to CoreAI team

Research Questions

What we set out
to understand.

I shaped research questions to go beyond surface-level usability, investigating where confusion transforms from "learning curve" into "I'm done." The distinction between temporary confusion and hard blockers turned out to be the most critical framing of the study.

01
At what point in the Foundry agent-creation flow do users first feel confused or overwhelmed?
02
Which concepts or terms make users feel like they need external help to continue?
03
Do users perceive confusion as a temporary learning curve or a hard blocker that makes the tool unusable?
04
What data types do users want to work with, and what do they want their agents to do?

Method

How we conducted
the study.

I chose moderated usability testing because the research centered on mental models. I needed to be there when confusion happened. Remote unmoderated testing would capture task failure, but not why. The Think Aloud protocol revealed participants' reasoning in real time, and their silences were often most revealing.

🎙️
Moderated Sessions
One-on-one moderated usability tests with real tasks in the live Foundry platform, using Think Aloud protocol to capture real-time reasoning and confusion points.
n = 8 participants
📋
Task-Based Scenarios
Participants completed structured tasks including agent creation, configuring knowledge sources, uploading data, and publishing agents to mirror real use cases.
Within-subject design
📊
Likert Ratings & Quotes
Post-task ease-of-use ratings on a 1–5 scale combined with qualitative Think Aloud data and direct participant quotes for triangulated findings.
Mixed methods

Participant Criteria

I recruited participants matching Foundry's target audience: students and early-career professionals with technical backgrounds, curious about AI and looking to leverage it for projects.

Critical Criteria
Active student status CS or technical background Curious about AI Looking to leverage AI for a project Experience with data sets

Findings

What worked well.

Not everything was broken, and that matters. Identifying what works is just as important as flagging problems. It tells the team which patterns to protect as they iterate.

Publishing an Agent
5 out of 8 participants rated the ease of publishing their agent as a 1 out of 5 (very easy). The publishing flow aligned with familiar patterns from other tools, making it intuitive and frictionless.
"Publishing the agent was pretty straightforward and aligns with what I would expect it to do because it's very similar UI elements to what other tools do right now."
Overall Task Completion
4 out of 8 participants rated the overall ease of use as a 2 out of 5 (easy). While the platform had areas of confusion, participants were ultimately able to navigate and complete tasks, suggesting a solid foundation to build on.

Areas of Improvement

Five issues. One that stops everything.

I prioritized findings using Norman's severity scale. A confusing label is a different problem than a blocker preventing every user from completing the core task. Clear severity framing gave the Foundry team an actionable roadmap, not just a list of complaints.

Severity Scale (Norman's)
4 Usability catastrophe: Prevents task completion; imperative to fix
3 Major problem: Causes significant confusion; important to fix
2 Minor problem: Adds friction but doesn't block completion; should be addressed
1 Cosmetic: Surface-level issue; fix if time permits
Severity 4 Guardrail Blocks Agent Creation
8 out of 8 participants were unable to proceed with creating an agent because the interface blocked interactions due to an unassigned or mismanaged guardrail. The system leaves new agents in an ambiguous "inheriting" state instead of automatically assigning Microsoft's default guardrail, causing the interface to prevent users from completing the task.
The error message further compounds confusion by prompting users to "create guardrail," when the actual resolution is to reassign to an existing default guardrail. Participants lacked contextual guidance on why the interaction was blocked, were unclear about the differences between guardrail versions, and could not see the active guardrail status; all of which increased frustration and wasted time.
Recommendations
Automatically assign Microsoft's default guardrail during agent creation. Update error messages to direct users to "Reassign Guardrail" rather than "Create Guardrail." Provide inline explanations about why interactions are blocked and how to resolve them. Clarify guardrail purposes and version differences through tooltips or descriptions. Make guardrail status more visible in the interface.
Severity 3 Confusing Terminology & Labeling
7 out of 8 participants were confused by overlapping or unclear terminology in the platform. Key points of confusion included the distinction between "Tools" and "Knowledge" (and why file uploads appeared under Tools), as well as the difference between "Agent Instructions" and "Message Agent." While this didn't fully prevent task completion, it was the most relevant usability friction to the overall experience.
Recommendations
Audit and simplify terminology across the platform to ensure labels are distinct, descriptive, and consistent. Add contextual definitions (tooltips or inline descriptions) for key concepts like "Tools," "Knowledge," and "Agent Instructions." Consider renaming overlapping terms to reduce cognitive load for first-time users.
Severity 2 Cluttered Landing Page & Visual Hierarchy
4 out of 8 participants experienced discoverability issues on the landing page. The "Start Building" button lacked visual prominence due to its relatively small size and the presence of multiple competing visual elements. The "Coding Quick Start" bar was significantly larger and attracted users' attention first. Additionally, the similarity in terminology between these two options created confusion regarding the appropriate starting point.
"I see a couple of places I could go to. Do I go to Start Building? Do I go to the Coding Quick Start part?"
Recommendations
Rebalance the page layout to reduce visual competition among elements and enhance hierarchical clarity. Increase the visual prominence of the "Start Building" button by enlarging its size, contrast, and positioning it within a primary focal area.
Severity 2 Unclear Platform Navigation Flow
8 out of 8 participants demonstrated a different flow for navigating Microsoft Foundry. After creating an agent, 2 out of 8 participants interacted with the navigation sidebar tabs to gain more understanding about the platform's terminology. While participants were ultimately able to navigate, the lack of guided structure forced exploratory behavior.
"I'll start by looking at the nav bar because there was no clear instruction of how like the different steps that will be involved in making the AI agents, so I'll have to explore the software on my own."
Recommendations
Introduce onboarding assistance or a guided tutorial for first-time users. 3 out of 8 participants specifically suggested this. As one shared: "I feel like if I looked up a tutorial or if the platform gave me some info when I created my account, it would be pretty easy to figure out as you go."
Severity 2 Misleading Error Messages on File Upload
5 out of 8 participants experienced confusion when files were successfully uploaded but an error message appeared. The upload process itself was straightforward, but the false error introduced unnecessary doubt and broke user confidence in the system.
"At least uploading [files] was straightforward. But that error message was a little bit confusing."
Recommendations
Investigate and resolve the underlying bug causing false error messages on successful uploads. Ensure confirmation states clearly communicate success and distinguish between warnings, errors, and informational messages.

Study at a Glance

The numbers behind
the research.

Eight sessions. Five distinct findings. One critical blocker affected every single participant, a finding so consistent it immediately became the team's top priority.

8
moderated usability sessions conducted with technical students and founders
5
distinct usability issues identified and prioritized by severity for the Foundry team
100%
of participants encountered the critical guardrail blocker during agent creation

Next Steps

Where this research
goes from here.

Good research doesn't just answer questions. It reveals the next ones. My study surfaced both immediate fixes and areas needing deeper exploration.

01
Resolve the Guardrail Blocker
The guardrail issue emerged mid-study and affected every single participant. Addressing this is the highest priority, as it completely blocks the core agent-creation workflow.
02
Test Agent Integration End-to-End
Future studies should explore how easily users can take a published agent and actually integrate it (via code or embed) into a product or website, a critical step my study didn't cover.
03
Translate Findings into Design Recommendations
Each of the five findings was paired with prioritized recommendations and handed to the Foundry design team for follow-up. The guardrail blocker became the team's top priority, with terminology audits and landing-page hierarchy queued for subsequent design sprints.

"You dived into a complex product, asked all the right questions, and it's very clear that you put a lot of thought into planning and executing the study, and then translated that into a clear, engaging readout."

Research Mentor · Microsoft CoreAI

Reflection

What I learned & what surprised me.

The biggest lesson was about adaptability. The guardrail blocker emerged in Session 1 and threatened to derail the study. Every participant hit the same wall, forcing a real-time decision: help them past it (losing data about the blocker's impact) or let them struggle (losing downstream data)? I chose a hybrid approach. Participants attempted the task fully while I documented confusion, then I provided a workaround so we could test the rest of the flow. That decision preserved both the critical finding and downstream insights.

Biweekly check-ins with the Microsoft sponsor became more critical than I'd expected: not just for logistics, but for real-time alignment on what mattered most. When I flagged the guardrail issue after Session 2, the sponsor confirmed it was a known-but-underestimated bug. My data gave the engineering team the evidence they needed to prioritize the fix. That moment of watching research directly influence a product decision was the highlight of this experience.

What Went Well
+ Timely and sufficient participant recruitment: I met the minimum of 6–8 participants
+ Biweekly 1-hour check-ins with the Microsoft sponsor helped me resolve issues in real time
+ Strong team organization with weekly internal syncs
+ Each usability session was productive and surfaced new insights
What I'd Do Differently
~ Run a pilot session before the formal study: a dry run would have surfaced the guardrail blocker earlier and given me time to design a cleaner workaround protocol
~ Include a broader participant pool: startup founders (not just students) would have revealed whether the terminology issues are universal or expertise-dependent
~ Add a retrospective interview after each session: some of my best insights came from off-script comments, and a structured debrief would have captured more of them
Next Project
Nox+: Secure Digital Identity
View case study →