Microsoft Foundry · UX Research Case Study

Introduction

How might we make AI agent creation more intuitive for new developers?

Microsoft Foundry lets students, developers, and startup founders build AI agents for their products. The vision: democratize AI agent creation for people still learning. But vision only matters if users can get through the flow.

As a User Experience Researcher on Microsoft's CoreAI team, I led 8 moderated usability sessions testing the Knowledge, Data, and FoundryIQ features. I defined research questions, designed the study protocol, and owned team communication. In the first session (and confirmed in every subsequent one), we discovered a critical blocker that 100% of participants hit, one the team hadn't anticipated.

This study went from "let's see how intuitive the flow is" to "we need to fix this before anything else matters."

My role: Led UX research on a 4-person team: defined research questions, designed the study protocol, ran 8 think-aloud usability sessions, owned team communication, and synthesized findings for the CoreAI team.

Note: Due to the confidential nature of this product and user privacy, visuals in this case study are limited.

Method

How we conducted
the study.

I chose moderated usability testing because the research centered on mental models. I needed to be there when confusion happened. Remote unmoderated testing would capture task failure, but not why. The Think Aloud protocol revealed participants' reasoning in real time, and their silences were often most revealing.

🎙️

Moderated Sessions

One-on-one moderated usability tests with real tasks in the live Foundry platform, using Think Aloud protocol to capture real-time reasoning and confusion points.

n = 8 participants

📋

Task-Based Scenarios

Participants completed structured tasks including agent creation, configuring knowledge sources, uploading data, and publishing agents to mirror real use cases.

Within-subject design

📊

Likert Ratings & Quotes

Post-task ease-of-use ratings on a 1–5 scale combined with qualitative Think Aloud data and direct participant quotes for triangulated findings.

Mixed methods

Participant Criteria

I recruited participants matching Foundry's target audience: students and early-career professionals with technical backgrounds, curious about AI and looking to leverage it for projects.

Critical Criteria

Active student status CS or technical background Curious about AI Looking to leverage AI for a project Experience with data sets

Areas of Improvement

Five issues. One that stops everything.

I prioritized findings using Norman's severity scale. A confusing label is a different problem than a blocker preventing every user from completing the core task. Clear severity framing gave the Foundry team an actionable roadmap, not just a list of complaints.

Severity Scale (Norman's)

4 Usability catastrophe: Prevents task completion; imperative to fix

3 Major problem: Causes significant confusion; important to fix

2 Minor problem: Adds friction but doesn't block completion; should be addressed

1 Cosmetic: Surface-level issue; fix if time permits

Severity 4 Guardrail Blocks Agent Creation

8 out of 8 participants were unable to proceed with creating an agent because the interface blocked interactions due to an unassigned or mismanaged guardrail. The system leaves new agents in an ambiguous "inheriting" state instead of automatically assigning Microsoft's default guardrail, causing the interface to prevent users from completing the task.

The error message further compounds confusion by prompting users to "create guardrail," when the actual resolution is to reassign to an existing default guardrail. Participants lacked contextual guidance on why the interaction was blocked, were unclear about the differences between guardrail versions, and could not see the active guardrail status; all of which increased frustration and wasted time.

Recommendations

Automatically assign Microsoft's default guardrail during agent creation. Update error messages to direct users to "Reassign Guardrail" rather than "Create Guardrail." Provide inline explanations about why interactions are blocked and how to resolve them. Clarify guardrail purposes and version differences through tooltips or descriptions. Make guardrail status more visible in the interface.

Severity 3 Confusing Terminology & Labeling

7 out of 8 participants were confused by overlapping or unclear terminology in the platform. Key points of confusion included the distinction between "Tools" and "Knowledge" (and why file uploads appeared under Tools), as well as the difference between "Agent Instructions" and "Message Agent." While this didn't fully prevent task completion, it was the most relevant usability friction to the overall experience.

Recommendations

Audit and simplify terminology across the platform to ensure labels are distinct, descriptive, and consistent. Add contextual definitions (tooltips or inline descriptions) for key concepts like "Tools," "Knowledge," and "Agent Instructions." Consider renaming overlapping terms to reduce cognitive load for first-time users.

Severity 2 Cluttered Landing Page & Visual Hierarchy

4 out of 8 participants experienced discoverability issues on the landing page. The "Start Building" button lacked visual prominence due to its relatively small size and the presence of multiple competing visual elements. The "Coding Quick Start" bar was significantly larger and attracted users' attention first. Additionally, the similarity in terminology between these two options created confusion regarding the appropriate starting point.

"I see a couple of places I could go to. Do I go to Start Building? Do I go to the Coding Quick Start part?"

Recommendations

Rebalance the page layout to reduce visual competition among elements and enhance hierarchical clarity. Increase the visual prominence of the "Start Building" button by enlarging its size, contrast, and positioning it within a primary focal area.

Severity 2 Unclear Platform Navigation Flow

8 out of 8 participants demonstrated a different flow for navigating Microsoft Foundry. After creating an agent, 2 out of 8 participants interacted with the navigation sidebar tabs to gain more understanding about the platform's terminology. While participants were ultimately able to navigate, the lack of guided structure forced exploratory behavior.

"I'll start by looking at the nav bar because there was no clear instruction of how like the different steps that will be involved in making the AI agents, so I'll have to explore the software on my own."

Recommendations

Introduce onboarding assistance or a guided tutorial for first-time users. 3 out of 8 participants specifically suggested this. As one shared: "I feel like if I looked up a tutorial or if the platform gave me some info when I created my account, it would be pretty easy to figure out as you go."

Severity 2 Misleading Error Messages on File Upload

5 out of 8 participants experienced confusion when files were successfully uploaded but an error message appeared. The upload process itself was straightforward, but the false error introduced unnecessary doubt and broke user confidence in the system.

"At least uploading [files] was straightforward. But that error message was a little bit confusing."

Recommendations

Investigate and resolve the underlying bug causing false error messages on successful uploads. Ensure confirmation states clearly communicate success and distinguish between warnings, errors, and informational messages.

Next Steps

Where this research
goes from here.

Good research doesn't just answer questions. It reveals the next ones. My study surfaced both immediate fixes and areas needing deeper exploration.

Resolve the Guardrail Blocker

The guardrail issue emerged mid-study and affected every single participant. Addressing this is the highest priority, as it completely blocks the core agent-creation workflow.

Test Agent Integration End-to-End

Future studies should explore how easily users can take a published agent and actually integrate it (via code or embed) into a product or website, a critical step my study didn't cover.

Translate Findings into Design Recommendations

Each of the five findings was paired with prioritized recommendations and handed to the Foundry design team for follow-up. The guardrail blocker became the team's top priority, with terminology audits and landing-page hierarchy queued for subsequent design sprints.

Reflection

What I learned & what surprised me.

The biggest lesson was about adaptability. The guardrail blocker emerged in Session 1 and threatened to derail the study. Every participant hit the same wall, forcing a real-time decision: help them past it (losing data about the blocker's impact) or let them struggle (losing downstream data)? I chose a hybrid approach. Participants attempted the task fully while I documented confusion, then I provided a workaround so we could test the rest of the flow. That decision preserved both the critical finding and downstream insights.

Biweekly check-ins with the Microsoft sponsor became more critical than I'd expected: not just for logistics, but for real-time alignment on what mattered most. When I flagged the guardrail issue after Session 2, the sponsor confirmed it was a known-but-underestimated bug. My data gave the engineering team the evidence they needed to prioritize the fix. That moment of watching research directly influence a product decision was the highlight of this experience.

What Went Well

+ Timely and sufficient participant recruitment: I met the minimum of 6–8 participants

+ Biweekly 1-hour check-ins with the Microsoft sponsor helped me resolve issues in real time

+ Strong team organization with weekly internal syncs

+ Each usability session was productive and surfaced new insights

What I'd Do Differently

~ Run a pilot session before the formal study: a dry run would have surfaced the guardrail blocker earlier and given me time to design a cleaner workaround protocol

~ Include a broader participant pool: startup founders (not just students) would have revealed whether the terminology issues are universal or expertise-dependent

~ Add a retrospective interview after each session: some of my best insights came from off-script comments, and a structured debrief would have captured more of them

Microsoft
Foundry

How might we make AI agent creation more intuitive for new developers?

What we set out
to understand.

How we conducted
the study.

Participant Criteria

What worked well.

Five issues. One that stops everything.

The numbers behind
the research.

Where this research
goes from here.

What I learned & what surprised me.

MicrosoftFoundry

How might we make AI agent creation more intuitive for new developers?

What we set outto understand.

How we conductedthe study.

Participant Criteria

What worked well.

Five issues. One that stops everything.

The numbers behindthe research.

Where this researchgoes from here.

What I learned & what surprised me.

Microsoft
Foundry

What we set out
to understand.

How we conducted
the study.

The numbers behind
the research.

Where this research
goes from here.