Evaluating the usability of Google Cloud's AI/ML developer platform - Vertex AI

Evaluating the usability of Google Cloud's AI/ML developer platform - Vertex AI

ORGANIZATION

Google

ROLE

UX Researcher

TIMELINE

8 Weeks (2025)

RESPONSIBILITIES

Study Design, Usability Testing, Design Recommendations

CONTEXT
What is Vertex AI?

Vertex AI is a cloud-based machine-learning development platform for building and using AI models. It is designed for enterprise developers to build and deploy AI models and applications at scale. Vertex AI loginless experience is a free way to interact with Gemini models in Vertex AI Studio.

TEAM STRUCTURE AND OPERATION
I worked with 3 of my classmates on this project, and we reported to 1 UX Research manager at Google.

We had weekly meetings with Katie O'Leary, the staff UXR manager at Google, to discuss our progress, clear doubts, and plan next steps. These meetings helped us address any blockers or dependencies and ensure we stayed on track. This study was a part of our grad school class led by Dr. Katya Cherukumilli, who also mentored us throughout.

At the Google SLU office after presenting our findings!

Hover on us to know the team

At the Google SLU office after presenting our findings!

Hover on us to know the team

PROBLEM
Google wanted to uncover the usability barriers in Vertex AI's loginless experience for desktop, to improve its usability and retention.

As a tool targeted toward developers with little to a good amount of experience with application building with AI Models, Google wanted us to conduct a usability study for Vertex AI's desktop/loginless experience to improve its usability and user retention.

SOLUTION / TASK
What we did:

This usability study assessed the loginless web experience of Google Vertex AI Studio, focusing on how developers navigate and engage with its features. The goal was to identify usability challenges and opportunities to improve user retention.

The study was conducted over 8 weeks, involving 8 participants—including students and professional developers familiar with AI application development who were first time users of Vertex AI. Through 60-minute in-person moderated sessions on the University of Washington campus, participants completed tasks while thinking aloud. We used audio and video recording equipment to document participant reactions and website navigation. Additionally, questionnaires were administered at the completion of each task and after testing to obtain supplemental quantitative data.

This study focused on assessing six tasks that attempted to replicate the initial user journey, from navigating to the Vertex AI Freeform page via the marketing page to getting the code for AI model implementation.

IMPACT

18

Actionable insights presented to Google’s team, along with specific suggestions to enhance the Vertex AI Studio experience

Influenced

Future platform priorities by identifying key usability issues in the loginless web experience

MY EXPERIENCE AND CONTRIBUTION
A hands-on end-to-end UX research experience with ample learnings!

I had the opportunity to collaborate with and learn from the amazing research team at Google. It was an end-to-end experience where I worked hands-on across all components of a usability study—starting with drafting the study plan, followed by creating the study kit, managing recruitment, conducting the tests, and finally analyzing the data.

I actively contributed to the creation of the plan and kits, moderated tests, took notes, and analyzed data to generate actionable insights aimed at improving Google Vertex AI Studio’s loginless web experience.

PROCESS BELOW ↓
TIMELINE
Originally planned as a 10-week project, we swiftly adapted our approach to meet an accelerated 8-week timeline due to scheduling changes.

This is how we structured it.

RESEARCH QUESTIONS
We started with the following research questions:
Is the loginless experience usable and satisfying?
What are the major frictions?
Does the loginless experience provide enough capabilities to engage and entice new users to sign up for more access?
TOOL FAMILIARISATION
To get acquainted with the tool first, we conducted cognitive walkthroughs ourselves and created interaction maps.

This was a crucial step that helped us learn more about Vertex AI, as none of us had any background in development. We were able to explore and understand common terminology or "developer lingo" that helped us empathize with our participants and better understand the reasoning behind certain decisions they made while interacting with the product.

A glimpse of how the interaction mapping was done to understand all the current flows & features

A glimpse of how the interaction mapping was done to understand all the current flows & features

RECRUITMENT
We leveraged personal connections, and University of Washington's CS Directories to recruit participants for our study.

We used convenience sampling from personal connections and large-scale email outreach (~1500 emails) to computer science students at University of Washington to recruit 8 participants who have some level of experience in AI application development.

By sending out screening surveys, we were able to identify qualified participants and maintain integrity of our results by excluding those who had previous experience with Vertex AI and those who were only available for remote testing.

INCLUSION CRITERIA AND FILTERING
Out of a total 21 responses, we narrowed down on 8 participants.

Below was our inclusion criteria:

  • The participant must have a developer background with familiarity with programming languages and some experience using AI platforms to build applications.

  • The participant must be willing to participate in-person on the University of Washington campus to ensure data collection accuracy and consistency across all participants.

Participants varied in occupation, familiarity with AI tools, the context in which they used them, and the AI development skills they had.

TESTING PROCEDURE
A rigorous multi-method testing protocol: Creating a controlled environment for comprehensive user insights.

We conducted in-person usability sessions in isolated study rooms on the UW campus. Each session was recorded with participant consent, which was obtained through a signed consent form at the start. The sessions began with reading a standard scenario, followed by a set of tasks created specifically for each participant. Each session involved a team of four: one moderator, one person collecting qualitative insights, one person tracking time to task completion, and one person noting the number of clicks to success and identifying any click path errors.

After each task, participants completed a post-task questionnaire to provide quantitative scores based on standard NASA-TLX and CSAT questions and qualitative feedback on their frustrations or satisfaction with the task. At the end of the test, participants filled out a post-test questionnaire based on standard SUS and NPS questions, summarizing their overall experience with Vertex AI after completing all the tasks. This allowed us to gather comprehensive insights into their interaction with the tool.

DATA COLLECTION
Here's how we streamlined our data collection.

As a team of 4 each with their roles assigned to them, below is how we planned our data collection process to ensure a smooth and efficient collection of data during the in-person usability tests.

Contents under NDA

DATA ANALYSIS
We combined qualitative feedback with quantitative metrics to validate our findings.

After collecting the data, we conducted affinity mapping with qualitative notes from observations, interviews, and questionnaires to identify key patterns. We extracted representative quotes from session recordings that highlighted specific Vertex AI issues, ensuring diverse participant representation to validate our findings.

We analyzed quantitative metrics including task completion times and click counts, comparing these against our benchmarks. This quantitative data corroborated our qualitative insights, strengthening our findings' credibility.

From post-task and post-test questionnaires, we created visualization charts for satisfaction scores (CSAT), usability ratings (SUS), and cognitive load measurements (NASA-TLX), providing clear visual evidence of participants' experiences with the platform.

Affinity Mapping, grouping of findings (Under NDA)

FINAL OUTCOME
We presented 18 findings along with our recommendations to Google's UXR team at the SLU Google office, Seattle.

These insights are expected to guide the prioritization of future updates for Google Vertex AI Studio, helping the team address key usability challenges and improve the platform’s loginless web experience.


We organized and presented our findings in the following categories:

Task-based findings: We organized the results of our usability tests and questionnaires based on the tasks through which they were identified. We rated the severity of each finding on a 4-point scale, sourced from the Handbook of Usability Testing by Jeffrey Rubin. We pulled from a lot of user quotes, leaning heavily on the qualitative aspects to make the reasons behind the issues faster to grasp and understand.


Significant concerns beyond task-based issues: We discovered issues that were unrelated to the tasks but could directly be linked to our research questions around user retention.


Quantitative metrics: These metrics supported our qualitative insights.

Below are snippets peeking at how we presented our quantitative findings. Due to an NDA clause, I'm unable to share the insights that came out of this study.

REFLECTION
What would I do differently?
  1. Increase dry runs before testing: While our initial dry runs proved valuable in refining our test kit, additional sessions would have allowed us to further optimize our methods. The feedback from these runs helped us eliminate distracting elements and strengthen components that directly supported our research goals.

  2. Deepen technical domain knowledge prior to testing: Despite our interaction mapping and initial research, we encountered unfamiliar developer terminology during testing sessions. Although this didn't compromise our study results, it created momentary confusion that could have been avoided with more thorough preparation in product-specific technical language.


  3. Eye tracking: We initially tried to get eye tracking equipment for our project but due to technical reasons it did not work out, and we did not have enough time to issue new equipment. But in the future I would love to explore this additional tool to get some cool insights, and have a much richer data that would reveal about natural click paths.

Designer by day, music enthusiast by night. Drop me a line about your project or that track you can't stop playing.

Designer by day, music enthusiast by night. Drop me a line about your project or that track you can't stop playing.

© Siddharth Hardikar

Illustrated, animated & built with love.

Designer by day, music enthusiast by night. Drop me a line about your project or that track you can't stop playing.

© Siddharth Hardikar