Gutenberg Technology | CMS

Gutenberg Technology | CMS

Redesigning AI from Hidden Feature to Intuitive Tool for CMS by Eye-Tacking Usability Testing

Redesigning AI from Hidden Feature to Intuitive Tool for CMS by Eye-Tacking Usability Testing

Gutenberg Technologies was advancing its AI-assisted CMS, but they lacked solid research for its design decisions. Our team conducted an in-depth usability study, using eye-tracking alongside task-based evaluation, and retrospective think-aloud (RTA) interviews to understand how first-time users navigate and apply these AI tools. This unique, visual methodology allowed us to precisely pinpoint systemic issues, such as poor discoverability, confusing terminology, and difficult version comparison. Through visual data, behavioral evidence, and user sentiment, we identified three major findings and proposed design recommendations to align with users’ mental models. This case study outlines the research process, key findings, and high-impact recommendations that guided GT’s 2026 roadmap for improving the AI authoring experience.

Gutenberg Technologies was advancing its AI-assisted CMS, but they lacked solid research for its design decisions. Our team conducted an in-depth usability study, using eye-tracking alongside task-based evaluation, and retrospective think-aloud (RTA) interviews to understand how first-time users navigate and apply these AI tools. This unique, visual methodology allowed us to precisely pinpoint systemic issues, such as poor discoverability, confusing terminology, and difficult version comparison. Through visual data, behavioral evidence, and user sentiment, we identified three major findings and proposed design recommendations to align with users’ mental models. This case study outlines the research process, key findings, and high-impact recommendations that guided GT’s 2026 roadmap for improving the AI authoring experience.

Content Audit

Eye-Tracking Usability Test

UX/UI

UX Consultancy

My Role

UX Researcher
Project Manager
UI Designer

Team

Matt: UX Researcher
Anvita: Facilitator
Uraiba: UI Designer

Deliverables

Usability Findings
Recommendations
Design Mockup

Timeline

2025.08 - 2025.12

Contributions

The Part I Played in This Work

The Part I Played in This Work

Ran usability & eye-tracking sessions 👀

Planned and facilitated tests to capture real user behavior and uncover friction points.

Synthesized research findings 🧩

Consolidated heatmaps, task data, and qualitative insights into clear problem themes.

Managed the project 📈

Coordinated meetings, planned the project timeline, and ensured we are on the right track.

Built design recommendations 🎨

Shaped solutions that aligned with user mental models and GT’s product goals.

01 Overview

01 Overview

Highlights

A redesigned AI interaction that boosts discoverability and makes AI feel intuitive to use

A redesigned AI interaction that boosts discoverability and makes AI feel intuitive to use

Redesigned Generate with AI Modal

Context

Gutenberg Technologies was creating new AI features and needed clarity on what “good AI” should look like

Gutenberg Technologies was creating new AI features and needed clarity on what “good AI” should look like

Gutenberg Technologies offers a suite of AI-enabled content creation tools designed to support editors in generating learning activities, drafting assessments, and improving written material. Despite the potential value, GT suspected that these AI capabilities were not being used to their full extent. Early feedback suggested that users were unsure where AI lived within the interface, what certain options meant, and whether the system’s behavior, particularly saving and versioning, worked the way they expected.

The Problem

GT's AI features were powerful but felt hidden, unclear, and disconnected from users’ mental models

GT's AI features were powerful but felt hidden, unclear, and disconnected from users’ mental models

While GT's AI tools offered powerful capabilities, they were largely perceived by participants as "hidden, confusing, and hard to trust". This friction stifled adoption and added unnecessary cognitive load to the content creation process. This project challenged us to look not just at usability, but at how language, workflow structure, and mental models influence users’ willingness to adopt AI-driven tools.

Current website: Gaze replay showing confusion on learning activity type

Current website: Gaze replay showing confusion on learning activity type

Goals

To evaluate the AI tools' discoverability and usability to identify critical friction points for first-time users.

To evaluate the AI tools' discoverability and usability to identify critical friction points for first-time users.

02 The Process

02 The Process

Research Overview

To uncover pain points and explore solutions, we designed a plan to get quantitative and qualitative data.

To uncover pain points and explore solutions, we designed a plan to get quantitative and qualitative data.

Our team was brought in to investigate these concerns through an eye-tracking usability evaluation. The project centered on two core questions:

  1. Can first-time users efficiently locate, understand, and apply the AI tools?
  2. What barriers prevent users from trusting or completing AI-assisted workflows?

To answer those questions, we created the research plan:

1

Kick-off Meeting

Understand clients' needs and constraints

2

Screener Survey

Find qualified participants for the test

3

Eye-Tracking Usability Testing

Observe where the usability issues are

4

System Usability Scale

Evaluate the usability of product

5

Retrospective-Think-Aloud

Understand user pain points and needs

6

Web Analytics

Learn behavioral data & Identify issues

7

Rainbow Sheet

Organize research findings and observations

8

Problem Lists

Identify the severity rates of each usability problem

5

Retrospective-Think-Aloud

Understand user pain points and needs

6

Web Analytics

Learn behavioral data & Identify issues

4

System Usability Scale

Evaluate the usability of product

1

Kick-off Meeting

Understand clients' needs and constraints

2

Screener Survey

Find qualified participants for the test

3

Eye-Tracking Usability Testing

Observe where the issues are

4

System Usability Scale

Evaluate the usability of product

5

Retrospective-Think-Aloud

Understand user pain points and needs

6

Web Analytics

Learn behavioral data & Identify issues

7

Research Findings

Organize research findings and observations

8

Problem Lists

Identify the severity rates of each usability problem

Test Setup

To understand where users struggled and why, we designed a usability test.

To understand where users struggled and why, we designed a usability test.

Eye-Tracking Usability Testing

Background of 8 Participant

●   Work types: Educator, Graduate Assistant, Editor, Author
●   Experience with AI: 0 experience - Some experience
●   Countries: United States

Tasks

To identify usability and learnability of the AI features for the CMS, we designed the following three tasks:

Task 1

Edit content using AI and compare versions

Task 2

Generate and edit assessment questions

Task 3

Generate a learning activity with AI

I conducted eye-tracking usability test in the lab

System Usability Scale (SUS)

The system usability scale (SUS) is a 10-question survey that provides a quick and reliable measure of a product's perceived usability with scores ranging from 1-100.

Retrospective Think-Aloud (RTA)

Immediately after each task, participants reviewed their session with us. This method helped us capture the emotional context behind each hesitation, misclick, or moment of confusion.

Data Sources & Tools

Our analysis sources and tools included:

Our analysis sources and tools included:

Data

  • Heatmaps showing misplaced attention

  • Time-to-locate metrics

  • Retrospective-think-aloud recording

  • Usability survey metrics

  • Quotes and observed Behavior

Tools

  • Tobii

  • Google Form

  • Google Analytics

  • Google Excel

Key Challenge & How We Fixed It

When everything seemed smooth, we faced a huge frustration.

⚠️

⚠️

Screenshot from eye-tracking platform showing no media is available for data visualization

A primary challenge for this study was the lack of existing systemic data. Our research focused on new AI features, which primarily operated within modals. However, it was hard to track related activities on Google Analytics and Tobii (eye-tracking platform), which provided virtually no usable quantitative metrics to triangulate our findings. To compensate for this data constraint and ensure evidence triangulation, we strategically incorporated rich qualitative data alongside our visual insights:

  • Completion Time: To objectively measure efficiency and friction.

  • User Quotes (Verbatims): To capture the user's mental model and articulate their frustrations.

  • Heatmaps and Gaze Replay: To visualize the users' visual attention and navigation paths.

03 The Result

03 The Result

Overall Findings

The AI tools suffer from low discoverability and a lack of systemic clarity, negatively impacting the overall experience of the platform.

The AI tools suffer from low discoverability and a lack of systemic clarity, negatively impacting the overall experience of the platform.

Overall SUS Score

46.3

46.3

46.3

/ 100

/ 100

This labels the interface as somewhere between poor and okay.

Findings & Recommendations

GT’s AI tools are useful but feel hidden, confusing, and hard to trust. So, We Aimed to Improve Discoverability and Usability.

GT’s AI tools are useful but feel hidden, confusing, and hard to trust. So, We Aimed to Improve Discoverability and Usability.

  1. Discoverability Issues: Low Discoverability Across AI Editing & Generating Tools

Finding 1

While every participant was able to find the AI-related features, it took a long time, indicating low discoverability.

Participants struggled to distinguish between multiple “edit” entry points and often clicked the wrong touchpoints entirely. Many assumed the “Add” button controlled all forms of content creation, including " Generate with AI". The result was confusion, delayed task completion, and repeated backtracking.

Participant 5 didn’t fixate on the icon at all when tasked to “Edit with AI.”

Average Time

35 Sec

Participants found the “Edit with AI” button with an average time of 35s. As users are generally scanning the screen, this average time is too high when it comes to discoverability.

Participant 2 had high fixation on all of the icons across the side panel denoting the confusion she experienced while generating with AI.

“Since it was a task to create something new, I went to the add button.”

- Participant 2

Average Time

35 Sec

Participants found the “Edit with AI” button with an average time of 35s. As users are generally scanning the screen, this average time is too high when it comes to discoverability.

Recommendation 1

Consolidate all editing and generating functions into a single, predictable menu aligned with users’ mental models of content creation.

Before

"Add" and "Generate with AI" were separated into 2 tabs, and there were 3 edit icon buttons on tool bar.

After

Consolidate all edit features into a button with a pencil icon, and move "Generate with AI" under the "Add" tab.

  1. Difficulty Differentiating AI-Generated Content from Original Text

Finding 2

Due to a lack of visual clues, comparing the original and the newly generated content was hard.

Participants repeatedly expressed frustration when trying to identify what the AI changed. Many resorted to reading both versions line by line, which was slow and cognitively taxing.

Participant 8 was trying to read through all the content to identify differences.

“It is very mentally taxing to read through both things to try and see what’s changed.”

- Participant 7

“I had no idea where AI edited anything so I had to keep going back and forth”

- Participant 8

Recommendation 2

Consolidate all editing and generating functions into a single, predictable menu aligned with users’ mental models of content creation.

Before

One frame in the modal displays a version and users need to click the version control button to switch between versions.

After

Two versions are displayed side by side for easier visual comparison. Eye-tracking data showed frequent switching between versions after reading the content. The side-by-side layout reduces cognitive load on comparison.

  1. Unclear AI Terminology Confuses Users

Finding 3

While most participants successfully used the AI features, unclear AI terminology led to frustrations when completing tasks.

Users, even those familiar with AI concepts, were confused by the platform's specific jargon, leading to a lack of confidence in their selections.

Participant 3 selected all options from Bloom’s levels

“I don't know what Bloom's levels mean, so I just selected all of them.”

- Participant 3

“I'm not familiar with some of the terminology of these products.”

- Participant 7

Recommendation 3

Consolidate all editing and generating functions into a single, predictable menu aligned with users’ mental models of content creation.

Before

Participants lacked confidence when selecting options due to unclear activity definitions.

After

*Designed by me

Adjusting the activity types from a dropdown menu to a tab-based interface allows GT to include a detailed description for each type, ensuring users have a clear comprehension before making a selection.

1

Eye-Tracking Usability Testing

Observe where the issues are

2

System Usability Scale

Evaluate the usability of product

3

Retrospective-Think-Aloud

Understand user pain points and needs

4

Web Analytics

Learn behavioral data & Identify issues

04 The Conclusion

04 The Conclusion

Feedback

“Sort of confirms a lot of assumptions we had internally as well. So very reassuring to get this feedback.”

“Sort of confirms a lot of assumptions we had internally as well. So very reassuring to get this feedback.”

We delivered our findings to the GT team alongside annotated recordings, synthesized insights, and redesigned interface concepts. The client responded positively—particularly to the proposed consolidation of tools and improvements to AI-related terminology. GT confirmed that many of these recommendations align with their 2026 roadmap for merging AI and non-AI patterns to streamline the authoring experience.

Next Steps

Run A/B tests to validate the proposed design recommendations.

Run A/B tests to validate the proposed design recommendations.

The next step is to validate the hypothesis that displaying the original and newly generated content side-by-side will decrease the time spent on the "Edit with AI" modal and reduce clicks on the version control button.

To achieve this, I will execute a structured A/B test comparing the Control (A) the existing click-to-switch version against the Variation (B)the new side-by-side design.

Control (A)

One frame in the modal displays a version and users need to click the version control button to switch between versions.

Variation (B)

Two versions are displayed side by side and changes are highlighted for easier visual comparison. The layout reduces cognitive load on comparison.

Key Action Plan

Step 1. Internal Development and Review: We will create the Variation (B) design using current components and test it internally.

Step 2. Metric Setup: We will set up Optimizely and Google Analytics to track the specific metrics: Primary: Time on page for the modal , and Secondary: Clicks on the version control button.

Step 3. Formal Testing: The A/B test will be conducted with 400 participants (200 for Control A / 200 for Variant B) who have experience writing or editing educational materials, ideally sourced from the United States and France.

Step 4. Analysis: We will analyze the data from both Optimizely and Google Analytics to validate the hypothesis and generate the final recommendation for the client.

Key Takeaways

What I learned about the methodology

What I learned about the methodology

My greatest takeaway was the necessity of evidence triangulation, specifically leveraging Eye Tracking to overcome our quantitative data gap. I learned that since traditional system metrics failed to track user actions within modals, Eye Tracking was essential to quantify discoverability. It provided objective behavioral data, like the 35-second average time needed to locate the "Edit with AI" feature, that directly measured friction.

Crucially, I realized that combining this visual data with Retrospective Think-Aloud (RTA) provided the required depth. Eye Tracking showed where users looked, while RTA revealed why, ensuring our recommendations were grounded in both objective behavior and subjective intent.

Retrospective

Learning to navigate ambiguity: Finding clarity in a space with no right answers

Learning to navigate ambiguity: Finding clarity in a space with no right answers

Working with AI reminded me that design doesn’t always come with clear answers, especially when the product space is still evolving. I learned to navigate ambiguity by grounding decisions in user behavior rather than assumptions. When terminology was unclear or interactions didn’t match expectations, the solution wasn’t to find the “right” answer, but to explore possibilities until the patterns made sense. This project taught me to be comfortable with the unknown, to ask better questions, and to trust the process of uncovering clarity where none existed at the start.

Let’s get in touch!

myra.chen960193@gmail.com

Copyright 2025 by Myra Chen

Let’s get in touch!

myra.chen960193@gmail.com

Copyright 2025 by Myra Chen

Let’s get in touch!

myra.chen960193@gmail.com

Copyright 2025 by Myra Chen