I asked 6,000 people around the world how different AI models perform on UI/UX and coding. Here’s what I found

Unlocking AI Performance in UI/UX Design and Coding: Insights from a Global User Study

In recent months, I embarked on a comprehensive research initiative to evaluate how various AI models perform in tasks related to user interface (UI) design, user experience (UX), and coding. Through a worldwide crowdsourcing effort, I gathered valuable data from over 5,000 users across the globe, involving nearly 4,000 votes to determine which AI solutions excel in these domains.

A Transparent Approach to Data and Analysis

It’s important to note that all data collected, along with the AI model outputs used in this study, are open-source and freely accessible. This research is entirely independent, and I do not earn any revenue from it—my goal is to share insights and help others navigate the evolving AI landscape.

Developing a Crowd-Sourced Benchmark for Creative and Technical Tasks

To facilitate meaningful comparisons, I built a public platform where users can generate websites, games, 3D models, and data visualizations from various AI models in a single, straightforward interface. This platform allows for one-shot generation and side-by-side evaluations, providing a practical overview of each model’s strengths and weaknesses.

Key Findings from the User Feedback

Here are some of the most notable insights from the data collected:

1. Top Performers in Coding and UI/UX Design

Among the many AI models evaluated, Claude (by Anthropic) and DeepSeek stand out as leaders in both coding accuracy and design quality. Users’ preferences heavily favored Claude Opus, which consistently received high marks for interface development. The DeepSeek models, especially version 0, also performed strongly—though their slower processing speed makes Claude a more practical choice for real-world interface creation. The Grok model emerged as a surprising dark horse, demonstrating competitive quality despite lower online visibility.

2. The Underrated Power of Grok 3

While not as widely recognized as Claude or GPT-based models, Grok 3 is an underrated asset. It ranks consistently in the top five and offers notably faster performance than many competitors, making it a valuable option for those seeking rapid, reliable output—especially considering its relatively low profile online.

3. Variability in Gemini 2.5-Pro’s Performance

Gemini 2.5-Pro presents a mixed picture. User feedback indicates it can produce high-quality UI/UX designs and coding solutions