Surveying 6,000 Participants Globally on AI Model Performance in UI/UX and Coding: Insights Revealed

Exploring AI Performance in UI/UX Design and Development: Insights from a Global Survey

In recent months, I embarked on a comprehensive research project to evaluate how various AI models perform in the realms of user interface/user experience (UI/UX) design and coding. This study involved collecting feedback from a diverse international audience—over 6,000 respondents—who participated in testing and rating multiple AI platforms.

A Transparent Approach

All the data and model outputs used in this analysis are freely accessible, open-source, and generated at no cost. My goal is purely to share insights and findings from this crowdsourced effort, without any commercial benefit.

Developing a Benchmark for AI-Generated Design and Development

During this period, I created a crowdsourced benchmarking platform—Design Arena—that enables users to generate websites, games, 3D models, and data visualizations from different AI models. The platform allows for direct comparison of model outputs, empowering users to determine which AI solutions excel in specific tasks.

Since launching, nearly 5,000 users have actively participated, submitting close to 4,000 votes. Based on this extensive dataset, here are the key takeaways:

Leading AI Models for UI/UX and Coding

Claude and DeepSeek models stand out as top performers in both coding and design tasks. The user preferences highlight Claude Opus as the overall favorite, particularly for interface development. The leaderboard ranks DeepSeek models (notably version 0, which excels in website generation) and Grok as strong contenders—Grok’s surprise placement is partly due to its rapid development pace. However, it’s worth noting that DeepSeek models tend to operate slower, making Claude a more practical choice for tasks demanding quick turnaround.

The Underappreciated Power of Grok 3

While not as widely recognized as Claude or GPT, Grok 3 deserves attention. Despite limited online popularity—possibly influenced by external factors such as its association with Elon Musk—Grok 3 consistently ranks within the top five and offers remarkable speed. Its efficiency makes it an appealing option for developers seeking quick and reliable outputs.

Variability in Gemini 2.5-Pro

Gemini 2.5-Pro presents a mixed bag. User feedback indicates that while it performs admirably in certain UI/UX scenarios, it sometimes produces poorly designed applications. Interestingly, the model demonstrates strong capabilities in coding business

Website Development

Hubsadmin