Exploring AI Model Performance in UI/UX Design and Coding: Insights from a Global Survey
In recent months, I conducted a comprehensive survey involving over 6,000 respondents worldwide to assess how various AI models perform in the realms of user interface/user experience (UI/UX) design and coding tasks. This research aims to provide valuable insights for developers, designers, and AI enthusiasts alike. All data collected and AI outputs generated are open-source and freely accessibleโI do not profit from this endeavor; I simply wish to share the findings.
Developing a Crowdsourced Benchmark for AI-Generated Design
To facilitate this analysis, I created a platform called Design Arena, a collaborative benchmark where users can generate websites, games, 3D models, and data visualizations using different AI models. Participants can compare outputs directly and gauge which models excel in specific areas. To date, nearly 4,000 votes have been cast by approximately 5,000 active users, providing a robust dataset for evaluation.
Key Findings from the Survey
- Top Performers in Coding and Design: Claude and DeepSeek Lead the Pack
Among the evaluated models, Claude (particularly the Claude Opus variant) and DeepSeek stand out. Users overwhelmingly favored Claude for its versatility and quality, especially in interface implementation. The top eight positions on our leaderboard feature Claude models, with DeepSeek v0 making a strong showingโparticularly excelling in website generationโand Grok emerging as an unexpected contender due to its promising capabilities. Notably, while DeepSeek models produce high-quality results, they tend to operate slowly, making Claude the preferred choice for interactive development environments.
- Grok 3: An Underappreciated Powerhouse
Despite less online visibility, Grok 3 has proven to be a remarkably efficient model. It consistently ranks within the top five, delivering faster results than many peersโan impressive feat given its relatively low profile, possibly due to its association with Elon Musk and related controversies.
- Gemini 2.5-Pro: A Mixed Bag
The Gemini 2.5-Pro model received mixed reviews. Some users praised its UI/UX capabilities, while others reported that it occasionally produces poorly designed applications. Interestingly, despite this inconsistency, Gemini excels at coding business logic, making it a useful tool for specific workflows.
- Midfield Performers: GPT and Llama
OpenAI’s GPT models

