Exploring AI Performance in UI/UX Design and Coding: Insights from a Global Survey
In recent months, I embarked on a comprehensive research project to evaluate how different artificial intelligence models perform in tasks related to user interface/user experience (UI/UX) design and programming. By leveraging a crowd-sourced benchmarking platform, I gathered feedback from thousands of users worldwide to compare the capabilities of various AI tools in generating websites, games, 3D models, and data visualizations.
A Collaborative Approach to AI Benchmarking
The platform I developed, DesignArena.ai, allows users to quickly generate and compare outputs from multiple AI models across different creative and technical domains. Over the course of this project, nearly 4,000 votes were cast by approximately 5,000 participants, providing a rich dataset for analysis. Itโs important to note that all the data, model outputs, and demos shared are open-source and freely accessible โ my goal is purely to share insights, not monetize.
Key Findings from the Survey
- Top Performers for Coding and Design: Claude and DeepSeek
Among the evaluated models, Claude and DeepSeek consistently ranked highest for their ability to assist with coding and UI/UX design tasks. Notably, Claude Opus emerged as the most favored, thanks to its impressive interface and output quality. The DeepSeek family also performed well, especially the v0 version, which excels in web-related projects. However, a notable drawback with DeepSeek models is their relatively slow processing times, which could impact workflow efficiency if speed is a priority.
- Grok 3: A Hidden Gem
Despite less visibility compared to giants like Claude or GPT, Grok 3 stands out as an underrated contender. It ranks consistently within the top five and boasts faster response times than many counterparts. While online chatter may overshadow this model, its performance suggests it’s worth exploring for efficient development and design tasks.
- Assessing Gemini 2.5-Pro
The performance of Gemini 2.5-Pro varies. Some users report impressive outputs, especially in UI/UX design, but others encounter less favorable results, often producing poorly structured applications. Its abilities in coding business logic are notable, but its inconsistent quality makes it a model to consider carefully before integrating it into workflow.
- **OpenAI GPT and Metaโs Llama: The Middle and the Back

