The next-gen ‘truth-seeking’ AI model

Photo of Elon Musk alongside other artificial intelligence researchers from xAI during the Grok 3 AI model launch showing reasoning benchmarks compared to other leading models from OpenAI, Google, and DeepSeek.

xAI unveiled its Grok 3 AI model on Monday, alongside new capabilities such as image analysis and refined question answering.

The company harnessed an immense data centre equipped with approximately 200,000 GPUs to develop Grok 3. According to xAI owner Elon Musk, this project utilised “10x” more computing power than its predecessor, Grok 2, with an expanded dataset that reportedly includes information from legal case filings.

Musk claimed that Grok 3 is a “maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically-correct.”

The Grok 3 rollout includes a family of models designed for different needs. Grok 3 mini, for example, prioritises faster response times over absolute accuracy. However, particularly noteworthy are the new reasoning-focused Grok 3 models.

Dubbed Grok 3 Reasoning and Grok 3 mini Reasoning, these variants aim to emulate human-like cognitive processes by “thinking through” problems. Comparable to models like OpenAI’s o3-mini and DeepSeek’s R1, these reasoning systems attempt to fact-check their responses—reducing the likelihood of errors or missteps.

Grok 3: The benchmark results

xAI asserts that Grok 3 surpasses OpenAI’s GPT-4o in certain benchmarks, including AIME and GPQA, which assess the model’s proficiency in tackling complex problems across mathematics, physics, biology, and chemistry.

The early version of Grok 3 is also currently leading on Chatbot Arena, a crowdsourced evaluation platform where users pit AI models against one another and rank their outputs. The model is the first to break the Arena’s 1400 score.

BREAKING: @xAI early version of Grok-3 (codename “chocolate”) is now #1 in Arena! 🏆

Grok-3 is:– First-ever model to break 1400 score!– #1 across all categories, a milestone that keeps getting harder to achieve

Huge congratulations to @xAI on this milestone! View thread 🧵… https://t.co/p8z8lccNd5 pic.twitter.com/hShGy8ZN1o

— lmarena.ai (formerly lmsys.org) (@lmarena_ai) February 18, 2025

According to xAI, Grok 3 Reasoning outperforms its rivals on a variety of prominent benchmarks:

These reasoning models are already integrated into features available via the Grok app. Users can select commands like “Think” or activate the more computationally-intensive “Big Brain” mode for tackling particularly challenging questions.

xAI has positioned the reasoning models as ideal tools for STEM (science, technology, engineering, and mathematics) applications, including mathematics, science, and coding challenges.

Guarding against AI distillation

Interestingly, not all of Grok 3’s internal processes are laid bare to users. Musk explained that some of the reasoning models’ “thoughts” are intentionally obscured to prevent distillation—a controversial practice where competing AI developers extract knowledge from proprietary models.

The practice was thrust into the spotlight in recent weeks after Chinese AI firm DeepSeek faced allegations of distilling OpenAI’s models to develop its latest model, R-1.

xAI’s new reasoning models serve as the foundation for a new Grok app feature called DeepSearch. The feature uses Grok models to scan the internet and Musk’s social platform, X, for relevant information before synthesising a detailed abstract in answer to user queries.

Accessing Grok 3 and committing to open-source

Access to the latest Grok model is currently tied to X’s subscription tiers. Premium+ subscribers, who pay $50 (~£41) per month, will receive priority access to the latest functionalities.

xAI is also introducing a SuperGrok subscription plan, reportedly priced at either $30 per month or $300 annually. SuperGrok subscribers will benefit from enhanced reasoning capabilities, more DeepSearch queries, and unlimited image generation features.

The company also teased upcoming features. Within a week, the Grok app is expected to introduce a voice mode—enabling users to interact with the AI through a synthesised voice similar to Gemini Live.

Musk further revealed plans to release Grok 3 models via an enterprise-ready API in the coming weeks, with DeepSearch functionality included.

Although Grok 3 is still fresh, xAI intends to open-source its predecessor in the coming months. Musk claims that xAI will continue to open-source the last version of Grok.

“When Grok 3 is mature and stable, which is probably within a few months, then we’ll open-source Grok 2,” explains Musk.

The ‘anti-woke’ AI model

Grok has long been marketed as unfiltered, bold, and willing to engage with queries that competitors might avoid. Musk previously described the AI as “anti-woke,” presenting it as a model unafraid to touch on controversial topics.

True to its promise, early models like Grok and Grok 2 embraced politically-charged queries, even veering into colourful language when prompted. Yet, these versions also revealed some biases when delving deep into political discourse.

“We’re working to shift Grok closer to politically-neutral,” said Musk.

However, whether Grok 3 achieves this goal remains to be seen. With such changes at play, analysts are already highlighting the potential societal impacts of introducing increasingly “truth-seeking” yet politically-sensitive AI systems.

With Grok 3, Musk and xAI have made a bold statement, pushing their technology forward while potentially fuelling debates around bias, transparency, and the ethics of AI deployment.

As competitors like OpenAI, Google, and DeepSeek refine their offerings, Grok 3’s success will hinge on its ability to balance accuracy, user demand, and societal responsibility.

See also: AI in 2025: Purpose-driven models, human integration, and more

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Source link