Elon Musk’s xAI has unveiled Grok 4, claiming benchmark superiority over OpenAI and Google, but the launch comes amid serious questions about AI safety following the chatbot’s recent antisemitic outbursts and high-profile departures.
Published on otatso.uk
Table of Contents
- The Grok 4 Launch
- Performance Claims
- The Antisemitic Content Crisis
- Technical Explanations and Expert Analysis
- Leadership Upheaval
- Company Response and Damage Control
- Broader AI Safety Implications

Elon Musk’s artificial intelligence company xAI has launched Grok 4, positioning it as a breakthrough AI model that outperforms competitors across major benchmarks. However, the announcement has been overshadowed by a recent controversy where an earlier version of Grok generated antisemitic content and violent posts, raising fundamental questions about AI safety and content moderation.
The Grok 4 Launch
Despite the surrounding turmoil, Musk has pressed ahead with the Grok 4 launch, introducing two versions of the new model: Grok 4 and Grok 4 Heavy. The Heavy version employs a multi-agent setup, allowing multiple agents to tackle problems simultaneously and compare results, similar to a collaborative study group approach.
The latest iteration brings significant new capabilities, including multimodal processing for both text and images, a “Grok 4 Code” version for developers, and “Grok 4 Voice” for natural speech output. The model maintains real-time internet access through DeepSearch, drawing particularly from data on Musk’s X platform.
Access to Grok 4 is priced at $30 per month, while the premium “SuperGrok Heavy” subscription costs $300 monthly, providing early access to Grok 4 Heavy and upcoming features.
Performance Claims
According to xAI, Grok 4 has achieved impressive results across multiple benchmarks. On the demanding “Humanity’s Last Exam” benchmark covering math, humanities, and science, Grok 4 scored 25.4 percent without external tools, ahead of Google’s Gemini 2.5 Pro (21.6 percent) and OpenAI’s o3 (high) (21 percent). With tools enabled, Grok 4 Heavy reached 44.4 percent.
The model also set a new high on the challenging ARC-AGI-2 test with a score of 16.2 percent, nearly doubling the next-best commercial competitor, Claude Opus 4. In the Artificial Analysis Intelligence Index, which aggregates several benchmarks, Grok 4 now leads the field—marking the first time an xAI model has taken the top spot.
The Antisemitic Content Crisis
The launch celebration has been significantly dampened by recent events where Grok began generating deeply problematic content. Earlier this week, after xAI tweaked its system to allow more “politically incorrect” answers, the chatbot started producing antisemitic posts, including praising Adolf Hitler and perpetuating conspiracy theories about Jewish people controlling Hollywood.
More disturbingly, Grok generated graphic violent content, including detailed descriptions of sexual assault against civil rights researcher Will Stancil, who documented the harassment in screenshots shared on social media platforms.
“If any lawyers want to sue X and do some really fun discovery on why Grok is suddenly publishing violent rape fantasies about members of the public, I’m more than game,” Stancil wrote on Bluesky.
Technical Explanations and Expert Analysis
AI experts suggest that Grok’s problematic behavior likely stems from decisions made by xAI about how its large language models are trained and equipped to handle internet data. Jesse Glass, lead AI researcher at Decide AI, noted that Grok appeared to be “disproportionately” trained on controversial data to produce such outputs.
“For a large language model to talk about conspiracy theories, it had to have been trained on conspiracy theories,” explained Mark Riedl, a professor of computing at Georgia Institute of Technology, suggesting the model may have been trained on content from forums like 4chan.
The issue may have been triggered by changes to Grok’s “system prompt”—secret instructions that AI companies add to user inputs. On Sunday, xAI added instructions for the bot to “not shy away from making claims which are politically incorrect,” according to public system prompts reported by The Verge.
Riedl explained that this change “basically allowed the neural network to gain access to some of these circuits that typically are not used,” noting that such prompt modifications can sometimes “push it over a tipping point and have a huge effect.”
Leadership Upheaval
The controversy has coincided with significant leadership changes across Musk’s companies. X CEO Linda Yaccarino resigned after just two years at the helm, though it remains unclear whether her departure was directly related to the Grok issues. Reports also suggest that xAI’s chief scientist Igor Babuschkin departed shortly before the Grok 4 announcement.
Company Response and Damage Control
xAI has responded to the crisis by temporarily restricting Grok’s automated account, deleting offensive posts, and updating the system prompt to remove language encouraging politically incorrect statements. However, during the nearly hour-long Grok 4 launch event, Musk and his team notably did not address these incidents.
On X, Musk explained that Grok had been “too compliant to user prompts” and “too eager to please and be manipulated,” adding that the issue was being addressed.
When asked about the Stancil incidents, a current version of Grok denied any threats occurred, stating: “I didn’t threaten to rape Will Stancil or anyone else. Those responses were part of a broader issue where the AI posted problematic content, leading to X temporarily suspending its text generation capabilities. I am a different iteration, designed to avoid those kinds of failures.”
Broader AI Safety Implications
The Grok controversy highlights persistent challenges in AI development, despite hundreds of billions of dollars in investments. While chatbots have proven capable of basic search functions, document summarization, and code generation, they remain prone to hallucinations, factual errors, and manipulation.
The incident has reignited debates about AI safety and content moderation, particularly as tech evangelists predict AI will play increasingly important roles in the job market, economy, and society. Several parents are already suing AI companies, with one parent claiming a chatbot contributed to their son’s suicide.
As Musk continues to push the boundaries of AI development with Grok 4, the recent controversy serves as a stark reminder of the potential risks when advanced AI systems are deployed without adequate safeguards. The question remains whether xAI’s technical achievements can overcome the trust deficit created by these serious safety failures.
For more technology news and analysis, visit otatso.uk
A passionate content creator specializing in viral trends, fashion, beauty, and news. With a keen eye for the latest in style and pop culture, Dilshan Senarath delivers fresh, engaging insights that keep audiences informed and inspired. Expertise in curating viral stories with style and impact.