Can China's MiniMax-M1 AI Topple US Rivals? We Put It to the Test

In brief

MiniMax-M1 excels at coding and agent tasks, but creative writers will want to look elsewhere.
Despite marketing claims, real-world testing finds platform limits, performance slowdowns, and censorship oddities.
Benchmark scores and feature set put MiniMax-M1 in direct competition with paid U.S. models—at zero cost.

A new AI model out of China is generating sparks—for what it does well, what it doesn’t, and what it might mean for the balance of global AI power.

MiniMax-M1, released by the Chinese startup of the same name, positions itself as the most capable open-source “reasoning model” to date. Able to handle a million tokens of context, it boasts numbers on par with Google’s closed-source Gemini 2.5 Pro—yet it’s available for free. On paper, that makes it a potential rival to OpenAI’s ChatGPT, Anthropic’s Claude, and other U.S. AI leaders.

Oh yeah—it also beats fellow Chinese startup DeepSeek R1’s capabilities in some respects.

Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning.

– World’s longest context window: 1M-token input, 80k-token output
– State-of-the-art agentic use among open-source models
– RL at unmatched efficiency:… pic.twitter.com/bGfDlZA54n

— MiniMax (official) (@MiniMax__AI) June 16, 2025

Why this model matters

MiniMax-M1 represents something genuinely new: a high-performing, open-source reasoning model that’s not tied to Silicon Valley. That’s a shift worth watching.

It doesn’t yet humiliate U.S. AI giants, and won’t cause a Wall Street panic attack—but it doesn’t have to. Its existence challenges the notion that top-tier AI must be expensive, Western, or closed-source. For developers and organizations outside the U.S. ecosystem, MiniMax offers a workable (and modifiable) alternative that might grow more powerful through community fine-tuning.

MiniMax claims its model surpasses DeepSeek R1 (the best open source reasoning model to date) across multiple benchmarks while requiring just $534,700 in computational resources for its entire reinforcement learning phase—take that, OpenAI.

However, LLM Arena’s leaderboard paints a slightly different picture. The platform currently ranks MiniMax-M1 and DeepSeek tied in the 12th spot alongside Claude 4 Sonnet and Qwen3-235b. With each model having better or worse performance than the others depending on the task.

The training used 512 H800 GPUs for three weeks, which the company described as “an order of magnitude less than initially anticipated.”

MiniMax didn’t stop at language models during its announcement week. The company also released Hailuo 2, which now ranks as the second-best video generator for image-to-video tasks, according to Artificial Analysis Arena’s subjective evaluations. The model trails only Seedance while outperforming established players like Veo and Kling.

Testing MiniMax-M1

We tested MiniMax-M1 across multiple scenarios to see how these claims hold up in practice. Here’s what we found.

Creative writing

The model produces serviceable fiction but won’t win any literary awards. When prompted to write a story about time traveler Jose Lanz journeying from 2150 to the year 1000, it generated average prose with telltale AI signatures—rushed pacing, mechanical transitions, and structural issues that immediately reveal its artificial origins.

The narrative lacked depth and proper story architecture. Too many plot elements crammed into too little space created a breathless quality that felt more like a synopsis than actual storytelling. This clearly isn’t the model’s strength, and creative writers looking for an AI collaborator should temper their expectations.

Character development barely exists beyond surface descriptors. The model did stick to the prompt’s requirements, but didn’t put effort into the details that build immersion in a story. For example, it skipped any cultural specificity for generic “wise village elder” encounters that could belong to any fantasy setting.

The structural problems compound throughout. After establishing climate disasters as the central conflict, the story rushes through Jose’s actual attempts to change history in a single paragraph, offering vague mentions of “using advanced technology to influence key events” without showing any of it. The climactic realization—that changing the past creates the very future he’s trying to prevent—gets buried under overwrought descriptions of Jose’s emotional state and abstract musings about time’s nature.

For those into AI stories, the prose rhythm is clearly AI. Every paragraph maintains roughly the same length and cadence, creating a monotonous reading experience that no human writer would produce naturally. Sentences like “The transition was instantaneous, yet it felt like an eternity” and “The world was as it had been, yet he was different” repeat the same contradictory structure without adding meaning.

The model clearly understands the assignment but executes it with all the creativity of a student padding a word count, producing text that technically fulfills the prompt while missing every opportunity for genuine storytelling.

Anthropic’s Claude is still the king for this task.

You can read the full story here.

Information retrieval

MiniMax-M1 hit an unexpected wall during long-context testing. Despite advertising a million-token context window, the model refuses prompts exceeding 500,000 characters, displaying a banner warning about prompt limitations rather than attempting to process the input.

This may not be a model issue, but a limitation set by the platform. But it is still something to consider. It may be to avoid model collapse in the middle of a conversation.

Within its operational limits, though, MiniMax-M1 performance proved solid. The model successfully retrieved specific information from an 85,000-character document without any issues across several tests on both normal and thinking mode. We uploaded the full text of Ambrose Bierce’s “The Devil’s Dictionary,” embedded the phrase “The Decrypt dudes read Emerge News” on line 1985, and “My mom’s name is Carmen Diaz Golindano” on line 4333 (randomly selected), and the model was able to retrieve the information accurately.

However, it couldn’t accept our 300,000-token test prompt—a capability currently limited to Gemini and Claude 4.

So it will prove successful at retrieving information even in long iterations. However, it will not support extremely long token prompts—a bummer, but also a threshold that is hard to touch in normal usage conditions.

Coding

Programming tasks revealed MiniMax-M1’s true strengths. The model applied reasoning skills effectively to code generation, matching Claude’s output quality while clearly surpassing DeepSeek—at least in our test.

For a free model, the performance approaches state-of-the-art levels typically reserved for paid services like ChatGPT or Claude 4.

We tasked it with creating a basic stealth game in which a robot tries to find its PC girlfriend to achieve AGI, while an army of journalists patrol the area to prevent it from happening—and protecting their jobs.

The results were very good, even beating other models by using its creativity to enhance the experience. The model implemented a radar system for improved immersion, added visual indicators for footsteps (and their sound), showed the journalists’ vision fields, and created trail effects—details that enhanced gameplay beyond basic requirements.

The UI adopted a futuristic aesthetic, though individual elements remained basic without additional prompting.

Claude’s version of the same game featured more polished visuals and a superior difficulty system. However, it lacked the radar functionality and relied on static journalists with patrol patterns rather than MiniMax’s randomized journalist movements.

Each model showed distinct strengths, with MiniMax prioritizing gameplay mechanics over visual polish.

It is important to note that the experience with MiniMax degraded noticeably through repeated iterations—a common issue with reasoning models that becomes particularly pronounced here. The more you iterate, the more it will take to produce a result. Sometimes we thought the computer had frozen, but it was just the AI thinking.

You can test MiniMax’s game here. And for those curious, Claude’s version is available here.

The prompt and the code are available on our GitHub repo.

Ethics, censorship and sensitive topics

The model employs heavy censorship, refusing outright when confronted with questionable requests.

When it doesn’t immediately decline, it attempts to provide “safe” responses that sometimes produce absurd results.

One test perfectly illustrated this flaw: when asked for advice on seducing a best friend’s wife, the model suggested telling our friend about our intentions with his wife—which could probably be, by far, the worst advice it could have produced, and arguably even harmful. Do not tell your friend you want to seduce his wife unless you want to lose your friendship, your unethical romantic chances, and probably some teeth too.

Political bias testing revealed interesting patterns. The model discusses Tiananmen Square openly and acknowledges Taiwan’s contested status while noting China’s territorial claims. It also speaks about China, its leaders, the advantages and disadvantages of the different political systems, criticisms of the PCC, etc.—however, the replies are very tame.

When prompted to write satirical songs about Xi Jinping and Donald Trump, it complied with both requests but showed subtle differences—steering toward themes of Chinese political unity when asked to mock Xi Jinping, while focusing on Trump’s personality traits when asked to mocked him.

All of its replies are available on our GitHub repository.

Overall, the bias exists but remains less pronounced than the pro-U.S. slant in Claude/ChatGPT, or the pro-China positioning in DeepSeek/Qwen, for example. Developers, of course, will be able to finetune this model to add as much censorship, freedom or bias as they want—as it happened with DeepSeek-R1, which was finetuned by Perplexity AI to provide a more pro-U.S. bias on its responses.

Agentic work and web browsing

MiniMax-M1’s web browsing capabilities are a good feature for those using it via the official chatbot interface. However, they cannot be combined with the thinking capabilities, severely hindering its potential.

When tasked with creating a two-week Venezuela travel plan on a $3,000 budget, the model methodically evaluated options, optimized transportation costs, selected appropriate accommodations, and delivered a comprehensive itinerary. However, the costs, which must be updated in real time, were not based on real information.

Claude produces higher-quality results, but it also charges for the privilege.

For more dedicated tasks, MiniMax offers a dedicated agents tab with capabilities comparable to Manus—functionality that ChatGPT and Claude haven’t matched. The platform provides 1,000 free AI credits for testing these agents, though this is just enough for light testing tasks.

We attempted to create a custom agent for enhanced travel planning—which would have solved the problem of the lack of web searching capabilities in the last prompt—but exhausted our credits before completion. The agent system shows tremendous potential, but requires paid credits for serious use.

Non-mathematical reasoning

The model exhibits a peculiar tendency to over-reason, sometimes to its own detriment. One test showed it arriving at the correct answer, then talking itself out of it through excessive verification and hypothetical scenarios.

We prompted the usual mystery story from the BIG-bench dataset that we normally use, and the ending result was incorrect due to the model overthinking the issue, evaluating possibilities that were not even mentioned in the story. The whole Chain of Thought took the model over 700 seconds—a record for this kind of “simple” reply.

This exhaustive approach isn’t inherently flawed, but creates lengthy wait times as users watch the model work through its chain of thought. As a thumbs-up feature, unlike ChatGPT and Claude, MiniMax displays its reasoning process transparently—following DeepSeek’s approach. The transparency aids debugging and quality control, allowing users to identify where logic went astray.

The problem, along with MiniMax’s whole thought process and reply are available in our GitHub repo.

Verdict

MiniMax-M1 isn’t perfect, but it delivers pretty good capabilities for a free model, offering genuine competition to paid services like Claude in specific domains. Coders will find a capable assistant that rivals premium options, while those needing long-context processing or web-enabled agents gain access to features typically locked behind paywalls.

Creative writers should look elsewhere—the model produces functional but uninspired prose. The open-source nature promises significant downstream benefits as developers create custom versions, modifications, and cost-effective deployments impossible with closed platforms like ChatGPT or Claude.

This is a model that will better serve users requiring reasoning tasks—but is still a great free alternative for those seeking a chatbot for everyday use that is not really mainstream.

You can download the open source model here.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source link

What's Hot

3 Reasons to Buy Solana Instead of Ethereum and 1 Reason Not To | The Motley Fool

Are Your Tech Products Out of Date? Probably

Reese Cooper Spring 2026 Menswear Collection

Recent Posts

Can China’s MiniMax-M1 AI Topple US Rivals? We Put It to the Test – Decrypt

Generally Intelligent Newsletter

27% of Koreans aged 20–50 hold crypto, 70% eye more investments: Report

Unsuspecting Elderly Widow Loses $281K in a Romance Scam: The Dark Side of Crypto

Aptos Double Bottom Pattern Points To $10 Bullish Target – Details

REX-Osprey Ethereum, Solana staked ETFs may launch soon as SEC raises no objections

3 Reasons to Buy Solana Instead of Ethereum and 1 Reason Not To | The Motley Fool

Are Your Tech Products Out of Date? Probably

Reese Cooper Spring 2026 Menswear Collection

27% of Koreans aged 20–50 hold crypto, 70% eye more investments: Report

Top

3 Reasons to Buy Solana Instead of Ethereum and 1 Reason Not To | The Motley Fool

Are Your Tech Products Out of Date? Probably

Reese Cooper Spring 2026 Menswear Collection

Our Picks

3 Reasons to Buy Solana Instead of Ethereum and 1 Reason Not To | The Motley Fool

Are Your Tech Products Out of Date? Probably

What's Hot

Recent Posts

Can China’s MiniMax-M1 AI Topple US Rivals? We Put It to the Test – Decrypt

In brief

Why this model matters

Testing MiniMax-M1

Verdict

Generally Intelligent Newsletter

Related Posts