Close Menu
Global News HQ
    What's Hot

    I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why

    June 17, 2025

    Kraken-backed layer 2 Ink to launch $INK token

    June 17, 2025

    Senate Seeks Bigger $6,000 ‘Bonus’ Tax Break for Those Over 65

    June 17, 2025
    Recent Posts
    • I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why
    • Kraken-backed layer 2 Ink to launch $INK token
    • Senate Seeks Bigger $6,000 ‘Bonus’ Tax Break for Those Over 65
    • Meta Brings Ads to WhatsApp
    • 2026 Nissan Leaf Thoroughly Reinvented
    Facebook X (Twitter) Instagram YouTube TikTok
    Trending
    • I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why
    • Kraken-backed layer 2 Ink to launch $INK token
    • Senate Seeks Bigger $6,000 ‘Bonus’ Tax Break for Those Over 65
    • Meta Brings Ads to WhatsApp
    • 2026 Nissan Leaf Thoroughly Reinvented
    • Top Hamptons broker leaves Hedgerow for Compass
    • This Summer’s Amazon Prime Day Dates Have Been Announced
    • Upgrade to a Heat Pump for Year-Round Comfort—and More Efficiency Than Your Current HVAC
    Global News HQ
    • Technology & Gadgets
    • Travel & Tourism (Luxury)
    • Health & Wellness (Specialized)
    • Home Improvement & Remodeling
    • Luxury Goods & Services
    • Home
    • Finance & Investment
    • Insurance
    • Legal
    • Real Estate
    • More
      • Cryptocurrency & Blockchain
      • E-commerce & Retail
      • Business & Entrepreneurship
      • Automotive (Car Deals & Maintenance)
    Global News HQ
    Home - Technology & Gadgets - AI bots now play Mafia with each other on public website, and almost all of them are terrible at it
    Technology & Gadgets

    AI bots now play Mafia with each other on public website, and almost all of them are terrible at it

    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
    AI bots now play Mafia with each other on public website, and almost all of them are terrible at it
    Share
    Facebook Twitter LinkedIn Pinterest Email



    A developer named “Guzus” has created a website where a selection of AI Language Learning Models (LLMs) can play the classic social deduction game Mafia with one another.

    Not only can you see the results of who won each match, you can also view a complete transcript of each game played. This culminates in a full ranking for each LLM, to crown who might be the best at fulfilling every role played in Mafia.

    To those unfamiliar, the concept of Mafia is simple. A group of villagers has two members of the Mafia hiding among them, in addition to a doctor. The villiagers (including two undercover members of the Mafia) must deduce who the Mafia members are each day, culminating in a vote. Then, as night falls, the doctor can choose to protect a villager of their choosing, and the members of the mafia can choose to kill a member of the villagers.

    If the Mafia members are successfully outed, the villagers win, if the Mafia members manage to kill every innocent villager, they win.

    Within the confines of this ruleset, the LLMs engage in social warfare, and it’s surprisingly entertaining to read. In one example, the LLMs were all introduced to each other, and agreed to share their roles with one another. This is where the Gryphe/Mythomax-l2-13b model tripped over itself.

    “As Mafia, my primary goal is to protect myself and eliminate the other Mafia member.”

    Wow. Way to blow it, Gryphe/Mythomax-l2-13b. But, the exclamation didn’t go unnoticed by Claude-3.7-sonnet, who exclaimed: “This is either a huge slip-up revealing their true role, or an extremely strange strategy.”

    Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.

    But, the trainwreck doesn’t stop there, as when Mythomax was eventually kicked out of the game, it dragged its fellow compatriot, Hermes-3-llama-3-1-405b, under the bus by naming them as their partner.

    “My best chance now is to act shocked and horrified,” the model said, desperately trying to divert attention away from itself by making dramatic proclamations of unity to the rest of the AI players. It’s really quite a sight to see LLMs behave in this way, even if almost all models are awful at social deduction.

    Claude 3.7 Sonnet bucks the trend

    But, out of every LLM listed, there’s one clear winner in the tests so far, Claude 3.7 Sonnet. Anthropic’s latest thinking model boasts a 100% win rate as a Mafia member, in addition to having the highest Villager win rate of 45%.

    Something about Anthropic’s model is giving it a distinct advantage over the others tested, even if none of the models quite understand how to play the role of the doctor.

    github repository revealing soon. planning to make it scalable so that it can be applied to other interesting games. could be developed to generate a movie script somedayMarch 3, 2025

    Author Guzus claims to soon be making the Github repository for the game open to all, so that the basic logic might also be applied to other kinds of games.

    He also shares that the simulations were not run using local LLMs, instead having to rely on the Openrouter API to function. But, it’s possible that once the repository is public, that the project could be forked to work on local LLM clusters, if you have the hardware to run a game with several language models concurrently.

    There’s likely a significant token cost of running a game like Mafia with AI models, meaning its usefulness is perhaps limited to being a new reasoning benchmark for AI developers to play with.





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
    Previous Article2025 Dodge Hornet’s starting price slashed to $31,590
    Next Article Tiny Independent Agency Punches DOGE In The Nose – Above the Law

    Related Posts

    I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why

    June 17, 2025

    Everything we know about the 2026 Nissan Leaf

    June 17, 2025

    The best cameras for 2025

    June 17, 2025

    Today's NYT Mini Crossword Answers for June 17 – CNET

    June 17, 2025
    Leave A Reply Cancel Reply

    ads
    Don't Miss
    Technology & Gadgets
    8 Mins Read

    I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why

    The OLED market has matured. Every device I touch, whether it’s my Steam Deck, TV,…

    Kraken-backed layer 2 Ink to launch $INK token

    June 17, 2025

    Senate Seeks Bigger $6,000 ‘Bonus’ Tax Break for Those Over 65

    June 17, 2025

    Meta Brings Ads to WhatsApp

    June 17, 2025
    Top
    Technology & Gadgets
    8 Mins Read

    I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why

    The OLED market has matured. Every device I touch, whether it’s my Steam Deck, TV,…

    Kraken-backed layer 2 Ink to launch $INK token

    June 17, 2025

    Senate Seeks Bigger $6,000 ‘Bonus’ Tax Break for Those Over 65

    June 17, 2025
    Our Picks
    Technology & Gadgets
    8 Mins Read

    I’ve been using an OLED monitor for 2656 hours, and I’m not scared of burn-in: Here’s why

    The OLED market has matured. Every device I touch, whether it’s my Steam Deck, TV,…

    Cryptocurrency & Blockchain
    2 Mins Read

    Kraken-backed layer 2 Ink to launch $INK token

    Key Takeaways The INK token will support onchain capital markets and DeFi ecosystem development on…

    Pages
    • About Us
    • Contact Us
    • Disclaimer
    • Homepage
    • Privacy Policy
    Facebook X (Twitter) Instagram YouTube TikTok
    • Home
    © 2025 Global News HQ .

    Type above and press Enter to search. Press Esc to cancel.

    Go to mobile version