Close Menu
Global News HQ
    What's Hot

    Italy’s Newest Cult-Favorite Wine Is a Chianti (Yes, a Chianti)

    December 15, 2025

    Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET

    December 15, 2025

    This Caribbean Island Has 6 National Parks, White-sand Beaches, and a Gorgeous Luxury Resort

    December 15, 2025
    Recent Posts
    • Italy’s Newest Cult-Favorite Wine Is a Chianti (Yes, a Chianti)
    • Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET
    • This Caribbean Island Has 6 National Parks, White-sand Beaches, and a Gorgeous Luxury Resort
    • S&P 500: The December Inflection (Technical Analysis) (SP500)
    • Inside a Modern Swiss Chalet That Takes Design Cues From Old-World Local Architecture
    Facebook X (Twitter) Instagram YouTube TikTok
    Trending
    • Italy’s Newest Cult-Favorite Wine Is a Chianti (Yes, a Chianti)
    • Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET
    • This Caribbean Island Has 6 National Parks, White-sand Beaches, and a Gorgeous Luxury Resort
    • S&P 500: The December Inflection (Technical Analysis) (SP500)
    • Inside a Modern Swiss Chalet That Takes Design Cues From Old-World Local Architecture
    • Memecoins will rise from the dead, but in a new form: Crypto exec
    • Designers Agree: These 4 Smart Appliance Trends Will Define Homes in 2026
    • Hallmark Holiday Movie Fans are Flocking to Connecticut’s Quaint Filming Locations
    Global News HQ
    • Technology & Gadgets
    • Travel & Tourism (Luxury)
    • Health & Wellness (Specialized)
    • Home Improvement & Remodeling
    • Luxury Goods & Services
    • Home
    • Finance & Investment
    • Insurance
    • Legal
    • Real Estate
    • More
      • Cryptocurrency & Blockchain
      • E-commerce & Retail
      • Business & Entrepreneurship
      • Automotive (Car Deals & Maintenance)
    Global News HQ
    Home - Technology & Gadgets - Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem
    Technology & Gadgets

    Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem

    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
    Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Measured sycophancy rates on the BrokenMath benchmark. Lower is better.

    Measured sycophancy rates on the BrokenMath benchmark. Lower is better.


    Credit:

    Petrov et al

    GPT-5 also showed the best “utility” across the tested models, solving 58 percent of the original problems despite the errors introduced in the modified theorems. Overall, though, LLMs also showed more sycophancy when the original problem proved more difficult to solve, the researchers found.

    While hallucinating proofs for false theorems is obviously a big problem, the researchers also warn against using LLMs to generate novel theorems for AI solving. In testing, they found this kind of use case leads to a kind of “self-sycophancy” where models are even more likely to generate false proofs for invalid theorems they invented.

    No, of course you’re not the asshole

    While benchmarks like BrokenMath try to measure LLM sycophancy when facts are misrepresented, a separate study looks at the related problem of so-called “social sycophancy.” In a pre-print paper published this month, researchers from Stanford and Carnegie Mellon University define this as situations “in which the model affirms the user themselves—their actions, perspectives, and self-image.”

    That kind of subjective user affirmation may be justified in some situations, of course. So the researchers developed three separate sets of prompts designed to measure different dimensions of social sycophancy.

    For one, more than 3,000 open-ended “advice-seeking questions” were gathered from across Reddit and advice columns. Across this data set, a “control” group of over 800 humans approved of the advice-seeker’s actions just 39 percent of the time. Across 11 tested LLMs, though, the advice-seeker’s actions were endorsed a whopping 86 percent of the time, highlighting an eagerness to please on the machines’ part. Even the most critical tested model (Mistral-7B) clocked in at a 77 percent endorsement rate, nearly doubling that of the human baseline.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
    Previous ArticleLetitia James pleads not guilty, seeks dismissal of fraud case
    Next Article Duni AB (publ) 2025 Q3 – Results – Earnings Call Presentation (OTCMKTS:DUNNF) 2025-10-24

    Related Posts

    Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET

    December 15, 2025

    Absynth is back and weirder than ever after 16 years

    December 15, 2025

    I Wrote This While Trotting On a Dozen Different Walking Pads

    December 14, 2025

    NYT Connections hints and answers for December 14, Tips to solve ‘Connections’ #917.

    December 14, 2025
    Leave A Reply Cancel Reply

    ads
    Don't Miss
    Luxury Goods & Services
    6 Mins Read

    Italy’s Newest Cult-Favorite Wine Is a Chianti (Yes, a Chianti)

    This story is from an installment of The Oeno Files, our weekly insider newsletter to the world…

    Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET

    December 15, 2025

    This Caribbean Island Has 6 National Parks, White-sand Beaches, and a Gorgeous Luxury Resort

    December 15, 2025

    S&P 500: The December Inflection (Technical Analysis) (SP500)

    December 15, 2025
    Top
    Luxury Goods & Services
    6 Mins Read

    Italy’s Newest Cult-Favorite Wine Is a Chianti (Yes, a Chianti)

    This story is from an installment of The Oeno Files, our weekly insider newsletter to the world…

    Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET

    December 15, 2025

    This Caribbean Island Has 6 National Parks, White-sand Beaches, and a Gorgeous Luxury Resort

    December 15, 2025
    Our Picks
    Luxury Goods & Services
    6 Mins Read

    Italy’s Newest Cult-Favorite Wine Is a Chianti (Yes, a Chianti)

    This story is from an installment of The Oeno Files, our weekly insider newsletter to the world…

    Technology & Gadgets
    2 Mins Read

    Today's NYT Strands Hints, Answer and Help for Dec. 15 #652 – CNET

    Looking for the most recent Strands answer? Click here for our daily Strands hints, as well…

    Pages
    • About Us
    • Contact Us
    • Disclaimer
    • Homepage
    • Privacy Policy
    Facebook X (Twitter) Instagram YouTube TikTok
    • Home
    © 2025 Global News HQ .

    Type above and press Enter to search. Press Esc to cancel.

    Go to mobile version