A Reasoning AI Beat a ChatGPT-Style Model 95% to 34%

A team at Tufts University has shown a different kind of artificial intelligence in action. On a classic step-by-step logic puzzle, their "neuro-symbolic" system solved it correctly 95% of the time, against 34% for a large, ChatGPT-style model built the conventional way. The gap comes down to how each one thinks. A standard model predicts the statistically likely next move from patterns in its training data. The neuro-symbolic model reasons through the problem with explicit logic, the way a person breaks a task into steps. It points to a second path for AI, one built on reasoning.

Key takeaways

  1. Two different ways to think. A standard large model, like the one behind ChatGPT, works by predicting the statistically likely answer from enormous training sets. A neuro-symbolic model pairs that pattern-learning with symbolic reasoning, applying rules and breaking a problem into logical steps.
  2. The reasoning approach won the test. On the Tower of Hanoi puzzle, the neuro-symbolic system hit a 95% success rate against 34% for a conventional model. On puzzle variants it had never seen, it scored 78% while the conventional models failed every attempt.
  3. Scale is one path, architecture is another. Most AI progress lately has come from making models bigger. This shows that combining learning with logic can outperform a larger model on structured problems, using a fraction of the energy.

How a Standard Model Thinks

The AI most people have used, ChatGPT and its cousins, is built on a large neural network trained on enormous amounts of data. It works by prediction: given everything it has seen, it estimates the most statistically likely next word, or next action. That makes it fluent and flexible, and it powers most of today's AI products.

The weakness shows up when a task requires exact, multi-step logic. Because the model is guessing the likely answer rather than working it out, it can produce confident mistakes. As Tufts professor Matthias Scheutz puts it, these models "act on statistical results from large training sets of similar scenarios, but that can lead to errors." The same applies to the robotics version tested here, a Vision-Language-Action model, which is built on the same statistical foundation as a chatbot.

How Neuro-Symbolic AI Thinks

Neuro-symbolic AI keeps the neural network for what it is good at, perception and learning, and adds a second ingredient: symbolic reasoning. Symbols are the rules, categories and logical steps that traditional computer science is built on. Adding them to a neural network lets the system work through a problem's logic using explicit rules. The neural network here stays far smaller than a large language model, and the symbolic side carries the step-by-step logic, so the whole system avoids the weight of a giant model. It does still use a neural network, but it does not run on top of a giant LLM the way a chatbot does.

Scheutz describes it as "a combination of conventional neural network AI with symbolic reasoning similar to the way humans break down tasks and concepts into steps and categories." In practice, the symbolic layer "can apply rules that limit the amount of trial and error during learning and get to a solution much faster."

The results from the Tufts team, presented at the ICRA 2026 robotics conference in Vienna this month, show the effect. On the Tower of Hanoi, a puzzle that demands a precise sequence of moves, the neuro-symbolic system reached 95% against 34% for the conventional model. On unfamiliar versions of the puzzle, it managed 78% while the standard approach failed every time. Because it works through the logic of the task, the neuro-symbolic system can handle a version it was not trained on, which is where the larger models broke down.

Why This Matters Beyond the Lab

The dominant story in AI has been scale: more data, more chips, bigger models. This research points to a second factor besides size. Giving a model the ability to reason lets it get better results on structured problems from a far smaller, cheaper system. The two approaches have different strengths, and the most capable systems will likely combine them.

The efficiency is a footnote here. The neuro-symbolic model trained on about 1% of the energy of the conventional model and ran on roughly 5% during use. Scheutz notes that today's approach is often wasteful, pointing out that Google's AI summaries can use up to 100 times the energy of serving a normal set of results. As data centres draw more electricity for ever-larger models, an approach that does more with less helps on that front.

This also suggests a likely shift in where AI can run. Today's large models carry billions of parameters and lean on banks of GPUs in a data centre, to train and often to answer. An approach that needs far less data, far less training compute and little energy to run does not carry that weight, so it is reasonable to expect that systems built this way could increasingly run on local hardware, potentially a regular laptop or even a mobile device, rather than a remote server. The Tufts team reported energy and training-time figures rather than a consumer-machine benchmark, so this is a likely direction the results point to rather than a measured result.

What It Means for South Africans

For anything that touches money, this distinction matters. Payments, trading and the emerging world of AI agents that hold and move funds on-chain all demand systems that get the answer right and can show their working. An AI that reasons by explicit rules, and whose logic can be inspected, suits that world better than a black-box model that guesses well most of the time. It is the same instinct behind a smart contract, where the rules are fixed, transparent and verifiable.

There is a local angle too. A reasoning-first approach asks less of the chips and the grid, which lowers the barrier for South African builders and financial firms working under real power and budget constraints, and could let more of that work run on local hardware rather than rented cloud. Much of the attention in AI goes to which company can build the biggest model. For builders working under tight power and budget limits, an approach that does more with less is at least as relevant.

If that shift holds, the implications run deeper. AI that runs on a personal device, rather than a company's servers, would keep the model in the hands of the person using it, which brings personalised, decentralised AI closer to reach. That lines up with the principle behind cryptocurrency. The aim is decentralised money and decentralised intelligence, both held by individuals rather than centralised, controlled and regulated by governments or a few large companies. The argument against letting one authority control the money supply applies just as well to the intelligence people come to rely on.


Cape Crypto (FSP 53746) provides information, not financial advice. Crypto assets are volatile and you can lose money. Don't invest more than you can afford to lose. Past performance is not indicative of future results.

Sources