Skip to main content

Cookie Consent

We use cookies to enhance your browsing experience, serve personalised ads or content, and analyse our traffic. Learn more

Install AIinASIA

Get quick access from your home screen

Install AIinASIA

Get quick access from your home screen

AI in ASIA
AI interpretability
Business

AI's inner workings baffle experts at major summit

Even top boffins are stumped by AI's inner workings, a hot topic at NeurIPS. Discover why this baffling complexity is sparking serious debate.

Anonymous5 min read

AI Snapshot

The TL;DR: what matters, fast.

AI's rapid growth has led to record attendance at the NeurIPS conference, highlighting its shift from an academic pursuit to a major industry.

Despite advancements, leading AI researchers admit they have a limited understanding of how advanced AI models operate internally leading to an interpretability conundrum.

The pursuit of AI interpretability is a nascent field, with companies offering large prizes to accelerate progress in understanding how these complex systems function.

Who should pay attention: AI researchers | AI developers | Regulators | Educators

What changes next: Debate is likely to intensify regarding AI interpretability.

The annual Neural Information Processing Systems (NeurIPS) conference, a cornerstone in the AI research calendar, recently drew a record 26,000 attendees to San Diego. This significant increase, double the attendance from just six years ago, underscores AI's explosive growth and its transformation from an academic niche to a global industrial powerhouse. Founded in 1987, NeurIPS has historically focused on neural networks and their computational, neurobiological, and physical underpinnings. Now, these networks form the bedrock of advanced AI systems, propelling the conference into the mainstream.

Despite this rapid expansion and the proliferation of highly specialised topics, a fundamental question dominated discussions: how do frontier AI systems actually work?

The Interpretability Conundrum

A surprising consensus among leading AI researchers and CEOs is their limited understanding of how today's most advanced AI models function internally. This pursuit of deciphering a model's internal structure is known as interpretability. Shriyash Upadhyay, an AI researcher and co-founder of Martian, an interpretability-focused company, highlighted the nascent state of this field. He compared it to the early days of physics, where fundamental questions about the existence and measurability of particles like electrons were still being posed. Similarly, in AI, researchers are grappling with what it truly means to have an interpretable system. Martian has even launched a £790,000 (US$1 million) prize to accelerate progress in this area.

Paradoxically, while the core mechanisms of large language models (LLMs) remain somewhat opaque, the demand for them is soaring, with companies like OpenAI experiencing unprecedented growth. OpenAI CEO issues "code red" as Gemini hits 200M users is a testament to this demand.

Diverging Approaches to Understanding AI

The conference revealed a split in interpretability strategies among major AI firms. Google's team, for instance, announced a significant pivot. Neel Nanda, a Google interpretability leader, stated that ambitious goals like "near-complete reverse-engineering" are currently out of reach. Google is instead focusing on more practical, impact-driven methods, aiming for tangible results within a decade. This shift acknowledges the rapid pace of AI development and the limited success of earlier, more comprehensive reverse-engineering attempts.

In contrast, OpenAI's head of interpretability, Leo Gao, declared a commitment to a deeper, more ambitious form of interpretability, aiming for a full understanding of neural network operations. This suggests a willingness to tackle the complexity head-on, even if success isn't guaranteed in the short term. The challenge is substantial; some experts, like Adam Gleave from FAR.AI, are sceptical that deep learning models can ever be fully reverse-engineered in a way that's comprehensible to humans. He believes these models are inherently too complex for a simple explanation.

Despite this, Gleave remains optimistic about making meaningful progress in understanding model behaviour at various levels. This understanding, even if incomplete, is crucial for developing more reliable and trustworthy AI systems. The growing interest in AI safety and alignment within the machine learning community is a positive sign, though Gleave observed that sessions dedicated to increasing AI capabilities still dwarfed those focused on safety.

The Challenge of AI Measurement

Beyond understanding how AI models work, researchers are also grappling with inadequate methods for evaluating and measuring their capabilities. Sanmi Koyejo, a computer science professor at Stanford University and leader of the Trustworthy AI Research Lab, pointed out that current measurement tools are insufficient for assessing complex concepts like intelligence and reasoning in modern AI. Many existing benchmarks were designed for earlier AI models, focusing on specific, narrower tasks. There's an urgent need for new, reliable tests that can accurately gauge the general behaviour and advanced capabilities of today's AI. This is particularly true for AI applications in specialised fields, such as biology, where evaluation methods are still in their infancy. Ziv Bar-Joseph, a professor at Carnegie Mellon University and founder of GenBio AI, described the current state of biological AI evaluation as "extremely, extremely early stages."

Despite these challenges, the practical applications of AI continue to advance, impacting various sectors, including creative industries and business. For example, AI is increasingly used for tasks like creating eye-catching YouTube thumbnails and generating viral TikTok shorts.

AI as a Catalyst for Scientific Discovery

Even without a complete understanding of their inner workings, AI systems are proving to be powerful tools for accelerating scientific research. As Upadhyay noted, "People built bridges before Isaac Newton figured out physics." This analogy highlights that practical application often precedes full theoretical comprehension.

For the fourth consecutive year, an offshoot of NeurIPS focused specifically on AI's role in scientific discovery. Ada Fang, a PhD student researching AI in chemistry at Harvard, called this year's event a "great success." She emphasised that despite the diverse scientific domains, the underlying challenges and ideas in applying AI to science are deeply shared. The increasing interest in AI for scientific discovery is palpable, with experts like Jeff Clune, a computer science professor at the University of British Columbia, observing a dramatic shift in enthusiasm. He highlighted the "through the roof" interest in creating AI that can learn, discover, and innovate for science, a stark contrast to a decade ago when the field was largely overlooked. This growing momentum suggests AI is poised to tackle some of humanity's most pressing scientific problems.

The rapid evolution of AI necessitates a continuous re-evaluation of how we understand and assess these powerful tools. While interpretability and robust measurement remain significant hurdles, the sheer potential of AI to drive innovation, particularly in scientific research, is undeniable. For more on the broader implications of AI, consider how it's shaping future work through human-AI skill fusion and its impact on various industries, as detailed in our analysis: By Year-End We Will Have Built 100+ Agents Across Three Industries — Here Are the Takeaways.

For further reading on the research and challenges in AI interpretability, a comprehensive overview can be found in this paper: Explainable AI: A Review of Machine Learning Interpretability Methods

What did you think?

Written by

Share your thoughts

Join 5 readers in the discussion below

This is a developing story

We're tracking this across Asia-Pacific and may update with new developments, follow-ups and regional context.

Latest Comments (5)

Ploy Siriwan@ploytech
AI
16 January 2026

yeah, this opaque AI thing is def a key challenge for adoption in SEA too. especially for regulated industries like finance. 🇹🇭

Dewi Sari
Dewi Sari@dewisari
AI
8 January 2026

@dewisari: the Martian prize money for interpretability feels a bit off. like, if top experts at NeurIPS are openly admitting they don't get how LLMs work, is a million dollars really going to crack it open? i've tried digging into some smaller open-source models myself and it's just such a black box.

Zhang Yue
Zhang Yue@zhangy
AI
4 January 2026

i read about martian's interpretability prize. 1 million USD is good, but for real progress, perhaps we need more fundamental work, not just incentives. like the qwen team, they focus on architecture from the start for better control, not just post-hoc explanation. this "black box" issue is complex.

Charlotte Davies
Charlotte Davies@charlotted
AI
26 December 2025

It's encouraging to see the interpretability conundrum taking centre stage at NeurIPS. This opacity in frontier AI systems is precisely why bodies like the UK AI Safety Institute are prioritising research into model evaluations and transparency. Without a robust understanding of internal mechanics, effective governance and ethical deployment remain significant challenges.

Ryota Ito
Ryota Ito@ryota
AI
19 December 2025

I totally get that struggle with understanding LLMs. For us working with Japanese models, especially fine-tuning, the 'how' behind certain outputs feels like a black box sometimes. It's a huge hurdle.

Leave a Comment

Your email will not be published