Skip to main content

We use cookies to enhance your experience. By continuing to visit this site you agree to our use of cookies. Cookie Policy

AI in ASIA
Business

Splunk bets on agentic AI to deliver self-healing IT systems

Splunk embeds agentic AI into observability platforms, transforming passive monitoring into self-healing IT infrastructure that diagnoses and repairs itself.

Intelligence DeskIntelligence Desk4 min read

AI Snapshot

The TL;DR: what matters, fast.

Splunk embeds agentic AI into observability platforms for autonomous IT healing

AI systems monitoring becomes critical as enterprises deploy agents across Asia-Pacific

GPU costs and AI model performance require real-time observability solutions

Splunk Transforms IT Observability Into Self-Healing Infrastructure

For years, observability platforms have been digital mirrors, reflecting the health of applications and infrastructure through dashboards full of metrics and charts. But Splunk is betting that the next chapter requires these mirrors to become intelligent partners capable of diagnosing, deciding, and even repairing themselves.

The company is embedding agentic AI into its Observability Cloud and AppDynamics, transforming passive monitoring into proactive intervention. This shift comes as enterprises grapple with AI agents, large language models, and complex multi-cloud environments where traditional dashboards feel increasingly inadequate.

"Agentic AI is reshaping what it takes for organisations to build and maintain a leading observability practice. We are delivering the only solution that can process, analyse and transform machine data from across all these environments into trusted inputs for LLMs, RAG pipelines, copilots and AI agents." - Kamal Hathi, SVP and GM of Splunk at Cisco

AI Systems Need Their Own Watchers

The most compelling aspect of Splunk's upgrade extends observability into AI systems themselves. Enterprises deploying AI agents across financial services in Singapore or digital commerce in Indonesia need to monitor whether those agents perform consistently, securely, and cost-effectively.

Advertisement

When an AI model starts hallucinating or consuming GPU cycles beyond budget, Splunk detects and alerts in real time. This matters critically in Asia's fast-growing digital markets, where a banking chatbot that drifts off script or a customer service bot that spikes compute costs affects both margins and customer trust.

"As AI becomes more embedded in business operations, monitoring tools need to get smarter and provide real-time insights into whether models are delivering results efficiently and securely. Performance and cost have become critical metrics." - Patrick Lin, SVP and GM of observability at Splunk

The regional implications are significant. As documented in Singapore's first agentic AI governance framework, Asian governments and enterprises are deploying AI agents at unprecedented pace, making robust observability essential.

By The Numbers

  • GPU demand in Japan and South Korea outstrips supply by 300%, making cost monitoring critical
  • 75% of enterprise AI pilots in Asia never reach production due to infrastructure challenges
  • Analyst fatigue affects 68% of security teams across Asia-Pacific due to rising incident volumes
  • AI-related downtime costs enterprises an average of $12,000 per minute in lost revenue
  • Multi-cloud environments generate 40% more telemetry data than traditional infrastructure

Infrastructure Becomes the AI Chokepoint

While AI agents capture headlines, underlying infrastructure often determines success or failure. GPU shortages, cloud service quotas, and accelerator costs create daily headaches for teams scaling AI workloads. Splunk's proactive monitoring of infrastructure bottlenecks and cost spikes positions it as guardian of this invisible plumbing.

This resonates particularly in markets like Japan and South Korea, where GPU cluster demand vastly exceeds supply. Early detection of consumption issues helps enterprises avoid both outages and unexpected bills.

The competitive landscape includes Datadog, Elastic Security, and Microsoft Sentinel, all investing in AI-enhanced detection. However, Splunk differentiates through agentic AI triage that prioritises and explains high-risk alerts, reducing analyst fatigue across resource-constrained Asian markets.

Observability Approach Traditional AI-Enhanced Agentic AI
Response Time Hours to days Minutes to hours Real-time to minutes
Root Cause Analysis Manual investigation Automated suggestions Autonomous diagnosis
Problem Prevention Reactive only Pattern-based alerts Predictive intervention
Cost Management Post-incident reports Threshold monitoring Dynamic optimisation

From IT Function to Enterprise Intelligence Layer

Splunk's ambition extends beyond IT monitoring towards becoming the intelligence layer connecting infrastructure, AI, and business outcomes. As organisations across Asia scale AI adoption, observability shifts from technical uptime concerns to customer satisfaction, regulatory compliance, and strategic agility.

This transformation particularly impacts sectors where customer trust evaporates quickly. A few minutes of disruption in fintech apps in Jakarta or logistics platforms in Shenzhen can mean lost revenue and damaged reputation. Understanding what agentic AI actually means becomes crucial for enterprises considering autonomous IT management.

Key capabilities of the upgraded platform include:

  • Real-time AI model performance monitoring with drift detection
  • Automated root cause analysis for complex, multi-system incidents
  • Cost optimisation recommendations for GPU and cloud resource usage
  • Predictive maintenance alerts before system degradation occurs
  • Cross-team visibility connecting technical metrics to business outcomes
  • Security monitoring for AI agents and LLM interactions
"Leaders often struggle with juggling a patchwork of tools that don't always talk to each other, which can slow down teams and make it hard to get a clear picture of what's going on. We are addressing this by creating a unified observability experience and using AI to accelerate problem detection and root cause analysis." - Kamal Hathi, SVP and GM of Splunk at Cisco

The Trust Question in Self-Healing Systems

Splunk's vision of self-healing IT systems raises fundamental questions about enterprise readiness. The concept of handing over infrastructure keys to agentic AI represents a significant leap from current practices, especially in risk-averse sectors like banking and government services.

The company positions observability as moving beyond ITOps and engineering teams towards organisational resilience. This connects to broader trends in event-driven agentic AI reinventing ERP systems, where autonomous systems increasingly handle business-critical functions.

"Observability isn't just for ITOps and engineering teams. By sharing insights across teams, organisations can better align product development with real customer needs, improving satisfaction and driving business success beyond just technical performance." - Patrick Lin, SVP and GM of observability at Splunk

How does agentic AI differ from traditional monitoring tools?

Traditional tools alert teams to problems, while agentic AI diagnoses root causes, recommends fixes, and can even implement solutions automatically. It shifts from reactive alerts to proactive problem prevention.

Can agentic AI observability handle complex multi-cloud environments?

Yes, Splunk's system processes telemetry data across hybrid and multi-cloud infrastructures, providing unified visibility and analysis regardless of where applications and services are deployed.

What happens if the agentic AI system itself fails?

Splunk maintains fallback mechanisms and human oversight controls. The system is designed to degrade gracefully, reverting to traditional monitoring approaches while maintaining core observability functions.

How does AI observability handle data privacy and security concerns?

The platform includes built-in security monitoring for AI agents and LLMs, tracking data access patterns and flagging potential breaches or policy violations in real time.

Is this technology ready for enterprise production environments?

Splunk has integrated these capabilities into existing Observability Cloud and AppDynamics platforms, suggesting production readiness. However, enterprises should pilot gradually in non-critical environments first.

As enterprises in Asia's digital markets consider autonomous IT management, the technology's sophistication appears to match growing infrastructure complexity. The question isn't whether AI can handle observability tasks, but whether organisations trust it enough to act autonomously on critical systems. For those exploring building their own agentic AI solutions, Splunk's approach offers insights into enterprise-grade implementation.

The AIinASIA View: Splunk's agentic AI observability represents a logical evolution from passive monitoring to active intervention. While the technology appears sound, adoption will depend heavily on enterprise risk tolerance and gradual trust-building. Asian markets, with their rapid AI deployment and infrastructure constraints, provide ideal testing grounds for self-healing systems. Success here could accelerate global adoption of autonomous IT management. We expect cautious but growing enterprise interest, particularly in sectors where downtime costs exceed automation risks.

The shift from reflection to resilience positions observability as a core enterprise capability rather than merely an IT function. As AI becomes embedded deeper into business operations, the stakes around system reliability continue rising. In fast-moving Asian digital markets, where customer expectations and competitive pressure leave little room for system failures, autonomous observability may become less luxury and more necessity.

The real test will be whether enterprises, especially those managing sensitive data and critical services, are prepared to trust agentic AI systems to diagnose and fix problems before human teams even know something went wrong. What's your take on letting AI manage your IT infrastructure autonomously? Drop your take in the comments below.

YOUR TAKE

We cover the story. You tell us what it means on the ground.

What did you think?

Share your thoughts

Join 3 readers in the discussion below

This is a developing story

We're tracking this across Asia-Pacific and may update with new developments, follow-ups and regional context.

Advertisement

Advertisement

This article is part of the Enterprise AI 101 learning path.

Continue the path →

Latest Comments (3)

Maggie Chan
Maggie Chan@maggiec
AI
17 October 2025

shifting observability from reflection to action" - this is exactly where the rubber meets the road for us. We're drowning in data, but translating that into actionable, automated compliance steps without constant human oversight is the real challenge. excited to see if splunk can actually deliver on that promise for smaller operations too, not just the big enterprises.

Oliver Thompson@olivert
AI
15 October 2025

@olivert: Rather spot on this, the idea of dashboards moving beyond passive reflection is crucial. We've seen firsthand how much human time goes into sifting through alerts that often just state the obvious. Pre-empting problems, as Splunk suggests, would be a real boon for incident response teams.

Dr. Farah Ali
Dr. Farah Ali@drfahira
AI
24 September 2025

while Splunk’s aim for self-healing systems is technologically ambitious, my primary concern circles back to who benefits from this automation. if such advanced observability becomes the standard, will it further deepen the digital divide for organisations in the Global South that lack the resources or infrastructure to adopt these complex, proprietary solutions? we must ensure these leaps forward are inclusive.

Leave a Comment

Your email will not be published