Promises vs. reality: 'AI agents still aren’t ready for the real world'

2025-06-07 16:29:05 英文原文

作者：Tal Shahaf|Add a commentPrintFind an error? Report us

AI agents promise a new era in customer service, but their errors can be costly; an Israeli startup now offers tech to catch and fix mistakes early—yet questions remain: Can it curb the risks of runaway AI hype before real damage is done?

There are many promises surrounding AI agents, the most dazzling technology in the field of artificial intelligence. But there's a problem: it doesn’t really work as it's supposed to.

Thousands of companies around the world are investing heavily in developing these AI tools, yet hesitate to deploy them due to the inability to ensure they don’t make critical errors - such as providing incorrect answers, offering unnecessary discounts or causing harm to customers.

The cure for these growing pains might lie in the technology developed by the Israeli company Traceloop, which is building tools to detect and handle AI agent errors, before they cause real damage.

Traceloop, publicly revealed for the first time last week, announced it has completed a $6.1 million seed funding round, led by Sorenson Capital and Ibex Investors, with participation from Samsung NEXT, Y Combinator and Grand Ventures. Additional investors include Olivier Pomel, CEO of Datadog; Milin Desai, CEO of Sentry; and Shay Banon, co-founder of Elastic.

Nir Gazit, the company’s CEO, claims there’s a gap between the expectations of companies developing AI agents and the actual reality: “Companies think they can ask the AI questions about all the company’s information and it will answer, but it doesn’t work well enough.”

“You ask a few questions, it responds, you think, ‘This is amazing, it's so smart,’ and then the press releases go out and the agent is deployed. But soon after, you realize it’s a failure, because you don't actually know what customers will ask, and things don’t work as they should,” he explains.

It’s astounding to discover how limited the ability to track an AI agent’s performance really is, companies collect data and only then find out the agent is making mistakes. Fixes are done through trial and error, often taking a long time. “That’s where we come in,” says Gazit. “We tell companies: ‘I’ll help you understand where the agent is working well, where it’s messing up, and how to improve outcomes.’”

Trying to rein in AI agents can turn out to be a deeply frustrating task, as Gazit describes: “When you write a line of code, it always does what it’s supposed to do. An AI agent, on the other hand, you have to ask nicely and hope for the best. For example: ‘Please don’t refund the customer unless you’ve checked first.’ Sometimes it’ll check, sometimes it won’t. Why? God knows.”

Indeed, one of the main insights that emerges from constantly testing AI is just how immature the technology still is, and how far the claims made about its capabilities are from reality, such as the promise that AI agents will integrate seamlessly with human workers in shared teams.

“I don’t think we’ll be there within the next ten years,” says Gazit. “This is bordering on science fiction. In some cases, agents work very well, but that’s very far from what’s being described. People sometimes build castles in the air based on these promises, and that’s just not the case.”

"Today’s developers need to learn how to work with AI tools. Those who do will be amazing, better than they were a year ago. Those who don’t probably won’t have a job.”

In a world where AI writes code, do we even need open-source code, painstakingly written by thousands of developers? “Absolutely. AI succeeds in about 80% of cases and is good for certain uses, but the other 20%, it simply doesn’t know how to write. We get tons of code contributions from people around the world, and it’s very easy for me to identify code that was assisted by AI. It’s immediately clear when code is AI-generated, no one has reviewed it, and it rarely works as intended. As a software engineer, I still spend 30% of my time actually writing code.”

You’re saying something explosive here, that the programmer profession is far from disappearing. “Definitely. I think the world is just changing. When I was in university, I learned the C programming language. Today everyone writes in Python, so if I hadn’t learned Python, I’d be unemployed. In the same way, today’s developers need to learn how to work with AI tools. Those who do will be amazing, better than they were a year ago. Those who don’t probably won’t have a job.”

So all this talk about juniors not finding work because AI is replacing them, isn’t true? “I don’t know. We just hired two junior developers.”

Traceloop was founded at the end of 2022 by Nir Gazit (CEO) and Gal Kleinman (CTO). The two met while serving in Unit 81 of the IDF’s Intelligence Corps. Afterward, Gazit joined Google, where he led a team developing AI tools using internal LLM models. He then served as Chief Software Architect at Fiverr. Kleinman led the development of Fiverr’s machine learning platform.

Over the coming year, the company plans to expand its technology, including support for AI agents that can truly converse with customers, not just generate text. “That’s also a space we want to enter. It’s a world moving so fast, and we’ve been in a constant sprint for two years now,” explains Gazit.

I wonder how much of this is a market where companies are already deploying AI agents with customers, and how much is companies just experimenting with the tech. “First of all, there’s a lot you don’t see. If you go to companies like Monday, HiBob, or Fiverr, you’ll see heavy use of LLMs, both in customer-facing areas and internally. Now it’s reaching consumer-facing systems, not just tech users. We signed a contract with a U.S. pension fund that uses agents to analyze hundreds of thousands of customers’ medical documents to understand them. This is already working in the field.”

As someone with your finger on the pulse—what concerns you most about AI? “There’s a lot of noise, and it’s hard to tell what’s real. It’s incredibly easy to use AI, and I don’t think people realize that. When I worked at Google and built models, it was like an art form, something for the elite. Today, you barely need anything, and it opens the door for everyone. That’s why there’s so much noise. AI is complex, unpredictable, and people don’t understand what they have in their hands.”

Do you think the big AI companies are doing a good and responsible job developing AI models? “I think so. And I think all the talk about AI safety mostly serves them. These days, it’s not always clear which model is better, so each company wants to make noise and say, ‘We’re closest to AGI, our models are going to go wild and take over the world.’ But it’s nonsense, just publicity.”

D-ID Agents מערכת בינה מלאכותית ליצירת אוואטרים

Is it possible we’ve already hit the AI ceiling, and it won’t get any smarter than it is today? “GPT-3 succeeded with two tricks: First, it was big, and people saw that when you add parameters to a model, it gets better. Then the technology improved so it was actually possible to build models with more parameters. But today, that’s starting to stall. We’ve hit a ceiling in terms of the amount of data we have, and we can’t really build better models anymore.”

Maybe it’s just a matter of time, and eventually we’ll see AGI that’s way smarter than us. "I think it will happen with the next leap. I don’t know when that will be—it could be tomorrow, or in 30 years. But right now, it’s not happening."

According to Aharon Rinberg, a partner at Ibex - which led the funding round: “I’m reminded of a quote by the late U.S. President Ronald Reagan: "Trust but verify. It's no secret that LLMs represent a step-function improvement in how humans interact with data. But their confidence — and potential for inaccuracy — makes AI agents that much more dangerous. IBM, Cisco, Dynatrace, and others already rely on Traceloop's core technology for agent observability and verification, ensuring that AI agents function as intended. I expect the adoption of verification tools to outpace that of LLMs themselves."

关于《Promises vs. reality: 'AI agents still aren’t ready for the real world'》的评论

暂无评论

发表评论

摘要

AI agents are promising in revolutionizing customer service but often fail due to critical errors. An Israeli startup, Traceloop, offers technology to detect and correct these mistakes early. The company recently secured $6.1 million in seed funding to develop tools that help businesses understand AI agent performance and improve outcomes before real damage is done. While the technology addresses some of the limitations of current AI capabilities, questions remain about the maturity of AI and its integration with human workforces. Traceloop aims to bridge the gap between expectations and reality by providing observability and verification for AI agents, ensuring they function as intended.

Promises vs. reality: 'AI agents still aren’t ready for the real world'

AI agents promise a new era in customer service, but their errors can be costly; an Israeli startup now offers tech to catch and fix mistakes early—yet questions remain: Can it curb the risks of runaway AI hype before real damage is done?

关于《Promises vs. reality: 'AI agents still aren’t ready for the real world'》的评论

发表评论

摘要

相关新闻

相关讨论