Agentic AI represents the next evolutionary step beyond generative AI, enabling autonomous systems with enhanced reasoning and interaction capabilities to tackle complex tasks. However, this progress introduces significant challenges in communication reliability, goal management, and system design. Recent research reveals critical limitations and proposes innovative solutions like ADAS and Google MASS to optimize agent performance.


Emergent Behavior Risks

As agentic systems scale, they exhibit unprogrammed behaviors arising from agent interactions. While beneficial for novel solutions, these behaviors risk oscillations, inefficiencies, or harmful outputs. Robust guardrails, including role-based access controls, resource limits, and validation rules, are essential to maintain alignment with organizational goals. Without these, systems may deviate unpredictably during execution.

Multi-Turn Conversation Failures

Studies from Microsoft and Salesforce demonstrate a 39% average performance drop when AI assistants handle multi-turn conversations. Key issues include:

  • Premature conclusions:

Agents fixate on early assumptions without course-correcting.

  • Information neglect:

Critical mid-conversation details are overlooked.

  • Unreliability spike:

Performance inconsistency increases by 112% compared to single-turn tasks. Even state-of-the-art models (GPT-4o, Claude 3.7, Gemini 2.5) show 30-40% degradation in extended dialogues. Technical tweaks like temperature reduction fail to resolve these issues, only upfront information delivery mitigates errors.

Goal Dilution in Subgoal Breakdown

When decomposing objectives into subgoals, agents frequently lose coherence. Salesforce identifies this as "jagged intelligence", inconsistent execution where agents excel in isolated tasks but fail in integrated workflows. For example:

  • Subgoals may conflict without centralized oversight.
  • Agents overlook interdependencies between subtasks.
  • Partial solutions compound errors across workflow stages.

Salesforce Research: Building Trustworthy Agents

Salesforce addresses these challenges through three pillars:

  1. Foundational Research
  • SIMPLE Benchmark: 225 reasoning questions quantifying LLM jaggedness.
  • SFR-Embedding Models: State-of-the-art text/code embeddings improving RAG accuracy.
  1. Guardrails and Testing
  • CRMArena: Simulates real CRM scenarios to evaluate agent reliability.
  • SFR-Guard Models: Enforce policy compliance and toxicity detection.
  1. Workflow Integration

Agents iteratively refined via customer feedback loops to ensure enterprise-grade consistency.


Automated Design Solutions

ADAS: Self-Improving Agents

Automated Design of Agentic Systems (ADAS) enables meta-agents to autonomously design, test, and refine specialized agents. Its iterative process:

  1. Generates candidate agents for a task.
  2. Simulates human-like feedback on correctness/efficiency.
  3. Refines code through debugging and optimization.

ADAS-discovered agents outperform manual designs and transfer seamlessly across models (e.g., GPT-3.5 to Claude). Crucially, they identify novel patterns like chained reasoning steps for complex problem-solving.

Google MASS: Optimizing Multi-Agent Systems

Multi-Agent System Search (MASS) is a three-stage framework optimizing prompts and topologies:

Stage

Function

Impact

Block-Level Prompt

Tunes individual agent instructions

Boosts local task aptitude

Topology Search

Identifies optimal agent connections

Reduces redundant interactions

Workflow-Level Tune

Refines prompts for global coordination

Enhances cross-agent collaboration

MASS outperforms prior frameworks (AFlow, ADAS) by 6-8% in reasoning, QA, and coding tasks. It reduces computational costs by focusing on high-impact design spaces and enables real-time adjustments for dynamic environments.


Conclusion

Agentic AI’s potential is tempered by communication fragility and goal-management flaws. Salesforce’s research provides critical tools for benchmarking and securing agents, while ADAS and MASS represent paradigm shifts in automated design. These innovations move us toward systems where agents collaboratively adapt to complexity, without sacrificing reliability. However, native multi-turn reliability remains essential for enterprise-scale adoption.


Relevant Sources and related Posts

Read also my previous related posts

Play Podcast

https://soundcloud.com/digital-age-switzerland/agentic-ai-advancements-challenges-in-communication-optimizatio?si=96e22a04caa7453a88393b0a3689f44b&utm\source=clipboard&utm\medium=text&utm\campaign=social\sharing