How AI Agents Reason: ReAct, Chain-of-Thought & Planning Patterns
COMPLETE guide to AI agent reasoning in 2026. ReAct, CoT, Tree of Thoughts, ReWOO, Reflexion — how each works, benchmarks & when to use them.
Frequently Asked Questions
What is the ReAct pattern in AI agents?
ReAct (Reason + Act) is an agent architecture that interleaves reasoning traces ("thought: I need to search for X") with tool actions and their observations, forming a thought-action-observation loop. Introduced at ICLR 2023 by Yao et al., it outperformed standalone Chain-of-Thought on HotPotQA, FEVER, and WebShop. It is the default agent pattern in LangChain and LangGraph.
What is Chain-of-Thought prompting for AI agents?
Chain-of-Thought (CoT) prompting elicits step-by-step reasoning from a language model by including explicit reasoning steps in the prompt or by simply appending "Let's think step by step" (zero-shot CoT, Kojima et al. 2022). CoT is the foundation all other reasoning patterns build on — it gives agents the ability to reason within a single step before committing to an action.
When should I use Tree of Thoughts instead of ReAct?
Use Tree of Thoughts for tasks with well-defined intermediate states that can be evaluated — mathematical puzzles, structured planning, logic problems. GPT-4 + ToT solved 74% of Game of 24 tasks vs. 4% with standard CoT. Avoid ToT for open-ended tasks where intermediate quality is hard to score, and be aware it costs 10–100x more tokens than CoT.
What is Reflexion in AI agents?
Reflexion (Shinn et al., NeurIPS 2023) lets agents improve through verbal self-feedback without retraining. After a failed attempt, the agent writes a natural-language reflection on what went wrong, stores it in episodic memory, and uses it on the next try. It achieved 91% pass@1 on HumanEval coding benchmarks — surpassing GPT-4's prior SOTA of 80%.
Which reasoning pattern is most token-efficient?
ReWOO (Xu et al. 2023) is the most token-efficient reasoning pattern for multi-step tasks. By separating the planning phase from execution, it reduces token usage by ~5x vs. ReAct while matching or improving accuracy. On HotPotQA it achieved 42.4% accuracy using ~2,000 tokens vs. ReAct's 40.8% at ~10,000 tokens.