What AI models does the agent use?

The agent has access to Claude, DeepSeek, Gemini, and other top models. It automatically routes to the best model for each task — you never have to think about API keys or token limits.

Yes. Each agent runs in an isolated environment. We don't train on your data, and you can export or delete everything at any time. The underlying engine (GoGogot) is open-source, so you can audit exactly how it works.

What can the agent actually do?

Browse the web, process files (CSV, PDF, images), run scheduled tasks, remember context across sessions, and communicate via Telegram or Slack. Think of it as a capable junior employee, not a chatbot.

How does memory work?

The agent maintains persistent memory across sessions. It remembers your preferences, past conversations, and task context. You can also explicitly tell it to remember or forget things.

Can I self-host the agent?

Absolutely. GoGogot is 100% open-source. Run it on your own hardware for free. cowork.ink is the managed version — we handle servers, model costs, updates, and uptime so you don't have to.

Is there a free trial?

We offer a 7-day money-back guarantee. Spin up an agent, give it real tasks, and if it doesn't save you time, we'll refund you — no questions asked.

What is prompt caching for AI agents?

Prompt caching stores the computed representation of your AI agent's system prompt so the LLM doesn't reprocess it on every request. Providers like Anthropic, OpenAI, and Google offer 50–90% discounts on cached input tokens. For agents with large system prompts, caching pays for itself on the second request. See our [cost optimization guide](/blog/ai-agent-cost-optimization/) for more tactics.

How much does prompt caching save on AI agent costs?

Real-world deployments report 41–80% total cost reduction from prompt caching alone. Anthropic charges 90% less for cached tokens, OpenAI gives 50–75% off, and Google Gemini charges 90% less for cached reads. The savings depend on your cache hit rate, which improves when you keep system prompts static and place dynamic content last.

Does prompt caching work automatically?

It depends on the provider. OpenAI caches automatically for any prompt over 1,024 tokens — no code changes needed. Anthropic requires explicit cache breakpoints via the API (or automatic mode with a single flag). Google Gemini offers both implicit (automatic) and explicit caching. Check our [prompt engineering guide](/blog/ai-agent-prompt-engineering/) for structuring prompts to maximize cache hits.

What is the minimum prompt size for caching?

Anthropic requires 1,024–4,096 tokens minimum depending on the model. OpenAI requires at least 1,024 tokens. Google Gemini requires 1,024 tokens for implicit caching and 32,768 tokens for explicit caching. Most AI agent system prompts (with tool definitions) easily exceed these thresholds.

How long do cached prompts last?

Anthropic caches last 5 minutes by default (refreshed on each hit), with an optional 1-hour TTL at higher write cost. OpenAI caches persist for 5–10 minutes of inactivity and always clear within 1 hour. Google Gemini's explicit caches have a configurable TTL (default 1 hour) with per-hour storage fees.

Prompt Caching for AI Agents: Save 70%+ on API Costs

Cut AI agent API costs by 70–90% with prompt caching. Real pricing data for OpenAI, Anthropic & Google. Code examples and proven patterns inside.

Frequently Asked Questions

What is prompt caching for AI agents?: Prompt caching stores the computed representation of your AI agent's system prompt so the LLM doesn't reprocess it on every request. Providers like Anthropic, OpenAI, and Google offer 50–90% discounts on cached input tokens. For agents with large system prompts, caching pays for itself on the second request. See our [cost optimization guide](/blog/ai-agent-cost-optimization/) for more tactics.
How much does prompt caching save on AI agent costs?: Real-world deployments report 41–80% total cost reduction from prompt caching alone. Anthropic charges 90% less for cached tokens, OpenAI gives 50–75% off, and Google Gemini charges 90% less for cached reads. The savings depend on your cache hit rate, which improves when you keep system prompts static and place dynamic content last.
Does prompt caching work automatically?: It depends on the provider. OpenAI caches automatically for any prompt over 1,024 tokens — no code changes needed. Anthropic requires explicit cache breakpoints via the API (or automatic mode with a single flag). Google Gemini offers both implicit (automatic) and explicit caching. Check our [prompt engineering guide](/blog/ai-agent-prompt-engineering/) for structuring prompts to maximize cache hits.
What is the minimum prompt size for caching?: Anthropic requires 1,024–4,096 tokens minimum depending on the model. OpenAI requires at least 1,024 tokens. Google Gemini requires 1,024 tokens for implicit caching and 32,768 tokens for explicit caching. Most AI agent system prompts (with tool definitions) easily exceed these thresholds.
How long do cached prompts last?: Anthropic caches last 5 minutes by default (refreshed on each hit), with an optional 1-hour TTL at higher write cost. OpenAI caches persist for 5–10 minutes of inactivity and always clear within 1 hour. Google Gemini's explicit caches have a configurable TTL (default 1 hour) with per-hour storage fees.