What AI models does the agent use?

The agent has access to Claude, DeepSeek, Gemini, and other top models. It automatically routes to the best model for each task — you never have to think about API keys or token limits.

Yes. Each agent runs in an isolated environment. We don't train on your data, and you can export or delete everything at any time. The underlying engine (GoGogot) is open-source, so you can audit exactly how it works.

What can the agent actually do?

Browse the web, process files (CSV, PDF, images), run scheduled tasks, remember context across sessions, and communicate via Telegram or Slack. Think of it as a capable junior employee, not a chatbot.

How does memory work?

The agent maintains persistent memory across sessions. It remembers your preferences, past conversations, and task context. You can also explicitly tell it to remember or forget things.

Can I self-host the agent?

Absolutely. GoGogot is 100% open-source. Run it on your own hardware for free. cowork.ink is the managed version — we handle servers, model costs, updates, and uptime so you don't have to.

Is there a free trial?

We offer a 7-day money-back guarantee. Spin up an agent, give it real tasks, and if it doesn't save you time, we'll refund you — no questions asked.

How much can you reduce AI agent costs?

Most teams can cut AI agent spending by 60–80% using a combination of prompt caching (saves 50–90% on repeated prefixes), model routing (uses cheaper models for 70% of tasks), and batch APIs (50% discount on non-urgent work). The exact savings depend on your traffic patterns and how much of your workload is routine vs. complex.

What is prompt caching for AI agents?

Prompt caching stores the processed version of your system prompt so the LLM doesn't recompute it on every request. OpenAI, Anthropic, and Google all offer automatic or explicit caching that charges 50–90% less for cached input tokens. The key requirement is keeping your system prompt static — any change invalidates the cache. See our [prompt engineering guide](/blog/ai-agent-prompt-engineering/) for how to structure prompts for maximum cache hits.

What is model routing and how does it save money?

Model routing directs each agent task to the cheapest model capable of handling it. Simple classification or formatting tasks go to small, fast models (GPT-4.1 Nano, Haiku), while complex reasoning goes to larger models (Claude Opus, GPT-5). Research shows using a cheaper model for 70% of routine tasks yields better ROI than running everything on the most expensive model.

Is it worth self-hosting an LLM to reduce AI agent costs?

Self-hosting makes sense at high volume (10,000+ daily requests) where API costs exceed infrastructure costs, or when privacy requirements rule out cloud APIs. For most teams, API-based optimization (caching, routing, batching) delivers 60–80% savings without the operational overhead of managing GPU infrastructure. For a self-hosted option, see [GoGogot](https://go-go-got.com) — a lightweight open-source AI agent that runs on a $5 VPS.

How do I measure AI agent cost per task?

Track three metrics per agent task: total input tokens, total output tokens, and number of LLM calls. Multiply by your provider's per-token pricing to get cost per task. Tools like LangSmith, Helicone, and OpenRouter's dashboard make this automatic. Our [AI agent cost guide](/blog/ai-agent-cost/) breaks down real pricing across every major provider.

AI Agent Cost Optimization: Cut Spending by 60–80% (Proven Tactics)

7 PROVEN tactics to cut AI agent costs by 60–80%. Prompt caching, model routing, batch APIs, context compression & more. Real numbers inside.

Frequently Asked Questions

How much can you reduce AI agent costs?: Most teams can cut AI agent spending by 60–80% using a combination of prompt caching (saves 50–90% on repeated prefixes), model routing (uses cheaper models for 70% of tasks), and batch APIs (50% discount on non-urgent work). The exact savings depend on your traffic patterns and how much of your workload is routine vs. complex.
What is prompt caching for AI agents?: Prompt caching stores the processed version of your system prompt so the LLM doesn't recompute it on every request. OpenAI, Anthropic, and Google all offer automatic or explicit caching that charges 50–90% less for cached input tokens. The key requirement is keeping your system prompt static — any change invalidates the cache. See our [prompt engineering guide](/blog/ai-agent-prompt-engineering/) for how to structure prompts for maximum cache hits.
What is model routing and how does it save money?: Model routing directs each agent task to the cheapest model capable of handling it. Simple classification or formatting tasks go to small, fast models (GPT-4.1 Nano, Haiku), while complex reasoning goes to larger models (Claude Opus, GPT-5). Research shows using a cheaper model for 70% of routine tasks yields better ROI than running everything on the most expensive model.
Is it worth self-hosting an LLM to reduce AI agent costs?: Self-hosting makes sense at high volume (10,000+ daily requests) where API costs exceed infrastructure costs, or when privacy requirements rule out cloud APIs. For most teams, API-based optimization (caching, routing, batching) delivers 60–80% savings without the operational overhead of managing GPU infrastructure. For a self-hosted option, see [GoGogot](https://go-go-got.com) — a lightweight open-source AI agent that runs on a $5 VPS.
How do I measure AI agent cost per task?: Track three metrics per agent task: total input tokens, total output tokens, and number of LLM calls. Multiply by your provider's per-token pricing to get cost per task. Tools like LangSmith, Helicone, and OpenRouter's dashboard make this automatic. Our [AI agent cost guide](/blog/ai-agent-cost/) breaks down real pricing across every major provider.