How to reduce token used by AI agents and improve output quality!

I would like to share some useful tricks I have learned today.

1. Manage well context window.

The context window is the current chat session with Copilot agents. It includes your prompts, system instructions, and historical conversations. The longer the conversation is, the more tokens it costs because the agent will retrieve past context, which may no longer be relevant.

2. Choose right model for right task

High reasoning model like Opus, Codex are great for planning, architecture, handling complex bug.

Second tier models like Sonnet is great for implementation based on the plan from previous steps.

And lower tier model like Haiku for small refactoring.

--> An important thing to note is that because agents load historical conversation, so low level of accuracy from beginning can compound the inaccuracy. That's why in the beginning stage like researching, planning, the strongest models are able to produce the best quality.

3. Prompt skill

Prompt skill also plays an important role. Be precise, focus on reducing unnecessary context and only provide as much as required context.

Add stop signals, for example tell agent to stop working when certain goal is achieved because there may be irrelevant instruction somewhere in the workflows, like from system instructions, or previous conversation.

4. Divide big tasks into different phases

Research -> Plan -> Implementation

-> If do all in one session, it will carry irrelevant context. For example in research phases agent loads a bunch of files but then in implementation phase, many of them may not be not needed. This lead to token wasting

What to do: create new context window between phases, this helps to eliminate irrelevant context, so that it can same tokens and improve output quality at the same time. What I do currently is to manually copy, paste each step in a new window or add all step to a plan.md, then tell copilot to refer to it and implement. Maybe there I can learn a better way in the future!

5. Add a lot of tests

Tests are a good way to be part of agent check list. This helps bring back the agent on track if test fails somewhere.

6. Keep the instructions and skills up to date

Maintain a concise, human-written copilot-instruction.md

Only add skills for capabilities the agent wouldn't have, for example Reactjs skill, the agent may already know it

7. Be careful with MCP

Mcp may burn a lot of token since they may load unnecessary context

8. Use output compressing tools

Like https://github.com/rtk-ai/rtk to compresses command outputs ( I haven't tried it myself but you can try :)

9. Run "/chronicle:tips"

Run the command regularly to analyze your current copilot using sessions and find improvements

Key skills for future

- Analytical skills: coding not true value, but analytical skill and proficient in learning any domain quickly

- Understand architecture: DDD, Hexagonal, CQRS, Event Driven Design

- Iteration on Prompts, agent configs: improve instruction overtime (as context engineering), use /chronicle:tips

Hung Hoang

Search This Blog