Skip to main content

How to reduce token used by AI agents and improve output quality!


I would like to share some useful tricks I have learned today. 

1. Manage well context window.

The context window is the current chat session with Copilot agents. It includes your prompts, system instructions, and historical conversations. The longer the conversation is, the more tokens it costs because the agent will retrieve past context, which may no longer be relevant.

2. Choose right model for right task

High reasoning model like Opus, Codex are great for planning, architecture, handling complex bug.

Second tier models like Sonnet is great for implementation based on the plan from previous steps.

And lower tier model like Haiku for small refactoring.

--> An important thing to note is that because agents load historical conversation, so low level of accuracy from beginning can compound the inaccuracy. That's why in the beginning stage like researching, planning, the strongest models are able to produce the best quality.

3.  Prompt skill

Prompt skill also plays an important role. Be precise, focus on reducing unnecessary context and only provide as much as required context. 

Add stop signals, for example tell agent to stop working when certain goal is achieved because there may be irrelevant instruction somewhere in the workflows, like from system instructions, or previous conversation.

4. Divide big tasks into different phases 

Research -> Plan -> Implementation

-> If do all in one session, it will carry irrelevant context. For example in research phases agent loads a bunch of files but then in implementation phase, many of them may not be not needed. This lead to token wasting

What to do: create new context window between phases, this helps to eliminate irrelevant context, so that it can same tokens and improve output quality at the same time. What I do currently is to manually copy, paste each step in a new window or add all step to a plan.md, then tell copilot to refer to it and implement. Maybe there I can learn a better way in the future!

5. Add a lot of tests 

Tests are a good way to be part of agent check list. This helps bring back the agent on track if test fails somewhere.

6. Keep the instructions and skills up to date 

Maintain a concise, human-written copilot-instruction.md

Only add skills for capabilities the agent wouldn't have, for example Reactjs skill, the agent may already know it

7. Be careful with MCP 

Mcp may burn a lot of token since they may load unnecessary context

8. Use output compressing tools 

Like https://github.com/rtk-ai/rtk to compresses command outputs ( I haven't tried it myself but you can try :)

9. Run "/chronicle:tips" 

Run the command regularly to analyze your current copilot using sessions and find improvements


Key skills for future

- Analytical skills: coding not true value, but analytical skill and proficient in learning any domain quickly

- Understand architecture: DDD, Hexagonal, CQRS, Event Driven Design

- Iteration on Prompts, agent configs: improve instruction overtime (as context engineering), use /chronicle:tips

Comments

Popular posts from this blog

Declarative Programming in Angular with Async Pipe and shareReplay

A declarative approach is a way that focuses on writing code that specifies what the application should do, rather than detailing how it should be done. For example, with the async pipe in Angular, we don’t need to write code to manually subscribe to an Observable, handle its data, and update the view. Instead, we simply specify in the template that we want the data from the Observable using the async pipe. Angular handles all the underlying processes to retrieve and display the data It's often used in reactive programming with RxJS and Angular's built-in features, such as the async pipe. export class ProductComponent { product$ = this.productService.getProduct(); constructor(private productService: ProductService) {} } The product observable will hold the product data and the async pipe in the template will automatically subscribe and unsubscribe observable <div *ngIf="product$ | async as product"> <h1>{{ product.name }}</h1> <p>{{...

The Developer’s Guide to Clean Code: Tips and Techniques

What is clean code? Clean code is a term used to describe code that is easy to read, understand, and maintain. It is written in a way that makes it simple, concise, and expressive. Clean code follows a set of conventions, standards, and practices that make it easy to read and follow. Here are some signs indicating that the code is not clean: 1. Poor names The name is not clear to understand, meaningless, or misleading . It doesn't reveal the intention of what it want to achieve. Consider the following examples: SqlDataReader drl; int od; void Button1_Click(); Class Pages1 In the examples above, it’s challenging to get the purpose of drl, od, or what Button1_Click() does. To enhance clarity, we can rename these identifiers as follows: SqlDataReader dataReader/reader; int overdueDays; void CheckAvailability_Click(); Class ViewCustomerPage {} Ambiguous names int? incidentNameId for instance. incidentNameId lacks clarity because if it represents the ID of an incident, then the inclu...

Date and Time in .NET: DateTime, DateTimeOffset, TimeZoneInfo, DateOnly, TimeOnly, and TimeSpan

What is UTC UTC (Coordinated Universal Time) is the world’s primary time standard used to regulate clocks and time zones. It serves as the reference point for civil time worldwide, ensuring that all local times are defined by their offset from UTC. DateTime The DateTime class provides a way to work with dates and times without including an offset. This approach can reduce a certain level of accuracy when dealing with time zones. For example, calling DateTime.Now returns the current date and time based on your computer’s local time zone. DateTime.Now gives you the local time, while DateTime.UtcNow returns the universal coordinated time (UTC). You typically use DateTime when you only need to track the date and time itself, without worrying about time zones. This is suitable for scenarios such as birthdays, deadlines, or local schedules, especially when your application is used primarily within a single time zone. The DateTime class also includes a Kind property, which provides limited in...