abax
stats and token estimates for your repos
03-04-2025 / [post]

Coding with large language models is changing the world (I'm not being hyperbolic, actually!), particularly as context windows continue to grow. Recently, I was working on an especially large codebase in Cursor and I wanted to better understand its size and a rough estimate of many tokens I'd be using if I did something like QA over the whole thing.
But there wasn't an easy way to do that. So I made Abax, a simple shell script for statistics about your codebase, including line counts, character counts, and (again, very rough) token estimations.
Sidenote: Abax (ἄβαξ) is the ancient Greek word from which the Latin abacus is derived. The first record of an abacus may have been in Sumeria between 2700 and 2300 BCE. It included a table of columns which delimited the orders of magnitude of their sexagesimal (base 60) number system (wiki).
Anyway, Abax gives you detailed stats about your repo:
- Line and character counts by file type (customizable exclusions)
- Rough token estimations (customizable ratio)
- Automatic handling of binary files
Here's how to try it:
# Download the script
curl -o abax.sh https://raw.githubusercontent.com/mohamm-ad/abax/main/abax.sh
# Make it executable
chmod +x abax.sh
# Run it at the root of your directory
./abax.sh
Feel free to raise issues or fork the repo here.