State-of-the-art lossless text compression using neural language models. Achieving 3-4x better compression than traditional methods.
Nacrith combines the predictive power of neural networks with the mathematical precision of arithmetic coding
Powered by SmolLM2-135M, capturing grammar, semantics, and world knowledge for superior compression
Mathematically optimal encoding that assigns shorter bit sequences to likely tokens
Compresses below classical entropy bounds by understanding deep linguistic structure
Tested on English prose of varying sizes. GPU: NVIDIA GTX 1050 Ti
| Sample | Original | gzip | xz | zip | Nacrith |
|---|---|---|---|---|---|
| small | 3.0 KB | 1.4 KB (46.8%) | 1.5 KB (50.2%) | 1.5 KB (49.9%) | 424 B (13.7%) |
| medium | 50.1 KB | 19.6 KB (39.2%) | 18.3 KB (36.6%) | 19.7 KB (39.3%) | 7.4 KB (14.8%) |
| large | 100.5 KB | 39.0 KB (38.9%) | 35.5 KB (35.3%) | 39.1 KB (38.9%) | 15.5 KB (15.4%) |



Benchmarks were run on a low-end NVIDIA GTX 1050 Ti — with a modern GPU, compression and decompression would be significantly faster.
The model uses ~1.3 GB of VRAM during compression/decompression, so any CUDA-capable GPU with at least 2 GB of VRAM will work. Falls back to CPU if no GPU is available.
The deep connection between prediction and compression
Pattern matching on raw bytes within a sliding window. Only exploit local, literal repetitions.
Captures semantic and syntactic structure. Understands that after "The President of the United", "States" is extremely likely — even without recent repetition.
Nacrith compresses well below the classical Shannon entropy bounds
| Method | Size | bits/byte |
|---|---|---|
| Original | 100.5 KB | 8.0000 |
| Shannon 0th-order limit | 59.5 KB | 4.7398 |
| Shannon 1st-order limit | 44.2 KB | 3.5213 |
| Shannon 2nd-order limit | 34.4 KB | 2.7373 |
| gzip -9 | 39.0 KB | 3.1082 |
| xz -9 | 35.5 KB | 2.8257 |
| Nacrith | 15.5 KB | 1.2355 |
Nacrith achieves 1.24 bits/byte — 74% below the 0th-order Shannon limit and 55% below the 2nd-order limit. This is state-of-the-art compression performance.
Quick installation and usage guide
# Clone the repository
git clone https://github.com/st4ck/nacrith.git
cd nacrith
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install torch transformers accelerate pytest# Compress a file
python cli.py compress input.txt output.nc
# Decompress a file
python cli.py decompress output.nc restored.txt
# Run benchmarks
python benchmark.pyExperience state-of-the-art compression that truly understands your text