Caching
The FileCacher provides SQLite-based response caching optimized for crawler workloads. It stores HTTP responses as BLOBs in a single database with namespace-based separation, optional zstd compression (Node.js 22.15+), and LRU eviction to cap disk and memory usage.
import { FileCacher } from 'rezo/crawler'; Architecture
- Single SQLite database — All domains share one database with
namespaceas a column, preventing file descriptor exhaustion when crawling many domains - WAL journal mode — Write-Ahead Logging enables non-blocking reads during writes
- BLOB storage — Responses stored as raw binary, avoiding the 33% overhead of base64 encoding
- WITHOUT ROWID table — Primary key is
(namespace, key)for fast composite lookups - Memory-mapped I/O — 128MB mmap for fast reads on supported systems
- LRU eviction — Oldest entries evicted when
maxEntriesis exceeded
Creating a FileCacher
const cache = await FileCacher.create({
cacheDir: '/tmp/my-crawler/cache', // Database directory (default: '/tmp/rezo-crawler/cache')
dbFileName: 'cache.db', // Database filename (default: 'cache.db')
ttl: 86400000, // Default TTL in ms (default: 7 days)
compression: true, // Enable zstd compression (default: false)
maxEntries: 100000 // Max entries before LRU eviction (default: 100,000)
}); Compression
When compression: true is set, responses are compressed with zstd before storage. This requires Node.js 22.15+ which ships native zlib.zstdCompressSync() and zlib.zstdDecompressSync(). On older Node.js versions, compression is silently disabled and data is stored uncompressed.
Storing Responses
set()
Store a single value with optional TTL and namespace:
// Store a response
await cache.set('https://example.com/page1', {
html: '<html>...</html>',
status: 200,
headers: { 'content-type': 'text/html' }
});
// Store with custom TTL (1 hour)
await cache.set('https://example.com/page1', responseData, 3600000);
// Store in a domain namespace
await cache.set('https://example.com/page1', responseData, undefined, 'example.com'); Values are JSON-serialized before storage, so any JSON-compatible type works.
setMany()
Store multiple entries in a single transaction for better throughput:
await cache.setMany([
{ key: 'https://example.com/page1', value: response1 },
{ key: 'https://example.com/page2', value: response2, ttl: 3600000 },
{ key: 'https://example.com/page3', value: response3 }
], 'example.com'); Retrieving Cached Data
get()
Retrieve a cached value. Returns null if the key does not exist or has expired:
const data = await cache.get<{ html: string; status: number }>(
'https://example.com/page1',
'example.com'
);
if (data) {
console.log(data.html);
console.log(data.status);
} else {
// Not cached or expired
} Expired entries are automatically deleted on access.
has()
Check if a key exists and is not expired without retrieving the full value:
const exists = await cache.has('https://example.com/page1', 'example.com'); hasMany()
Batch check multiple keys:
const cachedUrls = await cache.hasMany(
['https://example.com/page1', 'https://example.com/page2', 'https://example.com/page3'],
'example.com'
);
// Returns Set<string> of keys that exist and are not expired Deletion and Cleanup
delete()
Remove a single entry:
await cache.delete('https://example.com/page1', 'example.com'); clear()
Clear entries by namespace or all entries:
// Clear one namespace
await cache.clear('example.com');
// Clear everything
await cache.clear(); cleanup()
Remove all expired entries across all namespaces:
const removed = await cache.cleanup();
console.log(`Removed ${removed} expired entries`); LRU Eviction
When maxEntries is configured and the entry count exceeds the limit, the cacher performs LRU eviction:
- First, all expired entries are deleted
- If still over the limit, the oldest 10% of entries (by
createdAt) are removed - Eviction runs asynchronously on the next event loop tick to avoid blocking writes
const cache = await FileCacher.create({
maxEntries: 50000 // Keep at most 50,000 entries
});
// After 50,001 inserts:
// - Expired entries are purged
// - If still over limit, oldest 5,000 entries are evicted Set maxEntries: 0 to disable eviction (unlimited growth — use with caution).
Statistics
const stats = await cache.stats();
// { count: 45000, expired: 1200, namespaces: 5 }
// Stats for a specific namespace
const nsStats = await cache.stats('example.com');
// { count: 8000, expired: 300, namespaces: 5 } Closing
Always close the cacher when done to checkpoint the WAL and release the database connection:
await cache.close();
console.log(cache.isClosed); // true
console.log(cache.directory); // '/tmp/my-crawler/cache'
console.log(cache.databasePath); // '/tmp/my-crawler/cache/cache.db' Integration with Crawler
The crawler automatically creates and manages a FileCacher when enableCache: true:
import { Crawler } from 'rezo/crawler';
const crawler = new Crawler({
baseUrl: 'https://example.com',
enableCache: true,
cacheTTL: 86400000, // 24 hours
cacheDir: './my-cache'
});
// First visit fetches from network and caches
await crawler.visit('https://example.com/page1');
// Second visit serves from cache (if within TTL)
await crawler.visit('https://example.com/page1');