Memory Management
Long-running crawls accumulate memory through DOM parsing, response caching, and result collection. The MemoryMonitor class tracks V8 heap usage in real-time and provides status levels, reports, and automatic callbacks to prevent out-of-memory crashes.
import { MemoryMonitor } from 'rezo/crawler'; Creating a MemoryMonitor
const monitor = new MemoryMonitor({
warningRatio: 0.7, // Warning at 70% heap usage (default: 0.7)
criticalRatio: 0.85, // Critical at 85% heap usage (default: 0.85)
checkInterval: 10000 // Check every 10 seconds (default: 10000ms)
}); The monitor reads the V8 heap size limit at construction time via v8.getHeapStatistics() and uses it as the baseline for threshold calculations.
Memory Status Levels
The monitor reports three status levels:
| Status | Condition | Meaning |
|---|---|---|
ok | Usage below 70% of max heap | Normal operation |
warning | Usage between 70% and 85% | Memory pressure detected, consider reducing concurrency |
critical | Usage above 85% of max heap | High risk of OOM, pause work and force GC |
Manual Check
const status = monitor.check();
// 'ok' | 'warning' | 'critical'
if (status === 'critical') {
queue.pause();
monitor.forceGC();
// Wait and resume
} Memory Reports
getReport()
Get a detailed memory report:
const report = monitor.getReport(); The MemoryReport interface:
interface MemoryReport {
status: 'ok' | 'warning' | 'critical';
heapUsedMB: number; // Current heap usage in MB
heapTotalMB: number; // Total heap allocated in MB
heapLimitMB: number; // V8 max heap limit in MB
usagePercent: number; // Heap used as percentage of limit
externalMB: number; // External memory (Buffers, etc.) in MB
rssMB: number; // Resident Set Size in MB
} Example output:
const report = monitor.getReport();
console.log(report);
// {
// status: 'warning',
// heapUsedMB: 1450,
// heapTotalMB: 1800,
// heapLimitMB: 2048,
// usagePercent: 70.8,
// externalMB: 45,
// rssMB: 1920
// } getUsagePercent()
Quick check of heap usage as a percentage:
const percent = monitor.getUsagePercent();
console.log(`Heap usage: ${percent.toFixed(1)}%`); Forced Garbage Collection
forceGC()
Triggers a full garbage collection cycle. Requires Node.js to be started with --expose-gc:
const success = monitor.forceGC();
if (success) {
console.log('GC triggered');
} else {
console.log('GC not available (start node with --expose-gc)');
} # Start Node.js with GC exposed
node --expose-gc crawl.ts Automatic Monitoring
startAutoMonitor()
Start a background interval that checks memory and triggers callbacks on status transitions:
monitor.startAutoMonitor({
onWarning: (report) => {
console.warn(`Memory warning: ${report.usagePercent}%`);
// Reduce concurrency to slow memory growth
queue.concurrency = 5;
},
onCritical: (report) => {
console.error(`Memory critical: ${report.usagePercent}%`);
// Pause all work and force GC
queue.pause();
monitor.forceGC();
},
onRecovered: (report) => {
console.log(`Memory recovered: ${report.usagePercent}%`);
// Resume normal operation
queue.resume();
queue.concurrency = 20;
}
}); Callback behavior:
onWarningfires when status transitions towarningfrom any other stateonCriticalfires when status transitions tocriticalfrom any other stateonRecoveredfires when status transitions tookfromwarningorcritical- Callbacks only fire on transitions, not on every check interval
The monitor interval uses unref() so it does not keep the process alive.
stopAutoMonitor()
Stop the background monitoring interval:
monitor.stopAutoMonitor(); Concurrency Recommendations
getRecommendedConcurrency()
Get a recommended concurrency based on current memory pressure:
const recommended = monitor.getRecommendedConcurrency(
currentConcurrency, // Your current concurrency setting
5 // Minimum concurrency (default: 5)
);
queue.concurrency = recommended; | Status | Recommendation |
|---|---|
ok | Returns currentConcurrency unchanged |
warning | Returns max(minConcurrency, currentConcurrency * 0.5) |
critical | Returns minConcurrency |
Cleanup
monitor.destroy();
// Stops auto-monitoring and releases resources Integration with Crawler
A complete example of memory-aware crawling:
import { Crawler, MemoryMonitor } from 'rezo/crawler';
const monitor = new MemoryMonitor({
warningRatio: 0.7,
criticalRatio: 0.85,
checkInterval: 5000
});
const crawler = new Crawler({
baseUrl: 'https://example.com',
concurrency: 30,
maxUrls: 100000
});
// Adaptive concurrency based on memory pressure
monitor.startAutoMonitor({
onWarning: (report) => {
console.warn(`[memory] Warning at ${report.usagePercent}% - reducing concurrency`);
// Access the crawler's internal queue to adjust concurrency
},
onCritical: (report) => {
console.error(`[memory] Critical at ${report.usagePercent}% - pausing`);
monitor.forceGC();
},
onRecovered: (report) => {
console.log(`[memory] Recovered at ${report.usagePercent}% - resuming`);
}
});
// Periodic reporting
setInterval(() => {
const report = monitor.getReport();
console.log(`[memory] ${report.status} - ${report.heapUsedMB}MB / ${report.heapLimitMB}MB (${report.usagePercent}%)`);
}, 30000);
await crawler.visit('https://example.com');
await crawler.done();
monitor.destroy();