Wget

Link Conversion

After downloading a website, all the internal links still point to absolute URLs (https://example.com/style.css). The LinkConverter rewrites these to relative paths (./style.css) so the mirrored site works when opened from the local filesystem.

This implements wget’s -k / --convert-links functionality.

Quick Start

Enable link conversion with the fluent API:

import { Wget } from 'rezo/wget';

await new Wget()
  .pageRequisites()
  .convertLinks()
  .outputDir('./offline-site')
  .get('https://example.com/');

// All HTML and CSS files now use relative paths
// Open offline-site/index.html in your browser - it just works

How It Works

Link conversion runs as a post-processing step after all downloads complete:

  1. The Downloader builds a URL map: Map<originalUrl, localFilePath>
  2. LinkConverter reads each downloaded HTML and CSS file
  3. For every URL found in the file, it checks the URL map
  4. If the URL was downloaded, it’s replaced with a relative path from the current file to the target file
  5. If the URL was not downloaded, it remains as an absolute URL

HTML Conversion

The converter processes these elements and attributes in HTML:

ElementAttributes
a, area, linkhref
script, img, source, video, audio, tracksrc
img, sourcesrcset
videoposter
iframe, frame, embedsrc
objectdata
formaction
inputsrc

It also converts:

  • Inline styles (style="background: url(...)")
  • <style> tag contents (CSS url() and @import)
  • srcset descriptors (multi-resolution images)

CSS Conversion

In standalone CSS files, the converter rewrites:

  • @import url('...') and @import '...' rules
  • url() functions in property values (backgrounds, fonts, cursors, etc.)

Relative Path Calculation

Paths are calculated relative to the file being modified:

File:   output/example.com/docs/guide.html
Asset:  output/css/main.css
Result: ../../css/main.css
File:   output/example.com/index.html
Asset:  output/example.com/images/logo.png
Result: ./images/logo.png

Content-Type Awareness

The converter determines which files to process using the MIME type recorded during download, not file extensions. This is important because many web applications serve HTML from paths like /login, /dashboard.aspx, or /api/page — none of which have .html extensions.

Files with these MIME types are processed:

  • text/html, application/xhtml+xml — treated as HTML
  • text/css — treated as CSS

If MIME type information is unavailable, the converter falls back to file extensions (.html, .htm, .css).

URL Resolution

The converter handles multiple URL formats:

Absolute URLs

<!-- Before -->
<link href="https://example.com/css/style.css">

<!-- After (if downloaded) -->
<link href="./css/style.css">

<!-- After (if NOT downloaded) -->
<link href="https://example.com/css/style.css">

Site-Root Relative URLs

<!-- Before -->
<img src="/images/logo.png">

<!-- After (resolved against page URL, then converted) -->
<img src="./images/logo.png">

Already-Relative URLs

<!-- Before -->
<a href="../other-page.html">

<!-- Stays relative, may be adjusted for organized assets -->
<a href="../other-page.html">

Query Strings and Fragments

The converter strips query strings and fragments when looking up URLs in the download map, ensuring assets like /style.css?v=123 match the downloaded file.

Organized Asset Paths

When organizeAssets is enabled, assets are moved to categorized folders. The converter accounts for this when computing relative paths. For example, if a site-root image /core/misc/tree.png was organized into images/tree.png, the converter predicts the organized path and generates the correct relative reference.

Backup Original Files

Enable backups to preserve the original HTML/CSS before conversion:

await new Wget({
  recursive: { convertLinks: true },
  download: { backupConverted: true },
}).get('https://example.com/');

// Creates index.html.orig alongside the modified index.html

Conversion Statistics

The conversion phase returns detailed statistics:

interface ConversionStats {
  filesProcessed: number;   // Total files examined
  filesModified: number;    // Files that were actually changed
  linksConverted: number;   // Total URLs rewritten
  linksToRelative: number;  // URLs converted to relative paths
  linksToAbsolute: number;  // URLs left as absolute (not downloaded)
}

Special URLs

These URL schemes are never converted:

  • data: — Data URIs
  • javascript: — JavaScript URIs
  • mailto: — Email links
  • tel: — Phone links
  • blob: — Blob URIs
  • # — Fragment-only anchors

Complete Example

Mirror a site with link conversion and organized assets:

import { Wget } from 'rezo/wget';

const stats = await new Wget({
  organizeAssets: true,
  download: { adjustExtension: true },
})
  .concurrency(5)
  .convertLinks()
  .pageRequisites()
  .domains('docs.example.com')
  .outputDir('./docs-mirror')
  .on('link-conversion', (event) => {
    if (event.phase === 'complete') {
      console.log(`Converted ${event.linksConverted} links in ${event.convertedFiles} files`);
    }
  })
  .get('https://docs.example.com/');