- Add comprehensive default CHROME_ARGS in config.json with 55+ flags
for deterministic rendering, security, performance, and UI suppression
- Update chrome_utils.js launchChromium() to read CHROME_ARGS and
CHROME_ARGS_EXTRA from environment variables (set by get_config())
- Add getEnvArray() helper to parse JSON arrays or comma-separated
strings from environment variables
- Separate args into three categories:
1. baseArgs: Static flags from CHROME_ARGS config (configurable)
2. dynamicArgs: Runtime-computed flags (port, sandbox, headless, etc.)
3. extraArgs: User overrides from CHROME_ARGS_EXTRA
- Add CHROME_SANDBOX config option to control --no-sandbox flag
Args are now configurable via:
- config.json defaults
- ArchiveBox.conf file
- Environment variables
- Per-crawl/snapshot config overrides
- Add _derive_persona_paths() in configset.py to automatically derive
CHROME_USER_DATA_DIR and CHROME_EXTENSIONS_DIR from ACTIVE_PERSONA
when not explicitly set. This allows plugins to use these paths
without knowing about the persona system.
- Update chrome_utils.js launchChromium() to accept userDataDir option
and pass --user-data-dir to Chrome. Also cleans up SingletonLock
before launch.
- Update killZombieChrome() to clean up SingletonLock files from all
persona chrome_user_data directories after killing zombies.
- Update chrome_cleanup() in misc/util.py to handle persona-based
user data directories when cleaning up stale Chrome state.
- Simplify on_Crawl__20_chrome_launch.bg.js to use CHROME_USER_DATA_DIR
and CHROME_EXTENSIONS_DIR from env (derived by get_config()).
Config priority flow:
ACTIVE_PERSONA=WorkAccount (set on crawl/snapshot)
-> get_config() derives:
CHROME_USER_DATA_DIR = PERSONAS_DIR/WorkAccount/chrome_user_data
CHROME_EXTENSIONS_DIR = PERSONAS_DIR/WorkAccount/chrome_extensions
-> hooks receive these as env vars without needing persona logic
- Move hardcoded default args from Python to config.json YTDLP_ARGS
- Add get_ytdlp_args() function to read from YTDLP_ARGS env var
- Keep format arg with max_size in code (depends on YTDLP_MAX_SIZE)
- YTDLP_ARGS can be overridden as JSON array in environment
Replace old `output` field with new fields across the codebase:
- output_str: Human-readable output summary
- output_json: Structured metadata (optional)
- output_files: Dict of output files with metadata
- output_size: Total size in bytes
- output_mimetypes: CSV of file mimetypes
Files updated:
- api/v1_core.py: Update MinimalArchiveResultSchema to expose new fields
- api/v1_core.py: Update ArchiveResultFilterSchema to search output_str
- cli/archivebox_extract.py: Use output_str in CLI output
- core/admin_archiveresults.py: Update admin fields, search, and fieldsets
- core/admin_archiveresults.py: Fix output_html variable name bug in output_summary
- misc/jsonl.py: Update archiveresult_to_jsonl() to include new fields
- plugins/extractor_utils.py: Update ExtractorResult helper class
The embed_path() method already uses output_files and output_str,
so snapshot detail page and template tags work correctly.
- All install hooks now respect their respective XYZ_BINARY env vars
(e.g., WGET_BINARY, CHROME_BINARY, YTDLP_BINARY, etc.)
- Support both absolute paths (/usr/bin/wget2) and binary names (wget2)
- Dynamic bin_name used in Dependency JSONL output
- Updated 11 install hooks to follow the new pattern
- Mark checklist items as complete in TODO_hook_architecture.md
- Rename 13 on_Crawl__00_validate_* hooks to on_Crawl__00_install_*
- This better reflects what these hooks actually do (check/install binaries)
- Update TODO_hook_architecture.md to reflect renamed hooks
- Remove mock-based tests from plugin tests (headers, singlefile, ublock, captcha2)
- Replace fake cache tests with real double-install tests that verify cache behavior
- Add SCHEMA_0_8 and seed_0_8_data() for testing 0.8.x data directory migrations
- Add TestMigrationFrom08x class with comprehensive migration tests:
- Snapshot count preservation
- Crawl record preservation
- Snapshot-to-crawl relationship preservation
- Tag preservation
- ArchiveResult status preservation
- CLI command verification after migration
- Add more CLI tests for add command (tags, multiple URLs, file input)
- All tests now use real functionality without mocking