ArchiveBox is an open-source self-hosted web archiving system for the web and the desktop
Keep track of your URL, pages, and websites with ArchiveBox
What is ArchiveBox?
ArchiveBox is a web-based self-hosted web archiving system that you can use to record and archive online links, web pages, and media pages in a single database.
With ArchiveBox you can have your collection saved, share them or keep them private for your own use.
Moreover, It is an open-source, easy to setup, install, configure and use. Anyone can install it and start using it directly from their servers.
ArchiveBox comes with a command-line app that works directly from your terminal, a web application that works seamlessly in all modern browsers, and a new released desktop app (in Alpha stage) that works for Windows, Linux, and macOS.
ArchiveBox Features
- Free & open source, doesn’t require signing up online, stores all data locally
- Powerful, intuitive command line interface with modular optional dependencies
- Support dozens of file formats: Audio, Video, Text, Docs, Media
- ArchiveBox is an easy to setup, configure and use
- Comes with a strong search functionality.
- Save and record dozens of URLs (Links) separately or in a batch.
- A clutter-free web user-interface
- Comprehensive documentation
- Active development
- Has a Rich community
- ArchiveBox supports tags.
- Screenshot support for links
- Automatic check if the links are valid or not
- Take a snapshot of your URLs/ links.
- Automatically checks for page errors, stats, and header
- Extract URL header, meta, and included medias
- Export URL into PDF or as a screenshot image
- check the readability of pages
- Extracts a wide variety of content out-of-the-box: media (youtube-dl), articles (readability), code (git), etc.
- Supports scheduled/real-time importing from many types of sources
- Uses standard, durable, long-term formats like HTML, JSON, PDF, PNG, and WARC
- Usable as an oneshot CLI, self-hosted web UI, Python API (BETA), REST API (ALPHA), or desktop app (ALPHA)
- Saves all pages to archive.org as well by default for redundancy (can be disabled for local-only mode)
- Advanced users: support for archiving content requiring login/paywall/cookies (see wiki security caveats!)
- SQLite support
- Import/ export options
Platforms
- Web-based self-hosted
- Docker
- Linux
- macOS
- Windows
Get ArchiveBox
License
ArchiveBox is released as an open-source project under MIT license.
Resources
- te-ArchiveBox.html
- https://archivebox.io/
- https://github.com/ArchiveBox
- https://reposhub.com/python/web-crawling/pira