Xidel is an open-source data extraction tool

A platform independent command line tool to download webpages and extract data from them, using XPath 2 / XQuery expressions, CSS 3 selectors or custom pattern-matching templates. It is kind of an example for my internet tools.

Xidel is a command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

It is a platform-independent package which runs on Windows, Linux, and macOS.

Features

  • East to setup, use
  • Zero configuration required
  • Works smoothly on Windows, Linux, macOS and Android
  • Well documented
  • Packed with dozens of examples
  • Lightweight package

Xidel supports expressions

  • CSS 3 Selectors: to extract elements unchanged
  • XPath 3.0: to extract values and calculate things with them.
  • XQuery 3.0: to create new documents from the extracted values and to build Turing-complete scripts.
  • Pattern matching: to extract several expressions in an easy way using an annotated version of the input page for pattern-matching.
  • XPath 2.0/XQuery 1.0: compatibility mode for old XPath/XQuery versions.
  • JSONiq: to work with JSON APIs (deprecated by XPath 3.1)

Following

  • HTTP Codes: Redirections like 30x are automatically followed, while keeping things like cookies.
  • Links: It can follow (all) links on a page, meta refreshs, or any extracted value.
  • HTML Forms: It can fill in arbitrary data in the input elements and submit the form.
  • Arbitrary HTTP requests: In any query, you can call a function to make other requests.

Output formats:

  • Adhoc: just prints the data in a human-readable format.
  • XML: encodes the data as XML.
  • HTML: encodes the data as HTML.
  • JSON: encodes the data as JSON.
  • bash/cmd: exports the data as shell variables.
  • fn:serialize: implements the W3C XQuery Serialization standard.

Connections

  • Connections: HTTP / HTTPS as well as local files or stdin.

License

Xidel is released under the GNU General Public License v3.0.

Resources