Xidel is an open-source data extraction tool

Xidel is a command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

It is a platform-independent package which runs on Windows, Linux, and macOS.

Features

  • East to setup, use
  • Zero configuration required
  • Works smoothly on Windows, Linux, macOS and Android
  • Well documented
  • Packed with dozens of examples
  • Lightweight package

Xidel supports expressions

  • CSS 3 Selectors: to extract elements unchanged
  • XPath 3.0: to extract values and calculate things with them.
  • XQuery 3.0: to create new documents from the extracted values and to build Turing-complete scripts.
  • Pattern matching: to extract several expressions in an easy way using an annotated version of the input page for pattern-matching.
  • XPath 2.0/XQuery 1.0: compatibility mode for old XPath/XQuery versions.
  • JSONiq: to work with JSON APIs (deprecated by XPath 3.1)

Following

  • HTTP Codes: Redirections like 30x are automatically followed, while keeping things like cookies.
  • Links: It can follow (all) links on a page, meta refreshs, or any extracted value.
  • HTML Forms: It can fill in arbitrary data in the input elements and submit the form.
  • Arbitrary HTTP requests: In any query, you can call a function to make other requests.

Output formats:

  • Adhoc: just prints the data in a human-readable format.
  • XML: encodes the data as XML.
  • HTML: encodes the data as HTML.
  • JSON: encodes the data as JSON.
  • bash/cmd: exports the data as shell variables.
  • fn:serialize: implements the W3C XQuery Serialization standard.

Connections

  • Connections: HTTP / HTTPS as well as local files or stdin.

License

Xidel is released under the GNU General Public License v3.0.

Resources





Friday is an Open-source Virtual Assistant

Virtual assistant technology defines as an application program that uses semantic and deep learning. It can also call an AI assistant or digital assistant. It helps users or enterprises to assist people or automate tasks. Any virtual assist.......Read more...

15 Open-source Physics Simulation Engine

What is a physics simulation engine?A physics simulation engine is a custom software engine that grants developers add visual effetcs, simulate and tweak objects and enviroments accroding to the law of physics in 2D, 3D or both. With physi.......Read more...

Top 23 Open-source Headless and API-based CMS for 2022

A Headless API-based CMS is a content management system that offers an API endpoint to view, manage, and create content, users, and settings instead of the classical web interface. Many developers like API-based approach as it is easier to.......Read more...

CMS: Is a Self-hosted Open-source Contest Management System

CMS, or Contest Management System, is a distributed system for running and (to some extent) organizing a programming contest. CMS has been designed to be general and to handle many types of contests, tasks, scoring, etc. Nonetheless, CMS ha.......Read more...

18 Open-source Flat-file Wiki Engines

A flat-file system is a solution that save records, and data in a plain files, instead of depending on an external database to store its content. Unlike database dependent systems, which requires you to setup, configure, prepare, and manag.......Read more...