Harvest News Like a Pro: Introducing News-Please, Your Open-Source Solution for News Extraction and Archiving

Hazem Abbas

Mar 21, 2024 — 1 min read

Photo by Obi - @pixel8propix / Unsplash

news-please is an open-source news crawler that extracts structured information from news websites. It uses libraries like scrapy, Newspaper, and readability, and can follow internal hyperlinks and read RSS feeds to fetch both recent and archived articles.

It also features a library mode for Python developers and can extract articles from the large news archive at commoncrawl.org.

Features

works out of the box: install with pip, add URLs of your pages, run
run news-please conveniently using its CLI mode
use it as a library within your own software
extract articles from commoncrawl.org's news archive
stores extracted results in JSON files, PostgreSQL, ElasticSearch, or your own storage
simple but extensive configuration (if you want to tweak the results)
revisions: crawl articles multiple times and track changes
crawl and extract information given a list of article URLs
to use news-please within your own Python code

Extracted information

news-please extracts the following attributes from news articles. An examplary json file as extracted by news-please can be found here.

headline
lead paragraph
main text
main image
name(s) of author(s)
publication date
language

Install

$ pip3 install news-please

License

Apache-2.0 License

Resources & Downloads

Source-code download

Harvest News Like a Pro: Introducing News-Please, Your Open-Source Solution for News Extraction and Archiving

Hazem Abbas

Features

Extracted information

Install

License

Resources & Downloads

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Features

Extracted information

Install

License

Resources & Downloads

Read More Articles in Scrapping

Top 15 Open-Source Headless Browsers for Automation: Testing, Scraping, and Beyond

Python Wizards: Extract Valuable Insights from Google Maps in Minutes with Google Maps Scraper

Top 17 Open-source Web Scrapping Frameworks

27 Free and Open-source Instagram Scrapping Solutions and Media Downloaders

54 Free Open-source Web Spiders, Crawlers and Scrapping Solutions for Data Collection

14 Open-source Free Google Map Scrapping Tools and Scripts

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources