search-engine

DataparkSearch Is an Open Source Search Engine Written in C

Hazem Abbas

Apr 10, 2023 — 3 min read

Photo by Martin Adams / Unsplash

Table of Content

DataparkSearch Engine is a powerful and versatile search engine that can be used to search for information within a website, group of websites, intranet or local system. This open-source web-based search engine is equipped with a wide range of features that make it stand out from the competition.

One of the key benefits of DataparkSearch Engine is that it supports various URL schemes, including http, https, ftp, nntp, and news. This means that users can search for information across a wide range of platforms, making it an ideal solution for those who need to search across multiple websites or systems.

In addition to its extensive range of features, DataparkSearch Engine is also easy to use, with a user-friendly web-based interface that allows users to quickly and easily search for the information they need. This makes it an ideal choice for businesses, organizations or individuals who need a powerful search engine that is easy to use.

Overall, DataparkSearch Engine is a reliable and efficient search engine that is well-suited to a wide range of applications. Whether you need to search for information on a single website or across multiple platforms, this versatile search engine is sure to meet your needs.

Features

Support for http, https, ftp (passive mode), nntp and news URL schemes.
htdb virtual URL scheme for SQL database indexing.
Indexes text/html, text/xml, text/plain, audio/mpeg (mp3) and image/gif mime types natively.
External parsers support for other document types, including Microsoft Word, Excel, RTF, PowerPoint, Adobe Acrobat PDF and Flash.
Can index multilingual sites using content negotiation.
Can search all of the word forms using ispell affixes and dictionaries.
Synonym, acronym and abbreviation query expansion based on editable dictionaries, specified by language and charset.
Stop-words, synonyms and acronyms lists.
Options to query with all words, all words near to each others, any words, or Boolean queries. A subset of VQL (Verity Query Language) is supported.
Popularity Rank based on a neural network model.
Results can be sorted by relevancy (using vector calculation), popularity rank as "Goo" (adding weight for incoming links), and "Neo" (neural network model), last modified time, and by "importance" (a combination of relevancy and popularity rank).
Supports wide range of character sets support with automated character set and language detection.
Offers an accent insensitive search option.
Provides phrase segmenting (tokenizing) for Chinese, Japanese, Korean and Thai.*
Includes an indexer and a web CGI front-end, as well as a search module for Apache web server (mod_dpsearch).
Handles Internationalized Domain Names (IDN).
Summary Extraction Algorithm automatically sums up each document in several sentences.
Uses If-Modified-Since for efficient transfer of only changed files.
Can tweak URLs with session IDs and other weird formats, including some JavaScript link decoding.
Can perform parallel and multi-threaded indexing for faster updating.
Flexible update scheduling, including options for checking some sections of a site more frequently.
Handles basic authentication (user name and password) and cookies.
Stores a compressed text version of the documents for extracting and viewing.
Can specify a default character set and language for a server or subdirectory, or a list of possible languages.
Noindex tags: , <NOINDEX>, , Google's special comments ,  and  consider as tags to include/exclude.
Can specify a content body tag.
Spellchecking for query words with aspell.
Flexible options and commands to customize search result pages.
Effective caching gives significant time reduction in search times.
Query logging stores the query, query parameters and the number of results found.

Tech Stack

DataparkSearch is written primarily in the C language.

License

It is released under the GNU General Public License (v2), making it freely available to anyone who needs it.

Resources

DataparkSearch Is an Open Source Search Engine Written in C

Hazem Abbas

Table of Content

Features

Tech Stack

License

Resources

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Are Games Good for ADHD? A Doctor, Developer, and Gamer’s Perspective

Choosing the Right Horse: A Practical Approach

Is Google Search Dying? The AI Revolution That’s Changing How We Find Stuff

AI-Powered Peace of Mind: Crafting HIPAA-Compliant Products Without the Headache

Table of Content

Features

Tech Stack

License

Resources

Read More Articles in search-engine

Is Google Search Dying? The AI Revolution That’s Changing How We Find Stuff

Why Searching using AI Chat Interface Does not work, 7 Reasons

DeepSeek R1: Open-Source AI Model Surpasses OpenAI and Claude with Superior Cost Efficiency

Why We're Betting Big on DeepSeek-V3: A Personal Dive into the Open-Source AI That’s Changing the Game and Redefining AI Excellence

Is Bing Dying? Let’s Unpack the Current Search Engine Wars

It is not too late for Google Search to Recover! Why People Are Leaving Google Search?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Are Games Good for ADHD? A Doctor, Developer, and Gamer’s Perspective

Choosing the Right Horse: A Practical Approach

Is Google Search Dying? The AI Revolution That’s Changing How We Find Stuff

AI-Powered Peace of Mind: Crafting HIPAA-Compliant Products Without the Headache