What is Open Data?
Open Data is a concept that describes freedom of using the data, as anyone can use, share, redistribute, or republish it. The data should be legally open published under public domain to be used with minimal restrictions, & technically open as it should be published in open technical formats available for download.
Open Data license is used to set the legal foundation of the use of the published data, Open data must be licensed. Its license must permit people to use the data freely including data transformation, redistribution, republishing and even commercially.
Choosing Open Data license
- Creative Commons: CC-By and CC0 (Public Domain Dedication)
- Open Database License/Open Data Commons Open Database License (ODbL)
- Open Data Commons Public Domain Dedication and Licence (PDDL),
What's open data portals?
Data portal solution is a web software designed to publish data sets, it is often used by the government to publish data sets in order to provide transparency to its citizens. Data portal allows the organization to publish their data, organize it, tag/ classify it into categories, and often comes with data management tools with data reporting and visualization tools like maps, charts.
Open source vs commercial
Open source solutions are often come with more modules, apps integration, and large scale use, as it also packed a powerful community of users including experienced developers who sometimes involve in production and extensions/ plugins development.
Open source Data Portals
CKAN is an open source data portal designed to allow publishing, sharing and managing data sets, It provides many functionalities to the managers and end-users like full-text search, multi-lingual support, reporting tools, and a powerful API to access the data.
CKAN comes with many options for data scientists including reporting, Geospatial options to publish share location-based data. CKAN can be extended through plugins.
CKAN is the star of data portal as it's the popular choice for governments and NGO to publish/ share their data sets.
DKAN is an open source data portal very similar to CKAN though it comes with more data-oriented features including scrapping, data harvesting, visual data workflow, advanced visualization options, Integrated CMS/ blog to publish articles/ posts which makes it much powerful for SEO.
DKAN is based on Drupal CMS the open source PHP-based CMS, which makes it a little bit different in installation and management than CKAN which built on Python/ PostgreSQL.
DKAN users are mainly Gov organizations and NGOs.
Socrata is an open source data server to publish/ manage data sets. It features a powerful set of data management tools including database management, data manipulation tools, reporting tools, visualization with advanced options and customized financial analytics insights.
Scorata has 2 licenses an open source license for the community edition and commercial for the enterprise edition.
Scorata has been used by some US-states as Oregon and several other states.
Dataverse Open source Data repository solution built to share/ manage large data-sets. It helps its users, to collect, organize, publish their data-sets in a collaborative platform.
Dataverse has +40 installations around the world including NGO, Gov organization and research centers.
Swirrl is a self-hosted open-source web application for data publishing including collaborative structure that allows many users to contribute to data reporting, organization, analytics, and publishing.
Swirrl features include location-based data publishing, full-text search, data manipulation/ conversion/ merging tool, it comes with simple easy user-interface that ease the management and improve the user experience. Swirrl has a developer-friendly API that allows the users to access the data and integrate it into their reports/ applications.
Though Swirrl is originally designed for government usage it was the choice of many research centers around the world. and NHS ( National Health Services - UK).
6- The DataTank
The DataTank is an open source project aiming for developers as it provides a fully functional system to convert data sets into a functional RESTful-API.
DataTank supports multiple data formats in plain text files as JSON, CSV, RDF, XLS, JSON-LD, SHP files with MySQL database store.
GeoServer is an open source server for sharing geospatial data. GeoServer uses open source mapping tools OpenLayers to publish interactive geo reporting for geo-based data sets.
GeoServer has been around for yours and its used by several institutions and packed by a large community of developers, there are good books published to install, manage, master GeoServers which empower its new users and enrich the community.
Soda is a lightweight open-source Drupal distribution built to collect, manage and distribute (small scale) open data. Soda is released under GPLv3.0 license. It's easy to install and manage. As Soda is built on Drupal it inherits most of its features as CMS which makes it flexible as fully customizable content management system (CMS).
Truedat is an open-source data publishing CMS, built to provide a customizable interface to manage, publish data with advanced reporting features. Truedat is built to be a designer/ front-end developer friendly as it provides a fully customizable layout.
Features of Truedat include tagging/ classification, data reporting, front-end customization, SEO ready tools, quality check, data flow tracking, data dictionary, and user administration.
Magda is an open-source project that helps experienced developers/ users to build a data ecosystem, the project started as data aggregator but evolved to allow publishing, managing, sharing the datasets.
Magda has an advanced powerful search feature that supports full-text search, understands synonyms and acronyms, and it also comes with advanced powerful filters. It comes with a powerful rich API that allows using it with external projects with ease.
Magda is built on a modular architecture that allows developers to build their own extensions, and add new features. You can see Magda in action at The Australian government data portal. Magda is still under heavy development, so keep in mind to follow the project for future releases.
Magda uses tiny small services (minion) to watch for data changes/ updates and perform a certain operation upon changes.
JKAN is an open-source lightweight data portal built on a static site generator "Jekyll", Its completely backend free which makes it lightweight and also limited when it comes to database required features.
JKAN can be installed like Jekyll as static pages, even in GitHub pages, It has many themes and experienced developer can create themes for it with ease. JKAN is very easy to install, setup, manage and update, which makes it more preferable than backend-heavy solutions.
12- GeoNode: CMS for Geospatial Data
GeoNode is CMS for geospatial datasets, which allow the user to publish geospatial datasets with powerful visualization options, advanced search functionality, create interactive maps and collaborate with other users. GeoNode is also a powerful platform for developers for developing geospatial information systems (GIS) and for deploying spatial data infrastructures (SDI). GeoNode CMS is built with Django, and it works very well with Django-based projects.
13- Hue: Build Data Portals over SQL
Hue is not a data portal, its a developer tool to build data application based on SQL database, basically Hue is an editor and assistant for SQL that helps the developer to visualize, build their data apps/ portals using datasets saved in SQL-databases. Hue is supporting multiple sources including Apache Hive, Apache Impala, Apache Presto, SQL databases list MySQL, PostgreSQL, Oracle, BigQuery, basically any SQL-based database.
Hue has a powerful catalog which comes with powerful search, tagging options and visual tools that allow the user to navigate, search, tag, import or merge the data. Hue comes with advanced visualization features embedded in its dashboard. One of the most useful features which will save time for the user is workflow building, scheduling, and automation.
14- Open Data Node (ODN)
Open Data Node (ODN) is an Open Data publishing platform, It integrates well with other open-source Open Data portals like CKAN and Socrata. Open Data Node comes with automated features to make the data publishing, discovery and manipulation an easy task for user/ developers.
Open Data Node (ODN) provides utilities for developers to build their apps over the published data.
15 - Open Data Catalog
Open Data Catalog is an open-source Data portal/ CMS built originally as a data portal for Philadelphia OpenDataPhilly.org, It's built with Django and uses PostgreSQL database.
16- Open Geoportal (OGP) (JAVA)
Open Geoportal is an open-source web application to publish, share Geo-datasets. Open Geoportal is released under GPL v3 and built with Java.
- Guide to Open Data Licensing - src.