Libro: Open source Free Text Analysis Tool
Libro is an open source cross-platform software that provides a simple yet comprehensive text analysis for text files, It analyses the text content and generates graphs, and analytic summary for the extracted text from the text files. Libro is created in Brazil and it has 2 languages supported so far: English, & Brazilian Portuguese.
Libro is written with Python/ Free Pascal (Lazarus), though, the application is cross-platform, unfortunately, it does not have macOS or Linux package, however, experienced users with development skills will be able to run it on their machines (macOS, or Linux). The project the last update was 2017-09-24.
Supported file formats
Libro has proven to open large ePub, & HTML files, But it opens text-based file formats with ease, here is a list of supported file formats:
- Plain text files: .txt, .md
- HTML files: Hypertext Markup Language
- ePub: e-book file format
- ODT formats: The Open Document Format
Features:
- Lightweight
- Simple user-interface (UI)
- Works smoothly on Windows, Linux, & macOS.
- Count words
- Counts words, sentences, characters, spaces, and syllables
- Readability index statistics
- Rank words based on the frequency
- Quantitative analysis of the text using Shannon-Weaver information statistic and Zipf power law function
- Computes readability-index (Gunning-Fog, Coleman-Liau, Automated Readability Index (ARI), SMOG grade, Flesch–Kincaid grade level and Flesch Reading Ease)
- Generates word frequency plot
Statistics:
- Number of words
- Number of characters
- Number of alphanumerics
- Number of characters (without space)
- Different words
- Number of syllables
- Number of sentences
- The average number of words per sentence
- Readability index statistics
Under the hood
Here is what Libro uses different formula and analytical models to provide text analysis:
- Zipf's law: Zipf's Law is a statistical distribution in certain data sets, such as words in a linguistic corpus, in which the frequencies of certain words are inversely proportional to their ranks. [ref]
- Shannon-Weaver information statistic gives a measure of the entropy (or the average information content) of the text, expressed in bits.
- Gunning Fog Index formula: is based on the concept that short sentences are written in Plain English achieve a better score than long sentences written in a complicated language.[ref]
- The Flesch–Kincaid readability tests are readability tests designed to indicate how difficult a passage in English is to understand. [ref]
- SMOG grading: SMOG Readability Formula estimates the years of education a person needs to understand a piece of writing. [ref]
- The Coleman–Liau Readability Formula (also known as The Coleman–Liau Index) is a readability assessment test designed by linguists Meri Coleman and T. L. Liau to approximate the usability of a text. [ref]
- The Automated Readability Index (ARI) is a readability test designed to assess the understandability of a text. [ref]
Considerations
- Does not open patch files
- Does not open folders
- Requires admin privileges on Windows
- Requires experience to built and run on macOS, & Linux
Download and run Libro
Libro provides Windows executable file, that runs smoothly on Windows, but requires admin privileges.
Build it from source
Libro is an open source project that it releases its code for free, anyone with enough developing skills, can build it for their machine, operating systems.
Use Libro on Linux/ or macOS
To use Libro on Linux/ macOS without building it from source, you can use Wine, which will run it on macOS & Linux distros like Debian, Ubuntu, LinuxMint, Fedora & MXLinux.
Developer's note
Libro is built with FreePascal & Python, It offers 2 versions:
- Python version that has specific requirements
- Python version 2.6 or later, PyQt4 version 4.8 or later, BeautifulSoup version 4.0 or later, Matplotlib version 0.98 or later. - Free Pascal/Lazarus version : Requirements
Free Pascal version 3.0 or later, Lazarus version 1.6 or later, HTMLViewer component version 11.8 or later, HistoryFiles component version 1.3 or later, Vector library version 050702
License
Libro is release under GNU General Public License version 3.0 (GPLv3)
Conclusion
Libro is good for normal computer users, or who wants to analyze some text files without the need to run sophisticated software. However, It has an executable build for Windows now, even though it can build for other operating systems. Libre is a lightweight package, it's useful for students, and users who want a fast text analysis tool.
References
- The Automated Readability Index (ARI) [src]
- Coleman–Liau Readability Formula [src]
- The Flesch–Kincaid readability [src]
- Gunning Fog Formular [src]
- Zipf's law [src]