dhlab documentation#

dhlab is a python library for doing qualitative and quantitative analyses of the digital texts from Nettbiblioteket (eng: “the online library”) at the National Library of Norway (NLN). Nettbiblioteket is the NLN’s digital collection of media publications.

On our official homepage (in Norwegian), you can view and run example jupyter notebooks in your browser.

Installation with pip#

Install the latest version of dhlab in your (Unix) terminal with pip:

pip install -U dhlab

Get started with some examples.

Functionality#

Analyses can be performed on both a single document, and on a larger corpus.

Here are some of the text mining and automatic analyses you can do with dhlab:

Build a corpus from bibliographic metadata about publications.
Retrieve word (token) frequencies from a corpus.
Fetch chunks of text (paragraphs) as bag of words from a specific publication.
Extract concordances
collocations
Retrieve n-gram frequencies per yer in a time period.
Extract occurrences of named entities.
Plot narrative graphs of word dispersions in a publication.