home << dhlab reference << dhlab.api.dhlab_api

evaluate_documents#

from dhlab.api.dhlab_api import evaluate_documents
evaluate_documents(wordbags=None, urns=None)[source]#

Count and aggregate occurrences of topic wordbags for each document in a list of urns.

Parameters:
  • wordbags (dict) – a dictionary of topic keywords and lists of associated words. Example: {"natur": ["planter", "skog", "fjell", "fjord"], ... }

  • urns (list) – uniform resource names, for example: ["URN:NBN:no-nb_digibok_2008051404065", "URN:NBN:no-nb_digibok_2010092120011"]

Returns:

a pandas.DataFrame with the topics as columns, indexed by the dhlabids of the documents.

Return type:

DataFrame