dhlab.text.conc_coll#

Module Contents#

Classes#

Concordance

Wrapper for concordance function

Collocations

Collocations

Counts

Provide counts for a corpus - shouldn’t be too large

Functions#

API#

dhlab.text.conc_coll.find_hits(x)#
class dhlab.text.conc_coll.Concordance(corpus=None, query=None, window=20, limit=500)#

Bases: dhlab.text.dhlab_object.DhlabObj

Wrapper for concordance function

Initialization

Get concordances for word(s) in corpus

Parameters:
  • corpus – Target corpus, defaults to None

  • query – word or list or words, defaults to None

  • window – how many tokens to consider around the target word, defaults to 20

  • limit – limit returned hits, defaults to 500

show(n=10, style=True)#
classmethod from_df(df)#

Typecast DataFrame to Concordance

class dhlab.text.conc_coll.Collocations(corpus=None, words=None, before=10, after=10, reference=None, samplesize=20000, alpha=False, ignore_caps=False)#

Bases: dhlab.text.dhlab_object.DhlabObj

Collocations

Initialization

Create collocations object

Parameters:
  • corpus (dh.Corpus, optional) – target corpus, defaults to None

  • words (str or list, optional) – target words(s), defaults to None

  • before (int, optional) – words to include before, defaults to 10

  • after (int, optional) – words to include after, defaults to 10

  • reference (pd.DataFrame, optional) – reference frequency list, defaults to None

  • samplesize (int, optional) – description, defaults to 20000

  • alpha (bool, optional) – Only include alphabetical tokens, defaults to False

  • ignore_caps (bool, optional) – Ignore capitalized letters, defaults to False

show(sortby='counts', n=20)#
keywordlist(top=200, counts=5, relevance=10)#
classmethod from_df(df)#

Typecast DataFrame to Collocation

Parameters:

df – DataFrame

Returns:

Collocation

class dhlab.text.conc_coll.Counts(corpus=None, words=None)#

Bases: dhlab.text.dhlab_object.DhlabObj

Provide counts for a corpus - shouldn’t be too large

Initialization

Get frequency list for Corpus

Parameters:
  • corpus – target Corpus, defaults to None

  • words – list of words to be counted, defaults to None

sum()#

Summarize Corpus frequencies

Returns:

frequency list for Corpus

display_names()#

Display data with record names as column titles.

display_rel_names()#

Display relfreq data with record names as column titles.

classmethod from_df(df)#
property counts#

Legacy property for freq