dhlab.text.conc_coll
#
Module Contents#
Classes#
Wrapper for concordance function |
|
Collocations |
|
Provide counts for a corpus - shouldn’t be too large |
Functions#
API#
- dhlab.text.conc_coll.make_link(row)#
- dhlab.text.conc_coll.find_hits(x)#
- class dhlab.text.conc_coll.Concordance(corpus=None, query=None, window=20, limit=500)#
Bases:
dhlab.text.dhlab_object.DhlabObj
Wrapper for concordance function
Initialization
Get concordances for word(s) in corpus
- Parameters:
corpus – Target corpus, defaults to None
query – word or list or words, defaults to None
window – how many tokens to consider around the target word, defaults to 20
limit – limit returned hits, defaults to 500
- show(n=10, style=True)#
- classmethod from_df(df)#
Typecast DataFrame to Concordance
- class dhlab.text.conc_coll.Collocations(corpus=None, words=None, before=10, after=10, reference=None, samplesize=20000, alpha=False, ignore_caps=False)#
Bases:
dhlab.text.dhlab_object.DhlabObj
Collocations
Initialization
Create collocations object
- Parameters:
corpus (dh.Corpus, optional) – target corpus, defaults to None
words (str or list, optional) – target words(s), defaults to None
before (int, optional) – words to include before, defaults to 10
after (int, optional) – words to include after, defaults to 10
reference (pd.DataFrame, optional) – reference frequency list, defaults to None
samplesize (int, optional) – description, defaults to 20000
alpha (bool, optional) – Only include alphabetical tokens, defaults to False
ignore_caps (bool, optional) – Ignore capitalized letters, defaults to False
- show(sortby='counts', n=20)#
- keywordlist(top=200, counts=5, relevance=10)#
- classmethod from_df(df)#
Typecast DataFrame to Collocation
- Parameters:
df – DataFrame
- Returns:
Collocation
- class dhlab.text.conc_coll.Counts(corpus=None, words=None)#
Bases:
dhlab.text.dhlab_object.DhlabObj
Provide counts for a corpus - shouldn’t be too large
Initialization
Get frequency list for Corpus
- Parameters:
corpus – target Corpus, defaults to None
words – list of words to be counted, defaults to None
- sum()#
Summarize Corpus frequencies
- Returns:
frequency list for Corpus
- display_names()#
Display data with record names as column titles.
- display_rel_names()#
Display relfreq data with record names as column titles.
- classmethod from_df(df)#
- property counts#
Legacy property for freq