python - Sorting FreqDist in NLTK with get vs get() -
i playing around nltk , module freqdist
import nltk nltk.corpus import gutenberg print(gutenberg.fileids()) nltk import freqdist fd = freqdist() word in gutenberg.words('austen-persuasion.txt'): fd[word] += 1 newfd = sorted(fd, key=fd.get, reverse=true)[:10]
so playing around nltk , have question regarding sort portion. when run code sorts freqdist object. when run get() instead of encounter error
traceback (most recent call last): file "c:\python34\nlp\nlp.py", line 21, in <module> newfd = sorted(fd, key=fd.get(), reverse=true)[:10] typeerror: expected @ least 1 arguments, got 0
why right , get() wrong. under impression get() should correct, guess not.
essentially, freqdist
object in nltk
sub-class of native python's collections.counter
, let's see how counter
works:
a counter
dictionary stores elements in list key , counts of elements values:
>>> collections import counter >>> counter(['a','a','b','c','c','c','d']) counter({'c': 3, 'a': 2, 'b': 1, 'd': 1}) >>> c = counter(['a','a','b','c','c','c','d'])
to list of elements sorted frequency, can use .most_common()
function , return tuple of element , count sorted counts.
>>> c.most_common() [('c', 3), ('a', 2), ('b', 1), ('d', 1)]
and in reverse:
>>> list(reversed(c.most_common())) [('d', 1), ('b', 1), ('a', 2), ('c', 3)]
like dictionary can iterate through counter object , return keys:
>>> [key key in c] ['a', 'c', 'b', 'd'] >>> c.keys() ['a', 'c', 'b', 'd']
you can use .items()
function tuple of keys , values:
>>> c.items() [('a', 2), ('c', 3), ('b', 1), ('d', 1)]
alternatively, if need keys sorted counts, see transpose/unzip function (inverse of zip)?:
>>> k, v = zip(*c.most_common()) >>> k ('c', 'a', 'b', 'd')
going question of .get
vs .get()
, former function itself, while latter instance of function requires key of dictionary parameter:
>>> c = counter(['a','a','b','c','c','c','d']) >>> c counter({'c': 3, 'a': 2, 'b': 1, 'd': 1}) >>> c.get <built-in method of counter object @ 0x7f5f95534868> >>> c.get() traceback (most recent call last): file "<stdin>", line 1, in <module> typeerror: expected @ least 1 arguments, got 0 >>> c.get('a') 2
when invoking sorted()
, key=...
parameter inside sorted
function not key of list/dictionary you're sorting key sorted
should use sorting.
so these same, return values of keys:
>>> [c.get(key) key in c] [2, 3, 1, 1] >>> [c[key] key in c] [2, 3, 1, 1]
and when sorting, values used criteria sorting, these achieves same output:
>>> sorted(c, key=c.get) ['b', 'd', 'a', 'c'] >>> v, k = zip(*sorted((c.get(key), key) key in c)) >>> list(k) ['b', 'd', 'a', 'c'] >>> sorted(c, key=c.get, reverse=true) # highest lowest ['c', 'a', 'b', 'd'] >>> v, k = zip(*reversed(sorted((c.get(key), key) key in c))) >>> k ('c', 'a', 'd', 'b')
Comments
Post a Comment