python - Sorting FreqDist in NLTK with get vs get() -


i playing around nltk , module freqdist

import nltk nltk.corpus import gutenberg print(gutenberg.fileids()) nltk import freqdist fd = freqdist()  word in gutenberg.words('austen-persuasion.txt'):     fd[word] += 1  newfd = sorted(fd, key=fd.get, reverse=true)[:10] 

so playing around nltk , have question regarding sort portion. when run code sorts freqdist object. when run get() instead of encounter error

traceback (most recent call last):   file "c:\python34\nlp\nlp.py", line 21, in <module> newfd = sorted(fd, key=fd.get(), reverse=true)[:10] typeerror: expected @ least 1 arguments, got 0 

why right , get() wrong. under impression get() should correct, guess not.

essentially, freqdist object in nltk sub-class of native python's collections.counter, let's see how counter works:

a counter dictionary stores elements in list key , counts of elements values:

>>> collections import counter >>> counter(['a','a','b','c','c','c','d']) counter({'c': 3, 'a': 2, 'b': 1, 'd': 1}) >>> c = counter(['a','a','b','c','c','c','d']) 

to list of elements sorted frequency, can use .most_common() function , return tuple of element , count sorted counts.

>>> c.most_common() [('c', 3), ('a', 2), ('b', 1), ('d', 1)] 

and in reverse:

>>> list(reversed(c.most_common())) [('d', 1), ('b', 1), ('a', 2), ('c', 3)] 

like dictionary can iterate through counter object , return keys:

>>> [key key in c] ['a', 'c', 'b', 'd'] >>> c.keys() ['a', 'c', 'b', 'd'] 

you can use .items() function tuple of keys , values:

>>> c.items() [('a', 2), ('c', 3), ('b', 1), ('d', 1)] 

alternatively, if need keys sorted counts, see transpose/unzip function (inverse of zip)?:

>>> k, v = zip(*c.most_common()) >>> k ('c', 'a', 'b', 'd') 

going question of .get vs .get(), former function itself, while latter instance of function requires key of dictionary parameter:

>>> c = counter(['a','a','b','c','c','c','d']) >>> c counter({'c': 3, 'a': 2, 'b': 1, 'd': 1}) >>> c.get <built-in method of counter object @ 0x7f5f95534868> >>> c.get() traceback (most recent call last):   file "<stdin>", line 1, in <module> typeerror: expected @ least 1 arguments, got 0 >>> c.get('a') 2 

when invoking sorted(), key=... parameter inside sorted function not key of list/dictionary you're sorting key sorted should use sorting.

so these same, return values of keys:

>>> [c.get(key) key in c] [2, 3, 1, 1] >>> [c[key] key in c] [2, 3, 1, 1] 

and when sorting, values used criteria sorting, these achieves same output:

>>> sorted(c, key=c.get) ['b', 'd', 'a', 'c'] >>> v, k = zip(*sorted((c.get(key), key) key in c)) >>> list(k) ['b', 'd', 'a', 'c'] >>> sorted(c, key=c.get, reverse=true) # highest lowest ['c', 'a', 'b', 'd'] >>> v, k = zip(*reversed(sorted((c.get(key), key) key in c))) >>> k ('c', 'a', 'd', 'b') 

Comments

Popular posts from this blog

scala - 'wrong top statement declaration' when using slick in IntelliJ -

c# - DevExpress.Wpf.Grid.InfiniteGridSizeException was unhandled -

PySide and Qt Properties: Connecting signals from Python to QML -