WordFrequency.frink

Download or view WordFrequency.frink in plain text format


/** This implements the "Word Frequency" example on Rosetta Code:
     https://rosettacode.org/wiki/Word_frequency

    There are a few interesting things to note:
  
   * Frink has a Unicode-aware function, wordList[str], which intelligently
     enumerates through the words in a string (and correctly handles compound
     words, accented characters, etc.)  It returns words, spaces, and
     punctuation marks.  Results that do not contain alphanumeric characters
     are filtered out.
  
    * The file fetched from Project Gutenberg is supposed to be encoded in
      UTF-8 character encoding, but their servers incorrectly send that it
      is Windows-1252 encoded, so this program fixes that.

    * Frink has a Unicode-aware lowercase function, lc[str] which correctly
      handles accented characters.

    * This program uses the dictionary.increment[key, num] method to help
      count words.
*/


d = new dict
for w = select[wordList[read["https://www.gutenberg.org/files/135/135-0.txt", "UTF-8"]], %r/[[:alnum:]]/ ]
   d.increment[lc[w], 1]

println[joinln[first[reverse[sort[array[d], byColumn[1]]], 10]]]


/** Also see this one-liner that works in newer versions of Frink */

// formatTable[first[countToArray[select[wordList[lc[normalizeUnicode[read["https://www.gutenberg.org/files/135/135-0.txt", "UTF-8"]]]], %r/[[:alnum:]]/ ]], 10], "right"]


Download or view WordFrequency.frink in plain text format


This is a program written in the programming language Frink.
For more information, view the Frink Documentation or see More Sample Frink Programs.

Alan Eliasen was born 19966 days, 13 hours, 35 minutes ago.