This weekend on /r/programming someone posted a nice introduction to tries using Python. A while ago, I had implemented a mini web server to do completion and correction of words using Ternary Search Tries in Haskell, and since the trie post generated a lot of interest I decided to post mine too.
Then, someone else posted a blog article commenting on the readability of Python and Haskell based on my web server code and the trie example, concluding that the Python version was much more readable.
I personally prefer writing and reading Haskell, but I think that the comparison is not fair since the data structures compared are very different. Thus, I quickly coded an Haskell version of the code in the original blog post, so that you can compare code that does, more or less, the same thing.
The only difference is that I don’t store the current word in each node, since
it is not necessary—the current word can be kept track of when traversing the
path. Also, I’m using lists instead of sets when returning the completions, but
changing that would be trivial using Data.Set
.
Moreover, the Haskell version has the advantage of working with any list, so for example it would work with list of integers as well as Strings. This could be further abstracted to “unconsable” data types (ListLike offers that and more) but that’s not that relevant here.
So, here’s the original Python code:
"""
A fast data structure for searching strings with autocomplete support.
"""
class Trie(object):
def __init__(self, value=None):
self.children = {}
self.value = value
self.flag = False # Flag to represent that a word ends at this node
def add(self, char):
= self.value + char if self.value else char
val self.children[char] = Trie(val)
def insert(self, word):
= self
node for char in word:
if char not in node.children:
node.add(char)= node.children[char]
node = True
node.flag
def find(self, word):
= self
node for char in word:
if char not in node.children:
return None
= node.children[char]
node return node.value
def all_prefixes(self, wlist):
= set()
results if self.flag:
self.value)
results.add(if not self.children: return results
return reduce(lambda a, b: a | b,
for
[node.all_prefixes() in self.children.values()]) | results
node
def autocomplete(self, prefix):
= self
node for char in prefix:
if char not in node.children:
return set()
= node.children[char]
node return node.all_prefixes()
And this is the Haskell version:
module Trie
Trie
(
, empty
, insert
, find
, completewhere
)
import Data.Map (Map)
import qualified Data.Map as Map
-- | At each node, we store a 'Bool' indicating if we are at the end of
-- a word.
data Trie a = Trie (Map a (Trie a)) Bool
empty :: Ord a => Trie a
= Trie Map.empty False
empty
insert :: Ord a => [a] -> Trie a -> Trie a
Trie tries _) =
insert [] (Trie tries True
@(firstChar : rest) (Trie tries wordEnd) =
insert wordcase Map.lookup firstChar tries of
Nothing ->
Trie (Map.insert firstChar empty tries) wordEnd)
insert word (Just trie ->
Trie (Map.insert firstChar (insert rest trie) tries) wordEnd
find :: Ord a => [a] -> Trie a -> Bool
Trie _ wordEnd) =
find [] (
wordEnd: rest) (Trie tries _) =
find (firstChar maybe False (find rest) (Map.lookup firstChar tries)
complete :: Ord a => [a] -> Trie a -> [[a]]
Trie tries wordEnd) =
complete [] (if wordEnd then [[]] else []) ++
(concat [map (char :) (complete [] trie) | (char, trie) <- Map.toList tries]
: rest) (Trie tries _) =
complete (firstChar maybe [] (map (firstChar :) . complete rest) (Map.lookup firstChar tries)
I didn’t read the Python version too carefully so if I missed something let me know!
You can discuss this post on reddit.
Update: after the reddit discussion, I made the variable names a bit clearer.