Tries
This is my first post in the Data Structures category. In this post, I discuss a Python implementation of a widely used, but less widely taught, data structure  trie. For a great introduction to tries, check out this video.
The Data Structure
We begin by creating a class Node
that will represent each node in the trie. Each node has zero or more children.
class Node:
def __init__(self):
self.children = {}
self.is_complete_word = False
self.num_words = 0
The Node
class contains three members:

children
: To store a node’s children, we use a dictionary. This makes it easier to map a node’s value (which is a single character), to an object of classNode
.Tip: Some implementations of trie may store more than one character in a single node to save space. In this implementation, we assume that only a single character will be stored in each node.

is_complete_word
: The booleanis_complete_word
is used to indicate whether a certain node represents a complete word. For example, consider the following trie: This trie contains three complete words  CAT, CAP and SEA  and seven nodes in total. For nodes containing values T, P and A,is_complete_word
will be set toTrue
to indicate that these nodes represent complete words. 
num_words
: The integernum_words
denotes the number of complete words that can be obtained by traversing all children of some node T. For example, for nodeC
in the trie above,num_words = 2
, because we obtain two complete words  CAT and CAP  by traversing all children ofC
. To optimize this retrieval, instead of traversing all children of a node T to find the number of complete words starting with T, we store this information for each node whenever a complete word is inserted into the trie.
Insert Word
To insert a word into our trie data structure, we create a recursive method add_word
. The input to this method is (1) word
 the word to be added, and (2) trie
 the trie in which the word is to be added.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def add_word(word, trie):
if not word:
return trie
elif len(word) == 1: # All letters have now been processed
trie.is_complete_word = True
if word[0] not in trie.children: # Letter doesn't exist, so create
trie.children[word[0]] = Node()
else: # Letter exists, increment word counter
trie.children[word[0]].num_words += 1
add_word(word[1:], trie.children[word[0]]) # Process remaining letters
return trie
Our base condition is if word
is empty, in which case we simply return the input trie
. Line 4 checks if word
only contains a single character, in which case all letters have been processed, so we mark the current node as a complete word. As the video by McDowell describes, when adding a new word into a trie, we examine each character in the word in turn, and traverse the trie depthwise until we find the insertion point where to insert the remaining characters. Line 7 checks if the current character word[0]
does not exist as a child of the current node trie
, and if true, creates a new Node
to represent this character. If, however, the character already exists, we simply increment num_words
to indicate that the number of complete words starting with word[0]
has now increased by one. On line 12, we call add_word
recursively to process the remaining characters. Finally, we return trie
in which the input word
has been added.
Find Partial
The find partial operation enables us to find the number of words that start with a certain combination of letters. For example, in the trie discussed earlier (reproduced below), we may wish to find the number of words starting with CA (hint: the answer is 2).
Recall that, for each node, we store an integer value  num_words
 which represents the number of words starting with the character in that node. This significantly simplifies our find_word
method, shown below.
1
2
3
4
5
6
7
def find_word(word, trie):
if not word:
return trie.num_words
elif word[0] in trie.children:
return find_word(word[1:], trie.children[word[0]])
return 0
As before for add_word
, we write a recursive find_word
method that takes in two parameters: (1) word
 the word to find in the trie, and (2) trie
 the trie to find word
in. The output is an integer representing the number of words starting with word
.
Conclusion
That completes my Python implementation of a basic trie! There are several optimizations that can be applied to this implementation, and this post shall be updated frequently in the near future. Let me know what you think about the implementation in the comments below!
Note: This post is a work in progress, and I may modify and/or refine it over time.
Comments