Tries

8 minute read

This is my first post in the Data Structures category. In this post, I discuss a Python implementation of a widely used, but less widely taught, data structure - trie. For a great introduction to tries, check out this video.

The Data Structure

We begin by creating a class Node that will represent each node in the trie. Each node has zero or more children.

class Node:
  def __init__(self):
    self.children = {}
    self.is_complete_word = False
    self.num_words = 0

The Node class contains three members:

  1. children: To store a node’s children, we use a dictionary. This makes it easier to map a node’s value (which is a single character), to an object of class Node.

    Tip: Some implementations of trie may store more than one character in a single node to save space. In this implementation, we assume that only a single character will be stored in each node.

  2. is_complete_word: The boolean is_complete_word is used to indicate whether a certain node represents a complete word. For example, consider the following trie: This trie contains three complete words - CAT, CAP and SEA - and seven nodes in total. For nodes containing values T, P and A, is_complete_word will be set to True to indicate that these nodes represent complete words.

  3. num_words: The integer num_words denotes the number of complete words that can be obtained by traversing all children of some node T. For example, for node C in the trie above, num_words = 2, because we obtain two complete words - CAT and CAP - by traversing all children of C. To optimize this retrieval, instead of traversing all children of a node T to find the number of complete words starting with T, we store this information for each node whenever a complete word is inserted into the trie.

Insert Word

To insert a word into our trie data structure, we create a recursive method add_word. The input to this method is (1) word - the word to be added, and (2) trie - the trie in which the word is to be added.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def add_word(word, trie):
  if not word:
    return trie
  elif len(word) == 1: # All letters have now been processed
    trie.is_complete_word = True

  if word[0] not in trie.children: # Letter doesn't exist, so create
    trie.children[word[0]] = Node()
  else: # Letter exists, increment word counter
    trie.children[word[0]].num_words += 1

  add_word(word[1:], trie.children[word[0]]) # Process remaining letters

  return trie

Our base condition is if word is empty, in which case we simply return the input trie. Line 4 checks if word only contains a single character, in which case all letters have been processed, so we mark the current node as a complete word. As the video by McDowell describes, when adding a new word into a trie, we examine each character in the word in turn, and traverse the trie depthwise until we find the insertion point where to insert the remaining characters. Line 7 checks if the current character word[0] does not exist as a child of the current node trie, and if true, creates a new Node to represent this character. If, however, the character already exists, we simply increment num_words to indicate that the number of complete words starting with word[0] has now increased by one. On line 12, we call add_word recursively to process the remaining characters. Finally, we return trie in which the input word has been added.

Find Partial

The find partial operation enables us to find the number of words that start with a certain combination of letters. For example, in the trie discussed earlier (reproduced below), we may wish to find the number of words starting with CA (hint: the answer is 2).

Recall that, for each node, we store an integer value - num_words - which represents the number of words starting with the character in that node. This significantly simplifies our find_word method, shown below.

1
2
3
4
5
6
7
def find_word(word, trie):
  if not word:
    return trie.num_words
  elif word[0] in trie.children:
    return find_word(word[1:], trie.children[word[0]])

  return 0

As before for add_word, we write a recursive find_word method that takes in two parameters: (1) word - the word to find in the trie, and (2) trie - the trie to find word in. The output is an integer representing the number of words starting with word.

Conclusion

That completes my Python implementation of a basic trie! There are several optimizations that can be applied to this implementation, and this post shall be updated frequently in the near future. Let me know what you think about the implementation in the comments below!

Note: This post is a work in progress, and I may modify and/or refine it over time.

Comments