Trie Data Structure Implementation

Trie Data Structure Implementation using a multi-dimensional array…well really a List of Arrays.

One day I was reading about the Trie data structure in one of my favorite programming books. The – “Competitive Programmer’s Handbook” by Antti Laaksonen which is available for free. This is a great book if you a preparing for a difficult coding interview and already have some experience. The Trie section stated that you could implement it using a multi-dimensional array but gave very little insight into how that might work. I was intrigued so I decided to see if I could find an implementation elsewhere…(read Googled it). After about 20 min of searching I could only find ways to implement it using Node classes. This is not what I was looking for so I decided to work it out for myself.

What the Trie should look like…

If you are unfamiliar with a Trie data structure it is a tree type structure that stores strings in a way that allows for O(N) insert, and O(N) Lookups, and is useful for spell checkers or auto-complete functionality. An example of what we are looking for is below. We mark the Root node with a “^”. We indicate the end of a word with an asterisk “*”.

Trie Data Structure Implementation
Trie Data Structure Implementation

The first thing that stuck out to me was the “Array” implementation mentioned in the book. Since arrays are fixed data structures it implies that I might need to know some info ahead of time. If we look at the example above, we notice that we either need to know what strings we will be storing or at least how many characters. In order to not worry about writing code to dynamically resize the structure I opted to use a List of arrays implementation.

What Methods are Needed

Before we get to the code, lets figure out what methods we will need and what they should return.

  • Need a method to initialize the structure.
  • Need a method to add words to the trie.
  • Need a method to check if a word exists in the trie.
  • Need a method to get all words from trie.
  • Need a method to get all words based on a prefix.

I think this is a good place to start there might be some helper methods used in the process as well but no need to call them out now.

Trie Implementation

First lets start by creating a class and initializing the root node, and thinking about what else is needed. The trie will be storing English characters all in lower case. The Enghish alphabet has 26 letters and we will need one additional space to store the asterisks for total length of 27 characters.

    public class Trie
    {

        List<int[]> _trie = new List<int[]>();
        int TrieRoot = 0;
        int WordEnd = (int)'*';
        int AlphabetLength = 26;

        public Trie(int alphabetLength = 0)
        {
            InitTrie(alphabetLength);
        }

        private void InitTrie(int alphabetLength)
        {
            //initilize the List of arrays we will be using.
            _trie = new List<int[]>();
            if (alphabetLength > 0) this.AlphabetLength = alphabetLength;
            //set the initial element in the list, this will be our root element
            _trie.Add(GetFilledArray());
        }

        private int[] GetFilledArray()
        {
            //create an integer array to map to characters of the string.
            int[] temp = new int[AlphabetLength + 1];
            //fill the array with -1 so we know what characters are in use later on.
            Array.Fill(temp, -1);
            return temp;
        }

    }


A lot of boiler plate initialization code but it should give some idea of how the implementation will progress. First we are using an integer array to store the letters. It uses what I like to call character arithmetic to keep track of what letters have already been stored. If this is unfamiliar; it boils down to the fact that ever character in the ascii table can be represented by an decimal number. Quick example, take letter ‘c’ which has a decimal value of 99 in the ascii table if we subtract ‘a'(decimal 97) from ‘c’ we get 2. This will be the index in the integer array representing the letter ‘c’.

Lets work on adding the AddEntry method

This method will add a word to the data structure.

        
        public void AddEntry(string s)
        {
            //store everything as lower case
            s = s.ToLower();
            int[] root = GetRoot();
            int idx = 0;
            
            while(idx < s.Length)
            {
                //if we have the default value in the array then we have not added the letter yet
                if(root[s[idx] - 'a'] == -1)
                {
                    //add a new entry into the list and fill it with an empty array.
                    _trie.Add(GetFilledArray());
                    //get the amount of nodes currently in the trie
                    int nextIdx = _trie.Count - 1;
                    //set the current letters index to the new node index.
                    root[s[idx] - 'a'] = nextIdx;
                    //update the root node to the newly created node.
                    root = _trie[root[s[idx] - 'a']];
                }else
                {
                    //keep updating the root until we have found a letter that is missing or run to the end.
                    root = _trie[root[s[idx] - 'a']];
                }
                idx++;
            }
            //we finished adding the word so set the end word index
            root[AlphabetLength] = WordEnd;                      
        }

        private int[] GetRoot()
        {
            return _trie[TrieRoot];
        }


The implementation is starting to come together. Words can be added now and it seems that moving through the list is very similar to how navigating a linked list works. However, it’s still unusable in my opinion, just adding words does nothing really, there needs to be a way to get what is stored out.

Getting data out of the Trie

Lets work on that portion. To me since this is a tree structure it seems like recursion might be a good way to tackle this. Thinking thought the algorithm for each node in the tree we need to see if any letters are used. If there are then we need to follow that chain by doing the exact same thing again… the code below sums up this approach.

        public List<string> GetAllWords()
        {
            List<string> words = new List<string>();
            StringBuilder sb = new StringBuilder();
            GetWordsHelper(_trie, GetRoot(), words, sb);
            return words;
        }

        

        public void GetWordsHelper(List<int[]> trie,int[] node, List<string> words, StringBuilder currentWord)
        {
            //if the final node in the array has been set to * then we know all the letters that have been 
            //processed so far make up a word.  Add it to the list, and continue.
            if(node[AlphabetLength] == WordEnd)
            {
                words.Add(currentWord.ToString());
            }
            //skip the last letter (*) in each array.
            for(int i = 0; i < node.Length - 1; i++)
            {
                //only need to look for letters that are currently used.
                if(node[i] != -1)
                {
                    //determine what the original letter was based on the index add it to the current word.
                    currentWord.Append((char)('a' + i));
                    //recurse down following the nodes.
                    GetWordsHelper(trie, getNextNode(node[i]), words, currentWord);
                    //as we are coming back up out of the recursion pop letters off the current word.
                    currentWord = currentWord.Remove(currentWord.Length - 1, 1);
                }
            }
        }

        private int[] getNextNode(int i)
        {
            return _trie[i];
        }

Now that there is a way to get all the words in the trie out. All that is really left is to get words out that have the same prefix. Leveraging what is already there there is a pretty easy way to do this. Just need one additional helper method and a little modification to the GetAllWords() method, updates are below.

Get Words with Prefix

        public List<string> GetAllWords(string prefix)
        {
            List<string> words = new List<string>();
            prefix = prefix.ToLower();
            //instead of an empty stringbuild seed it with the prefix.
            StringBuilder sb = new StringBuilder(prefix);
            //calling the GetEndNodeInPrefix will determine where to start looking in the trie to build words.
            //then just simple recurse down from the node that is returned from GetEndNodeInPrefix.
            GetWordsHelper(_trie, GetEndNodeInPrefix(prefix), words, sb);
            return words;
        }

        private int[] GetEndNodeInPrefix(string prefix)
        {
            int idx = 0;
            //start at the root
            int[] root = GetRoot();
            while(idx < prefix.Length)
            {
                int charIdx = (int)(prefix[idx] -'a');
                //if we can not match any additinal letters from the prefix then just return what we have so far.
                if (root[charIdx] == -1) return root;
                root = getNextNode(root[charIdx]);
                idx++;
            }
            //if we matched all letters then return the next node to start looking at.  
            return root;
        }


And that is a quick and rough setup for a Trie. Disclaimer there are some assumptions being made in the code. Alpha numeric strings will not work there is no check to verify that only lower case English characters are used. There is no end of word verification, meaning that if the word “adds” is added and you call ContainsWord(“add”) its going to return true but you will not see it on the list of words. Full implementation is below for convenience.

Full Implementation Code

    public class Trie
    {

        List<int[]> _trie = new List<int[]>();
        int TrieRoot = 0;
        int WordEnd = (int)'*';
        int AlphabetLength = 26;

        public Trie(int alphabetLength = 0)
        {
            InitTrie(alphabetLength);
        }

        private void InitTrie(int alphabetLength)
        {
            //initilize the List of arrays we will be using.
            _trie = new List<int[]>();
            if (alphabetLength > 0) this.AlphabetLength = alphabetLength;
            //set the initial element in the list, this will be our root element
            _trie.Add(GetFilledArray());
        }


        public void AddEntry(string s)
        {
            //store everything as lower case
            s = s.ToLower();
            int[] root = GetRoot();
            int idx = 0;
            
            while(idx < s.Length)
            {
                //if we have the default value in the array then we have not added the letter yet
                if(root[s[idx] - 'a'] == -1)
                {
                    //add a new entry into the list and fill it with an empty array.
                    _trie.Add(GetFilledArray());
                    //get the amount of nodes currently in the trie
                    int nextIdx = _trie.Count - 1;
                    //set the current letters index to the new node index.
                    root[s[idx] - 'a'] = nextIdx;
                    //update the root node to the newly created node.
                    root = _trie[root[s[idx] - 'a']];
                }else
                {
                    //keep updating the root until we have found a letter that is missing or run to the end.
                    root = _trie[root[s[idx] - 'a']];
                }
                idx++;
            }
            //we finished adding the word so set the end word index
            root[AlphabetLength] = WordEnd;                      
        }

        public bool ContainsWord(string s)
        {
            s = s.ToLower();
            int[] root = GetRoot();
            int sIdx = 0;
            while(sIdx < s.Length)
            {
                int charIdx = s[sIdx] - 'a';
                if (root[charIdx] == -1) return false;
                root = getNextNode(root[charIdx]);
                sIdx++;
            }
            return true;
        }

        public List<string> GetAllWords(string prefix)
        {
            List<string> words = new List<string>();
            prefix = prefix.ToLower();
            //instead of an empty stringbuild seed it with the prefix.
            StringBuilder sb = new StringBuilder(prefix);
            //calling the GetEndNodeInPrefix will determine where to start looking in the trie to build words.
            //then just simple recurse down from the node that is returned from GetEndNodeInPrefix.
            GetWordsHelper(_trie, GetEndNodeInPrefix(prefix), words, sb);
            return words;
        }

        public List<string> GetAllWords()
        {
            List<string> words = new List<string>();
            StringBuilder sb = new StringBuilder();
            GetWordsHelper(_trie, GetRoot(), words, sb);
            return words;
        }

        

        public void GetWordsHelper(List<int[]> trie,int[] node, List<string> words, StringBuilder currentWord)
        {
            //if the final node in the array has been set to * then we know all the letters that have been 
            //processed so far make up a word.  Add it to the list, and continue.
            if(node[AlphabetLength] == WordEnd)
            {
                words.Add(currentWord.ToString());
            }
            for(int i = 0; i < node.Length - 1; i++)
            {
                //only need to look for letters that are currently used.
                if(node[i] != -1)
                {
                    //determine what the original letter was based on the index add it to the current word.
                    currentWord.Append((char)('a' + i));
                    //recurse down following the nodes.
                    GetWordsHelper(trie, getNextNode(node[i]), words, currentWord);
                    //as we are coming back up out of the recursion pop letters off the current word.
                    currentWord = currentWord.Remove(currentWord.Length - 1, 1);
                }
            }
        }

        private int[] GetEndNodeInPrefix(string prefix)
        {
            int idx = 0;
            //start at the root
            int[] root = GetRoot();
            while(idx < prefix.Length)
            {
                int charIdx = (int)(prefix[idx] -'a');
                //if we can not match any additinal letters from the prefix then just return what we have so far.
                if (root[charIdx] == -1) return root;
                root = getNextNode(root[charIdx]);
                idx++;
            }
            //if we matched all letters then return the next node to start looking at.  
            return root;
        }



        private int[] GetFilledArray()
        {
            //create an integer array to map to characters of the string.
            int[] temp = new int[AlphabetLength + 1];
            //fill the array with -1 so we know what characters are in use later on.
            Array.Fill(temp, -1);
            return temp;
        }

        private int[] getNextNode(int i)
        {
            return _trie[i];
        }

        private int[] GetRoot()
        {
            return _trie[TrieRoot];
        }