Find order of letters
Problem
Find the lexicographic order of letters from given lexicographic order of words.
Problem variation
A stone scripture has been found from some historical ruins. Archeologists guessed it is a dictionary and the words are ordered based on some predefined order of letters. But there is no direct evidence of which letter comes after what. Write an algorithm to find out the ordering of the letters from the ordering of the words.
Solution
From the ordering of any two words we can find ordering of a letter. For example if cat comes before dog, then we can say that c is before d. If cab comes before cat, we can say b comes before t. In this way from n words we can find n*(n-1)/2 combinations of words. From which we can find at most n*(n-1)/2 relations. We can represent these relations as edges of a directed graph where vertices are individual letters.
We represent the graph as an incidence matrix. Any other representation will also do. Now if we do topological sorting of this directed graph, we will get the ordering of the vertices. Which is nothing but the ordering of the letters. For topological sorting we try to find out a vertex which has no inbound edges. That is the vertex with least ordering. Then we remove all the outbound edges from that vertex and continue the process by finding next vertex with no inbound edges. In this way we find the complete ordering of the letters. During the process at any time if we can’t find any vertex with zero inbound edge, that means there is a conflict in the ordering of the words and hence the ordering of the letters can’t be done.
Code
import java.util.HashMap; public class FindOrderOfLetter { public static void main(String[] args) { String[] words = { "car", "cat", "cbr", "deer", "egg", "god", "rabe", "race", "rat", "tar" }; char[] letters = getLetterOrdering(words); if (letters == null) System.out.println("not possible"); else { for (char ch : letters) System.out.print(ch + ","); } } private static char[] getLetterOrdering(String[] words) { HashMap<Character, Integer> characters = new HashMap<Character, Integer>(); for (String word : words) { for (int i = 0; i < word.length(); ++i) { char character = word.charAt(i); if (!characters.keySet().contains(character)) { characters.put(character, characters.size()); } } } boolean[][] adjacency = new boolean[characters.size()] [characters.size()]; for (int i = 0; i < words.length - 1; ++i) { for (int j = i + 1; j < words.length; ++j) { String prevWord = words[i]; String nextWord = words[j]; for (int k = 0; k < Math.min(prevWord.length(), nextWord.length()); ++k) { char prevCharacter = prevWord.charAt(k); char nextCharacter = nextWord.charAt(k); if (prevCharacter != nextCharacter) { adjacency[characters.get(prevCharacter)][characters .get(nextCharacter)] = true; break; } } } } char[] result = new char[characters.size()]; int resultIndex = 0; while (!characters.isEmpty()) { char lowChar = ' '; for (Character nextCharacter : characters.keySet()) { int nextIndex = characters.get(nextCharacter); boolean lowest = true; for (Character prevCharacter : characters.keySet()) { int prevIndex = characters.get(prevCharacter); if (adjacency[prevIndex][nextIndex]) { lowest = false; break; } } if (lowest) { lowChar = nextCharacter; result[resultIndex++] = nextCharacter; break; } } if (lowChar == ' ') { return null; } else { characters.remove(lowChar); lowChar = ' '; } } return result; } }