用Java中的Trie确定电话号码前缀



我正在尝试创建一个快速搜索功能来确定电话号码的前缀。

我正在加载前缀数据从数据库到内存作为TreeMap,其中关键是前缀和值是一个对象包含关于这个前缀(国家等)的信息。

这是TreeMap的填充方式:

private static TreeMap<String, PrefixData> prefixMap = new TreeMap<>();
//EXAMPLE of DB query
    try {
        Class.forName("org.postgresql.Driver");
        Connection connection = DriverManager.getConnection(dbURL, dbUser, dbPass);
        connection.setAutoCommit(false);
        Statement stmt = connection.createStatement();
        ResultSet rs = stmt.executeQuery("SELECT * FROM countries_prefixes");
        //Looping resultset
        while (rs.next()) {
            //TODO Review fields that must be stored in memory
            String country = rs.getString("name");
            //Populating map with data object (keeping nr prefix as key and object as value)
            prefixMap.put(rs.getString("country_prefix"), new PrefixData(country));
        }
        rs.close();
        stmt.close();
        connection.close();
    } catch (ClassNotFoundException | SQLException e) {
        System.out.println(e.toString());
    }

假设我有电话号码,我想检查:37251845632;35844021546;34651478966等等…

有些前缀是1位数长,有些是2位数长,有些是3位数长,等等…

我创建了一个循环,可以运行:

//TODO Try some other search methods (Tries)
    //array for prefix length priority
    int[] sequence = {3, 4, 2, 1, 5, 6};
    //Performing search from the map
    for (int i = 0; i < sequence.length; i++) {
        //Extracting prefix from phone nr
        String prefix = phoneNr.substring(1, sequence[i] + 1);
        if (prefixMap.containsKey(prefix)) {
            PrefixData pdata = prefixMap.get(prefix);
            System.out.println(String.format("Found matching key [%s] in TreeMap", prefix));
            System.out.println(String.format("[NAME: %s] [REGEX: %s] ", pdata.getCountryName(), pdata.getNrRegex()));
            //Validate number format with regex
            if (pdata.getNrRegex().trim() != null && !pdata.getNrRegex().trim().isEmpty()) {
                System.out.println("Regex for number validation is present!");
                if (phoneNr.matches(pdata.getNrRegex().replaceAll("^/|/$", ""))) {
                    System.out.println("NUMBER IS VALID!");
                } else {
                    System.out.println("INVALID NUMBER!");
                }
            }
            return pdata;
        }
    }
    return null;
    }

现在循环运行良好,但速度较慢。我听说过try,它更快,但我不明白如何在我的场景中实现它。

任何帮助都是感激的!

正如我所说,循环可以工作,但这不是实现我的目标的好方法。

所以我做了一点研究,想出了一个解决方案,那就是使用前缀树(Trie)实现。

关于Trie的一些阅读可以在这里找到。

现在是Trie的实现部分。我知道找到已经编写和测试过的代码会更快,所以我在这里找到了Google实现。这里是Vladimir Kroz的实现。

做了一些小的修改,这就是解决方案。我将提供两种解决方案:

<<p> Prefixmap接口/strong>
package tiesImpl;
/**
 * Maps string prefixes to values. For example, if you {@code put("foo", 1)},
 * {@code get("foobar")} returns {@code 1}. Prohibits null values.
 *
 * <p>Use instead of iterating over a series of string prefixes calling
 * {@code String.startsWith(prefix)}.
 *
 * @author crazybob@google.com (Bob Lee)
 * @param <T>
 */
public interface PrefixMap<T> {
  /**
   * Maps prefix to value.
   *
     * @param prefix
     * @param value
   * @return The previous value stored for this prefix, or null if none.
   * @throws IllegalArgumentException if prefix is an empty string.
   */
  T put(CharSequence prefix, T value);
  /**
   * Finds a prefix that matches {@code s} and returns the mapped value.
   *
   * If multiple prefixes in the map match {@code s}, the longest match wins.
   *
     * @param s
   * @return value for prefix matching {@code s} or {@code null} if none match.
   */
  T get(CharSequence s);
}
<<p> PrefixTrie类/strong>
package tiesImpl;
/*
 * Copyright (C) 2007 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.LinkedHashMap;
import java.util.Map;
/**
 * Trie implementation supporting CharSequences as prefixes.
 *
 * Prefixes are sequences of characters, and the set of allowed characters is
 * specified as a range of sequential characters. By default, any seven-bit
 * character may appear in a prefix, and so the trie is a 128-ary tree.
 *
 * @author crazybob@google.com (Bob Lee)
 * @author mharris@google.com (Matthew Harris)
 * @param <T>
 */
public class PrefixTrie<T> implements PrefixMap<T> {
  // The set of allowed characters in prefixes is given by a range of
    // consecutive characters.  rangeOffset denotes the beginning of the range,
    // and rangeSize gives the number of characters in the range, which is used as
    // the number of children of each node.
private final char rangeOffset;
private final int rangeSize;
private final Node<T> root;
/**
 * Constructs a trie for holding strings of seven-bit characters.
 */
public PrefixTrie() {
    rangeOffset = '';
    rangeSize = 128;
    root = new Node<>(rangeSize);
}
/**
 * Constructs a trie for holding strings of characters.
 *
 * The set of characters allowed in prefixes is given by the range
 * [rangeOffset, lastCharInRange], inclusive.
 *
 * @param firstCharInRange
 * @param lastCharInRange
 */
public PrefixTrie(char firstCharInRange, char lastCharInRange) {
    this.rangeOffset = firstCharInRange;
    this.rangeSize = lastCharInRange - firstCharInRange + 1;
    if (rangeSize <= 0) {
        throw new IllegalArgumentException("Char range must include some chars");
    }
    root = new Node<>(rangeSize);
}
/**
 * {@inheritDoc}
 *
 * @param prefix
 * @param value
 * @throws IllegalArgumentException if prefix contains a character outside
 * the range of legal prefix characters.
 */
@Override
public T put(CharSequence prefix, T value) {
    if (value == null) {
        throw new NullPointerException();
    }
    Node<T> current = root;
    for (int i = 0; i < prefix.length(); i++) {
        int nodeIndex = prefix.charAt(i) - rangeOffset;
        try {
            Node<T> next = current.next[nodeIndex];
            if (next == null) {
                next = current.next[nodeIndex] = new Node<>(rangeSize);
            }
            current = next;
        } catch (ArrayIndexOutOfBoundsException e) {
            throw new IllegalArgumentException(
                    "'" + prefix.charAt(i) + "' is not a legal prefix character.");
        }
    }
    T oldValue = current.value;
    current.value = value;
    return oldValue;
}
/**
 * {@inheritDoc}
 * @param s
 */
@Override
public T get(CharSequence s) {
    Node<T> deepestWithValue = root;
    Node<T> current = root;
    for (int i = 0; i < s.length(); i++) {
        int nodeIndex = s.charAt(i) - rangeOffset;
        if (nodeIndex < 0 || rangeSize <= nodeIndex) {
            return null;
        }
        current = current.next[nodeIndex];
        if (current == null) {
            break;
        }
        if (current.value != null) {
            deepestWithValue = current;
        }
    }
    return deepestWithValue.value;
}
/**
 * Returns a Map containing the same data as this structure.
 *
 * This implementation constructs and populates an entirely new map rather
 * than providing a map view on the trie, so this is mostly useful for
 * debugging.
 *
 * @return A Map mapping each prefix to its corresponding value.
 */
public Map<String, T> toMap() {
    Map<String, T> map = newLinkedHashMap();
    addEntries(root, new StringBuilder(), map);
    return map;
}
/**
 * Adds to the given map all entries at or below the given node.
 *
 * @param node
 * @param builder A StringBuilder containing the prefix for the given node.
 * @param map
 */
private void addEntries(Node<T> node,
        StringBuilder builder,
        Map<String, T> map) {
    if (node.value != null) {
        map.put(builder.toString(), node.value);
    }
    for (int i = 0; i < node.next.length; i++) {
        Node<T> next = node.next[i];
        if (next != null) {
            builder.append((char) (i + rangeOffset));
            addEntries(next, builder, map);
            builder.deleteCharAt(builder.length() - 1);
        }
    }
}
private static class Node<T> {
    T value;
    final Node<T>[] next;
    @SuppressWarnings("unchecked")
    Node(int numChildren) {
        next = new Node[numChildren];
    }
}
/**
 * Creates a {@code LinkedHashMap} instance.
 *
 * @param <K>
 * @param <V>
 * @return a newly-created, initially-empty {@code LinkedHashMap}
 */
public static <K, V> LinkedHashMap<K, V> newLinkedHashMap() {
    return new LinkedHashMap<>();
}
}

Vladimir Kroz实现:Trie class

package tiesImpl;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
/**
 * Prefix table based on Trie structure. Allows to perform incremental lookup
 * and match based on search key prefixes (classic example - determine phone
 * area code for given phone number)
 *
 * @param <V> a type of value object to be stored along with prefix (e.g when
 * key is a country name, the value could be a name of the country)
 *
 * @author Vladimir Kroz
 * https://vkroz.wordpress.com/2012/03/23/prefix-table-trie-implementation-in-java/
 */
public class Trie<V> {
Entry<V> entry;
char key;
Map<Character, Trie<V>> children;
public Trie() {
    this.children = new HashMap<>(10);
    entry = new Entry<>();
}
/**
 * non-public, used by _put()
 */
Trie(char key) {
    this.children = new HashMap<>(10);
    this.key = key;
    entry = new Entry<>();
}
public void put(String key, V value) {
    _put(new StringBuffer(key), new StringBuffer(""), value);
}
void _put(StringBuffer remainder, StringBuffer prefix, V value) {
    if (remainder.length() > 0) {
        char keyElement = remainder.charAt(0);
        Trie<V> t = null;
        try {
            t = children.get(keyElement);
        } catch (IndexOutOfBoundsException e) {
        }
        if (t == null) {
            t = new Trie<>(keyElement);
            children.put(keyElement, t);
        }
        prefix.append(remainder.charAt(0));
        t._put(remainder.deleteCharAt(0), prefix, value);
    } else {
        this.entry.value = value;
        this.entry.prefix = prefix.toString();
    }
}
/**
 * Retrieves element from prefix table matching as a prefix to provided
 * key. E.g. if key is "37251656565" and prefix table has node "372" then
 * this call will return the value of "372"
 *
 * @param key a string which starts with prefix to be searched in the table
 * (e.g. phone number)
 * @return an Object associated with matching prefix (i.e if key is a phone
 * number it may return a corresponding country name)
 */
public V get(String key) {
    return _get(new StringBuffer(key), 0);
}
/**
 * Returns true if key has matching prefix in the table
 *
 * @param key
 * @return
 */
public boolean hasPrefix(String key) {
    return this.get(key) != null;
}
V _get(StringBuffer key, int level) {
    if (key.length() > 0) {
        Trie<V> t = children.get(key.charAt(0));
        if (t != null) {
            //FYI: modified code to return deepest with value instead of returning null if prefix doesn't have corresponding value.
            V result = t._get(key.deleteCharAt(0), ++level);
            return result == null ? entry.value : result;
        } else {
            return (level > 0) ? entry.value : null;
        }
    } else {
        return entry.value;
    }
}
@Override
//For debugging
public String toString() {
    Iterator<Character> it = children.keySet().iterator();
    StringBuffer childs = new StringBuffer();
    while (it.hasNext()) {
        Character _key = it.next();
        childs.append(String.format("n%sn",
                //Adding a tab to the beginning of every line to create a visual tree
                String.format("%s: %s", _key, children.get(_key)).replaceAll("(?m)(^)", "t")));
    }
    return String.format("Trie [entry=%s, children=%s]", entry, childs);
}
static public class Entry<V> {
    String prefix;
    V value;
    public Entry() {
    }
    public Entry(String p, V v) {
        prefix = p;
        value = v;
    }
    public String prefix() {
        return prefix;
    }
    public V value() {
        return value;
    }
    @Override
    public String toString() {
        return "Entry [prefix=" + prefix + ", value=" + value + "]";
    }
}
}

最后测试部分

package tiesImpl;
/**
 * Class for testing different implementations of prefix tree (Trie).
 *
 * @author lkallas
 */
public class TriesTest {
private static final PrefixTrie<String> googleTrie = new PrefixTrie<>();
private static final Trie<String> krozTrie = new Trie<>();
public static void main(String[] args) {
    //Inserting prefixes to Google implementation of Trie
    googleTrie.put("7", "Russia");
    googleTrie.put("77", "Abkhazia");
    googleTrie.put("746", "Some unknown country");
    //Inserting prefixes to Vladimir Kroz implementation of Trie
    krozTrie.put("7", "Russia");
    krozTrie.put("77", "Abkhazia");
    krozTrie.put("746", "Some unknown country");
    System.out.println("Google version of get: " + googleTrie.get("745878787"));
    System.out.println("Vladimir Kroz version of get: " + krozTrie.get("745878787"));
}
}

希望这个答案对其他人也有所帮助!干杯!

最新更新