Class SymSpell

java.lang.Object
org.languagetool.rules.spelling.symspell.implementation.SymSpell
All Implemented Interfaces:
Serializable

public class SymSpell extends Object implements Serializable
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    (package private) class 
     
    static enum 
     
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private Map<String,Long>
     
    private int
     
    private long
     
    private static int
     
    private static int
     
    private static int
     
    private static int
     
    private static int
     
    private Map<Integer,String[]>
     
     
    private int
     
    private int
     
    private int
     
    private static long
     
    private int
     
    private Map<String,Long>
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    SymSpell(int initialCapacity, int maxDictionaryEditDistance, int prefixLength, int countThreshold)
    Create a new instanc of SymSpell.SymSpell. Specifying ann accurate initialCapacity is not essential, but it can help speed up processing by aleviating the need for data restructuring as the size grows. The expected number of words in dictionary. Maximum edit distance for doing lookups. The length of word prefixes used for spell checking.. The minimum frequency count for dictionary words to be considered correct spellings. Degree of favoring lower memory use over speed (0=fastest,most memory, 16=slowest,least memory).
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Commit staged dictionary additions. Used when you write your own process to load multiple words into the dictionary, and as part of that process, you first created a SuggestionsStage object, and passed that to createDictionaryEntry calls. The SymSpell.SuggestionStage object storing the staged data.
    boolean
    Load multiple dictionary words from a file containing plain text. The path+filename of the file. True if file loaded, or false if file not found.
    boolean
    createDictionaryEntry(String key, long count, SuggestionStage staging)
    Create/Update an entry in the dictionary. For every word there are deletes with an edit distance of 1..maxEditDistance created and added to the dictionary.
    private boolean
    deleteInSuggestionPrefix(String delete, int deleteLen, String suggestion, int suggestionLen)
     
    private HashSet<String>
    edits(String word, int editDistance, HashSet<String> deleteWords)
     
    private HashSet<String>
     
    private int
     
    boolean
    loadDictionary(BufferedReader br, int termIndex, int countIndex)
    Load multiple dictionary entries from an buffered reader of word/frequency count pairs Merges with any dictionary data already loaded. An buffered reader to dictionary data. The column position of the word. The column position of the frequency count. True if file loaded, or false if file not found.
    boolean
    loadDictionary(InputStream corpus, int termIndex, int countIndex)
    Load multiple dictionary entries from an input stream of word/frequency count pairs Merges with any dictionary data already loaded. This is useful for loading the dictionary data from an asset file in Android. An input stream to dictionary data. The column position of the word. The column position of the frequency count. True if file loaded, or false if file not found.
    boolean
    loadDictionary(String corpus, int termIndex, int countIndex)
    Load multiple dictionary entries from a file of word/frequency count pairs Merges with any dictionary data already loaded. The path+filename of the file. The column position of the word. The column position of the frequency count. True if file loaded, or false if file not found.
    lookup(String input, SymSpell.Verbosity verbosity)
    Find suggested spellings for a given input word, using the maximum edit distance specified during construction of the SymSpell.SymSpell dictionary. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions. A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency.
    lookup(String input, SymSpell.Verbosity verbosity, int maxEditDistance)
    Find suggested spellings for a given input word. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions. The maximum edit distance between input and suggested words. A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency.
     
    lookupCompound(String input, int maxEditDistance)
     
    private String[]
     
    void
     
    Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).
    wordSegmentation(String input, int maxEditDistance)
    Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum edit distance between input and corrected words (0=no correction/segmentation only). The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).
    wordSegmentation(String input, int maxEditDistance, int maxSegmentationWordLength)
    Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum word length that should be considered. The maximum edit distance between input and corrected words (0=no correction/segmentation only). The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • defaultMaxEditDistance

      private static int defaultMaxEditDistance
    • defaultPrefixLength

      private static int defaultPrefixLength
    • defaultCountThreshold

      private static int defaultCountThreshold
    • defaultInitialCapacity

      private static int defaultInitialCapacity
    • defaultCompactLevel

      private static int defaultCompactLevel
    • initialCapacity

      private int initialCapacity
    • maxDictionaryEditDistance

      private int maxDictionaryEditDistance
    • prefixLength

      private int prefixLength
    • countThreshold

      private long countThreshold
    • compactMask

      private int compactMask
    • distanceAlgorithm

      private EditDistance.DistanceAlgorithm distanceAlgorithm
    • maxLength

      private int maxLength
    • deletes

      private Map<Integer,String[]> deletes
    • words

      private Map<String,Long> words
    • belowThresholdWords

      private Map<String,Long> belowThresholdWords
    • N

      private static long N
  • Constructor Details

    • SymSpell

      public SymSpell(int initialCapacity, int maxDictionaryEditDistance, int prefixLength, int countThreshold)
      Create a new instanc of SymSpell.SymSpell. Specifying ann accurate initialCapacity is not essential, but it can help speed up processing by aleviating the need for data restructuring as the size grows. The expected number of words in dictionary. Maximum edit distance for doing lookups. The length of word prefixes used for spell checking.. The minimum frequency count for dictionary words to be considered correct spellings. Degree of favoring lower memory use over speed (0=fastest,most memory, 16=slowest,least memory).
  • Method Details

    • createDictionaryEntry

      public boolean createDictionaryEntry(String key, long count, SuggestionStage staging)
      Create/Update an entry in the dictionary. For every word there are deletes with an edit distance of 1..maxEditDistance created and added to the dictionary. Every delete entry has a suggestions list, which points to the original term(s) it was created from. The dictionary may be dynamically updated (word frequency and new words) at any time by calling createDictionaryEntry The word to add to dictionary. The frequency count for word. Optional staging object to speed up adding many entries by staging them to a temporary structure. True if the word was added as a new correctly spelled word, or false if the word is added as a below threshold word, or updates an existing correctly spelled word.
    • loadDictionary

      public boolean loadDictionary(String corpus, int termIndex, int countIndex)
      Load multiple dictionary entries from a file of word/frequency count pairs Merges with any dictionary data already loaded. The path+filename of the file. The column position of the word. The column position of the frequency count. True if file loaded, or false if file not found.
    • loadDictionary

      public boolean loadDictionary(InputStream corpus, int termIndex, int countIndex)
      Load multiple dictionary entries from an input stream of word/frequency count pairs Merges with any dictionary data already loaded. This is useful for loading the dictionary data from an asset file in Android. An input stream to dictionary data. The column position of the word. The column position of the frequency count. True if file loaded, or false if file not found.
    • loadDictionary

      public boolean loadDictionary(BufferedReader br, int termIndex, int countIndex)
      Load multiple dictionary entries from an buffered reader of word/frequency count pairs Merges with any dictionary data already loaded. An buffered reader to dictionary data. The column position of the word. The column position of the frequency count. True if file loaded, or false if file not found.
    • createDictionary

      public boolean createDictionary(String corpus)
      Load multiple dictionary words from a file containing plain text. The path+filename of the file. True if file loaded, or false if file not found.
    • purgeBelowThresholdWords

      public void purgeBelowThresholdWords()
    • commitStaged

      public void commitStaged(SuggestionStage staging)
      Commit staged dictionary additions. Used when you write your own process to load multiple words into the dictionary, and as part of that process, you first created a SuggestionsStage object, and passed that to createDictionaryEntry calls. The SymSpell.SuggestionStage object storing the staged data.
    • lookup

      public List<SuggestItem> lookup(String input, SymSpell.Verbosity verbosity)
      Find suggested spellings for a given input word, using the maximum edit distance specified during construction of the SymSpell.SymSpell dictionary. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions. A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency.
    • lookup

      public List<SuggestItem> lookup(String input, SymSpell.Verbosity verbosity, int maxEditDistance)
      Find suggested spellings for a given input word. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions. The maximum edit distance between input and suggested words. A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency.
    • lookupCompound

      public List<SuggestItem> lookupCompound(String input, int maxEditDistance)
    • lookupCompound

      public List<SuggestItem> lookupCompound(String input)
    • deleteInSuggestionPrefix

      private boolean deleteInSuggestionPrefix(String delete, int deleteLen, String suggestion, int suggestionLen)
    • parseWords

      private String[] parseWords(String text)
    • edits

      private HashSet<String> edits(String word, int editDistance, HashSet<String> deleteWords)
    • editsPrefix

      private HashSet<String> editsPrefix(String key)
    • getStringHash

      private int getStringHash(String s)
    • wordSegmentation

      public SymSpell.SegmentedSuggestion wordSegmentation(String input)
      Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).
    • wordSegmentation

      public SymSpell.SegmentedSuggestion wordSegmentation(String input, int maxEditDistance)
      Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum edit distance between input and corrected words (0=no correction/segmentation only). The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).
    • wordSegmentation

      public SymSpell.SegmentedSuggestion wordSegmentation(String input, int maxEditDistance, int maxSegmentationWordLength)
      Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum word length that should be considered. The maximum edit distance between input and corrected words (0=no correction/segmentation only). The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).