Class SymSpell
java.lang.Object
org.languagetool.rules.spelling.symspell.implementation.SymSpell
- All Implemented Interfaces:
Serializable
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) classstatic enum -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate intprivate longprivate static intprivate static intprivate static intprivate static intprivate static intprivate EditDistance.DistanceAlgorithmprivate intprivate intprivate intprivate static longprivate int -
Constructor Summary
ConstructorsConstructorDescriptionSymSpell(int initialCapacity, int maxDictionaryEditDistance, int prefixLength, int countThreshold) Create a new instanc of SymSpell.SymSpell. Specifying ann accurate initialCapacity is not essential, but it can help speed up processing by aleviating the need for data restructuring as the size grows. The expected number of words in dictionary. Maximum edit distance for doing lookups. The length of word prefixes used for spell checking.. The minimum frequency count for dictionary words to be considered correct spellings. Degree of favoring lower memory use over speed (0=fastest,most memory, 16=slowest,least memory). -
Method Summary
Modifier and TypeMethodDescriptionvoidcommitStaged(SuggestionStage staging) Commit staged dictionary additions. Used when you write your own process to load multiple words into the dictionary, and as part of that process, you first created a SuggestionsStage object, and passed that to createDictionaryEntry calls. The SymSpell.SuggestionStage object storing the staged data.booleancreateDictionary(String corpus) Load multiple dictionary words from a file containing plain text. The path+filename of the file.True if file loaded, or false if file not found. booleancreateDictionaryEntry(String key, long count, SuggestionStage staging) Create/Update an entry in the dictionary. For every word there are deletes with an edit distance of 1..maxEditDistance created and added to the dictionary. private booleandeleteInSuggestionPrefix(String delete, int deleteLen, String suggestion, int suggestionLen) editsPrefix(String key) private intbooleanloadDictionary(BufferedReader br, int termIndex, int countIndex) Load multiple dictionary entries from an buffered reader of word/frequency count pairs Merges with any dictionary data already loaded. An buffered reader to dictionary data. The column position of the word. The column position of the frequency count.True if file loaded, or false if file not found. booleanloadDictionary(InputStream corpus, int termIndex, int countIndex) Load multiple dictionary entries from an input stream of word/frequency count pairs Merges with any dictionary data already loaded. This is useful for loading the dictionary data from an asset file in Android. An input stream to dictionary data. The column position of the word. The column position of the frequency count.True if file loaded, or false if file not found. booleanloadDictionary(String corpus, int termIndex, int countIndex) Load multiple dictionary entries from a file of word/frequency count pairs Merges with any dictionary data already loaded. The path+filename of the file. The column position of the word. The column position of the frequency count.True if file loaded, or false if file not found. lookup(String input, SymSpell.Verbosity verbosity) Find suggested spellings for a given input word, using the maximum edit distance specified during construction of the SymSpell.SymSpell dictionary. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions.A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency. lookup(String input, SymSpell.Verbosity verbosity, int maxEditDistance) Find suggested spellings for a given input word. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions. The maximum edit distance between input and suggested words.A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency. lookupCompound(String input) lookupCompound(String input, int maxEditDistance) private String[]parseWords(String text) voidwordSegmentation(String input) Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked.The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is). wordSegmentation(String input, int maxEditDistance) Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum edit distance between input and corrected words (0=no correction/segmentation only).The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is). wordSegmentation(String input, int maxEditDistance, int maxSegmentationWordLength) Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum word length that should be considered. The maximum edit distance between input and corrected words (0=no correction/segmentation only).The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).
-
Field Details
-
defaultMaxEditDistance
private static int defaultMaxEditDistance -
defaultPrefixLength
private static int defaultPrefixLength -
defaultCountThreshold
private static int defaultCountThreshold -
defaultInitialCapacity
private static int defaultInitialCapacity -
defaultCompactLevel
private static int defaultCompactLevel -
initialCapacity
private int initialCapacity -
maxDictionaryEditDistance
private int maxDictionaryEditDistance -
prefixLength
private int prefixLength -
countThreshold
private long countThreshold -
compactMask
private int compactMask -
distanceAlgorithm
-
maxLength
private int maxLength -
deletes
-
words
-
belowThresholdWords
-
N
private static long N
-
-
Constructor Details
-
SymSpell
public SymSpell(int initialCapacity, int maxDictionaryEditDistance, int prefixLength, int countThreshold) Create a new instanc of SymSpell.SymSpell. Specifying ann accurate initialCapacity is not essential, but it can help speed up processing by aleviating the need for data restructuring as the size grows. The expected number of words in dictionary. Maximum edit distance for doing lookups. The length of word prefixes used for spell checking.. The minimum frequency count for dictionary words to be considered correct spellings. Degree of favoring lower memory use over speed (0=fastest,most memory, 16=slowest,least memory).
-
-
Method Details
-
createDictionaryEntry
Create/Update an entry in the dictionary. For every word there are deletes with an edit distance of 1..maxEditDistance created and added to the dictionary. Every delete entry has a suggestions list, which points to the original term(s) it was created from. The dictionary may be dynamically updated (word frequency and new words) at any time by calling createDictionaryEntry The word to add to dictionary. The frequency count for word. Optional staging object to speed up adding many entries by staging them to a temporary structure.True if the word was added as a new correctly spelled word, or false if the word is added as a below threshold word, or updates an existing correctly spelled word. -
loadDictionary
Load multiple dictionary entries from a file of word/frequency count pairs Merges with any dictionary data already loaded. The path+filename of the file. The column position of the word. The column position of the frequency count.True if file loaded, or false if file not found. -
loadDictionary
Load multiple dictionary entries from an input stream of word/frequency count pairs Merges with any dictionary data already loaded. This is useful for loading the dictionary data from an asset file in Android. An input stream to dictionary data. The column position of the word. The column position of the frequency count.True if file loaded, or false if file not found. -
loadDictionary
Load multiple dictionary entries from an buffered reader of word/frequency count pairs Merges with any dictionary data already loaded. An buffered reader to dictionary data. The column position of the word. The column position of the frequency count.True if file loaded, or false if file not found. -
createDictionary
Load multiple dictionary words from a file containing plain text. The path+filename of the file.True if file loaded, or false if file not found. -
purgeBelowThresholdWords
public void purgeBelowThresholdWords() -
commitStaged
Commit staged dictionary additions. Used when you write your own process to load multiple words into the dictionary, and as part of that process, you first created a SuggestionsStage object, and passed that to createDictionaryEntry calls. The SymSpell.SuggestionStage object storing the staged data. -
lookup
Find suggested spellings for a given input word, using the maximum edit distance specified during construction of the SymSpell.SymSpell dictionary. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions.A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency. -
lookup
Find suggested spellings for a given input word. The word being spell checked. The value controlling the quantity/closeness of the retuned suggestions. The maximum edit distance between input and suggested words.A List of SymSpell.SuggestItem object representing suggested correct spellings for the input word, sorted by edit distance, and secondarily by count frequency. -
lookupCompound
-
lookupCompound
-
deleteInSuggestionPrefix
-
parseWords
-
edits
-
editsPrefix
-
getStringHash
-
wordSegmentation
Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked.The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is). -
wordSegmentation
Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum edit distance between input and corrected words (0=no correction/segmentation only).The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is). -
wordSegmentation
public SymSpell.SegmentedSuggestion wordSegmentation(String input, int maxEditDistance, int maxSegmentationWordLength) Find suggested spellings for a multi-word input String (supports word splitting/merging). The String being spell checked. The maximum word length that should be considered. The maximum edit distance between input and corrected words (0=no correction/segmentation only).The word segmented String, the word segmented and spelling corrected String, the Edit distance sum between input String and corrected String, the Sum of word occurence probabilities in log scale (a measure of how common and probable the corrected segmentation is).
-