Class SynthDictionaryBuilder
java.lang.Object
org.languagetool.tools.DictionaryBuilder
org.languagetool.tools.SynthDictionaryBuilder
Create a Morfologik binary synthesizer dictionary from plain text data.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final StringIt makes sense to remove all forms from the synthesizer dict where POS tags indicate "unknown form", "foreign word" etc., as they only take space. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) FilecollectTags(File plainTextDictFile) getIgnoreItems(File file) private @Nullable PatterngetPosTagIgnoreRegex(File infoFile) private FilegetTagFile(File tempFile) static voidprivate FilereverseLineContent(File plainTextDictFile, Set<String> itemsToBeIgnored, Pattern ignorePosRegex) private voidwritePosTagsToFile(File plainTextDictFile, File tagFile) Methods inherited from class DictionaryBuilder
addFreqData, buildDict, buildFSA, convertTabToSeparator, getOption, getOutputFilename, hasOption, isOptionTrue, readFreqList, setOutputFilename
-
Field Details
-
POLISH_IGNORE_REGEX
It makes sense to remove all forms from the synthesizer dict where POS tags indicate "unknown form", "foreign word" etc., as they only take space. Probably nobody will ever use them:- See Also:
-
-
Constructor Details
-
SynthDictionaryBuilder
SynthDictionaryBuilder(File infoFile) throws IOException - Throws:
IOException
-
-
Method Details
-
main
-
build
-
getIgnoreItems
- Throws:
FileNotFoundException
-
getPosTagIgnoreRegex
-
reverseLineContent
private File reverseLineContent(File plainTextDictFile, Set<String> itemsToBeIgnored, Pattern ignorePosRegex) throws IOException - Throws:
IOException
-
getTagFile
-
writePosTagsToFile
- Throws:
IOException
-
collectTags
- Throws:
IOException
-