This package provides infrastructure to make text datasets available within R, even when they are too large to store within an R package or are licensed in such a way that prevents them from being included in OSS-licensed packages.
Do you want to add a new dataset to the textdata package?
prefix_*.R in the R/ folder, where * is the name of the dataset. Supported prefixes include
dataset_lexicon_download_*(), process_*() and dataset_*().
download_*() function should take 1 argument named folder_path. It has 2 tasks, first it should check if the file is already downloaded. If it is already downloaded it should return invisible(). If the file isn’t at the path it should download the file to said path.process_*() function should take 2 arguments, folder_path and name_path. folder_path denotes the the path to the file returned by download_* and name_path is the path to where the polished data should live. Main point of process_*() is to turn the downloaded file into a .rds file containing a tidy tibble.dataset_*() function should wrap the load_dataset().process_*() function to the named list process_functions in the file process_functions.R.download_*() function to the named list download_functions in the file download_functions.R.print_info list in the info.R file.dataset_*.R to the @include tags in download_functions.R.README.Rmd._pkgdown.yml.NEWS.md file.What are the guidelines for adding datasets?
word instead of words for column names.For datasets that comes with a testing and training dataset. Let the user pick which one to retrieve with a split argument similar to how dataset_ag_news() is doing.