library(textdata)This package provides infrastructure to make text datasets available within R, even when they are too large to store within an R package or are licensed in such a way that prevents them from being included in OSS-licensed packages.
Do you want to add a new dataset to the textdata package?
prefix_*.R in the R/
folder, where * is the name of the dataset. Supported
prefixes include
dataset_lexicon_download_*(),
process_*() and dataset_*().
download_*() function should take 1 argument named
folder_path. It has 2 tasks, first it should check if the
file is already downloaded. If it is already downloaded it should return
invisible(). If the file isn’t at the path it should
download the file to said path.process_*() function should take 2 arguments,
folder_path and name_path.
folder_path denotes the the path to the file returned by
download_* and name_path is the path to where
the polished data should live. Main point of process_*() is
to turn the downloaded file into a .rds file containing a tidy
tibble.dataset_*() function should wrap the
load_dataset().process_*() function to the named list
process_functions in the file process_functions.R.download_*() function to the named list
download_functions in the file download_functions.R.print_info list in the info.R file.dataset_*.R to the @include tags in
download_functions.R.README.Rmd._pkgdown.yml.NEWS.md file.What are the guidelines for adding datasets?
word instead of words for column
names.For datasets that comes with a testing and training dataset. Let the
user pick which one to retrieve with a split argument
similar to how dataset_ag_news() is doing.