The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:
#library(devtools)
install_github("themains/rdomains")
To install the package from CRAN, type in:
install.packages("rdomains")
Next, load the package:
library(rdomains)
To get category of the content from Shallalist (service discontinued - using archived data), first download the archived data using:
get_shalla_data()
And then, get the category using:
shalla_cat("http://www.google.com")
## domain_name shalla_category
## 1 google.com searchengines
To get category of the content from DMOZ, first download the archived parsed CSV file using:
get_dmoz_data()
And then, get the category using:
dmoz_cat("http://www.google.com")
Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:
adult_ml1_cat("http://www.google.com")
## domain_name category
## 1 google.com 0.3133728
Start by getting the API key from VirusTotal.
The package uses the VirusTotal API v3 for comprehensive domain analysis:
virustotal_cat("http://www.google.com")
Get domain categorization using OpenAI’s GPT models. You’ll need an OpenAI API key:
# Set your API key
Sys.setenv("OPENAI_API_KEY", "your-api-key-here")
# Classify domains
openai_cat("google.com")
## domain_name openai_category
## 1 google.com technology
You can also specify custom categories:
openai_cat(c("amazon.com", "github.com"),
categories = c("ecommerce", "technology", "social", "other"))
Get domain categorization using Anthropic’s Claude models. You’ll need an Anthropic API key:
# Set your API key
Sys.setenv("ANTHROPIC_API_KEY", "your-api-key-here")
# Classify domains
claude_cat("facebook.com")
## domain_name claude_category
## 1 facebook.com social