rdomains: Get the category of content hosted by a domain

Install and Load the package

The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:

#library(devtools)
install_github("themains/rdomains")

To install the package from CRAN, type in:

install.packages("rdomains")

Next, load the package:

library(rdomains)

Shalla

To get category of the content from Shallalist (service discontinued - using archived data), first download the archived data using:

get_shalla_data()

And then, get the category using:

shalla_cat("http://www.google.com")
##   domain_name shalla_category
## 1  google.com   searchengines

DMOZ

To get category of the content from DMOZ, first download the archived parsed CSV file using:

get_dmoz_data()

And then, get the category using:

dmoz_cat("http://www.google.com")

ML

Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:

adult_ml1_cat("http://www.google.com")
##   domain_name  category
## 1  google.com 0.3133728

VirusTotal

Start by getting the API key from VirusTotal.

The package uses the VirusTotal API v3 for comprehensive domain analysis:

virustotal_cat("http://www.google.com")

OpenAI GPT Models

Get domain categorization using OpenAI’s GPT models. You’ll need an OpenAI API key:

# Set your API key
Sys.setenv("OPENAI_API_KEY", "your-api-key-here")

# Classify domains
openai_cat("google.com")
##   domain_name openai_category
## 1  google.com      technology

You can also specify custom categories:

openai_cat(c("amazon.com", "github.com"), 
           categories = c("ecommerce", "technology", "social", "other"))

Anthropic Claude

Get domain categorization using Anthropic’s Claude models. You’ll need an Anthropic API key:

# Set your API key  
Sys.setenv("ANTHROPIC_API_KEY", "your-api-key-here")

# Classify domains
claude_cat("facebook.com")
##   domain_name claude_category
## 1 facebook.com          social