Package {okf}


Title: Open Knowledge Format (OKF) Ingestion
Version: 0.5.2
Description: Read, validate, and load Open Knowledge Format (OKF) bundles (a directory of markdown files with YAML frontmatter) into a portable DuckDB catalog, build the concept graph, render to HTML, and optionally embed concept bodies for semantic search. Deterministic and agent-free: the same bundle always yields the same catalog, graph, and render, with no LLM calls in the core. Conformant and permissive per the OKF v0.1 specification.
License: Apache License (≥ 2)
Encoding: UTF-8
Depends: R (≥ 4.1.0)
Imports: yaml, DBI, duckdb, digest, jsonlite, utils
Suggests: httr2, commonmark, testthat (≥ 3.0.0)
Config/testthat/edition: 3
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2026-06-23 21:20:30 UTC; tsj_j
Author: Travis Jakel [aut, cre]
Maintainer: Travis Jakel <travis.s.jakel@gmail.com>
Repository: CRAN
Date/Publication: 2026-06-30 10:40:07 UTC

okf: Open Knowledge Format (OKF) Ingestion

Description

Read, validate, and load Open Knowledge Format (OKF) bundles into a portable DuckDB catalog, build the concept graph, and optionally embed concept bodies for semantic search. Conformant and permissive per the OKF v0.1 specification.

Author(s)

Maintainer: Travis Jakel travis.s.jakel@gmail.com


Description

Concepts that link to a given concept ("linked from" / backlinks).

Usage

okf_backlinks(con, path)

Arguments

con

An open DuckDB connection to an okf catalog.

path

Bundle-relative concept path.

Value

Character vector of source concept paths (resolved inbound links).


Split a concept body into chunks on paragraph boundaries.

Description

Split a concept body into chunks on paragraph boundaries.

Usage

okf_chunk_body(body, target_chars = 600L)

Arguments

body

Concept body text.

target_chars

Approximate maximum chunk size in characters.

Value

Character vector of chunks.


Deterministic community labels via synchronous label propagation.

Description

Operates on the undirected resolved-link graph. Each node starts in its own community; nodes iteratively adopt the most common label among neighbours, ties broken by the lexicographically smallest label (so the result is fully reproducible – no randomness). Isolated nodes keep their own label.

Usage

okf_clusters(con, max_iter = 50L, include_reserved = FALSE)

Arguments

con

An open DuckDB connection to an okf catalog.

max_iter

Maximum propagation sweeps.

include_reserved

Include reserved concepts ('index.md'/'log.md') as nodes – useful for graph visualization, where 'index.md' is the hub.

Value

A data.frame with 'path' and integer 'cluster' (1-based, stable order).


Assemble an index-first, link-following slice of a bundle as one markdown blob for direct LLM consumption.

Description

This is the OKF / "LLM wiki" consume primitive (Karpathy): hand the agent 'index.md' plus the relevant concept(s) and their link-neighborhood to read directly. It uses the concept graph – **no embeddings, no vector search**. With 'start', it walks the (undirected) link graph from that concept to 'depth'; without 'start', it packs all concepts. Output is capped to roughly 'max_tokens' (estimated at ~4 chars/token).

Usage

okf_context(
  con,
  start = NULL,
  depth = 1L,
  max_tokens = 8000L,
  include_index = TRUE
)

Arguments

con

An open DuckDB connection to an okf catalog.

start

Optional concept path to center the neighborhood on.

depth

Link-graph radius around 'start' (ignored when 'start' is NULL).

max_tokens

Approximate output budget.

include_index

Prepend 'index.md' (the map) when present.

Value

A list with 'text' (the markdown blob), 'included'/'omitted' concept paths, and 'est_tokens'.


Health / maintenance report for an ingested OKF catalog.

Description

Combines the validation findings already stored in the catalog (missing type, broken links, orphans, non-ISO timestamps, ...) with maintenance checks (duplicate titles; and, when 'now' is supplied, future/stale timestamps), and computes a health 'score' = the percentage of non-reserved concepts with zero findings. Fully deterministic.

Usage

okf_doctor(con, now = NULL, stale_days = NULL)

Arguments

con

An open DuckDB connection to an okf catalog.

now

Optional ISO-8601 "current time" enabling stale/future-timestamp checks (kept explicit so the function stays deterministic; the CLI passes the wall clock).

stale_days

Optional integer; with 'now', flag timestamps older than this many days.

Value

A list with 'score', 'n_concepts', 'n_healthy', 'n_error', 'n_warn', 'by_rule' (named counts), and 'issues' (a data.frame of path/severity/rule/ message).


Apply only unambiguously-safe maintenance fixes to a bundle's source files.

Description

Two mechanical, deterministic repairs (never invents content):

Edits files in place. Anything ambiguous is left for [okf_doctor()] to report.

Usage

okf_doctor_fix(root)

Arguments

root

A bundle directory path.

Value

A data.frame of changes ('path', 'kind', 'before', 'after'); zero rows if nothing was safely fixable.


Chunk and embed concept bodies into the catalog for semantic search.

Description

Populates 'okf_chunk' with one row per chunk plus its embedding vector and the concept's 'content_hash'. By default replaces all chunks. With 'incremental = TRUE', only concepts whose 'content_hash' differs from what was last embedded are re-embedded (and removed concepts' chunks are dropped) – the expensive embedder calls are skipped for unchanged concepts.

Usage

okf_embed(con, embedder = NULL, target_chars = 600L, incremental = FALSE)

Arguments

con

An open DuckDB connection to an okf catalog.

embedder

An embedder function; defaults to [okf_ollama_embedder()].

target_chars

Approximate chunk size in characters.

incremental

Re-embed only concepts whose content changed.

Value

The number of chunks (re)written this call (invisibly usable as an integer).


Description

Extract markdown link targets from a concept body (OKF cross-links, sec. 4).

Usage

okf_extract_links(body)

Arguments

body

Concept body text.

Value

Character vector of raw link targets (as written).


Materialize an OKF bundle from a directory, git URL, or tar/zip archive.

Description

Local directories are used in place. Git URLs (github/gitlab/bitbucket, '.git', or 'git@') are shallow-cloned. Tar/zip archives (local path or 'http(s)' URL) are downloaded if remote and extracted. The caller MUST invoke the returned 'cleanup()' when done to remove any temporary files.

Usage

okf_fetch(source, subdir = NULL, branch = NULL)

Arguments

source

A directory path, git URL, or tar/zip path/URL.

subdir

Optional bundle path within the cloned/extracted tree.

branch

Optional git branch or tag (git sources only).

Value

A list with 'dir' (the resolved bundle directory), 'source_kind' ('"dir"'/'"git"'/'"tar"'/'"zip"'), and 'cleanup' (a function).


Render the concept graph as one self-contained interactive HTML page.

Description

A force-directed graph drawn on a '<canvas>' with hand-rolled vanilla JS (no CDN, no framework) – pan, zoom, drag, type-to-search, nodes coloured by community ([okf_clusters()]). Clicking a node navigates to its rendered '.html' (relative), so dropping 'graph.html' into an [okf_html()] site root makes the graph a live map of the site. Fully offline; embeds the node/edge model as JSON.

Usage

okf_graph_html(con, out, site_title = NULL)

Arguments

con

An open DuckDB connection to an okf catalog.

out

Output '.html' file path.

site_title

Optional page title; defaults to the bundle directory name.

Value

The output path (invisibly).


Export the concept graph as portable JSON (nodes and edges).

Description

Returns a JSON object with 'nodes' and 'edges'. Nodes carry 'id' (path), 'type', 'title', 'tags', 'cluster' (from [okf_clusters()]), and 'href' (the rendered '.html' path). Edges are resolved links with 'source' and 'target' fields. Feeds any external graph visualizer – the same "core is a contract" idea as the DuckDB catalog.

Usage

okf_graph_json(con, pretty = TRUE)

Arguments

con

An open DuckDB connection to an okf catalog.

pretty

Pretty-print the JSON.

Value

A JSON string (invisibly also suitable for writing to a file).


Render the concept graph as a Mermaid 'graph LR' diagram.

Description

A text diagram for embedding directly in markdown (READMEs, docs, GitHub renders it natively) – the lightweight complement to the interactive [okf_graph_html()]. Node ids are sanitized; labels are the concept titles.

Usage

okf_graph_mermaid(con)

Arguments

con

An open DuckDB connection to an okf catalog.

Value

A Mermaid diagram as a single string (a ““ “'mermaid ““ block).


Render an ingested OKF catalog to HTML for viewing.

Description

Two modes. As a navigable **site** ('single = FALSE', the default), writes one self-contained ‘.html' per concept under 'out/' (mirroring the bundle’s directory tree) plus an 'index.html' landing page; internal '.md' links are rewritten to '.html'. As a **single file** ('single = TRUE'), writes one self-contained '.html' at path 'out', with each concept an anchored '<section>' and intra-bundle links rewritten to in-page anchors. No JavaScript; CSS is inlined so output is portable. Reserved concepts ('index.md', 'log.md') are rendered too. Bodies are rendered with the commonmark package; broken/orphan links are surfaced in a per-page footer badge from the validation findings.

Usage

okf_html(con, out, single = FALSE, site_title = NULL)

Arguments

con

An open DuckDB connection to an okf catalog (from [okf_ingest()]).

out

Output directory (site mode) or output '.html' file path (single).

single

Emit one self-contained file instead of a per-concept site.

site_title

Optional title for the landing page / single-file header; defaults to the bundle directory name.

Value

A list with 'files' (paths written), 'n_concepts', and 'mode' (invisibly).


Link-impact ("ripple") of a concept.

Description

Reports direct 'outbound' (concepts it links to), direct 'inbound' (concepts linking to it, i.e. backlinks), and 'transitive' – every concept that can reach it by following resolved links (what a change here could ripple to).

Usage

okf_impact(con, path)

Arguments

con

An open DuckDB connection to an okf catalog.

path

Bundle-relative concept path.

Value

A list with 'path', 'outbound', 'inbound', 'transitive' (all sorted character vectors).


Ingest an OKF bundle into a DuckDB catalog.

Description

Reads, validates, and loads the bundle into the 'okf_bundle', 'okf_concept', 'okf_link', and 'okf_validation' tables of a (file or in-memory) DuckDB database.

Usage

okf_ingest(
  root,
  db_path = ":memory:",
  ingested_at = NULL,
  bundle_id = NULL,
  source_kind = "dir",
  subdir = NULL,
  branch = NULL,
  incremental = FALSE
)

Arguments

root

A bundle directory path, a git URL, a tar/zip path or URL, or a bundle list from [okf_read()]. Non-directory sources are fetched via [okf_fetch()] and cleaned up afterwards.

db_path

DuckDB path; defaults to in-memory '":memory:"'.

ingested_at

Optional ISO-8601 timestamp; defaults to the current time.

bundle_id

Optional stable bundle id.

source_kind

How the bundle was obtained (e.g. '"dir"'); auto-set for fetched sources.

subdir

Optional bundle path within a cloned/extracted source.

branch

Optional git branch or tag (git sources only).

incremental

Only rewrite concepts whose 'content_hash' changed since a prior ingest of the same bundle into 'db_path' (added/removed handled too); links and validation are always recomputed (they are graph-global). Falls back to a full load if the bundle is not already present. The 'summary' then includes 'changed'/'added'/'removed'/'cached' counts.

Value

A list with the open 'con', the 'bundle_id', and a 'summary' (counts, conformance, link totals). The caller owns/closes 'con'.


Description

Build the concept graph (resolved and broken links) for a bundle.

Usage

okf_links(rd)

Arguments

rd

A bundle as returned by [okf_read()].

Value

A data.frame with 'src_path', 'dst_raw', 'dst_path', 'resolved'.


Build an embedder backed by a local Ollama embeddings model.

Description

An embedder is a function of 'texts' returning a list of numeric vectors. Swap in any such function (e.g. an OpenAI client) for [okf_embed()] / [okf_rag()].

Usage

okf_ollama_embedder(
  model = "nomic-embed-text",
  url = Sys.getenv("OLLAMA_URL", "http://localhost:11434")
)

Arguments

model

Ollama embedding model name.

url

Ollama base URL (defaults to the 'OLLAMA_URL' env var or localhost).

Value

A function 'texts -> list(numeric)'. Requires the httr2 package.


Parse the YAML frontmatter and body of a single OKF concept file.

Description

Parse the YAML frontmatter and body of a single OKF concept file.

Usage

okf_parse_file(path)

Arguments

path

Path to a markdown file.

Value

A list with 'meta' (parsed frontmatter, or 'NULL'), 'body', and 'err' ('NA' on success, else '"no_frontmatter"', '"unclosed_frontmatter"', or '"yaml_parse_error"').


Query helpers over an ingested OKF catalog.

Description

Query helpers over an ingested OKF catalog.

Usage

okf_concepts(con)

okf_graph_df(con)

okf_findings(con)

okf_search(con, term)

Arguments

con

An open DuckDB connection to an okf catalog.

term

Search term for [okf_search()] (matched against concept bodies).

Value

A data.frame: concepts ([okf_concepts]), link edges ([okf_graph_df]), validation findings ([okf_findings]), or body matches ([okf_search]).


Semantic search over an embedded catalog.

Description

Embeds ‘query' and returns the top-k most cosine-similar chunks (via DuckDB’s native 'list_cosine_similarity'). Run [okf_embed()] first.

Usage

okf_rag(con, query, embedder = NULL, k = 5L)

Arguments

con

An open DuckDB connection to an embedded okf catalog.

query

Query string.

embedder

An embedder function; defaults to [okf_ollama_embedder()].

k

Number of results to return.

Value

A data.frame with 'path', 'title', 'chunk_id', 'score', 'text'.


Read an OKF bundle from a directory into an in-memory representation.

Description

Read an OKF bundle from a directory into an in-memory representation.

Usage

okf_read(root, bundle_id = NULL, source_kind = "dir")

Arguments

root

Path to the bundle directory.

bundle_id

Optional stable id; defaults to a hash of the root path.

source_kind

How the bundle was obtained (e.g. '"dir"').

Value

A list with 'bundle_id', 'root', 'okf_version', 'source_kind', 'concepts' (parsed per-file records), and 'known' (all concept paths).


Description

Resolve a markdown link target to a bundle-relative concept path.

Usage

okf_resolve_link(raw, src_rel, known)

Arguments

raw

Raw link target.

src_rel

Bundle-relative path of the linking concept.

known

Character vector of all known concept paths in the bundle.

Value

The resolved bundle-relative path, or 'NA' if it does not resolve.


Validate a bundle against the OKF v0.1 conformance rules (permissively).

Description

Hard rules (severity 'error'): parseable frontmatter, non-empty 'type'. Soft findings (severity 'warn'): missing recommended fields, non-ISO timestamps, broken links. Never rejects the bundle – returns findings.

Usage

okf_validate(rd)

Arguments

rd

A bundle as returned by [okf_read()].

Value

A data.frame with 'path', 'severity', 'rule', 'message'.