---
title: "Introduction to rbiodatacr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to rbiodatacr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  message  = FALSE,
  warning  = FALSE
)
```

## Overview

`rbiodatacr` is an R client for querying
[BIODATACR](https://biodiversidad.go.cr), the national biodiversity
information platform of Costa Rica managed by the Technical Office of
CONAGEBIO. The platform is built on the
[Atlas of Living Australia (ALA)](https://www.ala.org.au/) API
infrastructure.

```{r setup}
library(rbiodatacr)
library(dplyr)
library(sf)
library(ggplot2)
```

---

## 1. Taxonomic search

Before downloading occurrence records, use `bdcr_species_search()` to
verify that the species name is recognized by BIODATACR and to retrieve
its taxonomic identifier (GUID).

```{r species-search}
bdcr_species_search("Panthera onca")
```

The function may return more than one row when both the species and
subspecies are registered. The `guid` column contains the unique
identifier for each taxonomic concept — useful for precise queries.

---

## 2. Counting records

Use `bdcr_count()` to check how many occurrence records are available
before downloading.

```{r count-single}
bdcr_count("Panthera onca")
```

For multiple species at once use `bdcr_count_batch()`, which returns a
tidy tibble with one row per species.

```{r count-batch}
species <- c(
  "Tapirus bairdii",
  "Panthera onca",
  "Ara ambiguus",
  "Bradypus variegatus"
)

conteos <- bdcr_count_batch(species)
conteos
```

---

## 3. Downloading occurrence records

`bdcr_occurrences()` downloads records for a single species and returns
a tibble with 15 fields relevant for biodiversity analysis.

```{r occurrences-single}
df_jaguar <- bdcr_occurrences("Panthera onca", rows = 100)
glimpse(df_jaguar)
```

For multiple species use `bdcr_occurrences_batch()`, which returns a
named list of tibbles — one per species.

```{r occurrences-batch}
spp_with_data <- filter(conteos, n_records >= 10)

lista_occ <- bdcr_occurrences_batch(
  taxa = spp_with_data$taxon,
  rows = 100
)

# Number of records per species
purrr::map_int(lista_occ, nrow)
```

---

## 4. Quality control

`bdcr_quality_check()` adds a `quality_flag` column to the occurrences
tibble. Possible flags are:

| Flag | Condition |
|---|---|
| `"ok"` | No issues detected |
| `"no_coords"` | Missing coordinates |
| `"geospatial_issue"` | `geospatialKosher == FALSE` |
| `"taxonomic_issue"` | `taxonomicKosher == FALSE` |
| `"old_record"` | Year before minimum threshold (default 1950) |

```{r quality-check}
df_qc <- bdcr_quality_check(df_jaguar)

count(df_qc, quality_flag, sort = TRUE)
```

Keep only clean records:

```{r quality-filter}
df_clean <- filter(df_qc, quality_flag == "ok",
                         !is.na(decimalLatitude),
                         !is.na(decimalLongitude))
nrow(df_clean)
```

---

## 5. Mapping occurrence records

Convert the clean tibble to an `sf` object and plot the records over
Costa Rica.

```{r map, fig.width = 7, fig.height = 6}
# Convert to sf
df_sf <- st_as_sf(
  df_clean,
  coords = c("decimalLongitude", "decimalLatitude"),
  crs    = 4326
)

# Load Costa Rica national boundary included in rbiodatacr
# Source: GADM (gadm.org), level 0 = country boundary
data(cr_outline)

# Map
ggplot() +
  geom_sf(data = cr_outline, fill = "gray95", color = "gray50") +
  geom_sf(data = df_sf, color = "#E63946", size = 2, alpha = 0.7) +
  labs(
    title    = "Panthera onca — BIODATACR occurrence records",
    subtitle = paste0(nrow(df_sf), " clean records"),
    caption  = "Source: BIODATACR (biodiversidad.go.cr)",
    x = "Longitude",
    y = "Latitude"
  ) +
  theme_minimal()
```

---

## 6. Complete workflow

```{r workflow}
# 1. Check availability
species <- c("Tapirus bairdii", "Panthera onca",
             "Ara ambiguus",    "Bradypus variegatus")

conteos <- bdcr_count_batch(species)

# 2. Download species with enough data
con_datos <- filter(conteos, n_records >= 10)

lista_occ <- bdcr_occurrences_batch(
  taxa = con_datos$taxon,
  rows = 200
)

# 3. Quality control
lista_limpia <- purrr::map(lista_occ, bdcr_quality_check)

# 4. Consolidate and filter
df_final <- bind_rows(lista_limpia, .id = "taxon") |>
  filter(quality_flag == "ok",
         !is.na(decimalLatitude),
         !is.na(decimalLongitude))

# 5. Summary
df_final |>
  count(taxon, sort = TRUE) |>
  rename(clean_records = n)
```