---
title: "Introduction to irpfR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to irpfR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

The ```irpfR``` package provides a high-level interface to access and clean the Personal Income Tax (IRPF) Open Data from the Brazilian Federal Revenue (Receita Federal do Brasil).
This vignette demonstrates the typical workflow: discovering available datasets, understanding their attributes through built-in metadata, and downloading cleaned data for analysis.

## 1. Discovering Available Data

The Brazilian Federal Revenue publishes data in several "sections" (e.g., assets, debts, income brackets). You can list all sections currently supported by the package using ```get_sections()```:

```{r}
library(irpfR)

# List available datasets
sections <- get_sections()
head(sections)
```

## 2. Inspecting Metadata

Government CSV files often have cryptic column names or complex tax definitions. To understand the content of a section before downloading it, use get_metadata():

```{r}
# Get descriptions for the "Assets and Rights" (Bens e Direitos) section
metadata <- get_metadata("bens_e_direitos")
head(metadata)
```

## 3. Downloading and Cleaning Data

The core function of the package is get_irpf(). It performs several automated engineering tasks:

* **Download**: Connects to the RFB servers and handles the file transfer.
* **Encoding**: Corrects UTF-8/Latin1 issues common in Brazilian government files.
* **Tidying**: Converts "wide" tables into a "long" (tidy) format, making them ready for ggplot2 or dplyr.
* **Smart Scaling**: Financial values are converted from millions to absolute BRL, while counts (like number of taxpayers) remain as raw integers.

```{r}
# Download data for "Assets and Rights"
df_bens <- get_irpf("bens_e_direitos")

# The resulting data is tidy
# Columns: ano_calendario, atributo, valor
head(df_bens)
```

## Why use irpfR?
Directly reading raw files from the government portal can be challenging due to inconsistent decimal marks (using commas), non-standard NA characters (like ```-``` or ```*```), and varying numerical scales.```irpfR``` encapsulates all these rules, allowing researchers to focus on the economic analysis rather than data cleaning.

