Type: Package
Title: Explorer of Indonesian Population Pyramids from Harmonized and Non-Harmonized Census Data
Version: 1.0.2
Date: 2025-10-09
Description: Provides harmonized and non-harmonized population pyramid datasets from the Indonesian population censuses (1971–2020), along with tools for visualization and an interactive 'shiny'-based explorer application. Data are processed from IPUMS International (1971–2010) and the Population Census 2020 (BPS Indonesia).
License: GPL-3
Depends: R (≥ 4.1)
Imports: shiny (≥ 0.13.0), shinythemes, shinyWidgets, shinyjs, dplyr, tidyr, DT, ggplot2, ggthemes, scales, networkD3
Suggests: tibble
Encoding: UTF-8
RoxygenNote: 7.3.2
URL: https://github.com/aripurwantosp/censuspyrID
BugReports: https://github.com/aripurwantosp/censuspyrID/issues
NeedsCompilation: no
Packaged: 2025-10-09 06:19:49 UTC; ari_prasojo2
Author: Ari Purwanto Sarwo Prasojo ORCID iD [aut, cre], Puguh Prasetyoputra ORCID iD [aut], Nur Fitri Mustika Ayu [aut]
Maintainer: Ari Purwanto Sarwo Prasojo <ari.prasojo18@gmail.com>
Repository: CRAN
Date/Publication: 2025-10-15 19:20:14 UTC

censuspyrID: Explorer of Indonesian Population Pyramids from Harmonized and Non-Harmonized Census Data

Description

Provides harmonized and non-harmonized population pyramid datasets from the Indonesian population censuses (1971–2020), along with tools for visualization and an interactive 'shiny'-based explorer application. Data are processed from IPUMS International (1971–2010) and the Population Census 2020 (BPS Indonesia).

Author(s)

Maintainer: Ari Purwanto Sarwo Prasojo ari.prasojo18@gmail.com (ORCID)

Authors:

See Also

Useful links:


Build Age-Profile Plot by Sex

Description

Create a line plot of population age profiles (5-year age groups) for a given province and year, with optional logarithmic scale. The plot is faceted by sex.

Usage

ageprof(data, log_scale = FALSE, color = "Fresh and bright")

Arguments

data

A data frame of population data for a specific province and year, containing at least the variables: pop (population count), sex (coded as 1 = male, 2 = female), age5 (5-year age groups).

log_scale

Logical; whether to use a logarithmic scale for the Y-axis. Default is FALSE.

color

Character; the name of a Canva color palette available in ggthemes::canva_palettes. Default is "Fresh and bright".

Details

The function produces an age-profile line chart where:

Value

A ggplot2 object representing the age-profile plot, faceted by sex.

See Also

pyr_single(), load_pop_data(), pop_data_by_reg(), pop_data_by_year(), get_code_label()

Examples

## Not run: 
# Example: age profile for Indonesia, 2020
data_idn <- pop_data_by_year(load_pop_data(), 2020) |>
  pop_data_by_reg(0) # Indonesia
ageprof(data_idn)

# Example with log scale
ageprof(data_idn, log_scale = TRUE)

## End(Not run)


Description

This function builds an area plot showing the proportion of population distributed across three broad age groups (young, working-age, old) over census years. The plot can be displayed separately by sex or combined.

Usage

area_trends(data, sex = 1, color = "Fresh and bright")

Arguments

data

A data frame containing population trends data for a specific region over years. Must include variables year, sex, age5, and pop.

sex

Integer indicating which sex to include in the plot:

  • 1 = All sexes

  • 2 = Male

  • 3 = Female

  • 4 = Male+Female

Default is 1 (all sexes).

color

Character string specifying the palette name from ggthemes::canva_palettes. Default is "Fresh and bright".

Details

The function aggregates population into three age groups:

It then calculates the proportion of each age group within each sex and year. The result is plotted as a stacked area chart, optionally faceted by sex.

Value

A ggplot2 object showing the population area trends.

See Also

pyr_trends(), load_pop_data(), pop_data_by_reg(), get_code_label()

Examples

## Not run: 
# Example: area trends for Indonesia
data_idn <- load_pop_data(harmonized = TRUE, smoothing = 1) |>
  pop_data_by_reg(0) #Indonesia
area_trends(data_idn, sex = 1) #All sexes
area_trends(data_idn, sex = 2) #Male
area_trends(data_idn, sex = 3) #Female
area_trends(data_idn, sex = 4) #Male+Female

## End(Not run)


Explore Harmonized and Non-Harmonized Population Pyramids from Indonesia’s Censuses (1971–2020)

Description

Launches censuspyrID Explorer, a Shiny application for visualizing harmonized and non-harmonized population pyramids from Indonesia’s population censuses (1971–2020).

Usage

censuspyrID_explorer(host = NULL, ...)

Arguments

host

Character string passed to runApp. Default is "0.0.0.0".

...

Additional arguments passed to runApp.

Details

The application provides interactive tools to explore demographic structures across provinces and census years. See the Help menu within the application for a navigation guide.

Value

The function launches the Shiny application. It does not return a value.

Examples

## Not run: 
censuspyrID_explorer()

## End(Not run)


Prepare Population Data for Tabular Display

Description

Prepares population data for tabular display (e.g., in reports or Shiny apps). The function reshapes the data by sex, adds total population, and computes the sex ratio, while also attaching province names and labels.

Usage

data_for_table(data, reg_code, harmonized = TRUE)

Arguments

data

A data frame containing population data for a specific province and year. Must include columns: year, province_id, sex, age5, and pop.

reg_code

Integer or character. Province code used to retrieve the province name.

harmonized

Logical. If TRUE (default), province codes are treated as harmonized; if FALSE, non-harmonized codes are used.

Details

The function performs the following steps:

Value

A data frame in wide format with columns:

See Also

load_pop_data(), pop_data_by_year(), get_code_label()

Examples

## Not run: 
data_idn <- pop_data_by_year(load_pop_data(), 2020) |>
  pop_data_by_reg(0) #Indonesia
tab <- data_for_table(data_idn, reg_code = 0, harmonized = TRUE)
head(tab)

## End(Not run)


Retrieve Reference Codes and Labels

Description

This function returns reference tables for codes and labels used in the package. It can provide mappings for census years, sex, age groups, and province codes (harmonized or non-harmonized).

Usage

get_code_label(what = 4)

Arguments

what

Integer indicating which reference table to return:

  • 1 = Census year and label

  • 2 = Sex code and label

  • 3 = Age (5-year group) code and label

  • 4 = Harmonized province code and label

  • 5 = Non-harmonized province code and label

Details

The function retrieves data from internal reference object re_label, which stores standardized coding schemes and their associated labels.

Value

A data frame (or tibble) containing codes and labels for the selected reference category.

Examples

# Get harmonized province codes and labels
get_code_label(4)

# Get sex codes and labels
get_code_label(2)


Check Province Expansion Status

Description

This function checks whether a given province code corresponds to a province that has been expanded (i.e., administratively split or modified).

Usage

is_expanded(reg_code)

Arguments

reg_code

non-harmonized province code (character or numeric).

Details

The function looks up the internal dataset prov_coverage. Expansion status is determined by the field expanded.

Value

A logical value:

See Also

get_code_label()

Examples

# Example: check expansion status of a province
get_code_label(5) #returns list of non-harmonized province code
is_expanded(1400)   # returns TRUE/FALSE for Riau province


Load Population Data

Description

Load census population data with options for harmonization and smoothing. Returns population counts by year, province, sex, and five-year age group, with raw or smoothed estimates depending on the selected method.

Usage

load_pop_data(harmonized = TRUE, smoothing = 1)

Arguments

harmonized

Logical. If TRUE (default), load harmonized data (hpop5). If FALSE, load non-harmonized data (ypop5).

smoothing

Integer. Smoothing method applied to population counts:

  • 1: none (raw)

  • 2: Arriaga

  • 3: Karup–King–Newton (KKN)

Details

Data are retrieved from internal census datasets:

Smoothing methods are applied to the population counts:

Value

A tibble with columns:

See Also

pop_data_by_year(), pop_data_by_reg(), pop5

Examples

## Not run: 
# Load harmonized, raw (unsmoothed) population data
load_pop_data(harmonized = TRUE, smoothing = 1)

# Load non-harmonized, Arriaga-smoothed population data
load_pop_data(harmonized = FALSE, smoothing = 2)

## End(Not run)


Population Counts in 5-Year Age Groups from Indonesian Censuses

Description

Population counts in 5-year age groups at the provincial level (subnational level 1), derived from a series of Indonesian population censuses. Data are available in two versions:

Both datasets are processed from census samples provided by IPUMS International (1971–2010) and the Population Census 2020. Data processing steps include prorating to allocate missing attributes and smoothing using multiple demographic methods (Arriaga and Karup–King–Newton).

Format

Each dataset is a tibble (data frame) with the following variables:

year

Census year.

province_id_h

Harmonized province identifier (in hpop5).

province_id_y

non-harmonized province identifier (in ypop5).

sex

Sex code.

age5

Age group in 5-year intervals.

ns

Unsmoothed population count.

arriaga

Population count smoothed with the Arriaga method.

kkn

Population count smoothed with the Karup–King–Newton method.

Source

Ruggles, S., Cleveland, L., Lovaton, R., Sarkar, S., Sobek, M., Burk, D., Ehrlich, D., Heimann, Q., Lee, J., & Merrill, N. (2025). Integrated Public Use Microdata Series, International: Version 7.7 (dataset). Minneapolis, MN: IPUMS. doi:10.18128/D020.V7.7

Badan Pusat Statistik (BPS). (2020). Jumlah Penduduk Menurut Wilayah, Kelompok Umur, dan Jenis Kelamin, di INDONESIA – Sensus Penduduk 2020. Retrieved September 4, 2025, from https://sensus.bps.go.id/topik/tabular/sp2020/3

References

Ruggles, S., Cleveland, L., Lovaton, R., Sarkar, S., Sobek, M., Burk, D., Ehrlich, D., Heimann, Q., Lee, J., & Merrill, N. (2025). Integrated Public Use Microdata Series, International: Version 7.7 (dataset). Minneapolis, MN: IPUMS. doi:10.18128/D020.V7.7

Badan Pusat Statistik (BPS). (2020). Jumlah Penduduk Menurut Wilayah, Kelompok Umur, dan Jenis Kelamin, di INDONESIA – Sensus Penduduk 2020. Retrieved September 4, 2025, from https://sensus.bps.go.id/topik/tabular/sp2020/3

Siegel, J. S., Swanson, D. A., & Shryock, H. S. (Eds.). (2004). The methods and materials of demography (2nd ed). Elsevier/Academic Press.

Aburto, J. M., Kashnitsky, I., Pascariu, M., & Riffe, T. (2022). Smoothing with DemoTools. Available at: https://timriffe.github.io/DemoTools/articles/smoothing_with_demotools.html#references-1

Examples

library(dplyr)

# Harmonized data
data(hpop5)
glimpse(hpop5)
head(hpop5)

# Non-harmonized data
data(ypop5)
glimpse(ypop5)
head(ypop5)


Filter Population Data by Province

Description

Filter population data based on a specified province ID. This function is intended for use with population datasets loaded via load_pop_data(), but can work with any data frame that includes a province_id column.

Usage

pop_data_by_reg(data, reg)

Arguments

data

A data frame or tibble containing population data. Must include a column named province_id.

reg

Integer or character. The province ID to filter by.

Value

A tibble (or data frame) containing only rows for the specified province.

See Also

load_pop_data(), pop_data_by_year()

Examples

# Load harmonized data
dat <- load_pop_data(harmonized = TRUE, smoothing = 1)

# Filter data for province ID 0 (Indonesia)
pop_data_by_reg(dat, reg = 0)

Filter Population Data by Year

Description

Filter population data for a specific census year. This function is intended for use with population datasets loaded via load_pop_data(), but can work with any data frame that contains a year column.

Usage

pop_data_by_year(data, yr)

Arguments

data

A data frame or tibble containing population data. Must include a column named year.

yr

Integer or numeric. The census year to filter by.

Value

A tibble (or data frame) containing only rows from the specified year.

See Also

load_pop_data(), pop_data_by_reg()

Examples

# Load harmonized data first
dat <- load_pop_data(harmonized = TRUE, smoothing = 1)

# Filter for the 2000 census year
pop_data_by_year(dat, 2000)

Print Population Summary Statistics

Description

Generate and print a formatted summary of population counts, percentages, sex ratio, and dependency ratios from a given dataset of population data for a specific province and year.

Usage

pop_summary(data)

Arguments

data

A data frame of population data for a specific province and year, containing at least the variables: pop (population count), sex (coded as 1 = male, 2 = female), age5 (5-year age groups).

Details

The function calculates:

Results are printed directly to the console in a formatted table.

Value

This function does not return an object. It prints formatted summary statistics to the console.

See Also

load_pop_data(), pop_data_by_reg(), pop_data_by_year(), get_code_label()

Examples

## Not run: 
# Example: population summary for Indonesia, 2020
data_idn <- pop_data_by_year(load_pop_data(), 2020) |>
  pop_data_by_reg(0) # Indonesia
pop_summary(data_idn)

## End(Not run)


Build a Single Population Pyramid

Description

Create a population pyramid for a given dataset (specific province and year), either in absolute counts or in proportions, with customizable color palettes.

Usage

pyr_single(data, use_prop = FALSE, color = "Fresh and bright")

Arguments

data

A data frame containing population data for a specific province and year. Must include variables sex, age5, and pop.

use_prop

Logical, default FALSE. If TRUE, the pyramid will be shown in proportions instead of absolute counts.

color

Character string indicating the color palette name to use for the pyramid. Available palettes come from ggthemes::canva_palettes, e.g., "Fresh and bright".

Value

A ggplot object representing the population pyramid.

See Also

ageprof(), pyr_trends(), load_pop_data(), pop_data_by_reg(), pop_data_by_year(), get_code_label()

Examples

## Not run: 
# Example data for Indonesia, 2020
data_idn <- pop_data_by_year(load_pop_data(), 2020) |>
  pop_data_by_reg(0) #Indonesia

# Absolute count pyramid
pyr_single(data_idn)

# Proportional pyramid with different palette
pyr_single(data_idn, use_prop = TRUE, color = "Professional and modern")

## End(Not run)

Description

Create trend plots of population pyramids over multiple census years for a given region. Users can choose between a grid layout of pyramids or an overlay of age profiles across years.

Usage

pyr_trends(data, mode = 1, use_prop = FALSE, color = "Fresh and bright")

Arguments

data

A data frame of population data for a specific region across census years, containing at least: year, age5, sex, and pop.

mode

Integer; visualization mode: 1 for grid pyramids, 2 for overlayed age profiles. Default is 1.

use_prop

Logical; whether to show proportions instead of absolute counts. Default is FALSE.

color

Character; the name of a Canva color palette available in ggthemes::canva_palettes. Default is "Fresh and bright".

Details

Two visualization modes are available:

Population counts can be displayed either as absolute numbers (default, in thousands) or as proportions (use_prop = TRUE).

Value

A ggplot2 object representing the population pyramid trend plot.

See Also

area_trends(), load_pop_data(), pop_data_by_reg(), get_code_label()

Examples

## Not run: 
# Example: pyramid trends for Indonesia
data_idn <- load_pop_data(harmonized = TRUE, smoothing = 1) |>
  pop_data_by_reg(0) #Indonesia
pyr_trends(data_idn, mode = 1)  # grid layout

# Overlay mode with proportions
pyr_trends(data_idn, mode = 2, use_prop = TRUE)

## End(Not run)


Create region (province) list

Description

Internal helper function to generate a list of provinces, depending on whether harmonized or non-harmonized coding is used.

Usage

reg_list(harmonized = TRUE)

Arguments

harmonized

Logical. If TRUE (default), use harmonized province codes. If FALSE, use non-harmonized province codes.

Details

Value

A named vector, where values are province IDs and names are corresponding province labels.


Get Province Name from Code

Description

Internal helper function to retrieve the province name corresponding to a given province code. Works with either harmonized or non-harmonized codes.

Usage

reg_name(code, harmonized = TRUE)

Arguments

code

Integer or character. The province code to look up.

harmonized

Logical. If TRUE (default), the function searches using harmonized province codes. If FALSE, it uses non-harmonized province codes.

Details

This function relies on the internal object ref_label, which must contain the reference tables:

Value

A character string with the corresponding province name.


Get Smoothing Method Name

Description

Internal helper function to return the name of the smoothing method based on the provided smoothing code.

Usage

smooth_name(smoothing = 1)

Arguments

smoothing

A numeric value indicating the smoothing method:

  • 1 = non-smoothed ("ns")

  • 2 = Arriaga ("arriaga")

  • 3 = Karup–King–Newton ("kkn")

Value

A character string with the name of the smoothing method.


Get Census Year Coverage for a Province

Description

This function determines the range of census years available for a given province. Coverage depends on whether harmonized or non-harmonized codes are used, and in the case of non-harmonized data, whether the province has experienced administrative expansion (pemekaran).

Usage

year_range(reg_code = NULL, harmonized = TRUE, before_expand = TRUE)

Arguments

reg_code

Character or numeric. Province code. Required if harmonized = FALSE.

harmonized

Logical. If TRUE (default), returns harmonized coverage (1971–2020). If FALSE, uses non-harmonized coverage.

before_expand

Logical. Only relevant if harmonized = FALSE and the province has expanded. If TRUE (default), returns coverage before expansion; if FALSE, returns coverage after expansion.

Details

Value

An integer vector of census years, with labels as names.

See Also

is_expanded(), get_code_label()

Examples

## Not run: 
# Harmonized coverage (1971–2020)
year_range(harmonized = TRUE)

# non-harmonized coverage for a province (before expansion)
get_code_label(5) #returns list of non-harmonized province code
year_range(reg_code = 1400, harmonized = FALSE, before_expand = TRUE)

# non-harmonized coverage for a province (after expansion)
year_range(reg_code = 1400, harmonized = FALSE, before_expand = FALSE)

## End(Not run)