| Type: | Package |
| Title: | Beta Factor Model |
| Version: | 0.2.11 |
| Author: | Guangbao Guo [aut, cre], Jiahui Feng [aut] |
| Maintainer: | Guangbao Guo <ggb11111111@163.com> |
| Description: | Provides tools for factor analysis in financial and econometric settings under Beta factor models. It includes functions to simulate factor-model data with Beta-distributed idiosyncratic components (e.g., standard Beta, scaled Beta, and truncated Beta distributions) and to conduct model diagnostic assessments such as likelihood ratio tests for factor number selection and goodness-of-fit tests for Beta distribution assumptions. Estimation routines encompass maximum likelihood estimation for finite-dimensional Beta factor models, regularized Beta factor analysis for high-dimensional datasets, and shrinkage-based estimation for robust Beta factor loading recovery in noisy or incomplete data environments. The package's methodological framework is detailed in Guo G. (2023) <doi:10.1007/s00180-022-01270-z>. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.5.0) |
| Suggests: | testthat (≥ 3.0.0), spelling, betareg, zoib |
| NeedsCompilation: | no |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| Imports: | MASS, psych, stats |
| Language: | en-US |
| Packaged: | 2026-05-12 12:44:40 UTC; Administrator |
| Repository: | CRAN |
| Date/Publication: | 2026-05-18 18:20:13 UTC |
California Alcohol Use Data
Description
A county-level monthly alcohol use dataset from California students (grades 7-11, 2008-2010).
The response variable Percentage is a proportion (0 < Percentage < 1), suitable for zero-inflated beta regression.
Usage
AlcoholUse
Format
A data frame with multiple rows and variables:
- Percentage
numeric: percentage of students who drank alcohol
- Grade
factor: student grade level
- Gender
factor: student gender
- MedDays
numeric: mid-point of days bucket
- Days
numeric: days bucket
- County
factor: county identifier
A data frame with 44 rows and 4 variables:
- accuracy
numeric: proportion of correct responses in a reading task
- accuracy1
numeric: transformed accuracy measure
- dyslexia
factor: dyslexia status (levels: "yes", "no")
- iq
numeric: IQ score
Source
http://www.kidsdata.org Reading Skills Data
A dataset from Smithson and Verkuilen (2006) on reading accuracy, dyslexia status, and IQ scores.
The response variable accuracy is a proportion (0 < accuracy < 1), suitable for beta regression.
Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. https://psycnet.apa.org/doi/10.1037/1082-989X.11.1.54
Examples
data(AlcoholUse)
str(AlcoholUse)
The BFM function is to generate Beta Factor Models data.
Description
The function supports various distribution types for generating the data.
Usage
BFM(n, p, m, mub, phib, distribution_type)
Arguments
n |
Sample size. |
p |
Sample dimensionality. |
m |
Number of factors. |
mub |
Mean parameter for Beta distribution (numeric vector or scalar, 0 < mub < 1). |
phib |
Precision parameter for Beta distribution (positive numeric vector or scalar). |
distribution_type |
Type of Beta distribution. |
Value
A list containing:
data |
Generated BFM data matrix (n rows, p columns). |
A |
A matrix representing the factor loadings. |
D |
Diagonal matrix of unique variances. |
kmo |
Kaiser-Meyer-Olkin sampling adequacy measure. |
bartlett |
Bartlett's test of sphericity. |
Examples
n <- 1000
p <- 10
m <- 5
mub <- runif(p, 0.2, 0.8)
phib <- runif(p, 5, 30)
dist_type <- "Elliptical Distribution"
X <- BFM(n, p, m, mub, phib, dist_type)
Household Food Expenditure Data
Description
A dataset from Griffiths, Hill, and Judge (1993) on household food expenditure, income, and household size.
The response variable food is a proportion (0 < food < 1), suitable for beta regression.
Usage
FoodExpenditure
Format
A data frame with 38 rows and 3 variables:
- food
numeric: proportion of household income spent on food
- income
numeric: household income (in thousands of dollars)
- persons
numeric: number of persons living in the household
Source
Griffiths, W. E., Hill, R. C., & Judge, G. G. (1993). Learning and Practicing Econometrics. Wiley.
Examples
data(FoodExpenditure)
str(FoodExpenditure)
Gasoline Yield Data from Prater (1956)
Description
A dataset containing 32 observations on gasoline yield under different experimental conditions.
The response variable yield is a proportion (0 < yield < 1), making it suitable for beta regression.
Usage
GasolineYield
Format
A data frame with 32 rows and 6 variables:
- yield
numeric: proportion of crude oil converted to gasoline
- batch
factor: 10 unique batches of crude oil
- temp
numeric: temperature (Fahrenheit)
- gravity
numeric: crude oil gravity
- pressure
numeric: pressure
- temp10
numeric: temperature (scaled)
Source
Prater (1956), as cited in Ferrari and Cribari-Neto (2004) Beta Regression for Modelling Rates and Proportions https://www.jstor.org/stable/4110074
Examples
data(GasolineYield, package = "betareg")
str(GasolineYield)
Reading Skills Data
Description
A dataset from Smithson and Verkuilen (2006) on reading accuracy, dyslexia status, and IQ scores.
The response variable accuracy is a proportion (0 < accuracy < 1), suitable for beta regression.
Usage
ReadingSkills
Format
A data frame with 44 rows and 4 variables:
- accuracy
numeric: proportion of correct responses in a reading task
- accuracy1
numeric: transformed accuracy measure
- dyslexia
factor: dyslexia status (levels: "yes", "no")
- iq
numeric: IQ score
Source
Smithson, M. & Verkuilen, J. (2006). A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. https://psycnet.apa.org/doi/10.1037/1082-989X.11.1.54
Examples
data(ReadingSkills)
str(ReadingSkills)
Calculate Errors for Factor Analysis Estimates
Description
This function calculates the Mean Squared Error (MSE) and relative error for factor loadings and uniqueness estimates.
Usage
calculate_errors(data, A, D, estimation_results)
Arguments
data |
Matrix of BFM data. |
A |
Matrix of true factor loadings. |
D |
Matrix of true uniquenesses (diagonal matrix). |
estimation_results |
A list containing |
Value
A named vector containing:
MSEA |
Mean Squared Error for factor loadings. |
MSED |
Mean Squared Error for uniqueness estimates. |
LSA |
Relative error for factor loadings. |
LSD |
Relative error for uniqueness estimates. |
Examples
set.seed(123)
n <- 10
p <- 5
A <- matrix(runif(p * p, -1, 1), nrow = p)
D <- diag(runif(p, 1, 2))
data <- matrix(runif(n * p), nrow = n)
estimation_results <- list(A_hat = A, D_hat = D)
errors <- calculate_errors(data, A, D, estimation_results)
print(errors)