| Type: | Package |
| Title: | Visualizing Probabilities, Frequencies, and Conditional Independence for Categorical Variates |
| Version: | 1.0.0 |
| URL: | https://github.com/rwoldford/eikosograms, https://rwoldford.github.io/eikosograms/ |
| BugReports: | https://github.com/rwoldford/eikosograms/issues |
| Description: | An eikosogram (ancient Greek for probability picture) divides the unit square into rectangular regions whose areas, sides, and widths represent various probabilities associated with the values of one or more categorical variates. Rectangle areas are joint probabilities, widths are always marginal (though possibly joint margins, i.e. marginal joint distributions of two or more variates), and heights of rectangles are always conditional probabilities. Eikosograms embed the rules of probability and are useful for introducing elementary probability theory, including axioms, marginal, conditional, and joint probabilities, and their relationships (including Bayes' theorem as a completely trivial consequence). They provide advantages over Venn diagrams for this purpose, particularly in distinguishing probabilistic independence, mutually exclusive events, coincident events, and associations. They also are useful for identifying and understanding conditional independence structure. Eikosograms can be thought of as mosaic plots when only two categorical variates are involved; the layout is quite different when there are more than two variates. Only one categorical variate, designated the "response", presents on the vertical axis and all others, designated the "conditioning" variates, appear on the horizontal. In this way, conditional probability appears only as height and marginal probabilities as widths. The eikosogram is ideal for response models (e.g. logistic models) but equally useful when no variate is distinguished as the response. In such cases, each variate can appear in turn as the response, which is handy for assessing conditional independence in discrete graphical models (i.e. "Bayesian networks" or "BayesNets"). The eikosogram and its value over Venn diagrams in teaching probability is described in W.H. Cherry and R.W. Oldford (2003) https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/paper.pdf, its value in exploring conditional independence structure and relation to graphical and log-linear models is described in R.W. Oldford (2003) https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/independence/paper.pdf, and a number of problems, puzzles, and paradoxes that are easily explained with eikosograms are given in R.W. Oldford (2003) https://math.uwaterloo.ca/~rwoldfor/papers/eikosograms/examples/paper.pdf. |
| License: | GPL-3 |
| Depends: | R (≥ 3.5.0) |
| Imports: | grid, stats |
| Suggests: | knitr, rmarkdown, gridExtra |
| VignetteBuilder: | knitr, rmarkdown |
| Encoding: | UTF-8 |
| LazyLoad: | yes |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2026-01-11 20:39:37 UTC; rwoldford |
| Author: | Wayne Oldford [aut, cre], Erle Holgersen [aut], Ben Lafreniere [aut], Tianlu Zhu [aut] |
| Maintainer: | Wayne Oldford <rwoldford@uwaterloo.ca> |
| Repository: | CRAN |
| Date/Publication: | 2026-01-11 21:00:02 UTC |
Generic method for creating an eikosogram
Description
Generic method for creating an eikosogram
Usage
eikos(
y,
x = NULL,
data = NULL,
marginalize = NULL,
main = "",
main_size = 16,
ylabs = TRUE,
ylab_rot = 0,
yname_size = 12,
yvals_size = 12,
yaxs = TRUE,
yprobs = NULL,
yprobs_size = 8,
xlabs = TRUE,
xlab_rot = 0,
xname_size = 12,
xvals_size = 12,
xaxs = TRUE,
xprobs = NULL,
xprobs_size = 8,
vertical_xprobs = TRUE,
ispace = list(bottom = 8, left = 2, top = 2, right = 5),
legend = FALSE,
col = NULL,
bottomcol = "steelblue",
topcol = "snow2",
lcol = "black",
draw = TRUE,
newpage = TRUE,
lock_aspect = TRUE
)
Arguments
y |
Either the name of a variable in the data set (eikos.default), or a formula of such variables (eikos.formula). |
x |
name(s) of any conditional variable(s) (horizontal axis). Should be null if formula given. |
data |
data frame or table |
marginalize |
variable(s) to marginalize on, or NULL if none. Marginalized variables still appear in plot. |
main |
title of plot |
main_size |
font size of title (in points) |
ylabs |
logical, whether y labels should appear or not. |
ylab_rot |
rotation of y labels |
yname_size |
font size of vertical axis names (in points) |
yvals_size |
font size of labels for values of y variable (in points) |
yaxs |
logical, whether y axis should appear or not. |
yprobs |
probabilities to be shown on y-axis. NULL if they should be calculated from the data. |
yprobs_size |
font size of labels for horizontal probabilities (in points) |
xlabs |
logical, whether x labels should appear or not. |
xlab_rot |
rotation of x labels |
xname_size |
font size of horizontal axis names (in points) |
xvals_size |
font size of labels for values of x variables (in points) |
xaxs |
logical, whether x axis should appear or not. |
xprobs |
probabilities to be shown on x-axis. NULL if they should be calculated from the data. |
xprobs_size |
font size of labels for horizontal probabilities (in points) |
vertical_xprobs |
logical, whether probabilities on x axis should be rotated vertically. |
ispace |
list of four items (bottom, left, top, right) indicating the margins separating the text around the diagram. Each value is a positive integer giving a measure in "points". |
legend |
logical, whether to include legend |
col |
a vector of colours to match the response values. If NULL (the default), the colours are constructed as a smooth transition from 'bottomcol' to 'topcol' via 'grDevices::colorRampPalette |
bottomcol |
bottom colour |
topcol |
top colour |
lcol |
colour of lines |
draw |
logical, whether to draw eikosogram. |
newpage |
logical, whether to draw on a newpage. |
lock_aspect |
logical, whether to force entire plot to 1:1 aspect ratio. |
See Also
Examples
eikos("Hair", "Eye", data=HairEyeColor, legend = TRUE)
eikos(gear ~ cyl, data = mtcars)
eikos(Admit ~ Gender + Dept, data = UCBAdmissions,
yaxs = FALSE, xaxs = FALSE,
lock_aspect = FALSE,
xlab_rot = 90, xvals_size = 8,
ispace = list(bottom = 15))
Create a new eikosogram
Description
Return a grid graphic object (grob) and draw an eikosogram if draw = TRUE.
Usage
## Default S3 method:
eikos(
y,
x = NULL,
data = NULL,
marginalize = NULL,
main = "",
main_size = 16,
ylabs = TRUE,
ylab_rot = 0,
yname_size = 12,
yvals_size = 12,
yaxs = TRUE,
yprobs = NULL,
yprobs_size = 8,
xlabs = TRUE,
xlab_rot = 0,
xname_size = 12,
xvals_size = 12,
xaxs = TRUE,
xprobs = NULL,
xprobs_size = 8,
vertical_xprobs = TRUE,
ispace = list(bottom = 8, left = 2, top = 2, right = 5),
legend = FALSE,
col = NULL,
bottomcol = "steelblue",
topcol = "snow2",
lcol = "black",
draw = TRUE,
newpage = TRUE,
lock_aspect = TRUE
)
Arguments
y |
Either the name of a variable in the data set (eikos.default), or a formula of such variables (eikos.formula). |
x |
name(s) of any conditional variable(s) (horizontal axis). Should be null if formula given. |
data |
data frame or table |
marginalize |
variable(s) to marginalize on, or NULL if none. Marginalized variables still appear in plot. |
main |
title of plot |
main_size |
font size of title (in points) |
ylabs |
logical, whether y labels should appear or not. |
ylab_rot |
rotation of y labels |
yname_size |
font size of vertical axis names (in points) |
yvals_size |
font size of labels for values of y variable (in points) |
yaxs |
logical, whether y axis should appear or not. |
yprobs |
probabilities to be shown on y-axis. NULL if they should be calculated from the data. |
yprobs_size |
font size of labels for horizontal probabilities (in points) |
xlabs |
logical, whether x labels should appear or not. |
xlab_rot |
rotation of x labels |
xname_size |
font size of horizontal axis names (in points) |
xvals_size |
font size of labels for values of x variables (in points) |
xaxs |
logical, whether x axis should appear or not. |
xprobs |
probabilities to be shown on x-axis. NULL if they should be calculated from the data. |
xprobs_size |
font size of labels for horizontal probabilities (in points) |
vertical_xprobs |
logical, whether probabilities on x axis should be rotated vertically. |
ispace |
list of four items (bottom, left, top, right) indicating the margins separating the text around the diagram. Each value is a positive integer giving a measure in "points". |
legend |
logical, whether to include legend |
col |
a vector of colours to match the response values. If NULL (the default), the colours are constructed as a smooth transition from 'bottomcol' to 'topcol' via 'grDevices::colorRampPalette |
bottomcol |
bottom colour |
topcol |
top colour |
lcol |
colour of lines |
draw |
logical, whether to draw eikosogram. |
newpage |
logical, whether to draw on a newpage. |
lock_aspect |
logical, whether to force entire plot to 1:1 aspect ratio. |
Examples
eikos("Hair", "Eye", data=HairEyeColor, legend = TRUE)
eikos("Hair", "Eye", data=HairEyeColor,
legend = TRUE, ylabs = FALSE,
yname_size = 16, yvals_size = 8)
eikos("Hair", "Eye", data=HairEyeColor,
legend = TRUE, ylabs = FALSE,
yprobs = seq(0.2, 1, .2))
eikos("Eye", "Hair", data=HairEyeColor, yprobs = seq(0,1,0.25),
yname_size = 20, xname_size = 20,
col = c("sienna4", "steelblue", "darkkhaki", "springgreen3"),
lcol = "grey10",
lock_aspect = FALSE)
Draw eikosogram using a formula to identify response and conditioning variates
Description
Draw eikosogram using a formula to identify response and conditioning variates
Usage
## S3 method for class 'formula'
eikos(
y,
x = NULL,
data = NULL,
marginalize = NULL,
main = "",
main_size = 16,
ylabs = TRUE,
ylab_rot = 0,
yname_size = 12,
yvals_size = 12,
yaxs = TRUE,
yprobs = NULL,
yprobs_size = 8,
xlabs = TRUE,
xlab_rot = 0,
xname_size = 12,
xvals_size = 12,
xaxs = TRUE,
xprobs = NULL,
xprobs_size = 8,
vertical_xprobs = TRUE,
ispace = list(bottom = 8, left = 2, top = 2, right = 5),
legend = FALSE,
col = NULL,
bottomcol = "steelblue",
topcol = "snow2",
lcol = "black",
draw = TRUE,
newpage = TRUE,
lock_aspect = TRUE
)
Arguments
y |
Either the name of a variable in the data set (eikos.default), or a formula of such variables (eikos.formula). |
x |
name(s) of any conditional variable(s) (horizontal axis). Should be null if formula given. |
data |
data frame or table |
marginalize |
variable(s) to marginalize on, or NULL if none. Marginalized variables still appear in plot. |
main |
title of plot |
main_size |
font size of title (in points) |
ylabs |
logical, whether y labels should appear or not. |
ylab_rot |
rotation of y labels |
yname_size |
font size of vertical axis names (in points) |
yvals_size |
font size of labels for values of y variable (in points) |
yaxs |
logical, whether y axis should appear or not. |
yprobs |
probabilities to be shown on y-axis. NULL if they should be calculated from the data. |
yprobs_size |
font size of labels for horizontal probabilities (in points) |
xlabs |
logical, whether x labels should appear or not. |
xlab_rot |
rotation of x labels |
xname_size |
font size of horizontal axis names (in points) |
xvals_size |
font size of labels for values of x variables (in points) |
xaxs |
logical, whether x axis should appear or not. |
xprobs |
probabilities to be shown on x-axis. NULL if they should be calculated from the data. |
xprobs_size |
font size of labels for horizontal probabilities (in points) |
vertical_xprobs |
logical, whether probabilities on x axis should be rotated vertically. |
ispace |
list of four items (bottom, left, top, right) indicating the margins separating the text around the diagram. Each value is a positive integer giving a measure in "points". |
legend |
logical, whether to include legend |
col |
a vector of colours to match the response values. If NULL (the default), the colours are constructed as a smooth transition from 'bottomcol' to 'topcol' via 'grDevices::colorRampPalette |
bottomcol |
bottom colour |
topcol |
top colour |
lcol |
colour of lines |
draw |
logical, whether to draw eikosogram. |
newpage |
logical, whether to draw on a newpage. |
lock_aspect |
logical, whether to force entire plot to 1:1 aspect ratio. |
Examples
eikos(Eye ~ Hair + Sex, data=HairEyeColor)
eikos(Hair ~ ., data=HairEyeColor,
yaxs = FALSE, ylabs = FALSE,
legend = TRUE,
col = c("black", "sienna4",
"orangered", "lightgoldenrod" ))
eikos(Hair ~ ., data=HairEyeColor, xlab_rot = 30,
yprobs = seq(0.1, 1, 0.1),
yvals_size = 10,
xvals_size = 8,
ispace = list(bottom = 10),
bottomcol = "grey30", topcol = "grey70",
lcol = "white")
eikos(Hair ~ ., data=HairEyeColor, xlab_rot = 30,
marginalize = "Eye",
yvals_size = 10,
xvals_size = 8,
ispace = list(bottom = 10),
bottomcol = "grey30", topcol = "grey70",
lcol = "white")
eikos(Hair ~ ., data=HairEyeColor, xlab_rot = 30,
marginalize = c("Eye", "Sex"),
yvals_size = 10,
xvals_size = 8,
ispace = list(bottom = 10),
bottomcol = "grey30", topcol = "grey70",
lcol = "white")
Create eikosogram data frame
Description
Eikos helper function used to convert data.
Usage
eikos_data(y, x, data, marginalize = NULL)
Arguments
y |
response variable. |
x |
conditional variables. |
data |
data frame or table to be converted. |
marginalize |
name of variable to marginalize on, NULL if none. |
Create eikosogram legend
Description
Eikos helper function used to create legend.
Usage
eikos_legend(
labels,
title = NULL,
yname_size = 12,
yvals_size = 12,
col,
margin = unit(2, "points"),
lcol = "black"
)
Arguments
labels |
labels to be included in legend |
title |
if non-NULL a string to give as the legend title |
yname_size |
font size of vertical axis names (in points) |
yvals_size |
font size of labels for values of y variable (in points) |
col |
colours od |
margin |
unit specifying margin between legend entries |
lcol |
line colour |
eikos helper function. Returns grob with x axis labels.
Description
eikos helper function. Returns grob with x axis labels.
Usage
eikos_x_labels(
x,
data,
margin = unit(10, "points"),
xname_size = 12,
xvals_size = 10,
lab_rot = 0
)
Arguments
x |
vector of conditional variables |
data |
data frame from eikos_data. |
margin |
unit specifying margin |
xname_size |
font size for x axis variable names (in points) |
xvals_size |
font size of labels for values of x variables (in points) |
lab_rot |
integer indicating the rotation of the label, default is horizontal |
Value
gList with x labels and x-axis names as grob frames.
Create grob with eikosogram x-axis probabilities
Description
Creates x axis grob to be placed on eikosogram. Called by eikos functions.
Usage
eikos_x_probs(
data,
xprobs = NULL,
xprobs_size = 8,
margin = unit(2, "points"),
rotate = TRUE
)
Arguments
data |
data frame from eikos_data object |
xprobs |
vector of probabilities to be shown. NULL if they should be calculated from the data. |
xprobs_size |
font size of labels for horizontal probabilities (in points) |
margin |
unit specifying margin between y axis and eikosogram |
rotate |
logical, whether probabilities should be rotated vertically. |
Value
textGrob with x-axis probabilities.
eikos helper function. Returns grob with y axis labels.
Description
eikos helper function. Returns grob with y axis labels.
Usage
eikos_y_labels(
y,
data,
margin = unit(2, "points"),
yname_size = 12,
yvals_size = 10,
lab_rot = 0
)
Arguments
y |
response variable |
data |
data frame from eikos_data. |
margin |
unit specifying margin |
yname_size |
font size for y axis variable names (in points) |
yvals_size |
font size of labels for values of y variable (in points) |
lab_rot |
integer indicating the rotation of the label, default is horizontal |
Value
grobFrame with response variable labels and axis text
Create grob with eikosogram y-axis probabilities
Description
Creates y axis grob to be placed on eikosogram. Called by eikos functions.
Usage
eikos_y_probs(data, yprobs, yprobs_size = 8, margin = unit(2, "points"))
Arguments
data |
data frame from eikos_data object |
yprobs |
vector of probabilities to be shown. NULL if they should be calculated from the data. |
yprobs_size |
font size of labels for horizontal probabilities (in points) |
margin |
unit specifying margin between y axis and eikosogram |
Value
textGrob with y-axis probabilities.
Mining medical records (fictional)
Description
An entirely artificially constructed data set and context designed for classroom discussion and analysis.
A medical data mining context is given in detail below. In light of the context, interesting scientific questions will arise as to the data collection, and how the results should, or should not, be interpreted. It should also raise questions on what might be done in any follow up studies.
Instructors might choose to invent their own context.
Format
A data frame with 16 rows and 5 variables (providing the counts for a 2x2x2x2 contingency table).
- Age
A two level factor recording one of two age groups: "20-39" or "40-59".
- Sex
A two level factor recording sex: "Male" or "Female".
- Treatment
A two level factor recording the treatment received: "A" or "B".
- Outcome
A two level factor recording patient outcome after treatment: "Recovered" or "Died".
- Freq
The frequency count of patients having that combination of factors.
Details
One fictional context (constructed in March 2020) for this data set is given below (in the PPDAC style of Mackay and Oldford (2000)).
Problem:
A disease epidemic has broken out in the population of some country. It is thought that adults under the age of 60 appear to be particularly vulnerable. Both men and women contract the disease and need to be treated. Those who go untreated die within 5 days of contracting the disease.
The medical community has tried two quite different approaches to treat patients having the disease – call these 'Treatment A' and 'Treatment B'. For the health of the country, it is important to determine which of these two treatments is more effective.
Plan:
To investigate which is the better treatment, it was decided to mine the medical records from another country of those who had contracted the disease and had been treated with one of the two treatments. Patients treated with either A or B survive the disease and recover fully; some however still die.
Electronic medical records available from several of the more populous districts are accessible. These can be searched to provide records from patients that have received treatment. It was decided that there should be the same number of records drawn for each treatment.
Moreover, concern was raised that the investigation have gender balance (i.e. equal numbers of males and females). So, to make sure that both sexes were equally represented, it was also decided that the number of female patients would be the same as the number of male patients.
Finally, it was desirable to detect even small differences in success rates of the two treatments since small differences could mean many more lives being saved. A sample size of about n = 3,000 was decided on.
Records would be collected until 3,000 were found, 1500 of which were treated with 'A', 1500 with 'B', and there were equal numbers of males and females in the study.
Data:
In this stage, the plan is executed. Instead of 1500 records of treatment 'A' and 'B', 1600 of each were found. The number of males and females was kept equal (now 1600 of each sex).
The process was to search the records in order, selecting those first encountered to get 1600 for each treatment and 1600 of each sex. Many records might be discarded whenever one quota was met and the search continued to meet the other quotas. It was also noticed that the patient's age was available for each record, so that the effect of treatment on younger and older adults might also be considered.
The counts which fell into the various categories were assembled into the data presented here.
Author(s)
R.W. Oldford
References
R.J. MacKay and R.W. Oldford 2000, 'Scientific Method, Statistical Method, and the Speed of Light', Statistical Science, Volume 15, No. 3, pp. 254-278. <doi:10.1214/ss/1009212817>
Tuberculosis 1910 death rates in New York and in Richmond
Description
Tuberculosis 1910 death rates in New York and in Richmond
Format
A 2 x 2 x 2 table of counts involving three binary variables
- Group
One of two racial groups: "White" or "Colored"
- City
One of two U.S. cities: "New York" or "Richmond"
- Total
Either total deaths from tuberculosis or total population in 1910: "Deaths" or "Population"
Details
These are historical data taken from page 449 of Cohen and Nagel's 1934 "Introduction to Logic and Scientific Method". For this reason, the original names of the racial groups have been retained.
The data are of special historical interest in Statistics because they are one of the earliest recorded instances of a real Simpson's paradox (Simpson 1951) occurring in practice (see Blyth 1971). Preserving this historical context, the questions posed by Cohen and Nagel (1934) are also recorded here using their own words. The data and questions appear at the back of their book as exercises on "Chapter XVI: Statistical Methods".
In their table, Cohen and Nagel (1934, p. 449) include the "death rates from tuberculosis in Richmond, Virginia, and in New York City in 1910". These rates (in number per 100,000) are easily calculated and so have been excluded from the table given here.
In their words, Cohen and Nagel (1934, p. 449) pose the following two questions as exercise (*emphasis* is theirs):
"a. Does it follow that tuberculosis caused a greater mortality in Richmond than in New York?
b. Notice that the death rate for whites and that for Negroes were *lower* in Richmond than in New York, although the *total* death rate was *higher*. Are the two populations compared really *comparable*, that is, homogeneous?"
Author(s)
R.W. Oldford.
Source
"An Introduction to Logic and Scientific Method" by Morris R. Cohen and Ernest Nagel, (1934), Harcourt, Brace and Company, New York.
References
Blyth, Colin R. 1972. On Simpson's Paradox and the Sure-Thing Principle. Journal of the American Statistical Association, 67, pp.364-366.
Cohen, Morris R.; Nagel, Ernest. 1934. An Introduction to Logic and Scientific Method. Harcourt, Brace and Company. New York.
Simpson, E.H. 1951. The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, Series B, 13, pp. 238-241.