Decisions based on complex criteria, like a medical diagnosis or an ethical investment, often require a lot of effort. Answering each question involves gathering evidence, which can be expensive, invasive, or time-consuming. The goal is to reach a correct and confident conclusion by asking the fewest possible questions.
The andorR
package provides a strategy for this by
calculating an influence index for each unanswered
question. This index quantifies how much a question, if answered, is
likely to help resolve the entire tree. It provides a logical path to
the conclusion, ensuring you’re always working on the most impactful
piece of the puzzle. This process also embraces
uncertainty, allowing you to provide answers with
varying levels of confidence and to revisit uncertain answers later to
strengthen the final result.
The optimisation process works in two stages. First, it assigns
indices to the parent (non-leaf) nodes. Second, it uses those parent
indices to calculate the final influence_index
for each
unanswered leaf (question).
true_index
and
false_index
Every AND
or OR
node is assigned two values
that represent the probability of an answer propagating up through that
node. The calculation depends on the node’s logical rule and the number
of its unanswered children (\(n\)).
AND
node to become
TRUE
, all its children must be TRUE
. Answering
one child TRUE
makes little progress. However, answering
just one child FALSE
immediately makes the AND
node FALSE
(this is called a short-circuit). Therefore:
OR
node to become TRUE
, only one child needs
to be TRUE
. To make it FALSE
, all children
must be FALSE
. Therefore:
influence_index
A leaf’s total influence is its potential to affect the final
outcome. This is calculated by multiplying the indices of all its
ancestors up to the root for both the TRUE
and
FALSE
pathways, and then summing them.
The formula is: \[Influence = (\prod \text{ancestor true_indices}) + (\prod \text{ancestor false_indices})\]
This score represents the combined probability of the leaf’s answer propagating up to the root and helping to resolve the final conclusion. The leaf with the highest score is the most strategic question to investigate next, assuming one is neutral about the final outcome of the tree.
The list of prioritised questions is calculated using the
get_highest_influence()
function. This takes an optional
parameter sort_by
which can have the following values:
influence_if_true
of a question is equal to 1,
then answering that single question with TRUE will resolve the entire
tree with TRUEinfluence_if_false
of a question is equal to 1,
then answering that single question with FALSE will resolve the entire
tree with FALSE.The strategy used in the interactive analysis can also be set using
the sort_by
parameter.
Let’s walk through the process using the ethical
investment tree included with this package.
We start by loading the package and the ethical
dataset,
then use load_tree_df()
to build the tree object.
The update_tree()
function is a convenient wrapper that
runs all the necessary calculations in the correct order: it calculates
the logical state of the tree, assigns the true_index
and
false_index
to all parent nodes, and then calculates the
influence_index
for all unanswered leaves.
We can now print the tree to see the calculated indices. Notice that
parent nodes have a true_index
and
false_index
, while only the unanswered leaves have an
influence_index
.
print(dtree, "rule", true = "true_index", false = "false_index", influence = "influence_index")
#> levelName rule true false influence
#> 1 Invest in Company X AND 0.2500000 1.0000000 NA
#> 2 ¦--Financial Viability AND 0.5000000 1.0000000 NA
#> 3 ¦ ¦--Profitability and Growth Signals OR 1.0000000 0.3333333 NA
#> 4 ¦ ¦ ¦--FIN1 NA NA 0.4583333
#> 5 ¦ ¦ ¦--FIN2 NA NA 0.4583333
#> 6 ¦ ¦ °--FIN3 NA NA 0.4583333
#> 7 ¦ °--Solvency and Stability AND 0.5000000 1.0000000 NA
#> 8 ¦ ¦--FIN4 NA NA 1.0625000
#> 9 ¦ °--FIN5 NA NA 1.0625000
#> 10 ¦--Acceptable Environmental Stewardship OR 1.0000000 0.5000000 NA
#> 11 ¦ ¦--Has a Clean Current Record AND 0.3333333 1.0000000 NA
#> 12 ¦ ¦ ¦--ENV1 NA NA 0.5833333
#> 13 ¦ ¦ ¦--ENV2 NA NA 0.5833333
#> 14 ¦ ¦ °--ENV3 NA NA 0.5833333
#> 15 ¦ °--Has a Credible Transition Pathway OR 1.0000000 0.3333333 NA
#> 16 ¦ ¦--ENV4 NA NA 0.4166667
#> 17 ¦ ¦--ENV5 NA NA 0.4166667
#> 18 ¦ °--ENV6 NA NA 0.4166667
#> 19 ¦--Demonstrable Social Responsibility OR 1.0000000 0.5000000 NA
#> 20 ¦ ¦--Shows Excellent Internal Culture OR 1.0000000 0.2500000 NA
#> 21 ¦ ¦ ¦--SOC1 NA NA 0.3750000
#> 22 ¦ ¦ ¦--SOC2 NA NA 0.3750000
#> 23 ¦ ¦ ¦--SOC3 NA NA 0.3750000
#> 24 ¦ ¦ °--SOC4 NA NA 0.3750000
#> 25 ¦ °--Has a Positive External Impact AND 0.3333333 1.0000000 NA
#> 26 ¦ ¦--SOC5 NA NA 0.5833333
#> 27 ¦ ¦--SOC6 NA NA 0.5833333
#> 28 ¦ °--SOC7 NA NA 0.5833333
#> 29 °--Strong Corporate Governance AND 0.2000000 1.0000000 NA
#> 30 ¦--GOV1 NA NA 1.0500000
#> 31 ¦--GOV2 NA NA 1.0500000
#> 32 ¦--GOV3 NA NA 1.0500000
#> 33 ¦--GOV4 NA NA 1.0500000
#> 34 °--GOV5 NA NA 1.0500000
To find the best question to ask first, we use
get_highest_influence()
.
get_highest_influence(dtree, top_n = 3)
#> name question
#> 1 FIN4 Debt-to-Equity ratio is below the industry average.
#> 2 FIN5 Company generates strong and positive free cash flow.
#> 3 GOV1 The board of directors is majority independent and diverse.
#> influence_if_true influence_if_false influence_index
#> 1 0.0625 1 1.0625
#> 2 0.0625 1 1.0625
#> 3 0.0500 1 1.0500
In this initial state, FIN4, FIN5,
and GOV1 (and other leaves under AND
nodes) have the highest influence. This is because they belong to
AND
groups directly under the root. A single
FALSE
answer to any of them has a high probability of
making a major branch of the tree FALSE
, thus contributing
significantly to the final answer. The tool correctly identifies these
as the most efficient questions to investigate first.
Let’s focus on some of the Financial questions:
# Get a full list of all question statuses
all_questions <- get_questions(dtree)
# Let's look at the initial influence of FIN1, FIN2, and FIN4
all_questions %>%
filter(name %in% c("FIN1", "FIN2", "FIN4")) %>%
knitr::kable()
name | question | answer | confidence | influence_if_true | influence_if_false | influence_index |
---|---|---|---|---|---|---|
FIN1 | Company demonstrates consistent, high revenue growth. | NA | NA | 0.1250 | 0.3333333 | 0.4583333 |
FIN2 | Company maintains a high net profit margin for its industry. | NA | NA | 0.1250 | 0.3333333 | 0.4583333 |
FIN4 | Debt-to-Equity ratio is below the industry average. | NA | NA | 0.0625 | 1.0000000 | 1.0625000 |
Now, let’s provide an answer and see how the priorities change. We
will answer FIN1
(which is under an OR
node)
as TRUE
. This should resolve its parent node,
“Profitability and Growth Signals”.
# Answer a question and then run update_tree() again
set_answer(dtree, "FIN1", TRUE, 5)
#> Answer for leaf 'FIN1' set to: TRUE with confidence 5/5
dtree <- update_tree(dtree)
Now that the tree has been recalculated, let’s check the influence scores for the same leaves again.
# Get the new list of all question statuses
all_questions_new <- get_questions(dtree)
# Look at the new influence of FIN2 and FIN4
all_questions_new %>%
filter(name %in% c("FIN1", "FIN2", "FIN4")) %>%
knitr::kable()
name | question | answer | confidence | influence_if_true | influence_if_false | influence_index |
---|---|---|---|---|---|---|
FIN1 | Company demonstrates consistent, high revenue growth. | TRUE | 5 | NA | NA | NA |
FIN2 | Company maintains a high net profit margin for its industry. | NA | NA | NA | NA | NA |
FIN4 | Debt-to-Equity ratio is below the industry average. | NA | NA | 0.125 | 1 | 1.125 |
FIN2: The influence index for FIN2
is now NA
. This is because its parent node (“Profitability
and Growth Signals”) was an OR
node. Since we answered its
sibling FIN1
as TRUE
, the parent node is now
resolved as TRUE
. Answering FIN2
can no longer
change the outcome of that branch, so its influence has correctly become
irrelevant.
FIN4: Conversely, the influence index for
FIN4
has increased. Because the
“Profitability” branch is now resolved, the final outcome of the
higher-level “Financial Viability” AND
node now depends
entirely on resolving the “Solvency and Stability” branch. This makes
the questions under it, like FIN4
, more critical than they
were before.
This dynamic re-prioritization is the core feature of the
andorR
optimisation algorithm.
Once enough questions have been answered to reach a TRUE
or FALSE
conclusion at the root, the focus of the analysis
shifts. The goal is no longer to find the answer, but to increase the
confidence in the answer you have. andorR
provides a separate function for this phase of the analysis.
For a detailed guide, see the “Confidence Boosting and Sensitivity
Analysis” vignette:
vignette("confidence-boosting", package = "andorR")
```