-
Notifications
You must be signed in to change notification settings - Fork 0
/
chap-stratified-evaluation.qmd
268 lines (199 loc) · 16.1 KB
/
chap-stratified-evaluation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
```{r echo = FALSE, cache = FALSE}
source("utils.R", local = TRUE)
```
# Stratified Evaluation {#chap-stratified-evaluation}
In the realm of audit sampling, the practice of stratified sampling is a powerful technique that enables auditors to enhance the accuracy and representativeness of their samples. Stratified sampling is a strategy employed in audit sampling where the population is divided into distinct subgroups or strata based on relevant characteristics. These characteristics could include geographical location, department, business unit, or any other factor that might influence the distribution of errors or misstatements. By segregating the population into strata, auditors can ensure that their sample captures the diversity of the entire population. This chapter delves into the intricacies of stratified evaluation, exploring three distinct approaches—no pooling, complete pooling, and partial pooling—each with its own advantages and trade-offs. Knowledge of these three approaches is crucial for auditors aiming to optimize their statistical evaluation and generate robust population estimates.
Consider an example of auditing expense claims in a large organization. Instead of treating all expense claims as a uniform entity, auditors can stratify the claims based on the departments they belong to. This approach ensures that the audit sample includes a representative mix of claims from different departments, thereby reducing the risk of overlooking specific areas of concern. Another example of such a situation would be a group audit where the audited organization consists of different components or branches. Stratification is relevant for the group auditor if they must form an opinion on the group as a whole because they must aggregate the samples taken by the component auditors.
In general, there are three approaches to evaluating a stratified sample: no pooling, complete pooling, and partial pooling. No pooling assumes no similarities between strata, which means that all strata are analyzed independently. Complete pooling assumes no difference between strata, which means that all data is aggregated and analyzed as a whole. Finally, partial pooling assumes differences and similarities between strata, which means that information can be shared between strata. Partial pooling (i.e., hierarchical modeling) is a powerful technique that can result in more efficient population and stratum estimates.
![There are three approaches to evaluating a stratified audit sample: no pooling, complete pooling and partial pooling. Image available under a [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/legalcode) license.](img/book_doors.png){#fig-ss-evaluation-1 fig-align="center" width=65%}
As a data example, consider the `retailer` data set that comes with the package. The organization in question consists of 20 branches across the country. In each of the 20 strata, a component auditor has taken a statistical sample and reported the outcomes to the group auditor.
```{r}
data(retailer)
head(retailer)
```
The number of units per stratum in the population can be provided with `N.units` to weigh the stratum estimates to determine population estimate. This is called poststratification. If `N.units` is not specified, each stratum is assumed to be equally represented in the population.
## No pooling
No pooling (`pooling = "none"`, default) assumes no similarities between strata. This means that the prior distribution specified through `prior` is applied independently for each stratum. This allows for independent estimates for the misstatement in each stratum but also results in a relatively high uncertainty in the population estimate. Assuming a binomial likelihood and a beta($\alpha$, $\beta$) prior on $\theta$ (these choices may differ among analysts), the statistical model applied in the no pooling approach is the following:
\begin{align}
k_s &\sim \text{Binomial}(n_s, \theta_s)\\
\theta_s &\sim \text{Beta}(\alpha, \beta)\\
\theta &\leftarrow \frac{\sum \theta_s N_s}{N}
\end{align}
The call below evaluates the sample using a Bayesian stratified evaluation procedure, in which the stratum estimates are poststratified to arrive at the population estimate. Since the posterior distribution is determined via sampling it is important to use `set.seed()` to make the results reproducible.
```{r}
set.seed(1)
result_np <- evaluation(
materiality = 0.05,
method = "binomial",
n = retailer[["samples"]],
x = retailer[["errors"]],
N.units = retailer[["items"]],
alternative = "two.sided",
pooling = "none",
prior = TRUE
)
result_np
```
In this case, the output of the `summary()` function shows that the estimate of the misstatement in the population is 5.98 percent, with the 95 percent credible interval ranging from 4.28 percent to 8.22 percent. The stratum estimates can be visualized using the `plot()` function in combination with `type = "estimates`, see @fig-stratified-ss-np-estimates. Estimation plots display stratum estimates and their uncertainties, revealing the differences and overlaps between strata. As the figure shows, the stratum estimates differ substantially from each other but are relatively uncertain.
```{r label="fig-stratified-ss-np-estimates", fig.cap="Estimates of the population and stratum misstatement under the no pooling model."}
plot(result_np, type = "estimates")
```
Posterior distribution plots provide insights into how the prior beliefs evolve after considering the data, showcasing the gradual convergence of information. The prior and posterior distribution for the population misstatement can be requested via the `plot()` function, see @fig-stratified-ss-np-posterior.
```{r label="fig-stratified-ss-np-posterior", fig.cap="Prior and posterior distribution for the population misstatement under the no pooling model."}
plot(result_np, type = "posterior")
```
## Complete pooling
Complete pooling (`pooling = "complete"`) assumes no differences between strata. This has the advantages that data from all strata can be aggregated, which decreases the uncertainty in the population estimate compared to the no pooling approach. However, the disadvantage of this approach is that it does not facilitate the distinction between between strata, as every stratum receives the same estimate equal to that of the population, see @fig-stratified-ss-cp-estimates. Assuming a binomial likelihood and a beta($\alpha$, $\beta$) prior on $\theta$, the statistical model applied in the complete pooling approach is the following:
\begin{align}
k &\sim \text{Binomial}(n, \theta)\\
\theta &\sim \text{Beta}(\alpha, \beta)
\end{align}
The call below evaluates the sample using a Bayesian stratified evaluation procedure, in which the strata are assumed to be the same.
```{r}
result_cp <- evaluation(
materiality = 0.05,
method = "binomial",
n = retailer[["samples"]],
x = retailer[["errors"]],
N.units = retailer[["items"]],
alternative = "two.sided",
pooling = "complete",
prior = TRUE
)
result_cp
```
For example, the output of the `summary()` function shows that the estimate of the misstatement in the population is 4.47 percent, with the 95 percent credible interval ranging from 3.74 percent to 5.34 percent. Since the data is aggregated, the stratum estimates contain relatively little uncertainty. However, the probability of misstatement in stratum 20 (many misstatements) under this assumption is the same as that of stratum 15 (few misstatements).
```{r label="fig-stratified-ss-cp-estimates", fig.cap="Estimates of the population and stratum misstatement under the complete pooling model."}
plot(result_cp, type = "estimates")
```
The prior and posterior distribution for the population misstatement can be requested via the `plot()` function, see @fig-stratified-ss-cp-posterior.
```{r label="fig-stratified-ss-cp-posterior", fig.cap="Prior and posterior distribution for the population misstatement under the complete pooling model."}
plot(result_cp, type = "posterior")
```
## Partial pooling
Finally, partial pooling (`pooling = "partial"`) assumes differences and similarities between strata. This allows the auditor to differentiate between strata, while also sharing information between the strata to reduce uncertainty in the population estimate. Assuming a binomial likelihood and a beta prior for $\theta$, the statistical model applied in the partial pooling approach is the following:
\begin{align}
k_s &\sim \text{Binomial}(n_s, \theta_s)\\
\theta_s &\sim \text{Beta}(\phi \nu, (1 - \phi) \nu)\\
\phi &\sim \text{Beta}(\alpha, \beta)\\
\nu &\sim \text{Pareto}(1, \frac{3}{2})\\
\theta &\leftarrow \frac{\sum \theta_s N_s}{N}
\end{align}
![Partial pooling takes into account the hierarchical structure in the data. Image available under a [CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/legalcode) license.](img/book_pyramid.png){#fig-ss-evaluation-2 fig-align="center" width=65%}
The call below evaluates the sample using a Bayesian stratified evaluation procedure, in which the stratum estimates are poststratified to arrive at the population estimate.
```{r}
set.seed(1)
result_pp <- evaluation(
materiality = 0.05,
method = "binomial",
n = retailer[["samples"]],
x = retailer[["errors"]],
N.units = retailer[["items"]],
alternative = "two.sided",
pooling = "partial",
prior = TRUE
)
result_pp
```
In this case, the output shows that the estimate of the misstatement in the population is 4.32 percent, with the 95 percent credible interval ranging from 3.23 percent to 5.39 percent. Note that this population estimate is substantially less uncertain than that of the no pooling approach. @fig-stratified-ss-pp-estimates visualizes the population and stratum estimates. Note that, like in the no pooling approach, the stratum estimates are different from each other but lie closer together and are less uncertain.
```{r label="fig-stratified-ss-pp-estimates", fig.cap="Estimates of the population and stratum misstatement under the partial pooling model."}
plot(result_pp, type = "estimates")
```
The prior and posterior distribution for the population misstatement can be requested via the `plot()` function, see @fig-stratified-ss-pp-posterior.
```{r label="fig-stratified-ss-pp-posterior", fig.cap="Prior and posterior distribution for the population misstatement under the partial pooling model."}
plot(result_pp, type = "posterior")
```
## Evaluation using data
To illustrate these concepts using data, let's consider the `allowances` dataset included in the package, which contains 3500 financial statement line items with book values (`bookValue`) and, for illustrative purposes, audited (true) values (`auditValue`) across different branches. Since the focus of this chapter is the evaluation stage in the audit, the sample is already indicated in the data set. The performance materiality in this example is set to five percent.
```{r}
data(allowances)
head(allowances)
```
Evaluating a stratified sample using data requires specification of the `data`, `values`, `values.audit` and `strata` arguments in the `evaluation()` function. In this case, the units are monetary and calculated by aggregating the book values of the items in each stratum.
```{r}
N.units <- aggregate(allowances[["bookValue"]], list(allowances[["branch"]]), sum)$x
```
### Classical Evaluation
Using classical evaluation, auditors can apply stratified evaluation to assess the population misstatement rate. The estimates obtained under this approach reflect independent evaluation of each stratum, potentially leading to a relatively high uncertainty in the overall population estimate. The statistical model for this evaluation using the Poisson likelihood is relatively simple:
\begin{align}
t_s &\sim \text{Poisson}(n_s\theta_s)\\
\theta &\leftarrow \frac{\sum \theta_s N_s}{N}
\end{align}
The call below evaluates the `allowances` sample using a classical stratified evaluation procedure, in which the stratum estimates are poststratified to arrive at the population estimate.
```{r}
set.seed(1)
result_dnpc <- evaluation(
materiality = 0.05,
data = allowances,
N.units = N.units,
values = "bookValue",
values.audit = "auditValue",
strata = "branch",
times = "times",
alternative = "two.sided",
pooling = "none"
)
result_dnpc
```
In this case, the output shows that the estimate of the misstatement in the population is 14.72 percent, with the 95 percent confidence interval ranging from 12.55 percent to 18.28 percent. The precision of the population estimate is 5.73 percent. The stratum estimates can be seen in the output of the `summary()` function and are visualized in @fig-stratified-np-estimates below.
```{r label="fig-stratified-np-estimates", fig.cap="Estimates of the population and stratum misstatement under the no pooling model."}
plot(result_dnpc, type = "estimates")
```
### Bayesian Evaluation
Bayesian inference can improve upon the estimates of the classical approach by pooling information between strata where possible. The statistical model for this evaluation using the multilevel model with the use of taints is relatively complex:
\begin{align}
t_i,s &\sim \text{Beta}(\theta_s \kappa_s, (1 - \theta_s) \kappa_s)\\
\theta_s &\sim \text{Beta}(\phi \nu, (1 - \phi) \nu)\\
\kappa_s &\sim \text{Normal}(\mu, \sigma)^{+}\\
\phi &\sim \text{Beta}(\alpha, \beta)\\
\nu &\sim \text{Pareto}(1, \frac{3}{2})\\
\mu &\sim \text{Normal}(1, 100)^{+}\\
\sigma &\sim \text{Normal}(0, 10)^{+}\\
\theta &\leftarrow \frac{\sum \theta_s N_s}{N}
\end{align}
The call below evaluates the `allowances` sample using a Bayesian multilevel stratified evaluation procedure, in which the stratum estimates are poststratified to arrive at the population estimate.
```{r}
set.seed(1)
result_dnpb <- evaluation(
materiality = 0.05,
method = "binomial",
data = allowances,
N.units = N.units,
values = "bookValue",
values.audit = "auditValue",
strata = "branch",
times = "times",
alternative = "two.sided",
pooling = "partial",
prior = TRUE
)
result_dnpb
```
The output shows that the estimate of the misstatement in the population is 16.59 percent, with the 95 percent credible interval ranging from 15.72 percent to 17.57 percent. The precision of the population estimate is 1.85 percent, which is substantially lower than that of the classical approach. The stratum estimates can be seen in the output of the `summary()` function and are visualized in @fig-stratified-pp-estimates below.
```{r label="fig-stratified-pp-estimates", fig.cap="Estimates of the population and stratum misstatement under the partial pooling model."}
plot(result_dnpb, type = "estimates")
```
The prior and posterior distribution for the population misstatement can be requested via the `plot()` function, see @fig-stratified-pp-posterior.
```{r label="fig-stratified-pp-posterior", fig.cap="Prior and posterior distribution for the population misstatement under the partial pooling model."}
plot(result_dnpb, type = "posterior")
```
Stratified evaluation is a pivotal tool in an auditor's arsenal, allowing for the analysis of diverse populations with varying characteristics. By embracing the principles of no pooling, complete pooling, and partial pooling, auditors can tailor their evaluation strategies to balance independence and shared information, resulting in more accurate and reliable population estimates. The combination of these approaches with real-world data offers auditors a comprehensive toolkit to enhance the quality and efficiency of their evaluations.
## Practical Exercises
1. Evaluate a stratified sample of $n_s$ = [30, 40, 50] items containing $k_s$ = [0, 1, 2] misstatements. Use the classical approach.
::: {.content-visible when-format="html"}
<details>
<summary>Click to reveal answer</summary>
To evaluate a stratified sample using the classical approach, the `evaluation()` function can be used with the default arguments.
```{r}
evaluation(n = c(30, 40, 50), x = c(0, 1, 3), method = "binomial")
```
</details>
:::
::: {.content-visible when-format="pdf"}
\clearpage
## Answers to the Exercises
1. To evaluate a stratified sample using the classical approach, the `evaluation()` function can be used with the default arguments.
```{r}
evaluation(n = c(30, 40, 50), x = c(0, 1, 3), method = "binomial")
```
:::