Exploiting `segments` in a way that scales #558

jl5000 · 2024-08-05T15:53:39Z

Related to #451, the use of the segments parameter is a powerful way of grouping and multiplying your validation tests. Whilst having a custom label is useful to be able to see the segments at a glance, this could cause very large validation reports that are difficult to navigate and parse.

As a starting point, it would be useful to set a global segmentation scheme in create_agent() (much like actions), which will then apply to every validation function. Perhaps a way of overriding this for specific validation checks (e.g. setting to NULL) would be needed too.

When it comes to organising this in the report, would it be possible to split the HTML output into sections for each segment (or even better, tabs)? My use case is monthly reports spanning years, and I want to be able to see the issues within specific months.

I have a feeling I am just scratching the surface of what could be done here, and would be keen to hear others' thoughts.

The text was updated successfully, but these errors were encountered:

yjunechoe · 2024-08-06T01:21:00Z

This is a creative idea and my personal thought is that such scaling should be possible (albeit not immediately obvious or convenient) without needing to bake additional behaviors into create_agent() and get_agent_report() (I'd prefer against this as these functions are already complex as is). See also #489

Given a larger "global" data frame and an agent that can apply over that data frame, you could start by creating a list of data splits by segment yourself, and then iterate over those data splits to produce segment-level agents, by re-interrogating the global agent over the data split using the set_tbl() + interrogate() combo. This gives you a list of segment-level agents that share the validations executed. From there, you can generate a report for each agent in the list and arranging the reports in html using {htmltools}.

An example:

library(pointblank)

# ---- The familiar interface

# Data frame with segments
set.seed(1)
df <- data.frame(segment = forcats::fct_inorder(month.name[1:3]), val = rnorm(300))

# Global agent
agent <- df %>% 
  create_agent() %>% 
  col_vals_gt(val, 0)

# ---- Data split strategy

# Split data by segment to iterate over
df_split <- split(df, ~ segment)

# Re-interrogate the `agent` on each data split
segment_agents <- lapply(df_split, function(x) {
  agent %>% 
    set_tbl(x, label = unique(x$segment)) %>% 
    interrogate()
})

# Grab segment-level agent reports
segment_agent_reports <- lapply(segment_agents, get_agent_report)

# A simple rendering of reports inside a single div
do.call(htmltools::div, unname(segment_agent_reports)) %>% 
  print(browse = TRUE)

jl5000 · 2024-08-06T21:11:20Z

Yes, I've been doing something similar to this, although your method is probably more elegant! Perhaps then it's not worth complicating the functions further.

jl5000 added the Type: ★ Enhancement label Aug 5, 2024

jl5000 assigned rich-iannone Aug 5, 2024

yjunechoe added the Type: ⁇ Question label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploiting `segments` in a way that scales #558

Exploiting `segments` in a way that scales #558

jl5000 commented Aug 5, 2024

yjunechoe commented Aug 6, 2024 •

edited

Loading

jl5000 commented Aug 6, 2024

Exploiting segments in a way that scales #558

Exploiting segments in a way that scales #558

Comments

jl5000 commented Aug 5, 2024

yjunechoe commented Aug 6, 2024 • edited Loading

jl5000 commented Aug 6, 2024

Exploiting `segments` in a way that scales #558

Exploiting `segments` in a way that scales #558

yjunechoe commented Aug 6, 2024 •

edited

Loading