Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploiting segments in a way that scales #558

Open
jl5000 opened this issue Aug 5, 2024 · 2 comments
Open

Exploiting segments in a way that scales #558

jl5000 opened this issue Aug 5, 2024 · 2 comments

Comments

@jl5000
Copy link

jl5000 commented Aug 5, 2024

Related to #451, the use of the segments parameter is a powerful way of grouping and multiplying your validation tests. Whilst having a custom label is useful to be able to see the segments at a glance, this could cause very large validation reports that are difficult to navigate and parse.

As a starting point, it would be useful to set a global segmentation scheme in create_agent() (much like actions), which will then apply to every validation function. Perhaps a way of overriding this for specific validation checks (e.g. setting to NULL) would be needed too.

When it comes to organising this in the report, would it be possible to split the HTML output into sections for each segment (or even better, tabs)? My use case is monthly reports spanning years, and I want to be able to see the issues within specific months.

I have a feeling I am just scratching the surface of what could be done here, and would be keen to hear others' thoughts.

@yjunechoe
Copy link
Collaborator

yjunechoe commented Aug 6, 2024

This is a creative idea and my personal thought is that such scaling should be possible (albeit not immediately obvious or convenient) without needing to bake additional behaviors into create_agent() and get_agent_report() (I'd prefer against this as these functions are already complex as is). See also #489

Given a larger "global" data frame and an agent that can apply over that data frame, you could start by creating a list of data splits by segment yourself, and then iterate over those data splits to produce segment-level agents, by re-interrogating the global agent over the data split using the set_tbl() + interrogate() combo. This gives you a list of segment-level agents that share the validations executed. From there, you can generate a report for each agent in the list and arranging the reports in html using {htmltools}.

An example:

library(pointblank)

# ---- The familiar interface

# Data frame with segments
set.seed(1)
df <- data.frame(segment = forcats::fct_inorder(month.name[1:3]), val = rnorm(300))

# Global agent
agent <- df %>% 
  create_agent() %>% 
  col_vals_gt(val, 0)

# ---- Data split strategy

# Split data by segment to iterate over
df_split <- split(df, ~ segment)

# Re-interrogate the `agent` on each data split
segment_agents <- lapply(df_split, function(x) {
  agent %>% 
    set_tbl(x, label = unique(x$segment)) %>% 
    interrogate()
})

# Grab segment-level agent reports
segment_agent_reports <- lapply(segment_agents, get_agent_report)

# A simple rendering of reports inside a single div
do.call(htmltools::div, unname(segment_agent_reports)) %>% 
  print(browse = TRUE)

image

@jl5000
Copy link
Author

jl5000 commented Aug 6, 2024

Yes, I've been doing something similar to this, although your method is probably more elegant! Perhaps then it's not worth complicating the functions further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants