-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Dataflow analysis framework #1476
base: main
Are you sure you want to change the base?
Conversation
1594e6f
to
e1c49d7
Compare
718f058
to
6698b1b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1476 +/- ##
==========================================
- Coverage 85.79% 85.68% -0.12%
==========================================
Files 135 142 +7
Lines 24687 25928 +1241
Branches 21623 22864 +1241
==========================================
+ Hits 21180 22216 +1036
- Misses 2405 2573 +168
- Partials 1102 1139 +37
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
e54c742
to
6cac41a
Compare
* DFContext reinstate fn hugr(), drop AsRef requirement (fixes StackOverflow) * test_tail_loop_iterates_twice: use tail_loop_builder_exts, fix from #1332(?) * Fix only-one-DataflowContext asserts using Arc::ptr_eq
…text interprets load_constant
hugr-passes/src/dataflow/datalog.rs
Outdated
Output, | ||
} | ||
|
||
ascent::ascent! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be helpful to have documentation on the relations, especially since the fields aren't named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wondered if I should do that, then was lazy. Thanks for keeping me honest :). I've kept these terse in format but I hope the meaning is clear, shout if you think it's worth spending more space.
hugr-passes/src/dataflow/datalog.rs
Outdated
|
||
ascent::ascent! { | ||
pub(super) struct AscentProgram<V: AbstractValue, C: DFContext<V>>; | ||
relation context(C); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of copies of C
, for every element of every relation, and then again in the indexes. Even when C
is just an Arc
that presumably adds to be quite substantial. This could be addressed by using ascent_run
in a function where the context is in scope. We could then also avoid the hacky Hash
implementations on contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. My bad here - I never looked at the top-level doc for ascent to see ascent_run
. Thanks :-), that is about 100* better. (Lead to moving a few things around, and separating DFContext from HugrView.)
hugr-passes/src/dataflow/datalog.rs
Outdated
io_node(c, pred, out_n, IO::Output), | ||
_cfg_succ_dest(c, cfg, succ, dest), | ||
node_in_value_row(c, out_n, out_in_row), | ||
if let Some(fields) = out_in_row.unpack_first(succ_n, df_block.sum_rows.get(succ_n).unwrap().len()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This grabs the row in input values for the node and only looks at the first. It could get the first value directly by using in_wire_value
. There's some other places where this is done as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unpack_first
doesn't only look at the first - it unpacks the first, but then appends the rest too. (Roughly, the row-level transformation that happens when you enter a conditional, end a loop iteration, or go down a control-flow edge.)
I've updated the doc on unpack_first
, but perhaps I should rename it, or any other ideas how to make this clearer?
hugr-passes/src/dataflow/datalog.rs
Outdated
lattice out_wire_value(C, Node, OutgoingPort, PV<V>); | ||
lattice in_wire_value(C, Node, IncomingPort, PV<V>); | ||
lattice node_in_value_row(C, Node, ValueRow<V>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an input port connected directly to an output port, this stores three copies of the value, which ascent then needs to keep in sync with lattice operations. We could be more economical like this: let a "link" refer to a group of ports that are all connected. We can then store the association of node and port to link, and the value of the link. That would reduce the number of times that values are copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is worth noting as a future optimization but not doing yet / until benchmarks demand, as I think the extra level of indirection will reduce clarity. But to clarify (as I haven't written the comment yet), something like...
relation link_values(Link, PV<V>);
relation out_wire_link(Node, OutgoingPort, Link);
relation in_wire_link(Node, OutgoingPort, Link);
relation node_in_value_row(Node, Vec<Link>);
the last three are fairly trivial, modulo definition of Link (possibly the topologically-least (Node, In/Out, PortIndex)
?). But then, each (say) out_wire_value(n, OutgoingPort::from(p.index()), v) <-- .....
becomes
link_values(lnk, v) <-- .....,
out_wire_link(n, OutgoingPort::from(p.index()), lnk)
?
That didn't seem too bad, hmmm. I can add a comment to that effect, or make a follow-up PR that I'd keep separate (in case we ever need to revert for debugging, say) if you think that's worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I had a bit more of a go. Defining a Link
as a (Node, OutgoingPort)
(it'd be better to use a Wire but there), I defined:
relation node_in_value_preds(Node, Vec<(Node, OutgoingPort)>);
which could be filled straightforwardly from the edges of the graph (an element being a node, and then the source of each of said node's inputs). The best I could come up with for a standard-control-flow "untuple the first element and append the rest" was then....
for (succ_n, succ) in hugr.output_neighbours(*pred).enumerate(),
output_child(pred, out_n),
_cfg_succ_dest(cfg, succ, dest),
- node_in_value_row(out_n, out_in_row),
- if let Some(fields) = out_in_row.unpack_first(succ_n, df_block.sum_rows.get(succ_n).unwrap().len()),
- for (out_p, v) in fields.enumerate();
+ node_in_value_preds(out_n, out_in_row),
+ for (idx, (out_node, out_port)) in out_in_row.iter().enumerate(),
+ out_wire_value(out_node, out_port, v1),
+ let first_tuple_len = df_block.sum_rows.get(succ_n).unwrap().len(),
+ for (out_p, v) in if idx==0 {
+ v1.variant_values(succ_n, first_tuple_len).unwrap_or_default()
+ .into_iter().enumerate().collect()
+ } else {
+ vec![(idx + first_tuple_len, v1.clone())]
+ };
which....unless there's some cunning way to make datalog procedures that I'm missing, would be needed about 5 times. (Bad, but not terrible). However I got even more stuck on the call to propagate_leaf_op
where it's not sufficient to define in-values to the leaf op one IncomingPort at a time, we need to reassemble them into a Vector to pass the whole lot together to propagate_leaf_op
. AFAICS this would require defining an ascent custom aggregator, and at that point I'm thinking - well we could, but no, let's not do this now, and let's hope we never have to....
d3b71ef
to
dc56686
Compare
Intended as a development of #1157, with significant changes:
Constant-folding and ValueHandle now stripped out, these will follow in a second PR
Everything is now in hugr-passes
Underlying domain of values abstracted over a trait AbstractValue (ValueHandle will implement this), which represents non-Sum values
datalog uses PartialValue wrapped around the AbstractValue to represent (Partial)Sums and make into a BoundedLattice
The old
PV
is gone (PartialValue
directly implements BoundedLattice)Interpretation of leaf (extension) ops is handled by the DFContext trait (although MakeTuple, and Untuple are handled by the framework - really prelude
MakeTuple
is just coreTag
andUntuple
is a single-Case Conditional with passthrough wires....); the framework handles routing of sums through these ops and all containers, also loading constants (with the DFContext handling non-Sum leafValue
s).Various refactoring of handling values (inc. in datalog) -
variant_values
+as_sum
+ more use of rows rather than indexing (this got rid of a bunch of unwraps and so on), significant refactoring of join/meet (and no_unsafe
).I've managed to refactor tests not to use ValueHandle etc. - they are only dealing with sum/loop/conditional routing after all.
dataflow/test.rs
uses about the simplest possibleTestContext
which provides zero information after any leaf-op - so we only get the framework-provided handling of Tag/MakeTuple/etc.propolutate_out_wires
should either be renamed or I'm even wondering if we can kill it (or make it private i.e. only for tests) by adding a method to set root-node inputs (i.e. outputs of the appropriate child Input node - or grandchild for CFG, tho that raises issues if the Entry block has actual predecessors i.e. cycles). Reworking this to set input wires rather than output wires would have some benefits here. Indeed the interface might be to pass input PartialValues intoMachine::run
.