Enhancement possibility? --> Pipe cache #14

rickhg12hs · 2020-09-11T12:18:54Z

There is some overhead to create pipes. For some use cases it may be advantageous to cache pipes or even partial pipes. Would it be possible to cache pipes automatically? ... or by some switch, etc.?

Here you can see the "penalty" associated with creating pipes.

$ ipython3
Python 3.7.5 (default, Oct 17 2019, 12:21:00) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.18.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from pipetools import pipe,X,foreach

In [2]: def my_func(count=10000, predef = False):
   ...:     if predef == False:
   ...:         for k in range(count):
   ...:             a = range(10) > pipe | foreach(X**2) | sum
   ...:     else:
   ...:         my_pipe = pipe | foreach(X**2) | sum
   ...:         for k in range(count):
   ...:             a = range(10) > my_pipe
   ...:     return a
   ...: 

In [3]: %timeit my_func()
202 ms ± 8.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit my_func(predef=True)
59.5 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit for k in range(10000): a=sum([x**2 for x in range(10)])
29.9 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

0101 · 2020-09-14T15:41:57Z

It might be possible. It still wouldn't be as fast as doing it manually, because there would be some overhead of creating a key and looking it up in the cache.

Also it's quite easy to do it explicitly as in your predef example. If you know you'll be reusing a pipe few thousand times you might as well give it a name. So I'm not sure if it's worth it to add caching that would complicate the code and potentially introduce some tricky bugs.

rickhg12hs · 2020-09-14T19:30:06Z

It would be nice if there was a way to do incremental data analysis where intermediate results could be cached and checked/viewed. For example,

In [6]: big_data_list > pipe | transformation_that_takes_a_long_time
[1234.5678,
...
 9876.5432]

Then hit "up-arrow" on the keyboard and just add the next transformation/aggregation at the end of the previous line...

In [7]: big_data_list > pipe | transformation_that_takes_a_long_time | another_long_transformation
[0.098,
...
 0.987]

... where big_data_list > pipe | transformation_that_takes_a_long_time isn't recalculated.

Is there an easy way to do this without manually storing intermediate analysis results?

0101 · 2020-09-15T17:11:41Z

Well that's another story - I thought you only wanted to cache the pipe itself, not the result of calling it. For this you'd have to make some sort of cached_pipe object that would behave that way, or control it by some flag (because that would not be a good default behavior). Also in case of big_data_list it might be tricky to create a good cache key. At the moment I don't see any easy way to accomplish this, but it could be a nice coding exercise. I'm open to ideas if you have any.

Also there's usually some sort of magic variable, in case of IPython it's _, which always holds the result of the previous expression. So you can do:

In [6]: big_data_list > pipe | transformation_that_takes_a_long_time
...
In [7]: _ > pipe | another_long_transformation # or another_long_transformation(_)
...

Which might be a decent workaround for you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement possibility? --> Pipe cache #14

Enhancement possibility? --> Pipe cache #14

rickhg12hs commented Sep 11, 2020

0101 commented Sep 14, 2020

rickhg12hs commented Sep 14, 2020

0101 commented Sep 15, 2020

Enhancement possibility? --> Pipe cache #14

Enhancement possibility? --> Pipe cache #14

Comments

rickhg12hs commented Sep 11, 2020

0101 commented Sep 14, 2020

rickhg12hs commented Sep 14, 2020

0101 commented Sep 15, 2020