Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose unpack_stream on strucutures #28

Open
harlowja opened this issue Nov 26, 2015 · 6 comments
Open

Expose unpack_stream on strucutures #28

harlowja opened this issue Nov 26, 2015 · 6 comments

Comments

@harlowja
Copy link

For unpacking a strucuture from a stream its very much appreciated to expose an equivalent method to do so on structure objects.

Currently all that exists is:

@six.add_metaclass(StructureMeta)
class Structure(object):
   ...
   def unpack(self, data, trailing=False):

But to unpack from a stream something like the following is needed;

@six.add_metaclass(StructureMeta)
class Structure(object):
   ...
   def unpack(self, data, trailing=False):
   def unpack_stream(self, stream, trailing=False):
@posborne
Copy link
Contributor

Exposing this would be possible but I'm debating if it is a good idea. The current, canonical way to do this is to use StreamProtocolHandler (http://digidotcom.github.io/python-suitcase/latest/api.html#suitcase.protocol.StreamProtocolHandler). This was done for a few reasons:

  1. It removes the need for suitcase to worry about how to read bytes out of a stream. It isn't always quite as easy as calling read(). Is it blocking/non-blocking/etc.? What do you do with errors? This would be handled in user code that calls feed(data). Internally, we have some support code we use for building TCP/Serial clients/servers.
  2. Separating this logic from the Structure itself allows for other algorithms to be used for determining how to move through a stream to find the Structure. The current implementation has some features like leading Magic field sequences (as is typical of SOF bytes in serial protocols) that might not be wanted for every use case.

If you have a stream where you know all bytes required will be available (or which is blocking and you expect failures to be handled acceptably well by suitcase) you can use Packer (http://digidotcom.github.io/python-suitcase/latest/api.html#suitcase.structure.Packer) directly which implements unpack_stream:

Structure._packer.unpack_stream(stream)

This violates encapsulation, so if there is a use case, I think I would be inclined to expose unpack_stream on Structure directly. I'll need to think a bit more about how to work in the trailing field stuff that has been added recently. The current Packer.unpack_stream doesn't deal with that directly.

@harlowja
Copy link
Author

I'm fine with 'Structure._packer.unpack_stream(stream)' exposing.

The reason is that I have the following similar state machine in a open review/code, and suitcase (with an exposed Structure._packer) seems like it can do the same thing but more elegantly, but my library needs to be able to do it in a streaming (or non-streaming) manner.

https://review.openstack.org/#/c/244376/20/taskflow/utils/executor_utils.py (see Reader class there and its feed method), the features of that reader class that I need/like are the following:

  • Can stream endlessly (and dispatch when a object has been fully created).
  • Can stream up to a limit (aka, a limit of 1 and a feed routine like in class Channel (see method _do_recv in that class in that same executor_utils.py file that uses this to read field by field)
  • Exposes how many bytes are needed for the next read (so that if u have a blocking channel u can determine how many bytes to fetch).

@harlowja
Copy link
Author

The other example that might be useful in that executor_utils.py file is to look at def handle_read(self): which uses https://docs.python.org/2/library/asyncore.html to get notified when data is available (the streaming case).

@harlowja
Copy link
Author

Btw, if u are going to recommend just using feed, there needs to be a way to know how many more bytes are needed to feed something. That is needed to say read from a socket, and know how many bytes to read for feeding (to make the next object get dispatched)...

That's also why in that code above it has:

  @property
  def bytes_needed(self):

@posborne
Copy link
Contributor

Thanks for the context @harlowja. That always helps. I'll probably move forward with exposing unpack_stream. The rest here is mostly bikeshedding.

Btw, if u are going to recommend just using feed, there needs to be a way to know how many more bytes are needed to feed something

Unfortunately, this is not possible in the general case for all Structures. There is a bytes_required on fields that fields must be able to provide prior to that field being read (so we can read the right number of bytes). Even in that case, however, there is an exception made for greedy payloads that are bound by the size of the containing packet (take for instance a UDP datagram) or other sequences that mark the end of the structure (EOF marker). An attempt could be made that would return None when the length is not determinable based on available information, but I would need to be further convinced of the utility of such functionality.

The idea with feed is that you would do non-blocking reads (as would be the case with asyncore) of whatever max chunk size makes sense for your application and feed all of those bytes into the parser and let it handle the rest. If you need to do synchronous IO, you can feed one byte at a time. The OS will probably buffer for you anyway so it's not all that bad (though there would definitely be some extra copies within suitcase at the moment [although for small string buffers in CPython, maybe not]).

@harlowja
Copy link
Author

`feed one byte at a time' :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants