-
Notifications
You must be signed in to change notification settings - Fork 175
JPEG parser #5
Comments
and formatted more nicely: class FastReader(Construct):
def _parse(self, stream, context):
return stream.read()
def _build(self, obj, stream, context):
stream.write(obj)
SegBody = Struct(None,
UBInt16('size'),
Field('data', lambda ctx: ctx['size'] - 2),
)
Seg = Struct('seg',
Literal('\xff'),
Byte('kind'),
Switch('body', lambda c: c['kind'],
{
SOS: FastReader('data'),
},
default = Embed(SegBody),
)
)
JPEG = Struct('jpeg',
Literal('\xff\xd8'),
GreedyRange(Seg),
) |
Hi, I'm not sure about the FastReader, as I still don't grok that section of Construct yet. There is a PascalString, in construct.macros, which takes a length_field as a kwarg. An example usage:
Thanks for your comments. Let me know if you have any patches you wish to contribute. |
Hi - the issue with the PascalString is that the length field doesn't include the bytes that make up the length field. In several protocols, we get fields like this, 0x0004babe, so the length (4) include the first 2 bytes. |
@akvadrako: this could be done like so
on the other hand, your straight forward solution is better. as per your
you would be able to build anything you want, but you'll never be able to parse it back. |
I suggested a variant to PascalString because length+data is common in network protocols and apparently JPEG too. FastReader is the best we can do with construct's internals. Your example wouldn't work with RepeatUntil and Range either. I'm not sure it should - since constructs need to know about future constructs and you'll get ambiguity:
Probably better to make a FastReadUntil('BOUNDARY'). |
Length + data is perfectly serviced by PascalString; the case where the length of the length is included in the length is actually rather uncommon though. Maybe a new String subclass is needed for it. As far as "fast" reading, why not examine other optimizations first? There are optimization opportunities in Construct core, I think. |
@MostAwesomeDude: no need to subclass, it would be much simpler to just define a InclusivePascalString "macro" that takes care of subtracting/adding the size of the length field from the length. @akvadrako: your "fast" reader isn't any faster than the plain old Field except that it doesn't check the length. since this greedy construct can only appear once at the end of a data structure, it don't suppose it would make much difference in terms of speed. also, my tests back in the day showed that psycho can speed up parsing by a tenfold. on the other hand, as you said, it poses a problem of breaking the symmetry between parsing and building... but i think it's inherent to the pattern and there isn't any real solution. |
it's much faster - construct is unusable for parsing JPEG images without it - where 99% of the data is an unbounded blob at the end of the file. |
if you're using what do you mean, though, that 99% of the file is a blob? doesn't it have an internal structure? if so, i assume you have no real interest in it, so you may want to use |
Yes, you are correct. OnDemand doesn't help though, because it requires a known length. |
well, i just had an idea: assuming you're working on a file/stringIO, you can write a construct that simply returns the remaining length till EOF. e.g.
and then you could combine it with |
This parses EXIF and JFIF files. this is my first construct, and while making it I noticed I was missing a couple things:
class FastReader(Construct):
def _parse(self, stream, context):
return stream.read()
SegBody = Struct(None,
UBInt16('size'),
Field('data', lambda ctx: ctx['size'] - 2),
)
Seg = Struct('seg',
Literal('\xff'),
Byte('kind'),
Switch('body', lambda c: c['kind'],
{
SOS: FastReader('data'),
},
default = Embed(SegBody),
)
)
JPEG = Struct('jpeg',
Literal('\xff\xd8'),
GreedyRange(Seg),
)
The text was updated successfully, but these errors were encountered: