Skip to content

Commit

Permalink
remove pointless SVPV mortal alloc in S_strip_spaces()/prototype parser
Browse files Browse the repository at this point in the history
Valid, parseable, and sane prototypes, are tiny in char len and often fit
on 1 hand. Original commit referred to mitigating sloppy XS code with
random white space. Majority of XS code will have perfect clean prototype
strings. Do not SV Alloc, PV Alloc, MEXTENDPUSH, memcpy(), and alot more
free()s in scope _dec(), for clean strings. Even for dirty but parsable
prototypes, they will be tiny. Therefore use a tiny stack buffer for
dirty semi-hot path to remove overhead. Fuzzing, junk, abuse, can OOM die
in newSV()/malloc() if needed, same as in prior version of the code.

Use newSV(len) and POK_off, SV head is private to us, and a waste to
bookkeep SVPV details. SAVEFREEPV() was not used, because previous code
did mortal, and not SAVEFREEPV(), so keep using mortal. One day after
testing by public, maybe SAVEFREEPV() is smarter choice here, but KISS
in this commit.

The size of the stack buffers should probably be 8 or 16 bytes to cover
legit protoype strings. I made the buffers larger, simply, because I can,
and there is no machine code size difference on x86/x64 between 16 and
the numbers picked.

The numbers are from 2 different binary analysis tools of perl541.dll on
x64 Windows, -O1, MSVC 2022. The numbers are "width" or "size" or
"overhead" in bytes of the C stack frames, of the 3 callers of
S_strip_spaces. My rational is, by keeping width of the C stack frame,
under 0xFF bytes, x86_op+stk_reg+imm8 encoding is emitted by CCs. Instead
of x86_op+stk_reg+imm32 encoding which is larger.

So the math is 0xFF-current_frame_size-(5*ptrs). -(5*ptrs) accounts for
future P5P C auto vars or changes to the C code of the 3 callers, and
whatever GCC vs Clang vs each CC build number uniqueness, so x86/x64 CCs
only use stk_reg+imm8_offset instructions and never resort to writing out
32b offsets in machine code. As a guesstimate "/2" the stack frame width
for i386 CPUs.

Perl_cv_ckproto_len_flags(), has 6 args, therefore its inelligible for
Win64's 4 register __fastcall ABI, and args 5 and 6, must be read off the
C stack per ABI. So even if small U8-U64 C auto vars, are at the "top" of
the C stack, and reached with +imm8 operands, obv the CC still has to
write 2 "lone" read(+imm32) ops to read arg 5 and 6. There are tricks to
optimize out +imm32 to reach incoming args, but thats for a CC vendor talk.

Anyways, 0x10/16 or 0x20/32 is the realistic buffer size, the higher
lengths here, are simply because, in theory, #1 avoid malloc() always,
#2 no perf, runtime, or machine code size diff between 0x20 and my numbers.
2 different tools were used, I picked the "larger" numbers C stack size
report number, to make the cleanedproto buffers even smaller, so this
"CC only uses +imm8 op to r/w C stack" optimization lasts for years, not
1 build number of GCC/VC/LLVM.

statistics

Perl_ck_entersub_args_proto 0x88/0x48
yyl_subproto 0x28/0x20
Perl_cv_ckproto_len_flags 0x68/0x30

S_strip_spaces() was added in

d16269d 6/24/2013 5:58:46 PM
Remove spaces from a (copy of) a proto when used. The logic that *CUT*
  • Loading branch information
bulk88 committed Oct 10, 2024
1 parent 82c4939 commit aa0a0e9
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 18 deletions.
50 changes: 36 additions & 14 deletions inline.h
Original file line number Diff line number Diff line change
Expand Up @@ -261,25 +261,47 @@ Perl_CvDEPTH(const CV * const sv)
Since we can't enforce the spacelessness at assignment time, this routine
provides a temporary copy at parse time with spaces removed.
I<orig> is the start of the original buffer, I<len> is the length of the
prototype and will be updated when this returns.
prototype and will be updated when this returns. I<buf_out> is an optional
small C stack buffer, large enough for len+1 bytes. If I<buf_out> is NULL,
a temporary self-freeing new buffer (mortal but subject to change), will be
malloc-ed and returned. I<buf_out> by design should be NULL for large, and
probably invalid, junk, fuzzing, or syntax error, prototype strings. Caller
should bounds check and compare C stack buffer I<buf_out> size vs prototype
len and pass NULL for I<buf_out> to signal that I<malloc> type memory must
be alloced if needed. Note, this function will return the I<orig> ptr, as an
optimization if the string is already clean of spaces.
*/

#ifdef PERL_CORE
PERL_STATIC_INLINE char *
S_strip_spaces(pTHX_ const char * orig, STRLEN * const len)
S_strip_spaces(pTHX_ char * buf_out, const char * orig, STRLEN * const ptolen)
{
SV * tmpsv;
char * tmps;
tmpsv = newSVpvn_flags(orig, *len, SVs_TEMP);
tmps = SvPVX(tmpsv);
while ((*len)--) {
if (!isSPACE(*orig))
*tmps++ = *orig;
orig++;
}
*tmps = '\0';
*len = tmps - SvPVX(tmpsv);
return SvPVX(tmpsv);
STRLEN len = *ptolen;
const char * orig_p = orig;
const char * orig_end = orig + len;

while (orig_p < orig_end) {
if(isSPACE(*orig_p)) { /* no memchr()! think memspn() */
char * tmps;
char * tmps_p;
if(buf_out)
tmps = buf_out;
else
tmps = SvPVX(sv_2mortal(newSV(len+1)));
tmps_p = tmps+(orig_p-orig);
Move(orig, tmps, (Size_t)(orig_p-orig), char);
while (orig_p < orig_end) {
if(!isSPACE(*orig_p))
*tmps_p++ = *orig_p;
orig_p++;
}
*tmps_p = '\0';
*ptolen = tmps_p - tmps;
return tmps;
}
orig_p++;
} /* ptolen untouched */
return (char *)orig;
}
#endif

Expand Down
20 changes: 17 additions & 3 deletions op.c
Original file line number Diff line number Diff line change
Expand Up @@ -9977,6 +9977,11 @@ void
Perl_cv_ckproto_len_flags(pTHX_ const CV *cv, const GV *gv, const char *p,
const STRLEN len, const U32 flags)
{
/* protos > ~8ch are likely junk. See git for why 0x68. */
struct { /* struct recovers 15-19 alignment bytes from C compiler */
char p [(0xff-(0x68/(PTRSIZE>=8?1:2))-(PTRSIZE*5))/2];
char cv [(0xff-(0x68/(PTRSIZE>=8?1:2))-(PTRSIZE*5))/2];
} cleanbuf;
SV *name = NULL, *msg;
const char * cvp = SvROK(cv)
? SvTYPE(SvRV_const(cv)) == SVt_PVCV
Expand All @@ -9994,8 +9999,12 @@ Perl_cv_ckproto_len_flags(pTHX_ const CV *cv, const GV *gv, const char *p,
return;

if (p && cvp) {
p = S_strip_spaces(aTHX_ p, &plen);
cvp = S_strip_spaces(aTHX_ cvp, &clen);
p = S_strip_spaces(aTHX_
(plen <= sizeof(cleanbuf.p)-1 ? cleanbuf.p : NULL),
p, &plen);
cvp = S_strip_spaces(aTHX_
(clen <= sizeof(cleanbuf.cv)-1 ? cleanbuf.cv : NULL),
cvp, &clen);
if ((flags & SVf_UTF8) == SvUTF8(cv)) {
if (plen == clen && memEQ(cvp, p, plen))
return;
Expand Down Expand Up @@ -14362,6 +14371,8 @@ OP *
Perl_ck_entersub_args_proto(pTHX_ OP *entersubop, GV *namegv, SV *protosv)
{
STRLEN proto_len;
/* protos > ~8ch are likely junk. See git for why 0x88. */
char protocleanbuf [0xff-(0x88/(PTRSIZE>=8?1:2))-(PTRSIZE*5)];
const char *proto, *proto_end;
OP *aop, *prev, *cvop, *parent;
int optional = 0;
Expand All @@ -14375,7 +14386,10 @@ Perl_ck_entersub_args_proto(pTHX_ OP *entersubop, GV *namegv, SV *protosv)
if (SvTYPE(protosv) == SVt_PVCV)
proto = CvPROTO(protosv), proto_len = CvPROTOLEN(protosv);
else proto = SvPV(protosv, proto_len);
proto = S_strip_spaces(aTHX_ proto, &proto_len);
proto = S_strip_spaces(aTHX_
(proto_len <= sizeof(protocleanbuf)-1
? protocleanbuf : NULL),
proto, &proto_len);
proto_end = proto + proto_len;
parent = entersubop;
aop = cUNOPx(entersubop)->op_first;
Expand Down
7 changes: 6 additions & 1 deletion toke.c
Original file line number Diff line number Diff line change
Expand Up @@ -6281,11 +6281,16 @@ yyl_colon(pTHX_ char *s)
static int
yyl_subproto(pTHX_ char *s, CV *cv)
{
/* protos > ~8ch are likely junk. See git for why 0x28. */
char protocleanbuf [0xff-(0x28/(PTRSIZE>=8?1:2))-(PTRSIZE*5)];
STRLEN protolen = CvPROTOLEN(cv);
const char *proto = CvPROTO(cv);
bool optional;

proto = S_strip_spaces(aTHX_ proto, &protolen);
proto = S_strip_spaces(aTHX_
(protolen <= sizeof(protocleanbuf)-1
? protocleanbuf : NULL),
proto, &protolen);
if (!protolen)
TERM(FUNC0SUB);
if ((optional = *proto == ';')) {
Expand Down

0 comments on commit aa0a0e9

Please sign in to comment.