Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spaces are incorrectly elided in strings defined by macros #5

Open
bcantrill opened this issue Aug 1, 2023 · 4 comments
Open

spaces are incorrectly elided in strings defined by macros #5

bcantrill opened this issue Aug 1, 2023 · 4 comments

Comments

@bcantrill
Copy link

Take the following:

$ cat wtf.cpp
#define WTF     "    this    is     a    string     with      spaces"

"    this    is     a    string     with      spaces"
WTF

This is the kind of output we would expect:

$ cat wtf.cpp | /opt/gcc-12/bin/cpp
# 0 "<stdin>"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "<stdin>"


"    this    is     a    string     with      spaces"
"    this    is     a    string     with      spaces"

In antiquarian CPP, however:

$ cat wtf.cpp | ./cpp/src/cpp
# 1 "" 


"    this    is     a    string     with      spaces"
" this is a string with spaces"

This is due to fixing TritonDataCenter/smartos-live#171, and in particular, this code:

cpp/src/cpp.c

Lines 1307 to 1318 in f75fd24

/*
* If we're mid-paste, compress spaces to a
* single space
*/
while (isspace(*vp)) {
if (!isspace(vp[1])) {
*vp = ' ';
break;
} else {
vp--;
}
}

Unforunately, this behavior is causing a DTrace test suite failure, as macros in https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/dtrace/test/tst/common/json/tst.general.d contain spaces that are being incorrectly elided.

It's unclear how hard this would be to fix or if the cure is worse than the disease on this one. If DTrace is the only consumer (but it sounds like as is also still a consumer?), it would probabably be be worth moving to an ANSI-compatible CPP in /usr/lib.

@jclulow
Copy link
Member

jclulow commented Aug 2, 2023

@bcantrill I suspect it would be valuable to investigate the preprocessor from the Portable C Compiler as a potential replacement. I've built a cpp binary from the pcc-20230801 snapshot and attached it here:

cpp.gz

It seems to be small (234KiB!) and self-contained (unlike, say, the preprocessor from GCC or Clang), and supports C99.

Could you replace your /usr/lib/cpp with the attached binary and see how the tests fare?

@jclulow
Copy link
Member

jclulow commented Aug 2, 2023

@bcantrill
Copy link
Author

Well, this has been an antiques roadshow. So, the PCC cpp gets different failures -- which you can ultimately reduce to (e.g.) this:

#include <sys/machtypes.h>

BEGIN
{
        exit(0);
}

This fails with the PCC cpp:

# dtrace -I/usr/include -x cpppath=/data/cpp.pcc -Cs ./machtypes.d 
dtrace: failed to compile script ./machtypes.d: "/usr/include/sys/machtypes.h", line 59: type redeclared: struct _label_t

But this works with both Sun cpp and GCC cpp:

# dtrace -I/usr/include -x cpppath=/data/SUNWdtrt-deps/gcc-12/bin/cpp -Cs ./machtypes.d 
<command-line>: warning: "__STDC__" redefined
dtrace: script './machtypes.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
 77      1                           :BEGIN 

Here is the output of Sun cpp running on the script (blank lines removed):

# cat ./machtypes.d | /data/cpp.orig
# 1 "" 
# 1 "/usr/include/sys/machtypes.h" 1
typedef struct _label_t { long val[6]; } label_t;
typedef unsigned char   lock_t;         
# 67 "/usr/include/sys/machtypes.h" 
# 2 "" 2
BEGIN
{
        exit(0);
}

Here is the output of GCC cpp:

# 0 "<stdin>"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "<stdin>"
# 1 "/usr/include/sys/machtypes.h" 1 3 4
# 59 "/usr/include/sys/machtypes.h" 3 4
# 59 "/usr/include/sys/machtypes.h" 3 4
typedef struct _label_t { long val[8]; } label_t;
typedef unsigned char lock_t;
# 2 "<stdin>" 2
# 3 "<stdin>"
BEGIN
{
 exit(0);
}

By contrast, here is the output of PCC cpp on the same:

# 1 "<stdin>"
# 1 "/usr/include/sys/machtypes.h"
# 59 "/usr/include/sys/machtypes.h"
typedef struct _label_t { long val[6]; } label_t;
typedef unsigned char   lock_t;          
# 1 "<stdin>"
BEGIN
{
exit(0);
}

In particular, note the difference in control line: both Sun cpp and GCC cpp have (at least) a third integer present. Here is the explanation of this from the code that parses it in libdtrace: https://github.com/illumos/illumos-gate/blob/69764de32bf29a19d2a8b8832adafc4bd4f934f4/usr/src/lib/libdtrace/common/dt_pragma.c#L380-L391

And here is the explanation from GCC about its meaning: https://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html

So this directive is -- for the moment anyway -- load bearing for DTrace; we will need to either find a way to emit the directive from PCC's cpp or for DTrace to not be dependent upon it in order to use PCC's cpp as a potential replacement.

@richlowe
Copy link
Member

richlowe commented Aug 2, 2023

I might have dealt with this in the work that is encapsulated by: #1

An ancient effort I don't remember to make this cpp and the Sun one as bug-for-bug compatible as I could muster (I don't remember how much I could muster, either).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants