Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot build IANA tz database 2022b #149

Open
deborahgoldsmith opened this issue Aug 11, 2022 · 10 comments
Open

Cannot build IANA tz database 2022b #149

deborahgoldsmith opened this issue Aug 11, 2022 · 10 comments

Comments

@deborahgoldsmith
Copy link

The latest commit (as of this report) of onetrueawk cannot built release 2022b of the IANA tz database.

Steps to reproduce

  1. Use latest commit from this repository; build awk and install in $PATH
  2. Check out tag 2022b from https://github.com/eggert/tz
  3. make rearguard_tarballs

Result:

awk: syntax error at source line 110 source file ziguard.awk
 context is
	      stdoff_column = 2 * >>>  / <<< ^Zone/ + 1
awk: illegal statement at source line 110 source file ziguard.awk
awk: illegal statement at source line 110 source file ziguard.awk
make: *** [main.zi] Error 2

Regression:
Works in FreeBSD 13.0 (earlier commit of onetrueawk, version 20190529)
Works with gawk among others

@johnhawkinson
Copy link

A more compact one-liner test case:

jhawk@lrr ~ % echo foo | /usr/bin/awk '{stdoff_column = 2 * /^Zone/ + 1}'  
/usr/bin/awk: syntax error at source line 1
 context is
	{stdoff_column = 2 * /^Zone/ >>>  + <<<  1}
/usr/bin/awk: illegal statement at source line 1

Versus gawk:

jhawk@lrr ~ % echo foo | gawk '{stdoff_column = 2 * /^Zone/ + 1}'          
jhawk@lrr ~ % 

@guyharris
Copy link

The current Single UNIX Specification page for awk says

When an ERE token appears as an expression in any context other than as the right-hand of the '˜' or "!˜" operator or as one of the built-in function arguments described below, the value of the resulting expression shall be the equivalent of:

$0 ˜ /ere/

I presume that /^Zone/ in 2 * /^Zone/ + 1 is an "ERE token".

That spec speaks of "ERE tokens", which appear to be of the form "/ere/", but I don't see any specification of what an "ERE token" is in the spec.

@johnhawkinson
Copy link

That spec speaks of "ERE tokens", which appear to be of the form "/ere/", but I don't see any specification of what an "ERE token" is in the spec.

See Lexical Conventions in the spec:

  1. The token ERE represents an extended regular expression constant. An ERE constant shall begin with the <slash> character. Within an ERE constant, a <backslash> character shall be considered to begin an escape sequence as specified in the table in XBD File Format Notation. In addition, the escape sequences in Escape Sequences in awk shall be recognized. The application shall ensure that a <newline> does not occur within an ERE constant. An ERE constant shall be terminated by the first unescaped occurrence of the <slash> character after the one that begins the ERE constant. The extended regular expression represented by the ERE constant shall be the sequence of all unescaped characters and values of escape sequences between, but not including, the two delimiting <slash> characters.

@millert
Copy link
Contributor

millert commented Aug 11, 2022

Placing the regex in parens makes the yacc grammar happy and appears to produce the correct result.

@millert
Copy link
Contributor

millert commented Aug 11, 2022

The obvious fix is to simply add add:

        | re

to the end of the term rule in awkgram.y but that does increase the shift/reduce and reduce/reduce conflicts. The tests still pass though ;-)

@plan9
Copy link
Collaborator

plan9 commented Aug 11, 2022

hi deborah, thanks for the report.
I have a freebsd13 at hand, 20190529 release gives the same error.I have tested earlier versions as well.
so it is not a change in our release of awk that now fails the IANA tz database build.

@eggert
Copy link

eggert commented Aug 12, 2022

20190529 release gives the same error.

Yes, unfortunately I tested 20190529 incorrectly and so I mistakenly told Deborah that 20190529 did not have the bug. Sorry about that. I.e., this is not a regression (though it is still a bug).

For what it's worth, Solaris 10 /usr/bin/nawk (which has a version string saying "Oct 11, 1989") has the same bug. And similarly for Solaris 10 /usr/bin/awk (aka "oawk"), which has no version string but is even older. Evidently the bug has been around for a while.

@plan9
Copy link
Collaborator

plan9 commented Aug 12, 2022

Evidently the bug has been around for a while.

yep, I've tested all those, including sol 8/10 nawk, awk, as well as MKS awk which solaris shipped as sys5 awk.

@plan9
Copy link
Collaborator

plan9 commented Aug 26, 2022

@millert obvious fix gives us 225 reduce/reduce. we can remove re from | re | term combinations, that reduces the reduce/reduce conflics somewhat, better but not great. to be continued.

@arnoldrobbins
Copy link
Collaborator

@plan9 The grammar is definitely an area where "Here there be dragons." Tread very, very, carefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants