ag pattern [file...]
The AG utility shall search input files, selecting lines matching a pattern. An input line shall be selected if the pattern, treated as a subset of Extended Regular Expressions as described by POSIX.2 Standard section 2.8.4, matches any part of the line; a null pattern shall match every line. Each selected input line shall be written to the standard output.
Regular expression matching shall be based on text lines, which are
separated by a <newline>
character. A regular expression cannot
contain a <newline>
and there is no way for a pattern to match a
<newline>
found in the input.
None.
-
pattern
Specify one pattern to be used during the search for input. See Extended Regular Exprssions below on what constitutes a valid pattern. -
file
A pathname of a file to be searched for the pattern. If no file operands are specified or the file operand is dash (-
), the standard input shall be used.
The standard input shall be used only if no file operands are specifed. See Input Files.
The input files shall be text files.
No environment variables are used.
Default.
For each selected input line from a file, a single output line shall be written:
"%s:%s\n", file, line
Standard input will have the filename dash (-
).
Standard error is used for diagnostic messages only.
None.
An ERE ordinary character, a special character preceded by a backslash, or a period shall match a single character. A bracket expression shall match a single character. An ERE matching a single character enclosed in parentheses shall match the same as the ERE without parentheses would have matched.
An oridinary character is an ERE that matches itself. An ordinary
character is any character in the supported character set, except for
the ERE special characters listed in 7.1.1.2. The interpretation of
an ordinary character preceded by a backslash (\
) is the ordinary
character.
An ERE special character has special properties in certain contexts. Outside of those contexts, or when preceded by a backslash, such a character shall be an ERE that matches the special character itself. The extended regular expression special characters and the contexts in which they have their special meaning are:
. [ \ (
The period, left-bracket, backslash, and left-
parenthesis are special except when used in a
bracket expression. Outside a bracket
expression, a left-parenthesis immediately
followed by a right-parenthesis will be
treated as a null expression.
)
The right-parenthesis is special when matched
with a preceding left-parenthesis, both
outside of a bracket expression.
* ?
The asterisk and question-mark are special
except when used in a bracket expression.
Results are undefined if this character
appears first in an ERE, or immediately
following a vertial-line, circumflex, or
left-parenthesis.
|
The vertical-line is special except when used
in a bracket expression. A vertical-line
appearing first or last in an ERE or
immediately following a vertical-line or
left-parenthesis, or preceding a
right-parenthesis, produces a null expression.
^
The circumflex shall be special when used as
an anchor or, as the first character of a
bracket expression.
$
The dollar-sign shall be special when used as
an anchor.
A period (.
), when used outside of a bracket expression, is an ERE
that shall match any character in the supported character set except
NUL.
A bracket expression (an expression enclosed in square brackets, []), is an ERE that matches a single character contained in the nonempty set of characters represented by the bracket expression.
The following rules and definitions apply to bracket expressions:
-
A bracket expression is either a matching list expression or a nonmatching list expression. It consists of one or more characters and/or range expressions. The right-bracket (
]
) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list [after an initial circumflex (^
), if any]. Otherwise, it shall terminate the bracket expression. The special characters period (.
), asterisk (*
), left-bracket ([
), and backslash (\
) shall lose their special meaning within a bracket expression. -
A matching list expression specifies a list that shall match any one of the expressions represented in the list. The first character of the list shall not be the circumflex. For example, [abc] is an ERE that matches any of a, b, or c.
-
A nonmatching list expression begins with a circumflex (
^
), and specifies a list that shall match any character except for the expressions represented in the list after the leading circumflex. For example,[^abc]
is an ERE that matches any character except a, b, or c. The circumflex shall have this special meaning only when it occurs first in the list, immediately following the left-bracket. -
A range expression represents the set of characters that fall between two elements in the current collation sequence, inclusive. It shall be expressed as the starting point and the ending point separated by a hyphen (
-
).In the following, all examples assume the collation sequence specified for the POSIX Locale (a.k.a ASCII Character Set), unless another collation sequence is sepecifically defined by the hardware.
The starting range point and the ending range point shall be a collating element (character). The ending range point shall collate higher than the starting range point; otherwise the expression shall be treated as invalid.
The interpretation of a range expression where the ending range point is the starting range point of a subsequent range expression is undefined.
The hyphen character shall be treated as itself if it occurs first (after an initial
^
, if any) or last in the list, or as an ending range point in a range expression. As examples, the expressions[-ac]
and[ac-]
are equivalent and match any of the characters a, c, or -; the expressions[^-ac]
and[^ac-]
are equivalent and match ant character except a, c, -; the expression[%--]
matches any of the characters between%
and-
inclusive; the expression[--@]
matches any of the characters between-
and@
inclusive; and the expression[a--@
] is invalid, because the lettera
follows the symbol-
in the POSIX Locale.
The following rules shall be used to construct EREs matching multiple characters from EREs matching a single character:
-
A concatenation of EREs shall match the concatenation of the character sequences matched by each component of the ERE. A concatenation of EREs enclosed in parentheses shall match whatever the concatenation without parentheses matches. For example, both the ERE cd and the ERE (cd) are matched by the third and forth character of the string abcdefabcdef.
-
When an ERE matching a single character, or a concatention of EREs enclosed in parentheses is followed by the special character asterisk (
*
), together with that asterisk it shall match what zero or more consecutive occurences of the ERE would match. For example, the EREb*c
matches the first character in the string cabbbcde, and the ERE b*cd matches the third through seventh characters in the string cabbbcdebbbbbbcdbc. And,[ab]*
and[ab][ab]
are equivalent when matching the string ab. -
When an ERE matching a single character or an ERE enclosed in parentheses is followed by the special character question-mark (
?
), together with that question mark it shall match what zero or one consecutive occurences of the ERE would match. For example, the EREb?c
matches the second character in the string acabbbcde.
The behaviour of multiple adjacent duplication symbols (*, ?) produces undefined results.
Two EREs separated by the special character vertical-line (|
) shall
match a string that is matched by either. For example, the ERE
a((bc)|d)
matches the string abc and the string ad. Single
characters, or expressions matching single characters, separated by
the vertical-line and enclosed in parentheses, shall be treated as an
ERE matching a single character.
The order of precedence shall be as shown in the table below, from high to low:
escaped characters \<special character>
bracket expression [ ]
grouping ( )
single-character-ERE duplication * ?
concatenation
anchoring ^ $
alteration |
For example, the ERE abba|cde matches either the string abba or the string cde (because concatenation has a higher order of precedence than alteration).
An ERE can be limited to matching strings that begin or end a line; this is called anchoring. The circumflex and dollar-sign special characters shall be considered ERE anchors in the following contexts:
-
A circumflex (
^
) outside a bracket expression shall anchor the (sub)expression it begins to the beginning of a string; such a (sub)expression can match only a sequence starting at the first character of a string. For example, the EREs ^ab and (^ab) match ab in the string abcdef, but fail to match in the string cdefab, and the ERE a^b is valid, but can never match because the character a prevents the expression ^b from matching the first character of a string. -
A dollar-sign (
$
) outside a bracket expression shall anchor the (sub)expression it ends to the end of a string; such a (sub)expression can match only a sequence ending at the last character of a string. For example, the EREs ef$ and (ef$) match ef in the string abcdef, but fail to match in the string cdefab, and the ERE e$f is valid, but can never match because the character f prevents the expression e$ from matching the last character of a string.
The following exit codes will be returned:
- 0 Successful match.
- 1 No match.
- 2 Usage error.
- 3 General error.
AG can be built on any system providing at least K&R C. It has been tested on
- SunOS with GCC
- ATARI Mega ST with Sozobon C
- PC clone with Turbo C and WatCom C
- Interactive UNIX System V/386 release 3.2
For most machines, the compile command line should be
cc -O -o ag ag.c
PC class machines may require that medium or large memory model be used and that ARRAY be set to 256. For Watcom C the command line would be
wcl -ml -DARRAY=256 ag.c
The value ARRAY represents the size of the compiled pattern array and the size of the array used to record matched states. The default value choosen should handle most patterns. ARRAY can be overridden on the command line with -DARRAY=nn, where 255 < nn < INT_MAX.
-
Aho, Sethi, and Ullman, "Compilers - Principles, Techniques, and Tools", Addison-Wesley, 86, ISBN 0-201-10088-6, chapter 3
-
Kernighan and Plauger, "Software Tools in Pascal". Addison-Wesley, 81, ISBN 0-201-10342-7, chapter 5
-
Webb Miller, "A Software Tools Sampler", Prentice Hall, 87, ISBN 0-13-822305-X, chaper 4
-
POSIX.2 Standard, 2.8.4 Extended Regular Expressions, 4.28 Grep
The expression (a*)*
compiles but loops forever.
There is no check for trailing backslash (\
) in the pattern.
There is no check for unbalanced brackets. Omitting a closing bracket will generate a "Patern too long" error, which is not the real error.
Public Domain 1992, 1993 by Anthony Howe. No warranty.