1 |
# @(#)POSIX 8.1 (Berkeley) 6/6/93 |
2 |
# $FreeBSD: stable/10/usr.bin/sed/POSIX 168417 2007-04-06 08:43:30Z yar $ |
3 |
# $MidnightBSD$ |
4 |
|
5 |
Comments on the IEEE P1003.2 Draft 12 |
6 |
Part 2: Shell and Utilities |
7 |
Section 4.55: sed - Stream editor |
8 |
|
9 |
Diomidis Spinellis <dds@doc.ic.ac.uk> |
10 |
Keith Bostic <bostic@cs.berkeley.edu> |
11 |
|
12 |
In the following paragraphs, "wrong" usually means "inconsistent with |
13 |
historic practice", as most of the following comments refer to |
14 |
undocumented inconsistencies between the historical versions of sed and |
15 |
the POSIX 1003.2 standard. All the comments are notes taken while |
16 |
implementing a POSIX-compatible version of sed, and should not be |
17 |
interpreted as official opinions or criticism towards the POSIX committee. |
18 |
All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. |
19 |
|
20 |
1. 32V and BSD derived implementations of sed strip the text |
21 |
arguments of the a, c and i commands of their initial blanks, |
22 |
i.e. |
23 |
|
24 |
#!/bin/sed -f |
25 |
a\ |
26 |
foo\ |
27 |
\ indent\ |
28 |
bar |
29 |
|
30 |
produces: |
31 |
|
32 |
foo |
33 |
indent |
34 |
bar |
35 |
|
36 |
POSIX does not specify this behavior as the System V versions of |
37 |
sed do not do this stripping. The argument against stripping is |
38 |
that it is difficult to write sed scripts that have leading blanks |
39 |
if they are stripped. The argument for stripping is that it is |
40 |
difficult to write readable sed scripts unless indentation is allowed |
41 |
and ignored, and leading whitespace is obtainable by entering a |
42 |
backslash in front of it. This implementation follows the BSD |
43 |
historic practice. |
44 |
|
45 |
2. Historical versions of sed required that the w flag be the last |
46 |
flag to an s command as it takes an additional argument. This |
47 |
is obvious, but not specified in POSIX. |
48 |
|
49 |
3. Historical versions of sed required that whitespace follow a w |
50 |
flag to an s command. This is not specified in POSIX. This |
51 |
implementation permits whitespace but does not require it. |
52 |
|
53 |
4. Historical versions of sed permitted any number of whitespace |
54 |
characters to follow the w command. This is not specified in |
55 |
POSIX. This implementation permits whitespace but does not |
56 |
require it. |
57 |
|
58 |
5. The rule for the l command differs from historic practice. Table |
59 |
2-15 includes the various ANSI C escape sequences, including \\ |
60 |
for backslash. Some historical versions of sed displayed two |
61 |
digit octal numbers, too, not three as specified by POSIX. POSIX |
62 |
is a cleanup, and is followed by this implementation. |
63 |
|
64 |
6. The POSIX specification for ! does not specify that for a single |
65 |
command the command must not contain an address specification |
66 |
whereas the command list can contain address specifications. The |
67 |
specification for ! implies that "3!/hello/p" works, and it never |
68 |
has, historically. Note, |
69 |
|
70 |
3!{ |
71 |
/hello/p |
72 |
} |
73 |
|
74 |
does work. |
75 |
|
76 |
7. POSIX does not specify what happens with consecutive ! commands |
77 |
(e.g. /foo/!!!p). Historic implementations allow any number of |
78 |
!'s without changing the behaviour. (It seems logical that each |
79 |
one might reverse the behaviour.) This implementation follows |
80 |
historic practice. |
81 |
|
82 |
8. Historic versions of sed permitted commands to be separated |
83 |
by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first |
84 |
three lines of a file. This is not specified by POSIX. |
85 |
Note, the ; command separator is not allowed for the commands |
86 |
a, c, i, w, r, :, b, t, # and at the end of a w flag in the s |
87 |
command. This implementation follows historic practice and |
88 |
implements the ; separator. |
89 |
|
90 |
9. Historic versions of sed terminated the script if EOF was reached |
91 |
during the execution of the 'n' command, i.e.: |
92 |
|
93 |
sed -e ' |
94 |
n |
95 |
i\ |
96 |
hello |
97 |
' </dev/null |
98 |
|
99 |
did not produce any output. POSIX does not specify this behavior. |
100 |
This implementation follows historic practice. |
101 |
|
102 |
10. Deleted. |
103 |
|
104 |
11. Historical implementations do not output the change text of a c |
105 |
command in the case of an address range whose first line number |
106 |
is greater than the second (e.g. 3,1). POSIX requires that the |
107 |
text be output. Since the historic behavior doesn't seem to have |
108 |
any particular purpose, this implementation follows the POSIX |
109 |
behavior. |
110 |
|
111 |
12. POSIX does not specify whether address ranges are checked and |
112 |
reset if a command is not executed due to a jump. The following |
113 |
program will behave in different ways depending on whether the |
114 |
'c' command is triggered at the third line, i.e. will the text |
115 |
be output even though line 3 of the input will never logically |
116 |
encounter that command. |
117 |
|
118 |
2,4b |
119 |
1,3c\ |
120 |
text |
121 |
|
122 |
Historic implementations did not output the text in the above |
123 |
example. Therefore it was believed that a range whose second |
124 |
address was never matched extended to the end of the input. |
125 |
However, the current practice adopted by this implementation, |
126 |
as well as by those from GNU and SUN, is as follows: The text |
127 |
from the 'c' command still isn't output because the second address |
128 |
isn't actually matched; but the range is reset after all if its |
129 |
second address is a line number. In the above example, only the |
130 |
first line of the input will be deleted. |
131 |
|
132 |
13. Historical implementations allow an output suppressing #n at the |
133 |
beginning of -e arguments as well as in a script file. POSIX |
134 |
does not specify this. This implementation follows historical |
135 |
practice. |
136 |
|
137 |
14. POSIX does not explicitly specify how sed behaves if no script is |
138 |
specified. Since the sed Synopsis permits this form of the command, |
139 |
and the language in the Description section states that the input |
140 |
is output, it seems reasonable that it behave like the cat(1) |
141 |
command. Historic sed implementations behave differently for "ls | |
142 |
sed", where they produce no output, and "ls | sed -e#", where they |
143 |
behave like cat. This implementation behaves like cat in both cases. |
144 |
|
145 |
15. The POSIX requirement to open all w files at the beginning makes |
146 |
sed behave nonintuitively when the w commands are preceded by |
147 |
addresses or are within conditional blocks. This implementation |
148 |
follows historic practice and POSIX, by default, and provides the |
149 |
-a option which opens the files only when they are needed. |
150 |
|
151 |
16. POSIX does not specify how escape sequences other than \n and \D |
152 |
(where D is the delimiter character) are to be treated. This is |
153 |
reasonable, however, it also doesn't state that the backslash is |
154 |
to be discarded from the output regardless. A strict reading of |
155 |
POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". |
156 |
As historic sed implementations always discarded the backslash, |
157 |
this implementation does as well. |
158 |
|
159 |
17. POSIX specifies that an address can be "empty". This implies |
160 |
that constructs like ",d" or "1,d" and ",5d" are allowed. This |
161 |
is not true for historic implementations or this implementation |
162 |
of sed. |
163 |
|
164 |
18. The b t and : commands are documented in POSIX to ignore leading |
165 |
white space, but no mention is made of trailing white space. |
166 |
Historic implementations of sed assigned different locations to |
167 |
the labels "x" and "x ". This is not useful, and leads to subtle |
168 |
programming errors, but it is historic practice and changing it |
169 |
could theoretically break working scripts. This implementation |
170 |
follows historic practice. |
171 |
|
172 |
19. Although POSIX specifies that reading from files that do not exist |
173 |
from within the script must not terminate the script, it does not |
174 |
specify what happens if a write command fails. Historic practice |
175 |
is to fail immediately if the file cannot be opened or written. |
176 |
This implementation follows historic practice. |
177 |
|
178 |
20. Historic practice is that the \n construct can be used for either |
179 |
string1 or string2 of the y command. This is not specified by |
180 |
POSIX. This implementation follows historic practice. |
181 |
|
182 |
21. Deleted. |
183 |
|
184 |
22. Historic implementations of sed ignore the RE delimiter characters |
185 |
within character classes. This is not specified in POSIX. This |
186 |
implementation follows historic practice. |
187 |
|
188 |
23. Historic implementations handle empty RE's in a special way: the |
189 |
empty RE is interpreted as if it were the last RE encountered, |
190 |
whether in an address or elsewhere. POSIX does not document this |
191 |
behavior. For example the command: |
192 |
|
193 |
sed -e /abc/s//XXX/ |
194 |
|
195 |
substitutes XXX for the pattern abc. The semantics of "the last |
196 |
RE" can be defined in two different ways: |
197 |
|
198 |
1. The last RE encountered when compiling (lexical/static scope). |
199 |
2. The last RE encountered while running (dynamic scope). |
200 |
|
201 |
While many historical implementations fail on programs depending |
202 |
on scope differences, the SunOS version exhibited dynamic scope |
203 |
behaviour. This implementation does dynamic scoping, as this seems |
204 |
the most useful and in order to remain consistent with historical |
205 |
practice. |