Benutzer:Andreas Plank/Sed
Remember the rules:
sed reads input line by input line first, not command line by command line first, that is:
|
What it does not do, is:
|
This gets complicated, if you do multiline stuff upon that input line. In case you do multi line stuff, I recommend to first write the multi-line-command-stuff and then the other command stuff that does not apply to multiple lines. If you do it vice versa and write first not-multi-line command stuff and then multi-line-command stuff, then
- sed reads one input line, applies e.g. command N (=that is: join lines by current-input-line\nnext-input-line), apply all subsequent sed commands to this joint line
- the next input line is not this «next-input-line» but the line after it!
Text snippets for the sed running under Linux
#### file options
# -e, --expression=… → execute
# -f, --file= → file: script file
# -i, --in-place → insert into file: edit file in place
# -l 40, --line-length 40 → specify the desired line-wrap length for the “l” command
# -n, --quiet, --silent: nothing i.e. quiet
# -r, --regexp-extended → extended regular expressions
# -s, --separate → separate: consider files as separate
# -u, --unbuffered → unbuffered = load minimal amounts of data from the input files and flush the output buffers more often
#### actions
# a → append action (after)
# $a append after last line
# c → change: You can replace the current line with the ‘c’ action
# d → delete action
# i → insert action (before)
# 1i insert before 1st line
# n → ?read the next line
# p → print action
# q → quit immediately without further processing
#### ACTIONS
# d and D → delete
# d command deletes the current pattern space, reads in the next line, puts the new line into the pattern space, and aborts the current command, and starts execution at the first sed command. This is called starting a new "cycle."
# D command deletes the first portion of the pattern space, up to the new line character, leaving the rest of the pattern alone.
# n and N → next
# n command will print out the current pattern space (unless the "--quiet, --silent or -n" flag is used), empty the current pattern space, and read in the next line of input.
# N command does *not* print out the current pattern space and does *not* empty the pattern space. It reads in the next line, but appends a new line character along with the input line itself to the pattern space:
# first-whole-line ┬→ first-whole-line\nsecond-whole-line
# second-whole-line ┘
# h and H → Hold
# h command copies the pattern buffer into the hold buffer. The pattern buffer is unchanged. An identical script to the above uses the hold
# H command allows you to combine several lines in the hold buffer. It acts like the "N" command as lines are appended to the buffer, with a "\n" between the lines. You can save several lines in the hold buffer, and print them only if a particular pattern is found later.
#### addresses for instance with p → print
'1,10p' # line 1 to 10
'/beginRE/,/endRE/p' # reg. expr: beginRE to endRE
'10~2p' # at line 10 then each 2nd line
'$=' # last line “$” and provide “=” the line
#### examples
# sorted lines → delete duplicate lines
sed '$!N; /^\(.*\)\n\1$/!P; D' temp2.txt > temp3.txt
# delete:
sed -e '1,10d' # line 1-10
sed -e '11,$d' # line 11 to end of file
sed -e '10~2d' # delete every 2nd line starting from 10
# extract:
sed -n -e '1,10p'
# quit
sed -e '10q' # quit
# commands on multiple lines or with -e again:
sed -e '1,4d
6,9d'
sed -e '1,4d' -e '6,9d'
# delete lines with debug + print lines with foo
sed -n -e '/debug/d' -e '/foo/p'
# no […]
sed '/[^][]\+/'
# pipes
gcc sourcefile.c 2>&1 | sed -n -e '/warning:/,/error:/p'
Sed help
### Command line syntax ############### # file ↘ # sed -f sed_replacements.sed old_file.txt > new_file.txt # insert ↘ ↙ file # sed -i -f sed_replacements.sed overwritten.txt ### Regular expressions ############### # note the different (default) regexpr !!! # summarised: + is + ? is ? ( is ( { is { | is | → all no expressions! # (..) → \(\) reference # ? → \? 0 or 1 # .+ → .\+ 1 or many # .* → .* 0 or many # [..] → [..] character definition range # {..} → \{..\} # | → \| means “or” ### Search and replace ################ # ↙ search ↙ global scope # s/search/replace/g; # s/\(search\)/\U\1\E/g; # finds “search” → “SEARCH” in \U (upper case) \E stops transformation # s/\(SEARCH\)/\L\1\E/g; # finds “SEARCH” → “search” in \L (lower case) \E stops transformation ### Search and replace (address) ###### # /address/s/search/replace/g # NOT-matched or everything except “address” # /address/!s/search/replace/g # address can be: 1,$ (1st line to the end) or a search pattern ### multiline search with “address” ### # “address” #┌─────────────────┴─────────────────────┐ # first line append \n second line #┌───────┴────────┐ ↓ ┌────────┴────────┐ /first line pattern/N;/second line pattern/{ # do something with the found pattern # search replace global scope # ↓ ↓ ↓ s@first line\nsecond line@replace pattern@g }
Ersetzungen mit MediaWiki-XML-Export
Siehe Sed help.
######### MediaWiki Export ########## # don't replace!! # &nbsp; → &nbps; # <br /> → <br/> ##################################### # set username # syntax explained # first line and append \n and second line #┌───────────┴───────────────────┐ ↓ ┌───────────┴──────────────┐ /\(<username>\).\+\(<\/username>\)/N;/\(.\+<id>\)[0-9]\+\(<\/id>\)/{ # do something with the found pattern # replace user name and user ID now with "\n" s@\(<username>\).\+\(</username>\n.\+<id>\)[0-9]\+\(</id>\)@\1Ihr Name\2123\3@g } # comment s@\(<comment>\).\+\(</comment>\)@\1alle Kommas in Semikolon (Stichworte, Ort)\2@g # delete revision /\(<revision>\)/N;/\( \+<id>\)[0-9]\+\(<\/id>\)/{ s@\(<revision>\n \+<id>\)[0-9]\+\(<\/id>\)@\1\2@g } # timestamp (-2 hours) s@\(<timestamp>\).\+\(</timestamp>\)@\12011-09-08T10:12:11Z\2@g # do your replacements here /^|\(Stichworte\|Ort\) *=.\+$/{ s@,@;@g }
All die eben aufgezeigten Anweisungen funktionieren nur, wenn das Suchmuster das Gesuchte in einer Zeile finden kann. Dies ist bei sed halt so, daß es nur in einzelnen Zeilen findet. Sucht man über mehrere Zeilen hinweg, muß man all diese Zeilen mit N aneinanderhängen, so daß sie im sog. pattern-space zu einer Zeile werden. Dann führt man die Ersetzungen durch und läßt es wieder ausgeben. Dies wird mit labels erreicht, auf die wieder zurückgegriffen werden kann. Wenn man im Falle von <text…</text>
eine Weiterleitung erstellen will, die aber den Titel benötogt, sieht das wie folgt aus:
# beware!! it corrupts the history!!! # create a REDIRCET based on the title /<title>.*<\/title>/ h # save the found title to the hold space /<text/ { # start at <text :labelTextStart # set a marker/label to cycle back later on <text N # append new lines /<\/text>/!b labelTextStart # if it is not </text> cycle back to labelTextStart # all <text…</text> is now in ONE single line!! s@<text.*</text>@@ # replace it x # exchange hold space and pattern space (get the title) s@<title>\(.*\)<\/title>@<text xml:space="preserve">[[REDIRCET: New page \1]]</text>@ }