Benutzer:Andreas Plank/Sed: Unterschied zwischen den Versionen

Aus Offene Naturführer
Wechseln zu: Navigation, Suche
K
K
 
(Eine dazwischenliegende Version desselben Benutzers wird nicht angezeigt)
Zeile 1: Zeile 1:
 +
__TOC__
 +
 
Remember the rules:
 
Remember the rules:
 
{| class="vertical-align-top"
 
{| class="vertical-align-top"
Zeile 19: Zeile 21:
 
|}
 
|}
  
This gets complicated, if you do multiline stuff upon that input line. In case you do multi line stuff, I recommend to first write the multi-line-command-stuff and ''then'' the other command stuff that does not apply to multiple lines. If you do it vice versa and write first not-multi-line command stuff and ''then'' multi-line-command stuff, then
+
== Hints for Multi-Line Substitutions ==
 +
 
 +
So, <abbr title="Stream EDitor">sed</abbr> reads intput line by input line, but this gets complicated, if you do multi-line stuff upon an input line. In case you do multi line stuff, I recommend to first write the multi-line-command-stuff and ''then'' the other command stuff that does not apply to multiple lines OR take care that also the multi-line stuff has the other replacements in it (if needed). If you do it vice versa and write first not-multi-line command stuff and ''then'' multi-line-command stuff, then
 
# sed reads one input line, applies e.g. command '''N''' (=that is: join lines by <span style="background-color:#FCAF3E;">current-input-line</span><span style="color:orange;">'''\n'''</span><span style="background-color:#729FCF;">next-input-line</span>), apply all subsequent sed commands to this joint line
 
# sed reads one input line, applies e.g. command '''N''' (=that is: join lines by <span style="background-color:#FCAF3E;">current-input-line</span><span style="color:orange;">'''\n'''</span><span style="background-color:#729FCF;">next-input-line</span>), apply all subsequent sed commands to this joint line
 
# the next input line is not this «<span style="background-color:#729FCF;">next-input-line</span>» but the line  ''after'' it!
 
# the next input line is not this «<span style="background-color:#729FCF;">next-input-line</span>» but the line  ''after'' it!
 +
 +
To illustrate this mishmash let’s see what happens when we do simple replacements on a sequence of 1, 2, 3 to 5:
 +
{| class="booktable vertical-align-top"
 +
|-
 +
! perhaps “strange“ behaviour  !! perhaps expected behaviour
 +
|-
 +
| style="width:50%" | <syntaxhighlight lang="bash" style="font-size:smaller">
 +
# from sequence 1, 2, ... 5  
 +
seq 5 | sed 's@4@\\4@;
 +
      # add slash when 4 is found really?
 +
/^3/{ # find starting 3 in a line
 +
  N;  # append (N)ext line via \n
 +
  s@\n@\\was_newline@
 +
      # replace \n to \\was_newline
 +
};
 +
 +
 +
s@^@# @;
 +
      # add a comment character for bash (just for fun)
 +
'
 +
# 1
 +
# 2
 +
# 3\was_newline4
 +
# 5
 +
</syntaxhighlight>
 +
Sed reads input line by input line, and the problem appears in line 3:
 +
# input line 1 (nothing to do)
 +
# input line 2 (nothing to do)
 +
# input line 3: nothing to do with the 4 to replace with, but it finds the 3 and …
 +
#* it adds (N)ext input line via \n (with one N then <code>3\n4</code> is now in the pattern space)
 +
#* it replaces <tt>3\n4</tt> to <tt>3\\was_newline4</tt>
 +
# input line 5 (nothing to do): next input line is now the 5th already!
 +
 +
So the command <code>s@4@\\4@;</code> will never apply to the 4 (which is in the modified <tt>inputline3\ninputline4</tt> now).
 +
 +
 +
| style="width:50%" | <syntaxhighlight lang="bash" style="font-size:smaller">
 +
# from sequence 1, 2, ... 5  
 +
seq 5 | sed ' # we moved
 +
      # the s@4@\\4@; to afterwards …
 +
/^3/{ # find starting 3 in a line
 +
  N;  # append (N)ext line via \n
 +
  s@\n@\\was_newline@
 +
      # replace \n to \\was_newline
 +
};
 +
s@4@\\4@;
 +
      # add slash when 4 is found for real
 +
s@^@# @;
 +
      # add a comment character for bash (just for fun)
 +
'
 +
# 1
 +
# 2
 +
# 3\was_newline\4
 +
# 5
 +
</syntaxhighlight>
 +
 +
Here: almost the same as left, but after the command <code>s@\n@\\was_newline@</code> has replaced <tt>3\n4</tt> to <tt>3\\was_newline4</tt> then command <code>s@4@\\4@;</code> is applied to this (<tt>3\\was_newline4</tt>) and also gets replaced as expected. In detail:
 +
# input line 1 (nothing to do)
 +
# input line 2 (nothing to do)
 +
# input line 3: it finds the 3 and …
 +
#* it adds (N)ext input line via \n (with one N then <code>3\n4</code> is now in the pattern space)
 +
#* it replaces <tt>3\n4</tt> to <tt>3\\was_newline4</tt>
 +
#* it replaces <tt>3\\was_newline4</tt> to <tt>3\\was_newline\\4</tt>
 +
# input line 5 (nothing to do):  next input line is now the 5th already! Same as left
 +
|}
 +
 +
== Text snippets for the <abbr title="Stream EDitor">sed</abbr> running under Linux ==
 +
<source lang="bash">
 +
#### file options
 +
# -e, --expression=… → execute
 +
# -f, --file= → file: script file
 +
# -i, --in-place → insert into file: edit file in place
 +
# -l 40, --line-length 40 → specify the desired line-wrap length for the “l” command
 +
# -n, --quiet, --silent: nothing i.e. quiet
 +
# -r, --regexp-extended → extended regular expressions
 +
# -s, --separate → separate: consider files as separate
 +
# -u, --unbuffered → unbuffered = load minimal amounts of data from the input files and flush the output buffers more often
 +
 +
#### actions
 +
# a → append action (after)
 +
#    $a append after last line
 +
# c → change: You can replace the current line with the ‘c’ action
 +
# d → delete action
 +
# i → insert action (before)
 +
#    1i insert before 1st line
 +
# n → ?read the next line
 +
# p → print action
 +
# q → quit immediately without further processing
 +
 +
#### ACTIONS
 +
# d and D → delete
 +
#  d command deletes the current pattern space, reads in the next line, puts the new line into the pattern space, and aborts the current command, and starts execution at the first sed command. This is called starting a new "cycle."
 +
#  D command deletes the first portion of the pattern space, up to the new line character, leaving the rest of the pattern alone.
 +
# n and N → next
 +
#  n command will print out the current pattern space (unless the "--quiet, --silent or -n" flag is used), empty the current pattern space, and read in the next line of input.
 +
#  N command does *not* print out the current pattern space and does *not* empty the pattern space. It reads in the next line, but appends a new line character along with the input line itself to the pattern space:
 +
#    first-whole-line  ┬→ first-whole-line\nsecond-whole-line
 +
#    second-whole-line  ┘
 +
# h  and H → Hold
 +
#  h command copies the pattern buffer into the hold buffer. The pattern buffer is unchanged. An identical script to the above uses the hold
 +
#  H command allows you to combine several lines in the hold buffer. It acts like the "N" command as lines are appended to the buffer, with a "\n" between the lines. You can save several lines in the hold buffer, and print them only if a particular pattern is found later.
 +
 +
 +
 +
#### addresses for instance with p → print
 +
  '1,10p' # line 1 to 10
 +
  '/beginRE/,/endRE/p' # reg. expr: beginRE to endRE
 +
  '10~2p' # at line 10 then each 2nd line
 +
  '$='    # last line “$” and provide “=” the line
 +
#### examples
 +
# sorted lines → delete duplicate lines
 +
  sed '$!N; /^\(.*\)\n\1$/!P; D' temp2.txt > temp3.txt
 +
# delete:
 +
  sed -e '1,10d' # line 1-10
 +
  sed -e '11,$d' # line 11 to end of file
 +
  sed -e '10~2d' # delete every 2nd line starting from 10
 +
# extract:
 +
  sed -n -e '1,10p'
 +
# quit
 +
  sed -e '10q' # quit
 +
# commands on multiple lines or with -e again:
 +
  sed -e '1,4d 
 +
    6,9d'
 +
  sed -e '1,4d' -e '6,9d'
 +
# delete lines with debug + print lines with foo
 +
  sed -n -e '/debug/d' -e '/foo/p'
 +
# no […] in the line
 +
  echo -e "line one\nli[ne] two" | sed --silent --expression '/[][]\+/!{ p; }'
 +
  echo -e "line one\nli[ne] two" | sed -ne '/[][]\+/!{ p; }'
 +
# pipes
 +
  gcc sourcefile.c 2>&1 | sed -n -e '/warning:/,/error:/p'
 +
  gcc sourcefile.c 2>&1 | sed --silent --expression '/warning:/,/error:/p'
 +
# upper case to lower case and vice versa
 +
  echo qWeRtzzuiPÜ | sed --regexp-extended --expression 's@([[:lower:]]?)([[:upper:]]?)@\U\1\L\2@g' # or
 +
  echo qWeRtzzuiPÜ | sed -re 's@([[:lower:]]?)([[:upper:]]?)@\U\1\L\2@g' # or
 +
  echo qWeRtzzuiPÜ | sed  --expression 's@\([[:lower:]]\?\)\([[:upper:]]\?\)@\U\1\L\2@g'
 +
  echo qWeRtzzuiPÜ | sed  -e 's@\([[:lower:]]\?\)\([[:upper:]]\?\)@\U\1\L\2@g'
 +
</source>
 +
 +
== Sed help ==
 +
 +
 +
<span style='color:#888786;'>### Command line syntax ###############</span>
 +
<span style='color:#888786;'>#  file ↘</span>
 +
<span style='color:#888786;'>#    sed -f sed_replacements.sed old_file.txt &gt; new_file.txt</span>
 +
<span style='color:#888786;'># insert ↘    ↙ file</span>
 +
<span style='color:#888786;'>#    sed -i -f sed_replacements.sed overwritten.txt</span>
 +
 +
<span style='color:#888786;'>### Regular expressions ###############</span>
 +
<span style='color:#888786;'># note the different (default) regexpr !!!</span>
 +
<span style='color:#888786;'># summarised: + is + ? is ? ( is ( { is { | is | → all no expressions!</span>
 +
<span style='color:#888786;'>#  (..) → \(\)    reference </span>
 +
<span style='color:#888786;'>#  ?    → \?      0 or 1</span>
 +
<span style='color:#888786;'>#  .+  → .\+    1 or many</span>
 +
<span style='color:#888786;'>#  .*  → .*      0 or many</span>
 +
<span style='color:#888786;'>#  [..] → [..]    character definition range</span>
 +
<span style='color:#888786;'>#  {..} → \{..\}</span>
 +
<span style='color:#888786;'>#  | → \| means “or”</span>
 +
 +
<span style='color:#888786;'>### Search and replace ################</span>
 +
<span style='color:#888786;'>#  ↙ search          ↙ global scope</span>
 +
<span style='color:#888786;'># s/search/replace/g;</span>
 +
<span style='color:#888786;'># s/\(search\)/\U\1\E/g; # finds “search” → “SEARCH” in \U (upper case) \E stops transformation</span>
 +
<span style='color:#888786;'># s/\(SEARCH\)/\L\1\E/g; # finds “SEARCH” → “search” in \L (lower case) \E stops transformation</span>
 +
 +
<span style='color:#888786;'>### Search and replace (address) ######</span>
 +
<span style='color:#888786;'>#  /address/s/search/replace/g</span>
 +
<span style='color:#888786;'># NOT-matched or everything except “address”</span>
 +
<span style='color:#888786;'>#  /address/!s/search/replace/g</span>
 +
<span style='color:#888786;'># address can be: 1,$ (1st line to the end) or a search pattern</span>
 +
 +
<span style='color:#888786;'>### multiline search with “address” ###</span>
 +
<span style='color:#888786;'>#              “address”</span>
 +
<span style='color:#888786;'>#┌─────────────────┴─────────────────────┐</span>
 +
<span style='color:#888786;'>#  first line  append \n  second line</span>
 +
<span style='color:#888786;'>#┌───────┴────────┐ ↓  ┌────────┴────────┐ </span>
 +
<span style='color:#0000ff;'>/</span><span style='color:#8f6a32;'>first line pattern</span><span style='color:#0000ff;'>/</span><b>N</b>;<span style='color:#0000ff;'>/</span><span style='color:#8f6a32;'>second line pattern</span><span style='color:#0000ff;'>/</span>{
 +
  <span style='color:#888786;'># do something with the found pattern</span>
 +
  <span style='color:#888786;'>#    search                replace      global scope</span>
 +
  <span style='color:#888786;'>#      ↓                      ↓          ↓</span>
 +
  <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#8f6a32;'>first line</span><span style='color:#ff80e0;'>\n</span><span style='color:#8f6a32;'>second line</span><span style='color:#0000ff;'>@</span><span style='color:#9c0f0f;'>replace pattern</span><span style='color:#0000ff;'>@</span><span style='color:#0057ae;'>g</span>
 +
}
 +
 +
 +
== Ersetzungen mit MediaWiki-XML-Export==
 +
 +
Siehe [[#Sed_help|Sed help]].
 +
 +
  <span style='color:#888786;'>######### MediaWiki Export ##########</span>
 +
  <span style='color:#888786;'>#  don't replace!!</span>
 +
  <span style='color:#888786;'>#  &amp;amp;nbsp; → &amp;nbps;</span>
 +
  <span style='color:#888786;'>#  &amp;lt;br /&amp;gt; → &lt;br/&gt;</span>
 +
  <span style='color:#888786;'>#####################################</span>
 +
 
 +
  <span style='color:#888786;'># set username</span>
 +
  <span style='color:#888786;'># syntax explained</span>
 +
  <span style='color:#888786;'>#      first line          and append \n and second line</span>
 +
  <span style='color:#888786;'>#┌───────────┴───────────────────┐ ↓  ┌───────────┴──────────────┐ </span>
 +
  <span style='color:#0000ff;'>/</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'>&lt;username&gt;</span><span style='color:#ff80e0;'>\).\+\(</span><span style='color:#8f6a32;'>&lt;</span><span style='color:#ff80e0;'>\/</span><span style='color:#8f6a32;'>username&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>/</span><b>N</b>;<span style='color:#0000ff;'>/</span><span style='color:#ff80e0;'>\(.\+</span><span style='color:#8f6a32;'>&lt;id&gt;</span><span style='color:#ff80e0;'>\)[</span><span style='color:#8f6a32;'>0-9</span><span style='color:#ff80e0;'>]\+\(</span><span style='color:#8f6a32;'>&lt;</span><span style='color:#ff80e0;'>\/</span><span style='color:#8f6a32;'>id&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>/</span>{
 +
    <span style='color:#888786;'># do something with the found pattern</span>
 +
    <span style='color:#888786;'># replace user name and user ID now with &quot;\n&quot;</span>
 +
    <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'>&lt;username&gt;</span><span style='color:#ff80e0;'>\).\+\(</span><span style='color:#8f6a32;'>&lt;/username&gt;</span><span style='color:#ff80e0;'>\n.\+</span><span style='color:#8f6a32;'>&lt;id&gt;</span><span style='color:#ff80e0;'>\)[</span><span style='color:#8f6a32;'>0-9</span><span style='color:#ff80e0;'>]\+\(</span><span style='color:#8f6a32;'>&lt;/id&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>@</span><span style='color:#ff80e0;'>\1</span><span style='color:#9c0f0f;'>Ihr Name</span><span style='color:#ff80e0;'>\2</span><span style='color:#9c0f0f;'>123</span><span style='color:#ff80e0;'>\3</span><span style='color:#0000ff;'>@</span><span style='color:#0057ae;'>g</span>
 +
  }
 +
 
 +
  <span style='color:#888786;'># comment</span>
 +
  <span style='color:#0000ff;'>/</span><span style='color:#e85752;'>&lt;comment&gt;</span><span style='color:#0000ff;'>/</span>,<span style='color:#0000ff;'>/</span><span style='color:#e85752;'>&lt;</span><span style='color:#00c5cc;'>\/</span><span style='color:#e85752;'>comment&gt;</span><span style='color:#0000ff;'>/</span>{
 +
    <span style='color:#644a9b;'>:label_add_newlines</span>
 +
      <b>N</b>; <span style='color:#888786;'># add newlines as '\n'</span>
 +
    <i><span style='color:#c5b399;'># if line contains not (!) '&lt;/comment&gt;' go (b)ack to label_add_newlines</span></i>
 +
    <span style='color:#0000ff;'>/</span><span style='color:#e85752;'>&lt;</span><span style='color:#00c5cc;'>\/</span><span style='color:#e85752;'>comment&gt;</span><span style='color:#0000ff;'>/</span><b><span style='color:#880088;'>!</span></b><b>b</b> <span style='color:#644a9b;'>label_add_newlines</span>
 +
    <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#00c5cc;'>\(</span><span style='color:#e85752;'>&lt;comment&gt;</span><span style='color:#00c5cc;'>\).\+\(</span><span style='color:#e85752;'>&lt;/comment&gt;</span><span style='color:#00c5cc;'>\)</span><span style='color:#0000ff;'>@</span><span style='color:#00c5cc;'>\1</span><span style='color:#e85752;'>new comment</span><span style='color:#00c5cc;'>\2</span><span style='color:#0000ff;'>@</span><span style='color:#0057ae;'>g</span>;
 +
  }
 +
 
 +
  <span style='color:#888786;'># delete revision</span>
 +
  <span style='color:#0000ff;'>/</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'>&lt;revision&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>/</span><b>N</b>;<span style='color:#0000ff;'>/</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'> </span><span style='color:#ff80e0;'>\+</span><span style='color:#8f6a32;'>&lt;id&gt;</span><span style='color:#ff80e0;'>\)[</span><span style='color:#8f6a32;'>0-9</span><span style='color:#ff80e0;'>]\+\(</span><span style='color:#8f6a32;'>&lt;</span><span style='color:#ff80e0;'>\/</span><span style='color:#8f6a32;'>id&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>/</span>{
 +
  <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'>&lt;revision&gt;</span><span style='color:#ff80e0;'>\n</span><span style='color:#8f6a32;'> </span><span style='color:#ff80e0;'>\+</span><span style='color:#8f6a32;'>&lt;id&gt;</span><span style='color:#ff80e0;'>\)[</span><span style='color:#8f6a32;'>0-9</span><span style='color:#ff80e0;'>]\+\(</span><span style='color:#8f6a32;'>&lt;\/id&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>@</span><span style='color:#ff80e0;'>\1\2</span><span style='color:#0000ff;'>@</span><span style='color:#0057ae;'>g</span>
 +
  }
 +
 
 +
  <span style='color:#888786;'># timestamp  (-2 hours)</span>
 +
  <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'>&lt;timestamp&gt;</span><span style='color:#ff80e0;'>\).\+\(</span><span style='color:#8f6a32;'>&lt;/timestamp&gt;</span><span style='color:#ff80e0;'>\)</span><span style='color:#0000ff;'>@</span><span style='color:#ff80e0;'>\1</span><span style='color:#9c0f0f;'>2011-09-08T10:12:11Z</span><span style='color:#ff80e0;'>\2</span><span style='color:#0000ff;'>@</span><span style='color:#0057ae;'>g</span>
 +
 
 +
  <span style='color:#888786;'># do your replacements here</span>
 +
  <span style='color:#0000ff;'>/</span><span style='color:#ff80e0;'>^</span><span style='color:#8f6a32;'>|</span><span style='color:#ff80e0;'>\(</span><span style='color:#8f6a32;'>Stichworte</span><span style='color:#ff80e0;'>\|</span><span style='color:#8f6a32;'>Ort</span><span style='color:#ff80e0;'>\)</span><span style='color:#8f6a32;'> </span><span style='color:#ff80e0;'>*</span><span style='color:#8f6a32;'>=</span><span style='color:#ff80e0;'>.\+$</span><span style='color:#0000ff;'>/</span>{
 +
    <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#8f6a32;'>,</span><span style='color:#0000ff;'>@</span><span style='color:#9c0f0f;'>;</span><span style='color:#0000ff;'>@</span><span style='color:#0057ae;'>g</span>
 +
  }
 +
All die eben aufgezeigten Anweisungen ''funktionieren nur'', wenn das Suchmuster das Gesuchte in ''einer'' Zeile finden kann. Dies ist bei {{Abk.|sed}} halt so, daß es nur in einzelnen Zeilen findet. Sucht man über mehrere Zeilen hinweg, muß man all diese Zeilen mit '''N''' aneinanderhängen, so daß sie im sog. pattern-space zu einer Zeile werden. Dann führt man die Ersetzungen durch und läßt es wieder ausgeben. Dies wird mit ''labels'' erreicht, auf die wieder zurückgegriffen werden kann. Wenn man im Falle von <code>&lt;text…&lt;/text></code> eine Weiterleitung erstellen will, die aber den Titel benötogt, sieht das wie folgt aus:
 +
<span style='color:#888786;'># beware!! it corrupts the history!!!</span>
 +
<span style='color:#888786;'># create a REDIRCET based on the title</span>
 +
<span style='color:#0000ff;'>/</span><span style='color:#8f6a32;'>&lt;title&gt;</span><span style='color:#ff80e0;'>.*</span><span style='color:#8f6a32;'>&lt;</span><span style='color:#ff80e0;'>\/</span><span style='color:#8f6a32;'>title&gt;</span><span style='color:#0000ff;'>/</span> <b>h</b> <span style='color:#888786;'># save the found title to the hold space</span>
 +
 +
<span style='color:#0000ff;'>/</span><span style='color:#8f6a32;'>&lt;text</span><span style='color:#0000ff;'>/</span> { <span style='color:#888786;'># start at <text</span>
 +
<span style='color:#452886;'>:labelTextStart</span> <span style='color:#888786;'># set a marker/label to cycle back later on <text</span>
 +
  <b>N</b>  <span style='color:#888786;'># append new lines</span>
 +
  <span style='color:#0000ff;'>/</span><span style='color:#8f6a32;'>&lt;</span><span style='color:#ff80e0;'>\/</span><span style='color:#8f6a32;'>text&gt;</span><span style='color:#0000ff;'>/</span><b><span style='color:#880088;'>!</span></b><b>b</b> <span style='color:#452886;'>labelTextStart</span> <span style='color:#888786;'># if it is not &lt;/text&gt; cycle back to labelTextStart</span>
 +
  <span style='color:#888786;'># all &lt;text…&lt;/text&gt; is now in ONE single line!!</span>
 +
  <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#8f6a32;'>&lt;text</span><span style='color:#ff80e0;'>.*</span><span style='color:#8f6a32;'>&lt;/text&gt;</span><span style='color:#0000ff;'>@@</span> <span style='color:#888786;'># replace it</span>
 +
  <b>x</b> <span style='color:#888786;'># exchange hold space and pattern space (get the title)</span>
 +
  <b>s</b><span style='color:#0000ff;'>@</span><span style='color:#8f6a32;'>&lt;title&gt;</span><span style='color:#ff80e0;'>\(.*\)</span><span style='color:#8f6a32;'>&lt;\/title&gt;</span><span style='color:#0000ff;'>@</span><span style='color:#9c0f0f;'>&lt;text xml:space=&quot;preserve&quot;&gt;[[REDIRCET: New page </span><span style='color:#ff80e0;'>\1</span><span style='color:#9c0f0f;'>]]&lt;/text&gt;</span><span style='color:#0000ff;'>@</span>
 +
}
 +
 +
== Split a file into separate single files according to its content: multiple <nowiki>{{Metadata | … | … }}</nowiki> ==
 +
 +
{{Ombox | text=Bear in mind that you should have line ends only with “\n” (i.e. Linux/Unix) not line ends with “\r\n” (carriage returns and line feed: common on Windows text files), because {{abbr.|sed}} will behave weird on those line ends with “\r\n”.<br/>
 +
You can replace all “\r\n” to “\n” by (s)ubstitute (g)lobally:<br/><code>sed 's@[\r\n]@[\n]@g;' sourcefile.txt</code>
 +
}}
 +
 +
Let’s say we want to split all templates of <nowiki>{{Metadata | … | … }}</nowiki> contained in a single file to separate files according to the content of the very template text data. So we try to find and use the <span style="background-color:#EDD400;">highlighted text data</span>:
 +
<span style="background-color:#EDD400;"><nowiki>{{Metadata</nowiki></span>
 +
| Type = StillImage
 +
| Description =<nowiki>{{Metadata/Description</nowiki> | language code=en | content=Drabkina 1999: A selection of 114 drawings, redrawn by M. Drabkina after the publication by Punithalingam (1988): ''E. Punithalingam: Ascochyta. II. Species on monocotyledons (excluding grasses), cryptogams and gymnosperms.'' Mycological Papers, 159 (1988), pp. 1–235.<nowiki>}}</nowiki>
 +
| Provider Page = Julius Kühn Institute – Federal Research Institute for Cultivated Plants
 +
| Title = Ascochyta acori
 +
| Resource ID = 362574154
 +
| Copyright Statement = © M. Drabkina 1999
 +
| License Statement = Creative Commons non-commercial, by attribution, share-alike license (version 2.5)
 +
| Creators = M. Drabkina;
 +
| Metadata Creator = G. Hagedorn
 +
| Language = zxx
 +
| Metadata Language = en
 +
| Original Creation Date = 12.10.2001
 +
| Country Codes = global
 +
| Subject Category = Deuteromycota
 +
| Taxonomic Coverage = Ascochyta
 +
| Scientific Names = Ascochyta acori Oudem.
 +
| Taxon Count = 1
 +
| Vernacular Names =
 +
| General Keywords =
 +
| Best Quality URI = <nowiki>http://</nowiki>212.201.100.117/storage/Fungi/Coelos/Punithalingam%20Ascochyta/edt/<span style="background-color:#EDD400;">AscP88-002</span>.png
 +
| Best Quality Availability = online (free)
 +
<span style="background-color:#EDD400;"><nowiki>}}</nowiki></span>
 +
 +
<span style="background-color:#EDD400;"><nowiki>{{Metadata</nowiki></span>
 +
| Type = StillImage
 +
| Description =<nowiki>{{Metadata/Description</nowiki> | language code=en | content=Drabkina 1999: A selection of 114 drawings, redrawn by M. Drabkina after the publication by Punithalingam (1988): ''E. Punithalingam: Ascochyta. II. Species on monocotyledons (excluding grasses), cryptogams and gymnosperms.'' Mycological Papers, 159 (1988), pp. 1–235.<nowiki>}}</nowiki>
 +
| Provider Page = Julius Kühn Institute – Federal Research Institute for Cultivated Plants
 +
| Title = Ascochyta acori
 +
| Resource ID = 601700422
 +
| Copyright Statement = © M. Drabkina 1999
 +
| License Statement = Creative Commons non-commercial, by attribution, share-alike license (version 2.5)
 +
| Creators = M. Drabkina;
 +
| Metadata Creator = G. Hagedorn
 +
| Language = zxx
 +
| Metadata Language = en
 +
| Original Creation Date = 12.10.2001
 +
| Country Codes = global
 +
| Subject Category = Deuteromycota
 +
| Taxonomic Coverage = Ascochyta
 +
| Scientific Names = Ascochyta acori Oudem.
 +
| Taxon Count = 1
 +
| Vernacular Names =
 +
| General Keywords =
 +
| Best Quality URI = <nowiki>http://</nowiki>212.201.100.117/storage/Fungi/Coelos/Punithalingam%20Ascochyta/edt/<span style="background-color:#EDD400;">AscP88-001</span>.png
 +
| Best Quality Availability = online (free)
 +
<span style="background-color:#EDD400;"><nowiki>}}</nowiki></span>
 +
 +
Now we extract those <span style="background-color:#EDD400;">highlighted text data</span> modify them a little and finally we want to get a sed command line that writes the exact (found) line numbers into a file. This sed syntax should look like:
 +
<syntaxhighlight lang="bash">
 +
sed "42,64w filename.txt" # from line 42 to (,) line 64 (w)rite to file “filename.txt”
 +
</syntaxhighlight>
 +
 +
Note that BASH allows you to concatenate output by using <code>command | next-command | next-command</code>. So multiple modifications can be concatenated. With sed it can be done as follows:
 +
<syntaxhighlight lang="bash">
 +
# store $source_metadata_file
 +
source_metadata_file="Drabkina 1999: selected drawings after Punithalingam (1988): Ascochyta on monocotyledons.mwt";
 +
dataset_directory="Drabkina_1999"
 +
# proceed whit reading the source file and apply sed commands to it and process its out put further (after |) ...
 +
sed --regexp-extended --silent '
 +
/^\{\{Metadata$/,/^\}\}$/ { # find lines of only “<line-start>{{Metadata<line-end>” to  “<line-start>}}<line-end>”
 +
  /^\{\{Metadata$/{  # at line “<line-start>{{Metadata<line-end>”
 +
    =; # print line number (=)
 +
  };
 +
  /Best Quality URI/{ # at line with Best Quality URI
 +
    # extract the file basename, strip and (s)ubstitute extension to a medawiki-text file (.mwt)...
 +
    s@.*Best Quality URI.*/([^/]+)\.[a-z]{3,4}$@\1.mwt@g;
 +
    p; # (p)rint result (new file basename)
 +
  };
 +
  /^\}\}$/{# at line  “<line-start>}}<line-end>”
 +
    =; # print the line number (=)
 +
  };
 +
  ######################
 +
  # we should get an output like:
 +
  # 18
 +
  # AscP88-002.mwt
 +
  # 40
 +
  # 42
 +
  # AscP88-001.mwt
 +
  # 64
 +
  # ...
 +
  ######################
 +
  # now this output can be processed further concatenate the next BASH-command sequence by |
 +
}' "$source_metadata_file" | sed --regexp-extended --silent "# process the preceding output further
 +
/^[0-9]+\$/{ # at “<line-start>any number/numbers<line-end>”
 +
  N; # append (N)ext line by \n
 +
  N; # append (N)ext line by \n
 +
  # subsitute “<line-start>any-numbers (\n)ewline anything (\n)ewline any-numbers<line-end>”
 +
  # by a comment what it will do and the very sed command ...
 +
  s@^([0-9]+)\n(.+)\n([0-9]+)\$@echo 'from line \1 to \3 write to file ./$dataset_directory/\2'\nsed --silent '\1,\3w ./$dataset_directory/\2' '$source_metadata_file'@;
 +
  p; # (p)rint
 +
};" > split_file_at_metadata.sh
 +
</syntaxhighlight>
 +
 +
Check the file <code>split_file_at_metadata.sh</code>. We should get content of this file like:
 +
<syntaxhighlight lang="bash">
 +
######################
 +
head split_file_at_metadata.sh
 +
######################
 +
echo 'from line 18 to 40 write to file ./Drabkina_1999/AscP88-002.mwt'
 +
sed --silent '18,40w ./Drabkina_1999/AscP88-002.mwt'
 +
echo 'from line 42 to 64 write to file ./Drabkina_1999/AscP88-001.mwt'
 +
sed --silent '42,64w ./Drabkina_1999/AscP88-001.mwt'
 +
</syntaxhighlight>
 +
 +
Enable execution-rights of file <code>split_file_at_metadata.sh</code> and execute it:
 +
<syntaxhighlight lang="bash">
 +
chmod ugo+x split_file_at_metadata.sh
 +
./split_file_at_metadata.sh # execute above commands
 +
</syntaxhighlight>
 +
 +
</syntaxhighlight>
 +
|
 +
|-
 +
| Beispiel || Beispiel
 +
|}
  
 
== Text snippets for the <abbr title="Stream EDitor">sed</abbr> running under Linux ==
 
== Text snippets for the <abbr title="Stream EDitor">sed</abbr> running under Linux ==

Aktuelle Version vom 23. November 2020, 13:09 Uhr

Remember the rules:

sed reads input line by input line first, not command line by command line first, that is:
  1. read one input line first and
  2. apply all the commands to just this line
  3. read the next input line
  4. apply all the commands to just this line
What it does not do, is:
  1. read one command line
  2. apply this to all input lines
  3. read the next command line
  4. apply this to all input lines
If you experience a very strange and unexpected regular search-replace behaviour you might have character carriage return (\r) in your text file. You can remove it by applying first the removal of it: cat /path/to/my-textfile.txt | sed '@\r@@g' | sed '# continue sed syntax expressions'

Hints for Multi-Line Substitutions

So, sed reads intput line by input line, but this gets complicated, if you do multi-line stuff upon an input line. In case you do multi line stuff, I recommend to first write the multi-line-command-stuff and then the other command stuff that does not apply to multiple lines OR take care that also the multi-line stuff has the other replacements in it (if needed). If you do it vice versa and write first not-multi-line command stuff and then multi-line-command stuff, then

  1. sed reads one input line, applies e.g. command N (=that is: join lines by current-input-line\nnext-input-line), apply all subsequent sed commands to this joint line
  2. the next input line is not this «next-input-line» but the line after it!

To illustrate this mishmash let’s see what happens when we do simple replacements on a sequence of 1, 2, 3 to 5:

perhaps “strange“ behaviour perhaps expected behaviour
# from sequence 1, 2, ... 5  
seq 5 | sed 's@4@\\4@; 
      # add slash when 4 is found really?
/^3/{ # find starting 3 in a line
  N;  # append (N)ext line via \n
  s@\n@\\was_newline@
      # replace \n to \\was_newline
};


s@^@# @;
      # add a comment character for bash (just for fun)
'
# 1
# 2
# 3\was_newline4
# 5

Sed reads input line by input line, and the problem appears in line 3:

  1. input line 1 (nothing to do)
  2. input line 2 (nothing to do)
  3. input line 3: nothing to do with the 4 to replace with, but it finds the 3 and …
    • it adds (N)ext input line via \n (with one N then 3\n4 is now in the pattern space)
    • it replaces 3\n4 to 3\\was_newline4
  4. input line 5 (nothing to do): next input line is now the 5th already!

So the command s@4@\\4@; will never apply to the 4 (which is in the modified inputline3\ninputline4 now).


# from sequence 1, 2, ... 5  
seq 5 | sed ' # we moved
      # the s@4@\\4@; to afterwards …
/^3/{ # find starting 3 in a line
  N;  # append (N)ext line via \n
  s@\n@\\was_newline@
      # replace \n to \\was_newline
};
s@4@\\4@;
      # add slash when 4 is found for real
s@^@# @;
      # add a comment character for bash (just for fun)
'
# 1
# 2
# 3\was_newline\4
# 5

Here: almost the same as left, but after the command s@\n@\\was_newline@ has replaced 3\n4 to 3\\was_newline4 then command s@4@\\4@; is applied to this (3\\was_newline4) and also gets replaced as expected. In detail:

  1. input line 1 (nothing to do)
  2. input line 2 (nothing to do)
  3. input line 3: it finds the 3 and …
    • it adds (N)ext input line via \n (with one N then 3\n4 is now in the pattern space)
    • it replaces 3\n4 to 3\\was_newline4
    • it replaces 3\\was_newline4 to 3\\was_newline\\4
  4. input line 5 (nothing to do):  next input line is now the 5th already! Same as left

Text snippets for the sed running under Linux

#### file options
# -e, --expression=… → execute
# -f, --file= → file: script file
# -i, --in-place → insert into file: edit file in place
# -l 40, --line-length 40 → specify the desired line-wrap length for the “l” command
# -n, --quiet, --silent: nothing i.e. quiet
# -r, --regexp-extended → extended regular expressions
# -s, --separate → separate: consider files as separate
# -u, --unbuffered → unbuffered = load minimal amounts of data from the input files and flush the output buffers more often

#### actions 
# a → append action (after)
#     $a append after last line
# c → change: You can replace the current line with the ‘c’ action
# d → delete action 
# i → insert action (before)
#     1i insert before 1st line
# n → ?read the next line
# p → print action
# q → quit immediately without further processing

#### ACTIONS 
# d and D → delete
#   d command deletes the current pattern space, reads in the next line, puts the new line into the pattern space, and aborts the current command, and starts execution at the first sed command. This is called starting a new "cycle."
#   D command deletes the first portion of the pattern space, up to the new line character, leaving the rest of the pattern alone.
# n and N → next
#   n command will print out the current pattern space (unless the "--quiet, --silent or -n" flag is used), empty the current pattern space, and read in the next line of input.
#   N command does *not* print out the current pattern space and does *not* empty the pattern space. It reads in the next line, but appends a new line character along with the input line itself to the pattern space:
#     first-whole-line   ┬→ first-whole-line\nsecond-whole-line
#     second-whole-line  ┘
# h  and H → Hold
#   h command copies the pattern buffer into the hold buffer. The pattern buffer is unchanged. An identical script to the above uses the hold
#   H command allows you to combine several lines in the hold buffer. It acts like the "N" command as lines are appended to the buffer, with a "\n" between the lines. You can save several lines in the hold buffer, and print them only if a particular pattern is found later.



#### addresses for instance with p → print
  '1,10p' # line 1 to 10
  '/beginRE/,/endRE/p' # reg. expr: beginRE to endRE
  '10~2p' # at line 10 then each 2nd line
  '$='    # last line “$” and provide “=” the line
#### examples
# sorted lines → delete duplicate lines
  sed '$!N; /^\(.*\)\n\1$/!P; D' temp2.txt > temp3.txt
# delete: 
  sed -e '1,10d' # line 1-10
  sed -e '11,$d' # line 11 to end of file
  sed -e '10~2d' # delete every 2nd line starting from 10
# extract:
  sed -n -e '1,10p'
# quit 
  sed -e '10q' # quit 
# commands on multiple lines or with -e again:
  sed -e '1,4d  
    6,9d'
  sed -e '1,4d' -e '6,9d'
# delete lines with debug + print lines with foo
  sed -n -e '/debug/d' -e '/foo/p'
# no […] in the line
  echo -e "line one\nli[ne] two" | sed --silent --expression '/[][]\+/!{ p; }'
  echo -e "line one\nli[ne] two" | sed -ne '/[][]\+/!{ p; }'
# pipes
  gcc sourcefile.c 2>&1 | sed -n -e '/warning:/,/error:/p'
  gcc sourcefile.c 2>&1 | sed --silent --expression '/warning:/,/error:/p'
# upper case to lower case and vice versa
  echo qWeRtzzuiPÜ | sed --regexp-extended --expression 's@([[:lower:]]?)([[:upper:]]?)@\U\1\L\2@g' # or
  echo qWeRtzzuiPÜ | sed -re 's@([[:lower:]]?)([[:upper:]]?)@\U\1\L\2@g' # or
  echo qWeRtzzuiPÜ | sed  --expression 's@\([[:lower:]]\?\)\([[:upper:]]\?\)@\U\1\L\2@g'
  echo qWeRtzzuiPÜ | sed  -e 's@\([[:lower:]]\?\)\([[:upper:]]\?\)@\U\1\L\2@g'

Sed help

### Command line syntax ###############
#   file ↘
#    sed -f sed_replacements.sed old_file.txt > new_file.txt
# insert ↘    ↙ file
#    sed -i -f sed_replacements.sed overwritten.txt

### Regular expressions ###############
# note the different (default) regexpr !!!
# summarised: + is + ? is ? ( is ( { is { | is | → all no expressions!
#   (..) → \(\)    reference 
#   ?    → \?      0 or 1
#   .+   → .\+     1 or many
#   .*   → .*      0 or many
#   [..] → [..]    character definition range
#   {..} → \{..\}
#   | → \| means “or”

### Search and replace ################
#  ↙ search          ↙ global scope
# s/search/replace/g;
# s/\(search\)/\U\1\E/g; # finds “search” → “SEARCH” in \U (upper case) \E stops transformation
# s/\(SEARCH\)/\L\1\E/g; # finds “SEARCH” → “search” in \L (lower case) \E stops transformation

### Search and replace (address) ######
#   /address/s/search/replace/g
# NOT-matched or everything except “address”
#   /address/!s/search/replace/g
# address can be: 1,$ (1st line to the end) or a search pattern

### multiline search with “address” ###
#               “address”
#┌─────────────────┴─────────────────────┐
#   first line   append \n  second line
#┌───────┴────────┐ ↓  ┌────────┴────────┐ 
/first line pattern/N;/second line pattern/{
  # do something with the found pattern
  #     search                replace       global scope
  #       ↓                      ↓          ↓
  s@first line\nsecond line@replace pattern@g
}


Ersetzungen mit MediaWiki-XML-Export

Siehe Sed help.

 ######### MediaWiki Export ##########
 #  don't replace!!
 #  &amp;nbsp; → &nbps;
 #  &lt;br /&gt; → <br/>
 #####################################
 
 # set username
 # syntax explained
 #       first line           and append \n and second line
 #┌───────────┴───────────────────┐ ↓  ┌───────────┴──────────────┐ 
 /\(<username>\).\+\(<\/username>\)/N;/\(.\+<id>\)[0-9]\+\(<\/id>\)/{
   # do something with the found pattern
   # replace user name and user ID now with "\n"
   s@\(<username>\).\+\(</username>\n.\+<id>\)[0-9]\+\(</id>\)@\1Ihr Name\2123\3@g
 }
 
 # comment
 /<comment>/,/<\/comment>/{
   :label_add_newlines
     N; # add newlines as '\n'
   # if line contains not (!) '</comment>' go (b)ack to label_add_newlines
   /<\/comment>/!b label_add_newlines 
   s@\(<comment>\).\+\(</comment>\)@\1new comment\2@g;
 }
 
 # delete revision
 /\(<revision>\)/N;/\( \+<id>\)[0-9]\+\(<\/id>\)/{
 s@\(<revision>\n \+<id>\)[0-9]\+\(<\/id>\)@\1\2@g
 }
 
 # timestamp  (-2 hours)
 s@\(<timestamp>\).\+\(</timestamp>\)@\12011-09-08T10:12:11Z\2@g
 
 # do your replacements here
 /^|\(Stichworte\|Ort\) *=.\+$/{
   s@,@;@g
 }

All die eben aufgezeigten Anweisungen funktionieren nur, wenn das Suchmuster das Gesuchte in einer Zeile finden kann. Dies ist bei sed halt so, daß es nur in einzelnen Zeilen findet. Sucht man über mehrere Zeilen hinweg, muß man all diese Zeilen mit N aneinanderhängen, so daß sie im sog. pattern-space zu einer Zeile werden. Dann führt man die Ersetzungen durch und läßt es wieder ausgeben. Dies wird mit labels erreicht, auf die wieder zurückgegriffen werden kann. Wenn man im Falle von <text…</text> eine Weiterleitung erstellen will, die aber den Titel benötogt, sieht das wie folgt aus:

# beware!! it corrupts the history!!! 
# create a REDIRCET based on the title
/<title>.*<\/title>/ h # save the found title to the hold space

/<text/ { # start at <text
:labelTextStart # set a marker/label to cycle back later on <text
  N  # append new lines
  /<\/text>/!b labelTextStart # if it is not </text> cycle back to labelTextStart
  # all <text…</text> is now in ONE single line!!
  s@<text.*</text>@@ # replace it
  x # exchange hold space and pattern space (get the title)
  s@<title>\(.*\)<\/title>@<text xml:space="preserve">[[REDIRCET: New page \1]]</text>@
}

Split a file into separate single files according to its content: multiple {{Metadata | … | … }}

Let’s say we want to split all templates of {{Metadata | … | … }} contained in a single file to separate files according to the content of the very template text data. So we try to find and use the highlighted text data:

{{Metadata
| Type = StillImage
| Description ={{Metadata/Description | language code=en | content=Drabkina 1999: A selection of 114 drawings, redrawn by M. Drabkina after the publication by Punithalingam (1988): E. Punithalingam: Ascochyta. II. Species on monocotyledons (excluding grasses), cryptogams and gymnosperms. Mycological Papers, 159 (1988), pp. 1–235.}}
| Provider Page = Julius Kühn Institute – Federal Research Institute for Cultivated Plants
| Title = Ascochyta acori
| Resource ID = 362574154
| Copyright Statement = © M. Drabkina 1999
| License Statement = Creative Commons non-commercial, by attribution, share-alike license (version 2.5)
| Creators = M. Drabkina;
| Metadata Creator = G. Hagedorn
| Language = zxx
| Metadata Language = en
| Original Creation Date = 12.10.2001
| Country Codes = global
| Subject Category = Deuteromycota
| Taxonomic Coverage = Ascochyta
| Scientific Names = Ascochyta acori Oudem.
| Taxon Count = 1
| Vernacular Names = 
| General Keywords = 
| Best Quality URI = http://212.201.100.117/storage/Fungi/Coelos/Punithalingam%20Ascochyta/edt/AscP88-002.png
| Best Quality Availability = online (free)
}}

{{Metadata
| Type = StillImage
| Description ={{Metadata/Description | language code=en | content=Drabkina 1999: A selection of 114 drawings, redrawn by M. Drabkina after the publication by Punithalingam (1988): E. Punithalingam: Ascochyta. II. Species on monocotyledons (excluding grasses), cryptogams and gymnosperms. Mycological Papers, 159 (1988), pp. 1–235.}}
| Provider Page = Julius Kühn Institute – Federal Research Institute for Cultivated Plants
| Title = Ascochyta acori
| Resource ID = 601700422
| Copyright Statement = © M. Drabkina 1999
| License Statement = Creative Commons non-commercial, by attribution, share-alike license (version 2.5)
| Creators = M. Drabkina;
| Metadata Creator = G. Hagedorn
| Language = zxx
| Metadata Language = en
| Original Creation Date = 12.10.2001
| Country Codes = global
| Subject Category = Deuteromycota
| Taxonomic Coverage = Ascochyta
| Scientific Names = Ascochyta acori Oudem.
| Taxon Count = 1
| Vernacular Names = 
| General Keywords = 
| Best Quality URI = http://212.201.100.117/storage/Fungi/Coelos/Punithalingam%20Ascochyta/edt/AscP88-001.png
| Best Quality Availability = online (free)
}}

Now we extract those highlighted text data modify them a little and finally we want to get a sed command line that writes the exact (found) line numbers into a file. This sed syntax should look like:

sed "42,64w filename.txt" # from line 42 to (,) line 64 (w)rite to file “filename.txt”

Note that BASH allows you to concatenate output by using command | next-command | next-command. So multiple modifications can be concatenated. With sed it can be done as follows:

# store $source_metadata_file
source_metadata_file="Drabkina 1999: selected drawings after Punithalingam (1988): Ascochyta on monocotyledons.mwt";
dataset_directory="Drabkina_1999"
# proceed whit reading the source file and apply sed commands to it and process its out put further (after |) ...
sed --regexp-extended --silent '
/^\{\{Metadata$/,/^\}\}$/ { # find lines of only “<line-start>{{Metadata<line-end>” to  “<line-start>}}<line-end>”
  /^\{\{Metadata$/{   # at line “<line-start>{{Metadata<line-end>”
    =; # print line number (=) 
  };
  /Best Quality URI/{ # at line with Best Quality URI
    # extract the file basename, strip and (s)ubstitute extension to a medawiki-text file (.mwt)...
    s@.*Best Quality URI.*/([^/]+)\.[a-z]{3,4}$@\1.mwt@g; 
    p; # (p)rint result (new file basename)
  };
  /^\}\}$/{# at line  “<line-start>}}<line-end>” 
    =; # print the line number (=)
  };
  ######################
  # we should get an output like:
  # 18
  # AscP88-002.mwt
  # 40
  # 42
  # AscP88-001.mwt
  # 64
  # ...
  ######################
  # now this output can be processed further concatenate the next BASH-command sequence by | 
}' "$source_metadata_file" | sed --regexp-extended --silent "# process the preceding output further
/^[0-9]+\$/{ # at “<line-start>any number/numbers<line-end>”
  N; # append (N)ext line by \n
  N; # append (N)ext line by \n
  # subsitute “<line-start>any-numbers (\n)ewline anything (\n)ewline any-numbers<line-end>”
  # by a comment what it will do and the very sed command ...
  s@^([0-9]+)\n(.+)\n([0-9]+)\$@echo 'from line \1 to \3 write to file ./$dataset_directory/\2'\nsed --silent '\1,\3w ./$dataset_directory/\2' '$source_metadata_file'@;
  p; # (p)rint
};" > split_file_at_metadata.sh

Check the file split_file_at_metadata.sh. We should get content of this file like:

######################
head split_file_at_metadata.sh
######################
echo 'from line 18 to 40 write to file ./Drabkina_1999/AscP88-002.mwt'
sed --silent '18,40w ./Drabkina_1999/AscP88-002.mwt'
echo 'from line 42 to 64 write to file ./Drabkina_1999/AscP88-001.mwt'
sed --silent '42,64w ./Drabkina_1999/AscP88-001.mwt'

Enable execution-rights of file split_file_at_metadata.sh and execute it:

chmod ugo+x split_file_at_metadata.sh
./split_file_at_metadata.sh # execute above commands

</syntaxhighlight> | |- | Beispiel || Beispiel |}

Text snippets for the sed running under Linux

#### file options
# -e, --expression=… → execute
# -f, --file= → file: script file
# -i, --in-place → insert into file: edit file in place
# -l 40, --line-length 40 → specify the desired line-wrap length for the “l” command
# -n, --quiet, --silent: nothing i.e. quiet
# -r, --regexp-extended → extended regular expressions
# -s, --separate → separate: consider files as separate
# -u, --unbuffered → unbuffered = load minimal amounts of data from the input files and flush the output buffers more often

#### actions 
# a → append action (after)
#     $a append after last line
# c → change: You can replace the current line with the ‘c’ action
# d → delete action 
# i → insert action (before)
#     1i insert before 1st line
# n → ?read the next line
# p → print action
# q → quit immediately without further processing

#### ACTIONS 
# d and D → delete
#   d command deletes the current pattern space, reads in the next line, puts the new line into the pattern space, and aborts the current command, and starts execution at the first sed command. This is called starting a new "cycle."
#   D command deletes the first portion of the pattern space, up to the new line character, leaving the rest of the pattern alone.
# n and N → next
#   n command will print out the current pattern space (unless the "--quiet, --silent or -n" flag is used), empty the current pattern space, and read in the next line of input.
#   N command does *not* print out the current pattern space and does *not* empty the pattern space. It reads in the next line, but appends a new line character along with the input line itself to the pattern space:
#     first-whole-line   ┬→ first-whole-line\nsecond-whole-line
#     second-whole-line  ┘
# h  and H → Hold
#   h command copies the pattern buffer into the hold buffer. The pattern buffer is unchanged. An identical script to the above uses the hold
#   H command allows you to combine several lines in the hold buffer. It acts like the "N" command as lines are appended to the buffer, with a "\n" between the lines. You can save several lines in the hold buffer, and print them only if a particular pattern is found later.



#### addresses for instance with p → print
  '1,10p' # line 1 to 10
  '/beginRE/,/endRE/p' # reg. expr: beginRE to endRE
  '10~2p' # at line 10 then each 2nd line
  '$='    # last line “$” and provide “=” the line
#### examples
# sorted lines → delete duplicate lines
  sed '$!N; /^\(.*\)\n\1$/!P; D' temp2.txt > temp3.txt
# delete: 
  sed -e '1,10d' # line 1-10
  sed -e '11,$d' # line 11 to end of file
  sed -e '10~2d' # delete every 2nd line starting from 10
# extract:
  sed -n -e '1,10p'
# quit 
  sed -e '10q' # quit 
# commands on multiple lines or with -e again:
  sed -e '1,4d  
    6,9d'
  sed -e '1,4d' -e '6,9d'
# delete lines with debug + print lines with foo
  sed -n -e '/debug/d' -e '/foo/p'
# no […] in the line
  echo -e "line one\nli[ne] two" | sed --silent --expression '/[][]\+/!{ p; }'
  echo -e "line one\nli[ne] two" | sed -ne '/[][]\+/!{ p; }'
# pipes
  gcc sourcefile.c 2>&1 | sed -n -e '/warning:/,/error:/p'
  gcc sourcefile.c 2>&1 | sed --silent --expression '/warning:/,/error:/p'
# upper case to lower case and vice versa
  echo qWeRtzzuiPÜ | sed --regexp-extended --expression 's@([[:lower:]]?)([[:upper:]]?)@\U\1\L\2@g' # or
  echo qWeRtzzuiPÜ | sed -re 's@([[:lower:]]?)([[:upper:]]?)@\U\1\L\2@g' # or
  echo qWeRtzzuiPÜ | sed  --expression 's@\([[:lower:]]\?\)\([[:upper:]]\?\)@\U\1\L\2@g'
  echo qWeRtzzuiPÜ | sed  -e 's@\([[:lower:]]\?\)\([[:upper:]]\?\)@\U\1\L\2@g'

Sed help

### Command line syntax ###############
#   file ↘
#    sed -f sed_replacements.sed old_file.txt > new_file.txt
# insert ↘    ↙ file
#    sed -i -f sed_replacements.sed overwritten.txt

### Regular expressions ###############
# note the different (default) regexpr !!!
# summarised: + is + ? is ? ( is ( { is { | is | → all no expressions!
#   (..) → \(\)    reference 
#   ?    → \?      0 or 1
#   .+   → .\+     1 or many
#   .*   → .*      0 or many
#   [..] → [..]    character definition range
#   {..} → \{..\}
#   | → \| means “or”

### Search and replace ################
#  ↙ search          ↙ global scope
# s/search/replace/g;
# s/\(search\)/\U\1\E/g; # finds “search” → “SEARCH” in \U (upper case) \E stops transformation
# s/\(SEARCH\)/\L\1\E/g; # finds “SEARCH” → “search” in \L (lower case) \E stops transformation

### Search and replace (address) ######
#   /address/s/search/replace/g
# NOT-matched or everything except “address”
#   /address/!s/search/replace/g
# address can be: 1,$ (1st line to the end) or a search pattern

### multiline search with “address” ###
#               “address”
#┌─────────────────┴─────────────────────┐
#   first line   append \n  second line
#┌───────┴────────┐ ↓  ┌────────┴────────┐ 
/first line pattern/N;/second line pattern/{
  # do something with the found pattern
  #     search                replace       global scope
  #       ↓                      ↓          ↓
  s@first line\nsecond line@replace pattern@g
}


Ersetzungen mit MediaWiki-XML-Export

Siehe Sed help.

 ######### MediaWiki Export ##########
 #  don't replace!!
 #  &amp;nbsp; → &nbps;
 #  &lt;br /&gt; → <br/>
 #####################################
 
 # set username
 # syntax explained
 #       first line           and append \n and second line
 #┌───────────┴───────────────────┐ ↓  ┌───────────┴──────────────┐ 
 /\(<username>\).\+\(<\/username>\)/N;/\(.\+<id>\)[0-9]\+\(<\/id>\)/{
   # do something with the found pattern
   # replace user name and user ID now with "\n"
   s@\(<username>\).\+\(</username>\n.\+<id>\)[0-9]\+\(</id>\)@\1Ihr Name\2123\3@g
 }
 
 # comment
 /<comment>/,/<\/comment>/{
   :label_add_newlines
     N; # add newlines as '\n'
   # if line contains not (!) '</comment>' go (b)ack to label_add_newlines
   /<\/comment>/!b label_add_newlines 
   s@\(<comment>\).\+\(</comment>\)@\1new comment\2@g;
 }
 
 # delete revision
 /\(<revision>\)/N;/\( \+<id>\)[0-9]\+\(<\/id>\)/{
 s@\(<revision>\n \+<id>\)[0-9]\+\(<\/id>\)@\1\2@g
 }
 
 # timestamp  (-2 hours)
 s@\(<timestamp>\).\+\(</timestamp>\)@\12011-09-08T10:12:11Z\2@g
 
 # do your replacements here
 /^|\(Stichworte\|Ort\) *=.\+$/{
   s@,@;@g
 }

All die eben aufgezeigten Anweisungen funktionieren nur, wenn das Suchmuster das Gesuchte in einer Zeile finden kann. Dies ist bei sed halt so, daß es nur in einzelnen Zeilen findet. Sucht man über mehrere Zeilen hinweg, muß man all diese Zeilen mit N aneinanderhängen, so daß sie im sog. pattern-space zu einer Zeile werden. Dann führt man die Ersetzungen durch und läßt es wieder ausgeben. Dies wird mit labels erreicht, auf die wieder zurückgegriffen werden kann. Wenn man im Falle von <text…</text> eine Weiterleitung erstellen will, die aber den Titel benötogt, sieht das wie folgt aus:

# beware!! it corrupts the history!!! 
# create a REDIRCET based on the title
/<title>.*<\/title>/ h # save the found title to the hold space

/<text/ { # start at <text
:labelTextStart # set a marker/label to cycle back later on <text
  N  # append new lines
  /<\/text>/!b labelTextStart # if it is not </text> cycle back to labelTextStart
  # all <text…</text> is now in ONE single line!!
  s@<text.*</text>@@ # replace it
  x # exchange hold space and pattern space (get the title)
  s@<title>\(.*\)<\/title>@<text xml:space="preserve">[[REDIRCET: New page \1]]</text>@
}

Split a file into separate single files according to its content: multiple {{Metadata | … | … }}

Let’s say we want to split all templates of {{Metadata | … | … }} contained in a single file to separate files according to the content of the very template text data. So we try to find and use the highlighted text data:

{{Metadata
| Type = StillImage
| Description ={{Metadata/Description | language code=en | content=Drabkina 1999: A selection of 114 drawings, redrawn by M. Drabkina after the publication by Punithalingam (1988): E. Punithalingam: Ascochyta. II. Species on monocotyledons (excluding grasses), cryptogams and gymnosperms. Mycological Papers, 159 (1988), pp. 1–235.}}
| Provider Page = Julius Kühn Institute – Federal Research Institute for Cultivated Plants
| Title = Ascochyta acori
| Resource ID = 362574154
| Copyright Statement = © M. Drabkina 1999
| License Statement = Creative Commons non-commercial, by attribution, share-alike license (version 2.5)
| Creators = M. Drabkina;
| Metadata Creator = G. Hagedorn
| Language = zxx
| Metadata Language = en
| Original Creation Date = 12.10.2001
| Country Codes = global
| Subject Category = Deuteromycota
| Taxonomic Coverage = Ascochyta
| Scientific Names = Ascochyta acori Oudem.
| Taxon Count = 1
| Vernacular Names = 
| General Keywords = 
| Best Quality URI = http://212.201.100.117/storage/Fungi/Coelos/Punithalingam%20Ascochyta/edt/AscP88-002.png
| Best Quality Availability = online (free)
}}

{{Metadata
| Type = StillImage
| Description ={{Metadata/Description | language code=en | content=Drabkina 1999: A selection of 114 drawings, redrawn by M. Drabkina after the publication by Punithalingam (1988): E. Punithalingam: Ascochyta. II. Species on monocotyledons (excluding grasses), cryptogams and gymnosperms. Mycological Papers, 159 (1988), pp. 1–235.}}
| Provider Page = Julius Kühn Institute – Federal Research Institute for Cultivated Plants
| Title = Ascochyta acori
| Resource ID = 601700422
| Copyright Statement = © M. Drabkina 1999
| License Statement = Creative Commons non-commercial, by attribution, share-alike license (version 2.5)
| Creators = M. Drabkina;
| Metadata Creator = G. Hagedorn
| Language = zxx
| Metadata Language = en
| Original Creation Date = 12.10.2001
| Country Codes = global
| Subject Category = Deuteromycota
| Taxonomic Coverage = Ascochyta
| Scientific Names = Ascochyta acori Oudem.
| Taxon Count = 1
| Vernacular Names = 
| General Keywords = 
| Best Quality URI = http://212.201.100.117/storage/Fungi/Coelos/Punithalingam%20Ascochyta/edt/AscP88-001.png
| Best Quality Availability = online (free)
}}

Now we extract those highlighted text data modify them a little and finally we want to get a sed command line that writes the exact (found) line numbers into a file. This sed syntax should look like:

sed "42,64w filename.txt" # from line 42 to (,) line 64 (w)rite to file “filename.txt”

Note that BASH allows you to concatenate output by using command | next-command | next-command. So multiple modifications can be concatenated. With sed it can be done as follows:

# store $source_metadata_file
source_metadata_file="Drabkina 1999: selected drawings after Punithalingam (1988): Ascochyta on monocotyledons.mwt";
dataset_directory="Drabkina_1999"
# proceed whit reading the source file and apply sed commands to it and process its out put further (after |) ...
sed --regexp-extended --silent '
/^\{\{Metadata$/,/^\}\}$/ { # find lines of only “<line-start>{{Metadata<line-end>” to  “<line-start>}}<line-end>”
  /^\{\{Metadata$/{   # at line “<line-start>{{Metadata<line-end>”
    =; # print line number (=) 
  };
  /Best Quality URI/{ # at line with Best Quality URI
    # extract the file basename, strip and (s)ubstitute extension to a medawiki-text file (.mwt)...
    s@.*Best Quality URI.*/([^/]+)\.[a-z]{3,4}$@\1.mwt@g; 
    p; # (p)rint result (new file basename)
  };
  /^\}\}$/{# at line  “<line-start>}}<line-end>” 
    =; # print the line number (=)
  };
  ######################
  # we should get an output like:
  # 18
  # AscP88-002.mwt
  # 40
  # 42
  # AscP88-001.mwt
  # 64
  # ...
  ######################
  # now this output can be processed further concatenate the next BASH-command sequence by | 
}' "$source_metadata_file" | sed --regexp-extended --silent "# process the preceding output further
/^[0-9]+\$/{ # at “<line-start>any number/numbers<line-end>”
  N; # append (N)ext line by \n
  N; # append (N)ext line by \n
  # subsitute “<line-start>any-numbers (\n)ewline anything (\n)ewline any-numbers<line-end>”
  # by a comment what it will do and the very sed command ...
  s@^([0-9]+)\n(.+)\n([0-9]+)\$@echo 'from line \1 to \3 write to file ./$dataset_directory/\2'\nsed --silent '\1,\3w ./$dataset_directory/\2' '$source_metadata_file'@;
  p; # (p)rint
};" > split_file_at_metadata.sh

Check the file split_file_at_metadata.sh. We should get content of this file like:

######################
head split_file_at_metadata.sh
######################
echo 'from line 18 to 40 write to file ./Drabkina_1999/AscP88-002.mwt'
sed --silent '18,40w ./Drabkina_1999/AscP88-002.mwt'
echo 'from line 42 to 64 write to file ./Drabkina_1999/AscP88-001.mwt'
sed --silent '42,64w ./Drabkina_1999/AscP88-001.mwt'

Enable execution-rights of file split_file_at_metadata.sh and execute it:

chmod ugo+x split_file_at_metadata.sh
./split_file_at_metadata.sh # execute above commands
Quelle: Offene Naturführer, Das Wiki zu Bestimmungsfragen: Benutzer:Andreas Plank/Sed (Zuletzt geändert:
Dieses Attribut ist ein Spezialattribut in diesem Wiki.
23 November 2020 12:09:34). Abgerufen am 23. Dezember 2024, 10:27 von https://offene-naturfuehrer.de/web/Benutzer:Andreas_Plank/Sed