Showing posts with label sed. Show all posts
Showing posts with label sed. Show all posts

Tuesday, October 12, 2010

Using back references in SED

Today I have several CSV files, everyone with a header at the first line and a date at the first fields with the format mm/dd/yyyy. It was necessary to change that field into the format yyyy-mm-dd.

Here there is a Perl solution and also a SED one:

#!/usr/bin/perl
use strict;
use warnings;
while (<>) {
        chomp;
        if (/^[^0-9]/) {
                print "$_\n";
                next;
        }
        my @Fields = split /,/, $_, 2;
        my ($Month, $Day, $Year) = split /\//, $Fields[0];
        print "$Year-$Month-$Day, $Fields[1]\n";
}
__END__

sed -i.bak 's/^\([0-9]\+\)\/\([0-9]\+\)\/\([0-9]\+\)/\3-\1-\2/' *.csv

Yesterday on my way home I realized that the Perl script was horrible, a one-liner should be OK, like the one below:

perl -i.bak -pe 's!(\d+)/(\d+)/(\d+)!\3-\1-\2!' *.csv

Wednesday, September 29, 2010

SED instead of AWK

Some days ago I was presented with the following problem:
Two days ago I was confronted with a simple task: We have around 50 CSV files that should be concatenated into one file. Every file has a header at the first line that should be eliminated except for the first one.
My solution was to use AWK. Although my practical experience with SED is very limited I knew since the beginning that the problem could be solved with the SED editor. Today, six days after the AWK solution I found this SED solution:

sed -n '1p;/^set,/d;p' *.csv > all

The first sentence of the SED program print the first line of the input stream then continues with the second sentence that deletes a line when it contains the header, the magic here is that the d command stop processing the rest of the program's lines and start the program again reading the next line from the input stream, the last sentence always print the line.

After the first solution I found this one which is simpler, I should devote more time to SED's study!

sed -n '1p;/^set,/!p' *.csv > all

http://www.gnu.org/software/sed/manual/sed.html

Friday, August 27, 2010

Counting lines in a file (61960627 lines)

sed -n '$=' file.txt

real    1m9.237s
user    1m8.602s
sys     0m0.631s

perl -ne 'END { print $NR }' file.txt

real    0m13.876s
user    0m13.245s
sys     0m0.630s

awk 'END { print NR }' file.txt

real    0m8.866s
user    0m8.257s
sys     0m0.608s

wc -l file.txt

real    0m2.550s
user    0m1.677s
sys     0m0.873s

wc file.txt

real    3m4.875s
user    3m3.970s
sys     0m0.895s

Thursday, July 22, 2010

Substitute \001 in FIX messages using SED

sed 's/\x01/|/g' /var/log/trail.log

I don't know why \001 is not working in SED.