Thursday, September 30, 2010

Insert data into MySQL from a CSV file

LOAD DATA INFILE '/var/lib/mysql/btdata.csv' INTO TABLE btpl.parset FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' (idsystem, name, dataurl, description);

http://dev.mysql.com/doc/refman/5.1/en/load-data.html

mysqlimport — A Data Import Program

The mysqlimport client provides a command-line interface to the LOAD DATA INFILE SQL statement. Most options to mysqlimport correspond directly to clauses of LOAD DATA INFILE syntax.

Invoke mysqlimport like this:

shell> mysqlimport [options] db_name textfile1 [textfile2 ...]

Command invocation for a CSV file exported from Excel:

mysqlimport --fields-terminated-by=, --ignore-lines=1 -u root -p rolling mbf_symbols.csv

For each text file named on the command line, mysqlimport strips any extension from the file name and uses the result to determine the name of the table into which to import the file's contents. For example, files named patient.txt, patient.text, and patient all would be imported into a table named patient.

http://dev.mysql.com/doc/refman/5.1/en/mysqlimport.html

Creating a filename using the output of a command in MSDOS

From Microsoft Windows XP Documentation:

Parsing output

You can use the for /F command to parse the output of a command by making the filenameset between the parenthesis a back quoted string. It is treated as a command line, which is passed to a child Cmd.exe and the output is captured into memory and parsed as if it were a file.

For example, a batch file like the one that follows:


@echo off
rem Creates the yyyymmdd.csv file from the statement text file
rem The date for the filename is taken from the statement text file
rem with a perl program called getdate.pl that returns the date throgh
rem standard output
FOR /F "usebackq" %%d IN (`getdate.pl statement.txt`) DO @statement.pl statement.txt > %%d.csv

Wednesday, September 29, 2010

Get public advisories from the National Hurricane Center

#!/usr/bin/perl
# Download the public advisory from National Hurricane Center


use strict;
use warnings;


use Net::FTP;


my $FtpServer = 'ftp.nhc.noaa.gov';
my $RemoteDir = '/pub/products/nhc/public';
my $LocalFileName = '/home/janeiros/chk/txt/nhc.txt';


# Get the modification time of the local file
my $LocalFileMDTM = (stat($LocalFileName))[9];


my $Ftp = Net::FTP->new($FtpServer, Passive => 1, Debug => 0)
        or die "Couldn't no connect to $FtpServer: $!\n";


$Ftp->login('anonymous', 'jesus.aneiros@gmail.com')
        or die "Couldn't logon to $FtpServer: $!\n";


$Ftp->cwd($RemoteDir);


$Ftp->ascii();


# Get the file name of the last advisory
my $RemoteFileName = ($Ftp->ls())[-1];


# Get the last modification time for the remote file
my $RemoteFileMDTM = $Ftp->mdtm($RemoteFileName);


# Get the file only if it is older than the local
if (!defined($LocalFileMDTM) || $RemoteFileMDTM > $LocalFileMDTM) {
        $Ftp->get($RemoteFileName, $LocalFileName);
}


$Ftp->quit;


__END__

SED instead of AWK

Some days ago I was presented with the following problem:
Two days ago I was confronted with a simple task: We have around 50 CSV files that should be concatenated into one file. Every file has a header at the first line that should be eliminated except for the first one.
My solution was to use AWK. Although my practical experience with SED is very limited I knew since the beginning that the problem could be solved with the SED editor. Today, six days after the AWK solution I found this SED solution:

sed -n '1p;/^set,/d;p' *.csv > all

The first sentence of the SED program print the first line of the input stream then continues with the second sentence that deletes a line when it contains the header, the magic here is that the d command stop processing the rest of the program's lines and start the program again reading the next line from the input stream, the last sentence always print the line.

After the first solution I found this one which is simpler, I should devote more time to SED's study!

sed -n '1p;/^set,/!p' *.csv > all

http://www.gnu.org/software/sed/manual/sed.html

Thursday, September 23, 2010

Create WAR file from Eclipse project

  1. Open the command line window.
  2. Go to the war directory of the project.
  3. Run the following command:
    • jar cvf name_for_deploy.war .
  4. Deploy the war with Tomcat or Glassfish.

Avoid that grep command in grep output

Hello everyone,

I'm planning to send everyday at least a message to the list with something interesting about Linux. This is the first one.

One week ago I bought the book "Official Ubuntu Server Book, The (2nd Edition)". Being a command line fan I went directly to the section "Cool tips and tricks". I started reading and I found out that the explanation of the first trick is in a certain way wrong. 

The idea behind the trick is to avoid the appearance of grep in resulting list for a command like:

janeiros@harlie:~$ ps ax | grep bash
13235 tty1     S      0:00 -bash
13256 tty1     S+     0:00 /bin/bash /usr/bin/startx
16911 pts/0    Ss+    0:00 -bash
17005 pts/1    Ss     0:00 -bash
17152 pts/1    S+     0:00 grep --color=auto bash

The authors suggest a "trick" running the command in this form:

janeiros@harlie:~$ ps ax | grep [b]ash
 How come when I try the trick is OK the first time and wrong at the second one?

janeiros@harlie:~$ ps ax | grep [b]ash
13235 tty1     S      0:00 -bash
13256 tty1     S+     0:00 /bin/bash /usr/bin/startx
16911 pts/0    Ss+    0:00 -bash
17005 pts/1    Ss     0:00 -bash

You can see grep is gone.

And now is back:

janeiros@harlie:~$ ps ax | grep [b]ash
13235 tty1     S      0:00 -bash
13256 tty1     S+     0:00 /bin/bash /usr/bin/startx
16911 pts/0    Ss+    0:00 -bash
17005 pts/1    Ss     0:00 -bash
17185 pts/1    S+     0:00 grep --color=auto bash

Of course, I did something in between that provoked the change! What was it? Let see if somebody find out what was. The authors' explanation for the trick is not exact!

The explanation is right there at the command line!

If nobody finds out I'll tell you later today.

By the way there is an alternative to the trick but it's longer.

The outputs of the commands weren't edited at all.

And by the way, something the author didn't tell: In Linux we can avoid the whole grep filtering with a simple ps -C

janeiros@harlie:~$ ps -C bash
  PID TTY          TIME CMD
13235 tty1     00:00:00 bash
17798 pts/0    00:00:00 bash

Of course, the output is a little bit different than the one from ps ax.

Hello,

The cause that is causing the trick to fail the second time is the existence in the current directory of a file with the name bash.

You can test it by going to a directory where you have write permissions, run the command the first time (ps -ef | grep [b]ash), then run the command touch bash and finally run ps -ef | grep [b]ash to see it fail.

The reason the command fails is due to something call "file expansion"at the shell level.

In the book, the authors say, and I quote:

"This works because of the power of regular expressions. When I surround the first character with brackets, I’m telling grep to search within a character class."

That's partially true 'cause they are forgetting that the shell is in the middle doing some "massage" to the argument, that massage is called "file name expansion." When the shell sees an argument with one of the special characters * or ? or [ it considers it as a pattern and replaces it with an alphabetical sorted list of filenames. If there is no filename that fits the pattern then the argument is left without change (depending of the shell option nullglob).

So, if you try to use the trick but in the directory exists a file that fits the pattern then the shell is gonna do the substitution and feed the command (grep) with the filename, bash in this case, instead of [b]ash. One solution to avoid the pattern scanning is using double quotes around the argument.

Below is a small script that let you test the whole thing:

#!/bin/bash

echo "$1"       # Don't forget to type the double quotes!

##END##

You can run the script with the [b]ash without the file bash in the directory and with the file in it and see the argument that the script is receiving from the shell.

Have a good night.
-- 
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

Servers, clients, logs and shell arithmetics

Hello everyone,

Let's say that we have a client that connects to a server through the network. The application's protocol specifies that the the client should maintain a session and send a message (heartbeat) every 30 seconds to keep the session alive. We have a log file with lines like the one below:

[2010-09-16 08:41:01.966] CLD -> HeartBeat-Sent

If we want to see if everything is OK we could take the last line of the file and examine the timestamp against the local time of the machine. I decide to take only the minutes and seconds and create a number like 1401 and compare that number against the local time in the same format. The line of the shell script that do that part is below:

if [ $(( ${TIME_HERE} - ${TIME_THERE} )) -gt ${INTERVAL} ]
then
    STATUS="ERROR"
else
    STATUS="OK"
fi

There is a potential runtime error in that block of code. Any idea what could be wrong? Assume both machines are time synchronized to the milliseconds resolution.

The problem in that sentence is related to the way the shell interprets integers. When just one of  the operands in the expression (( ${TIME_HERE} - ${TIME_THERE} )) is something like 0008, the shell cries out.

The reason for that cry is that the shell thinks the number is an octal number so the 8 is not a valid digit in that base (0..7). You can see it with a command like:

janeiros@harlie:~$ echo "$(( 1000 - 0008 ))"
-bash: 1000 - 0008: value too great for base (error token is "0008")

You can also verify that the shell is doing octal arithmetic with something like this:

janeiros@harlie:~$ echo "$(( 1000 - 0011 ))"
991

All this could be solve by using the expr command:

janeiros@harlie:~$ echo "$(expr 1000 - 0011)"
989

Octal numbers bring good memories to me, from the time I was in college and I got in touch with the first computer in my life, it was around the beginning of the Eighties. That computer was so primitive that it doesn't have a boot loader, well it doesn't have a disk either! So you have to load a small set of instruction to indicate the paper tape reader to read the monitor program 'cause the machine doesn't even have an operating system! The set of instructions to do that was in the form of octal number that you have to load in the machine as a binary digit representation of the octal number. For doing that you have a series of switches that you can switch to ON/OFF (0,1).

Later I got in touch with UNIX (1991), Interactive UNIX to be more precise (http://en.wikipedia.org/wiki/INTERACTIVE_UNIX). In UNIX I discovered that my useless knowledge of the octal representation was in fact very useful at the time of working with absolute file permission representation in chmod!

Have a good day!

--
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

AWK

Hello everyone,

Two days ago I was confronted with a simple task: We have around 50 CSV files that should be concatenated into one file. Every file has a header at the first line that should be eliminated except for the first one. To solve the problem I used  a different way that the one that I will present to you today: streamed the files through sed deleting all the headers and putting the result into a temp file and then add a header to that file but today I was thinking that it could be a good opportunity to use AWK [1] for the task and write to the list about it.

Let's see how defines a program in AWK one of its creators, Alfred Aho: "An AWK program is of a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed."

So we could create a program in AWK that prints the header only one time (very easy task with the use of a flag) discarding all the others headers and printing all the others lines. This is the solution that I found:


/^set,/ { if (!printed) { print; printed = 1 } next }
        { print }

And it was run this way:

awk -f p *.csv > all

p is the file with the AWK program.

The first line of the program matches a header, prints it and sets a flag var then stops processing the current record and reads the next record starting with the first pattern again. The second line of the program is always executed printing the record (line) with the exception.

These are the times for a total of 160K lines:

real    0m0.116s
user    0m0.060s
sys     0m0.052s

Have a good night and remember, TIMTOWTDI.


--
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

Tuesday, September 21, 2010

Outlook, REGEX and Attachments

Sub SaveAttachment(MyMail As MailItem)
    Dim strID As String
    Dim objMail As Outlook.MailItem
    Dim strFileName As String
    Dim objAttachments As Outlook.Attachments
    Dim strFilePath As String
  
    strFilePath = "C:\Files\"
  
    strID = MyMail.EntryID
    Set objMail = Application.Session.GetItemFromID(strID)
  
    ' Extract the date from the Subject and create the filename
    strFileName = REDate(objMail.Subject) & ".txt"
  
    Set objAttachments = objMail.Attachments
  
    ' Save the file with the name it was attached
    objAttachments.Item(1).SaveAsFile strFilePath & _
        objAttachments.Item(1).FileName
      
    ' Save the file with name in the format yyymmdd taken from the Subject line
    objAttachments.Item(1).SaveAsFile strFilePath & _
        strFileName

End Sub

Function REDate(strData As String) As String
    Dim RE As Object, REMatches As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = False
        .IgnoreCase = True
        .Pattern = "[0-9]{8}"
    End With
  
    Set REMatches = RE.Execute(strData)
    REDate = REMatches(0)

End Function

Perl, REGEX, modifiers and arrays

Suppose you have a file like this:

 1* 2* 3 4 5 6
10 20 30* 40 50* 60
100 200 300 400* 500 600
1000 2000* 3000 4000 5000* 6000*

You need to extract the group of numbers per line that have a * associated, the result should be like this:

1 2
30 50
400
2000 5000 6000

Could you find a language that you obtain the same result with less lines of code than Perl?

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {

chomp;

my @Data = /(\d+)\*/g;

print "@Data\n";
}

__END__


You can do it even with a one-liner:

janeiros@harlie:/media/disk$ perl -ne '@Data = /(\d+)\*/g; print "@Data\n"' data.txt
1 2
30 50
400
2000 5000 6000

From this example:

- The context, Perl assigning is sensible to the left part of the assignment sentence.
- The modifiers for the REGEXes, like the g in this case.
- The grouping for the extraction, the parentheses.

--
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

MySQL and REGEX

"A regular expression is a powerful way of specifying a pattern for a complex search." (Excerpt from MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/regexp.html)

The financial institutions use letters to identify the months of the year in the following way:

F - January
G - February
H - March
J - April
K - May
M - June
N - July
Q - August
U - September
V - October
X - November
Z - December

So you can have a date in the form F10 which identifies January 2010 or J11 which identifies April 2011.

Let say you have a prices table in a MySQL database and you want to check that the values in the field contract in that table are correct according to the following rules:

- The first character in the field should be a letter valid for a month (F - January, G - February, etc).
- The second and third chars in the field should be a decimal digit (0..9).

You need to specify a query in SQL for MySQL that identifies all the records that have a wrong contract according to the above rules.

There are several ways of doing this in MySQL, one could be a query like the following:

SELECT * FROM prices
WHERE substr(contract, 1, 1) NOT IN ('F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z')
OR substr(contract, 2, 1) NOT IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
OR substr(contract, 3, 1) NOT IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);

Other query could be:

SELECT * FROM prices
WHERE substr(contract, 1, 1) NOT IN ('F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z')
OR substr(contract, 2, 1) NOT BETWEEN 0 AND 9
OR substr(contract, 3, 1) NOT BETWEEN 0 AND 9;

As you can see the queries are long so here is where you can use the power of REGEX (Regular Expressions) that MySQL implements as an extension to the SQL language.

Using REGEX the query will be:

SELECT * FROM prices
WHERE contract NOT REGEXP '^[FGHJKMNQUVXZ][0-9]{2}';

The REGEX expression is negated (NOT REGEXP) so the query is going to return values in the field contract of the prices table that don't have at the beginning (the symbol ^) a letter F or G or H, etc. (the part which is specified between square brackets, which is called a class) and 2 digits (the class [0-9] with the number of occurrences, part {2}).

As you can see REGEX provides a concise and flexible way for matching strings of text against a certain pattern.

Have a good day.

--

J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

Sunday, September 19, 2010

By the evidence


"When I think back ... of the serval cat and a baboon that I had as pets in my childhood days−and that eventually I had to house in large cages−it makes me sad. It makes me sadder still, however, and also very angry, when I think of the innumerable adult animals and birds deliberately caught and locked up for the so-called 'pleasure' and 'education' of thoughtless human beings. ... surely there are today so many first-class films ... that the cruelty of keeping wild creatures in zoos should no longer be tolerated."
From L.S.B. Leakey, By the Evidence, Chapter 4.

Animal welfare

“Thousands of people who say they 'love' animals sit down once or twice a day to enjoy the flesh of creatures who have been utterly deprived of everything that could make their lives worth living and who endured the awful suffering and the terror of the abattoirs--and the journey to get there--before finally leaving their miserable world, only too often after a painful death.”
Jane Goodall

Friday, September 10, 2010

God is Able

"What I do know is that I am clinging to God's power to help me through these days of grief and sorrow. This power is my hope for the future. I am presently weak, but I know that God is strong and is able to carry me through. How do I know? The Bible tells me so. I am supported by God's comforting presence that I experience in loving acts of kindness from God's people."


Janet Huff (Illinois, USA)


Taken from the 9/10/2010 Daily Devotional of the Upper Room.

Music I like to listen

Wednesday, September 8, 2010

Using MySQL temporary table

1. Create temporary table with the settlements table structure:

CREATE TEMPORARY TABLE sett LIKE settlements;

2. Insert into the temporary table the data from the settlements for a certain date:

INSERT INTO sett (date,symbol,contract,price)
SELECT date,symbol,contract,price
FROM pltracker.settlements
WHERE date = '2010-09-03';

2a. You can do the previous two steps in only one:

CREATE TEMPORARY TABLE sett
SELECT date,symbol,contract,price
FROM settlements
WHERE date = '2010-09-03';

3. Update the date of the data at the temporary table for the specific date of the settlements:

UPDATE sett
SET date = '2010-09-06'
WHERE date = '2010-09-03';

3a. Since you are updating all the data in the table you can use this SQL statement instead:

UPDATE sett
SET date = '2010-09-06';

4. Insert data back into the settlements table:

INSERT INTO settlements (date,symbol,contract,price)
SELECT *
FROM sett;

5. Check that the new data is in place at the settlements table:

SELECT * FROM settlements
WHERE date = '2010-09-06';

6. The temporary table will be automatically drop when you close the session or you can drop it as any other table:

DROP TABLE sett;

Thursday, September 2, 2010

Listing Multicast group memberships

In Windows XP:

C:\>netsh interface ip show joins


Interface Addr   Multicast Group
---------------  ---------------
172.17.17.223    224.0.0.1
172.17.17.223    239.255.255.250

In Solaris and other UNIXes:


$ netstat -gn
IPv6/IPv4 Group Memberships
Interface       RefCnt Group
--------------- ------ ---------------------
lo              1      224.0.0.1
eth0            1      224.0.0.1
lo              1      ff02::1
eth0            1      ff02::1:ff09:ac30
eth0            1      ff02::1