Showing posts with label REGEX. Show all posts
Showing posts with label REGEX. Show all posts

Sunday, February 28, 2021

Capturing with Perl and REGEX two numbers on a line, one of them is optional

 #!/usr/bin/perl

use strict;
use warnings;

sub get_long_short {
   my ($number, $spaces1, $spaces2) = @_;
   my ($long, $short) = ('', '');

   if (length($spaces1) > length($spaces2))
   {
      $short = $number;
   } else {
      $long = $number;
   }

   return ($long, $short);
}

while (<DATA>) {

   chomp();

   if (my ($b1, $n1, $n2, $b2, $s) = /
      ^(\s*)            # First spaces ($b1)
      ([\d,.]+)         # First number always exists ($n1)
      (?:               # Don't want to capture spaces
      \s+               # in front
      ([\d,.]+)         # Second number ($n2)
      )?                # Second number is optional
      (\s+)             # Space always exists ($b2)
      (\w{2,3})$        # Two or three letters symbol ($s)
      /x)
   {
      my ($long, $short);

      if (defined($n2)) {
         $long = $n1;
         $short = $n2;
      } else {
         ($long, $short) = get_long_short($n1, $b1, $b2);
      }

      $long =~ s/,//g;
      $short =~ s/,//g;

      print("$long|$short|$s\n");

   } else {
      print("NOT MATCHED: $_\n");
   }
}

__END__
long short symbol
12 3  NG
1,234 1,222 CL
1,333       PL
123.4 9,088     HNG
123.4 9,088     BBBB
       90.65 HO
   1   RB
    2  RB
     3 RB
4      RB
  100,000.00     CL
  CL

Wednesday, August 13, 2014

Excel Regular Expressions

Sub RegEx_Tester()
    Dim objRegExp_1 As Object
    Dim regExp_Matches As Object
    
    Dim strToSearch As String
    
    Set objRegExp_1 = CreateObject("vbscript.regexp")
    objRegExp_1.Global = True
    objRegExp_1.IgnoreCase = True
    objRegExp_1.Pattern = "[a-z,A-Z]*@[a-z,A-Z]*.com"
    
    strToSearch = "ABC@xyz.com"
    
    Set regExp_Matches = objRegExp_1.Execute(strToSearch)
    
    If regExp_Matches.Count = 1 Then
        MsgBox ("This string is a valid email address.")
    End If
    
End Sub

Thursday, March 1, 2012

Wireshark match operator

matches   Does the protocol or text string match the given Perl regular expression?

fix.MsgType == "D" and fix.SecurityDesc matches "CLT|NGT"


The "matches" operator allows a filter to apply to a specified Perl-compatible regular expression (PCRE). The "matches" operator is only implemented for protocols and for protocol fields with a text string representation.

More information on PCRE can be found in the pcrepattern(3) man page (Perl Regular Expressions are explained in http://perldoc.perl.org/perlre.html).

Tuesday, September 21, 2010

Outlook, REGEX and Attachments

Sub SaveAttachment(MyMail As MailItem)
    Dim strID As String
    Dim objMail As Outlook.MailItem
    Dim strFileName As String
    Dim objAttachments As Outlook.Attachments
    Dim strFilePath As String
  
    strFilePath = "C:\Files\"
  
    strID = MyMail.EntryID
    Set objMail = Application.Session.GetItemFromID(strID)
  
    ' Extract the date from the Subject and create the filename
    strFileName = REDate(objMail.Subject) & ".txt"
  
    Set objAttachments = objMail.Attachments
  
    ' Save the file with the name it was attached
    objAttachments.Item(1).SaveAsFile strFilePath & _
        objAttachments.Item(1).FileName
      
    ' Save the file with name in the format yyymmdd taken from the Subject line
    objAttachments.Item(1).SaveAsFile strFilePath & _
        strFileName

End Sub

Function REDate(strData As String) As String
    Dim RE As Object, REMatches As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = False
        .IgnoreCase = True
        .Pattern = "[0-9]{8}"
    End With
  
    Set REMatches = RE.Execute(strData)
    REDate = REMatches(0)

End Function

Perl, REGEX, modifiers and arrays

Suppose you have a file like this:

 1* 2* 3 4 5 6
10 20 30* 40 50* 60
100 200 300 400* 500 600
1000 2000* 3000 4000 5000* 6000*

You need to extract the group of numbers per line that have a * associated, the result should be like this:

1 2
30 50
400
2000 5000 6000

Could you find a language that you obtain the same result with less lines of code than Perl?

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {

chomp;

my @Data = /(\d+)\*/g;

print "@Data\n";
}

__END__


You can do it even with a one-liner:

janeiros@harlie:/media/disk$ perl -ne '@Data = /(\d+)\*/g; print "@Data\n"' data.txt
1 2
30 50
400
2000 5000 6000

From this example:

- The context, Perl assigning is sensible to the left part of the assignment sentence.
- The modifiers for the REGEXes, like the g in this case.
- The grouping for the extraction, the parentheses.

--
J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

MySQL and REGEX

"A regular expression is a powerful way of specifying a pattern for a complex search." (Excerpt from MySQL manual: http://dev.mysql.com/doc/refman/5.1/en/regexp.html)

The financial institutions use letters to identify the months of the year in the following way:

F - January
G - February
H - March
J - April
K - May
M - June
N - July
Q - August
U - September
V - October
X - November
Z - December

So you can have a date in the form F10 which identifies January 2010 or J11 which identifies April 2011.

Let say you have a prices table in a MySQL database and you want to check that the values in the field contract in that table are correct according to the following rules:

- The first character in the field should be a letter valid for a month (F - January, G - February, etc).
- The second and third chars in the field should be a decimal digit (0..9).

You need to specify a query in SQL for MySQL that identifies all the records that have a wrong contract according to the above rules.

There are several ways of doing this in MySQL, one could be a query like the following:

SELECT * FROM prices
WHERE substr(contract, 1, 1) NOT IN ('F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z')
OR substr(contract, 2, 1) NOT IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
OR substr(contract, 3, 1) NOT IN (0, 1, 2, 3, 4, 5, 6, 7, 8, 9);

Other query could be:

SELECT * FROM prices
WHERE substr(contract, 1, 1) NOT IN ('F', 'G', 'H', 'J', 'K', 'M', 'N', 'Q', 'U', 'V', 'X', 'Z')
OR substr(contract, 2, 1) NOT BETWEEN 0 AND 9
OR substr(contract, 3, 1) NOT BETWEEN 0 AND 9;

As you can see the queries are long so here is where you can use the power of REGEX (Regular Expressions) that MySQL implements as an extension to the SQL language.

Using REGEX the query will be:

SELECT * FROM prices
WHERE contract NOT REGEXP '^[FGHJKMNQUVXZ][0-9]{2}';

The REGEX expression is negated (NOT REGEXP) so the query is going to return values in the field contract of the prices table that don't have at the beginning (the symbol ^) a letter F or G or H, etc. (the part which is specified between square brackets, which is called a class) and 2 digits (the class [0-9] with the number of occurrences, part {2}).

As you can see REGEX provides a concise and flexible way for matching strings of text against a certain pattern.

Have a good day.

--

J. E. Aneiros
GNU/Linux User #190716 en http://counter.li.org
perl -e '$_=pack(c5,0105,0107,0123,0132,(1<<3)+2);y[A-Z][N-ZA-M];print;'
PK fingerprint: 5179 917E 5B34 F073 E11A  AFB3 4CB3 5301 4A80 F674

Wednesday, May 5, 2010

Validate contract in C# using REGEX

String strToTest;
String rePattern = "^[FGHJKMNQUVXZ][0-9]$";

Console.Write("Enter a String to Test for Contract:");
strToTest = Console.ReadLine();

Regex regexPattern = new Regex(rePattern);
Console.WriteLine(regexPattern.IsMatch(strToTest) == true ? "OK" : "KO");

strToTest = Console.ReadLine();