Friday, 15 July 2011

Perl RegEx Parse block of notes on 10 digit number -



Perl RegEx Parse block of notes on 10 digit number -

ok, here's thing. have note in old sql server text format. puts notes record in 1 big blob of data. need take blob of text , parse out create 1 row each note entry separate columns timestamp, user, , note text. way do can think of utilize regex locate unix timestamp each note , parse on that. know there split function parsing on delimiters, removes delimiter. need parse on \d{10} retain 10 digit number. here sample data.

create table test_table ( job_number number, notes varchar2(4000) ) insert test_table values (12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text') (12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text') (12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')

i see 1 record each note this.

job_number dttm user notes_text ---------- ---------- ---- ---------- 12345 1234567890 username notes text notes text notes text notes text 12345 5468204562 username notes text notes text notes text notes text 12345 1025478510 username notes text notes text notes text notes text 12346 2345678901 username notes text notes text notes text notes text 12346 1523024512 username notes text notes text notes text notes text 12346 1578451236 username notes text notes text notes text notes text 12347 2345678902 username notes text notes text notes text notes text 12347 2365201214 username notes text notes text notes text notes text 12347 1202154215 username notes text notes text notes text notes text

thank help can provide

text::parsewords can handle quoted strings , split on comma. can skip ahead in input using flip-flop operator 1 .. /values/. particular skip method may need revised.

then matter of parsing strings, can done splitting using lookahead assertion , capturing various entries in each substring. regex in split:

my @entries = split /(?<!^)(?=\d{10})/, $data;

has negative lookbehind assertion avoid matching @ start of string ^, , lookahead assertion match 10 numbers. split @ numbers , maintain them.

the data file handle used demonstration, replace <data> <> utilize argument file name.

use strict; utilize warnings; utilize text::parsewords; $format = "%-12s %-12s %-10s %s\n"; # format printing @headers = qw(job_number dttm user notes_text); printf $format, @headers; printf $format, map "-" x length, @headers; # print underline while (<data>) { next while 1 .. /values/; # skip info s/^\(|\)$//g; # remove parentheses ($job, $data) = quotewords('\s*,\s*',0, $_); # parse string @entries = split /(?<!^)(?=\d{10})/, $data; # split entries $entry (@entries) { # parse each entry ($dttm, $user, $notes) = $entry =~ /^(\d+)\s+(\s+)\s+(.*)/; printf $format, $job, $dttm, $user, $entry; } } __data__ create table test_table ( job_number number, notes varchar2(4000) ) insert test_table values (12345, '1234567890 username notes text notes text notes text notes text 5468204562 username notes text notes text notes text notes text 1025478510 username notes text notes text notes text notes text') (12346, '2345678901 username notes text notes text notes text notes text 1523024512 username notes text notes text notes text notes text 1578451236 username notes text notes text notes text notes text') (12347, '2345678902 username notes text notes text notes text notes text 2365201214 username notes text notes text notes text notes text 1202154215 username notes text notes text notes text notes text')

output:

job_number dttm user notes_text ---------- ---- ---- ---------- 12345 1234567890 username 1234567890 username notes text notes text notes text notes text 12345 5468204562 username 5468204562 username notes text notes text notes text notes text 12345 1025478510 username 1025478510 username notes text notes text notes text notes text 12346 2345678901 username 2345678901 username notes text notes text notes text notes text 12346 1523024512 username 1523024512 username notes text notes text notes text notes text 12346 1578451236 username 1578451236 username notes text notes text notes text notes text 12347 2345678902 username 2345678902 username notes text notes text notes text notes text 12347 2365201214 username 2365201214 username notes text notes text notes text notes text 12347 1202154215 username 1202154215 username notes text notes text notes text notes text

regex perl

No comments:

Post a Comment