Creating Logical Search Vectors

Peter MerrittAPLLeave a Comment

We are all well aware of the power of manipulating logical vectors. However, those happy-go-lucky users out there just keep pushing the boundaries (and their luck). Some time back, we were presented with a semi-formatted character vector (from several source files) containing code-delimited sets of data, something like:


“...OpenCodeF$ text number number more text other ClosingCodeF$ stuff stuff number OpenCodeB$ number ClosingCodeB$ number text stuff OpenCodeE$ stuff number stuff CloseCodeE$ text text...”>


And so on. As you can see, the ‘enclosed data’ sets (highlighted above in blue) were occurring anywhere within the text, in any order (the files were the best which could be achieved as extracts of long-defunct – but suddenly vital – database files).

Having run a pre-parser to identify the variable positions of each data item and specific opening and closing pairs, what we had to do was amend the resulting logical selection vector of items to ‘fill-in’ the selection between each pair of open/close codes, but leaving anything outside the ‘pair’ not selected.

So, if after the initial pairs were identified you had a logical set of intermediate answers like:


0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0

What is required after running your brilliant program should end with this:


0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 0

Easy? Sure. Oh, but what about unmatched pairs of logical hits, essentially the ‘end of file’ problem? Check your solution against this:


HITS = 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0

ANS <=  Program (data=HITS) (unmatched=0)

0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0

ANS

ANS <=  Program (data=HITS) (unmatched=1)

0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1

There are of course several different approaches which could be followed (depending on the language in use), but as an ‘APL shop’ we have the advantage of being able to try things in developer or interactive mode. The following is an example of an APL ‘Defined Function’:


DF_lvfill←{    ⍝ fills-in logical vector between pairs of start/end mkrs 
    x←(≠⍵)∨⍵
    (⍺=1):x    ⍝ left argument: ‘1’ if if trailing ´odd flags´ reqd
    (⌽~∧⌽x)∧x    ⍝ if not, turn them off
}

HITS <= 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 1 0

unmatched <= 1


ans <= unmatched DF_lvfill hits

ans

0 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1

The left argument is because the client was supplying different source files, only some of which had a ‘closing’ flag (for the last group) and some did not – so we had to tell the function whether an ‘unpaired’ set of flags was ok for any given file… But that solution was some time ago now, so I also tried a different approach earlier this morning, just to see if it is any better (but mainly because I’m a geek):


df_fill←{
high←⌈/scan←+⍵
⍵∨(2|scan)∧(high≠scan)
}

Note - I have named and created variables for the intermediate info just for clarity; obviously they could easily go on one line – unless we want to compile them, of course… Be interesting to run timers on the solutions. Anyway, this was a real-world problem, but is useful in its own right as an exercise in manipulating logical vectors. If you would like to try any of these solutions (or simply have a go with APL for free), please check-out the following links:

http://www.tryapl.org/

Or for more wide-ranging discussions, check-out: http://www.dyalog.com/mastering-dyalog-apl.htm

And https://sites.google.com/site/baavector/