Sunday, March 20, 2016

match incremental input with predefined dictionary

Requirement:

1) a dictionary with many words (String) without duplicates
2) an input feed with many continuous characters (String)
3) input keep growing (append new character) if any word in the dictionary is a prefix (including equal) of the input stream, in this case the word need to be recorded, and continue 3); if none is a prefix of the input stream, then input terminate, return the result

Looks like a Trie can be used for such task, for every iteration when input grows, check if we can find input stream from the dictionary (converted to a Trie); however, it's not easy to implement efficient and pure functional Trie in Haskell. Thus here the alternative.

The idea is for every iteration, we keep reduce the candiates, if any candiates has a full match, we then could return the candiates; if there's no candidates, we then could let the caller terminate input stream.

candidate is defined as:

-- type Candidate = (String, [String])

The first part is the prefix already matched, second part is the substring of the candidates with prefix (first) removed;

result is aggregated by a Writer Monad;
return value could be either one of:

data MatchStatus = More    -- we have candidates, keep growing input stream;
                 | Done    -- No more candidate, terminate input
                   deriving Show


The monadic code makes the logic quite handy.

incr c = do
  (matched, rems) <- br="" get="">  let rems' = fmap tail . filter (\xs -> if null xs then False else if (head xs) /= c then False else True) $ rems
      matched' = c:matched
  put (matched', rems')
  when (any null rems') ( tell (Seq.singleton (reverse matched')) )
  return $! if null rems' then Done else More
<- br="" get="">
<- br="" get=""><- br="" get="">
-- dict1 = ["xde", "xd", "efgh"]
-- runIdentity . runRWST (incr 'x' >> incr 'd' >> incr 'e' >> incr 'f') 0 $ ("", dict1)

--   (Done,("fedx",[]),fromList ["xd","xde"])

No comments:

Post a Comment