Sunday, March 20, 2016

match incremental input with predefined dictionary

Requirement:

1) a dictionary with many words (String) without duplicates
2) an input feed with many continuous characters (String)
3) input keep growing (append new character) if any word in the dictionary is a prefix (including equal) of the input stream, in this case the word need to be recorded, and continue 3); if none is a prefix of the input stream, then input terminate, return the result

Looks like a Trie can be used for such task, for every iteration when input grows, check if we can find input stream from the dictionary (converted to a Trie); however, it's not easy to implement efficient and pure functional Trie in Haskell. Thus here the alternative.

The idea is for every iteration, we keep reduce the candiates, if any candiates has a full match, we then could return the candiates; if there's no candidates, we then could let the caller terminate input stream.

candidate is defined as:

-- type Candidate = (String, [String])

The first part is the prefix already matched, second part is the substring of the candidates with prefix (first) removed;

result is aggregated by a Writer Monad;
return value could be either one of:

data MatchStatus = More    -- we have candidates, keep growing input stream;
                 | Done    -- No more candidate, terminate input
                   deriving Show


The monadic code makes the logic quite handy.

incr c = do
  (matched, rems) <- br="" get="">  let rems' = fmap tail . filter (\xs -> if null xs then False else if (head xs) /= c then False else True) $ rems
      matched' = c:matched
  put (matched', rems')
  when (any null rems') ( tell (Seq.singleton (reverse matched')) )
  return $! if null rems' then Done else More
<- br="" get="">
<- br="" get=""><- br="" get="">
-- dict1 = ["xde", "xd", "efgh"]
-- runIdentity . runRWST (incr 'x' >> incr 'd' >> incr 'e' >> incr 'f') 0 $ ("", dict1)

--   (Done,("fedx",[]),fromList ["xd","xde"])

Saturday, March 19, 2016

C interview quiz: how to reverse a string

So I heard someone was asked how to reverse a C string maybe in a coding interview. seems pretty easy? At least it's very easy come into some solution like this:

char* reverse(char* s, int n)
{
 int i;
 char ch;

 for (i = 0; i < n / 2; i++) {
   ch = s[i];
   s[i] = s[n - i -1];
   s[n -i - 1] = ch;
 }

 return s;
}

Well, this might can pass the coding test, for a serious C programmer, I don't think that's the right solution. The problem is as a low level C programmer, you need think like machines, machines do terribly when handling with bytes, but much better with words. Keep this in mind, we could find a much better solution (even it's longer):

static inline void swap64(long long* x, long long* y)
{
 long long t = *x;
 *x = *y;
 *y = t;
}

static inline void swap32(int* x, int* y)
{
 int t = *x;
 *x = *y;
 *y = t;
}

static inline void swap16(short* x, short* y)
{
 short t = *x;
 *x = *y;
 *y = t;
}

static inline void swap8(char* x, char* y)
{
 char t = *x;
 *x = *y;
 *y = t;
}

static inline char* fastrev(char* s, int i, int j)
{
 while (i + 16 <= j) {
   long long *l1 = (long long*)(s+i);
   long long *l2 = (long long*)(s+j-8);
   *l1 = bswap_64(*l1);
   *l2 = bswap_64(*l2);
   swap64(l1, l2);
   i += 8;
   j -= 8;
 }
 while(i + 8 <= j) {
   int* i1 = (int*)(s+i);
   int* i2 = (int*)(s+j-4);
   *i1 = bswap_32(*i1);
   *i2 = bswap_32(*i2);
   swap32(i1, i2);
   i += 4;
   j -= 4;
 }
 while(i + 4 <= j) {
   short* i1 = (short*)(s+i);
   short* i2 = (short*)(s+j-2);
   *i1 = bswap_16(*i1);
   *i2 = bswap_16(*i2);
   swap16(i1, i2);
   i += 2;
   j -= 2;
 }
 while (i < j) {
   char* c1 = s+i;
   char* c2 = s+j-1;
   swap8(c1, c2);
   i++;
   j--;
 }

 return s;
}

char* reverse(char* s, int n)
{
 return fastrev(s, 0, n);
}


(I didn't use swap macro with the well-known `xor` trick, because `xor` on non-words doesn't necessarily faster).

The 2nd solution is 3 times faster on my computer when revering 10^6 bytes string repeated 1000 times (have to do this, otherwise the bottle neck is I/O). Main function:

int main(int argc, char* argv[])
{
 char* line;
 size_t n;
 ssize_t k;
 int loop, loopcnt = 1000;
 int nt;

 scanf("%d\n", &nt);

 while(nt--) {
   k = getline(&line, &n, stdin);
   line[k-1] = '\0';
   loop = loopcnt;
   while(loop--) {
     reverse(line, k-1);
   }
   printf("%s\n", reverse(line, k-1));
 }

 free(line);
 return 0;
}
Timing Result:

[wangbj@nuc tmp]$ ls -l /tmp/i5.txt   
-rw-r--r-- 1 wangbj users 1000003 Mar 19 00:43 /tmp/i5.txt
[wangbj@nuc tmp]$ time cat /tmp/i5.txt | ./rev > /dev/null       

real    0m0.637s
user    0m0.621s
sys     0m0.015s
[wangbj@nuc tmp]$ time cat /tmp/i5.txt | ./rev2 > /dev/null

real    0m0.206s
user    0m0.197s
sys     0m0.015s

While there are so much big tech asking algorithm questions (problem is 90% of exactly the same questions can be found online with answers..), reverse a string might not a laughable problem at all!

Thursday, March 10, 2016

Binary tree right side view with Haskell


data BTree a = Leaf | Branch a !(BTree a) !(BTree a) deriving Show

expand Leaf = return (mempty, [])
expand (Branch a l r) = return (Alt (Just a), [r, l])

rightSideView = mapM (getAlt . mconcat) . init . levels . runIdentity . unfoldTreeM_BF expand