The qualification round for Facebook Hacker Cup happened last weekend. I was initially tricked by one of the problems. I thought I had solved it - but the next day while browsing some blogs about the problems, my wife discovered a mistake I had made. Fortunately, I had solved the other problem correctly to qualify for the next round.

The problem appears simple enough. You have been given a set of string and you have to find a "

That brute force approach is horribly inefficient. So the quick solution that comes to mind is, just sort all the strings in ascending order, and then iterate over them to concatenate them one after another. That will work with our example. That's what I did. So what's the issue ?

The problem appears simple enough. You have been given a set of string and you have to find a "

**" of these strings. An example makes it perfectly clear. Assume that the only characters in the strings are lowercase characters from 'a' to 'z'.***lexicographically lowest possible concatenation*If the input isOf all the possible ways in which you can concatenate the strings, this will be the first one if you arrange them in ascending order. Conceptually, this is equivalent to saying, find all theabc xyz pqr. The correct output should beabcpqrxyz.

**N!**ways in which the strings can be combined, sort them in ascending order and output the first one.That brute force approach is horribly inefficient. So the quick solution that comes to mind is, just sort all the strings in ascending order, and then iterate over them to concatenate them one after another. That will work with our example. That's what I did. So what's the issue ?

__Problem with natural sort and concat__

Consider the following example case where the input is just these 2 strings[xa xab]

Since"xa" < "xab"a simple sort and append method will producexaxab

This is not correct, as the other possible concatenation is the correct answer.

Becausexabxa < xaxab

And that's the problem.

For any 2 strings S1 and S2, S1 < S2 does NOT imply (S1.S2) < (S2.S1).

This happens only if one string is a starting substring of another. There is NO problem with natural sort and concat in following cases.

For[abc pqrabc]the answer is still in natural order =abcpqrabcwhich is <pqrabcabc

For[abcd abczpqr]the answer is still in natural order =abcdabczpqrwhich is <abczpqrabcd

Even when we have a starting substring, the simple natural sort and concat can be correct.

For[xa xaz]answer is still in natural order =xaxaz.

**Correct solution**

My new solution is still a sort and append. But when the strings are being sorted, simple comparison **Correct solution**

**"if S1 < S2"**should not be used. The correct comparison is

**"if (S1.S2) < (S2.S1)"**. So the strings should be placed in ascending order using the comparison of their concatenations.

Generally speaking, before concatenating the strings should be ordered

[S0 S1 S2 ... Sn]

such that

for any k, the concatenation (S[k-1].S[k]) < (S[k].S[k-1])

Then the entire concatenation would be lowest.

That sounds intuitively correct. Still, since I was tricked once, I just want to make sure that this approach is indeed correct.

Let's say while sorting we decide to order the strings as

To prove that above ordring is indeed correct, we need to make sure that our comparison is transitively correct.

**Informal Proof**

That sounds intuitively correct. Still, since I was tricked once, I just want to make sure that this approach is indeed correct.**Informal Proof**

Let's say while sorting we decide to order the strings as

**[S1 S2]**because**(S1.S2) < (S2.S1)**. Then we check string**S3**, and since**(S2.S3) < (S3.S2)**we decide to put it last as**[S1 S2 S3]**. This will give us**S1.S2.S3**as our answer. It's correct based on comparison of [**S1 S2]**and**[S2 S3].**But what about the comparison of**[S1 S3]**? How do we know that**S3**should not be put ahead of**S1**giving us the answer**S3.S1.S2**?To prove that above ordring is indeed correct, we need to make sure that our comparison is transitively correct.

Prove that, if (S1.S2) < (S2.S1) and (S2.S3) < (S3.S2), then it implies (S1.S2) < (S3.S1)

The only interesting case in our ordered list

**[S0 S1 S2 ... Sn]**is when the natural sorting order is violated for all 3 strings. Any ordering**[S1 S2 S3]**generated by our algorithm is trivially correct**if S1 < S2**. There is no point comparing**S3**to**S1**here. In fact we can simply combine such string pairs into one bigger string, and keep reducing the number of strings till we have a violation of the natural order in at least 3 successive strings.Consider the natual sorting order[xa xab xabc]. But for us, the correct order is[xabc xab xa]. This is the only interesting case. We want to make sure our derived answerxabcxabxais really the lowest.

Another example.

Consider the natual sorting order [xa xab xabz]. But for us, the correct order is[xab xabz xa]. This isNOTintersting as the order for first 2 strings[xab xabz]is just natural ordering. We can reduce this to[xabxabz xa]and now our solution is trivially correct.

To repeat, the natural sorting order of

**[S1 S2]**will be violated only if entire**S1**is a substring of**S2**, and**(S2.S1)****< (S1.S2)**.So let's consider the case generally and name the substrings asMaybe I made it too complicated. But I could not figure out a simpler proof. I can also prove it a bit more formally using induction, but I don't think it's simpler. In any case, the solution is really very simple to implement, and that's what matters !X, YandZ.

S1 = Z.Y.XS2 = Z.YS3 = Z

ClearlyS3 < S2 < S1using natural order, but solution using our comparison came out to be in exactly the opposite order as[S1 S2 S3]. Let's prove it correct.

Also our algorithm decided thatS2.S3 < S3.S2- meaningZ.Y.Z < Z.Z.YRemoving identical substring from the beginignof both, gives usY.Z < Z.Yand this impliesY < Z.

NowS3.S1will beZ.Z.Y.Xand compare this toS1.S2which isZ.Y.X.Z.Y. Since we know thatZ > Y, it follows thatZ.Z > Z.Y. Therefore(S3.S1)is indeed> (S1.S2)and henceS3should not be placed ahead ofS1.

Thank you. Your blog really helped me pass my testcases. (y)

ReplyDelete