ORACLCS - Editorial

December 12, 2015, 1:14 am

≫ Next: WAYPA - Editorial

≪ Previous: SEADIV - Editorial

PROBLEM LINK:

Contest
Practice

Author:Praveen Dhinwa
Tester:Sergey Kulik
Editorialist:Kevin Atienza

PREREQUISITES:

longest common subsequence

PROBLEM:

You are given $n$ strings, each of length $m$ and consisting only of the characters a or b. Find another such string such that the longest common subsequence (LCS) of all $n+1$ strings is minimized. Output only this smallest LCS.

QUICK EXPLANATION:

The smallest LCS is always a constant string, i.e. either aaa...a or bbb...b. Thus the answer is the minimum number of as or bs in any string.

EXPLANATION:

First, we introduce a little bit of notation. Let $s$ be a string of length, say, $m$. We write this as $|s| = m$. For $0 \le i \le m$, we denote by $s[i]$ the $i$th character of $s$, and $s_i$ the prefix of $s$ consisting of the first $i$ characters. Thus, $s_m = s$, and $s_0 = \varepsilon$, the empty string. As an example, suppose $s$ is codechef. Then $s_4$ is code and $s[4]$ is c.

Longest common subsequence

The problem of finding the longest common subsequence (LCS) is quite standard, and there is a well known (dynamic programming) algorithm for it: Suppose we want to find the length of the LCS of two strings of length $m$, say $s$ and $t$. Let $f(a,b)$ be the LCS of $s_a$ and $t_b$. Then (the length of) the LCS of $s$ and $t$ themselves is $f(m,m)$. The following gives one way to compute $f$ recursively:

As base cases, we have $f(a,b) = 0$ if $a = 0$ or $b = 0$, because in these cases, one of the strings is empty.
If $a > 0$, $b > 0$ and $s[a] = t[b]$, then $f(a,b) = f(a-1,b-1) + 1$.
If $a > 0$, $b > 0$ and $s[a] \not= t[b]$, then $f(a,b) = \max(f(a-1,b), f(a,b-1))$.

By tabulating all $(m+1)^2$ possible sets of arguments to $f$ and filling the table in the right order, one can achieve an $O((m+1)^2)$, or $O(m^2)$, time algorithm to compute (the length of) the LCS of two strings.

This algorithm can be extended to more strings. However, the complexity becomes greater and greater the more strings you add. For example, for three strings, you would need a three-dimensional array for $f$, so the running time is $O(m^3)$. In general, to compute the LCS of $n$ strings this way would require $O(nm^n)$ time, which is not polynomial-time. In fact, the problem of computing the LCS of a set of strings is NP-complete!

It's not even clear how to brute-force the problem to pass the first subtask. One idea might be to try all binary strings of length $m$, and for each string, compute its LCS together with the other $n$ strings. To compute the LCS, one can use the algorithm above. But there are $n+1$ strings, so the complexity is $O(nm^{n+1})$, which is too slow for the worst case $m = n = 14$.

A faster way would be to simply try all $2^{m+1} - 1$ possible LCSes, i.e. for each string of length at most $m$, check if it is a subsequence of all strings, and get the longest. Checking subsequences can be done in linear time, so this algorithm runs in $O(mn 2^m)$, which is much more acceptable.

However, since there are $2^m$ strings to try, this procedure will be called $2^m$ times and so the overall complexity is $O(mn 4^m)$, which is too large even for the smallest subtask!

$n = 1$

Let's start small. Suppose $n = 1$, that is, suppose there is only one string of length $m$. Our goal is to find another such string such that the length of the LCS is minimized. Clearly this is a special case of the current problem and so we must be able to solve this case.

As an example, suppose the string we have is bbbbaaaaaa. Let's try to concoct a string that minimizes the LCS.

Let's try a simple candidate: aaaaaaaaaa. What is the LCS? Obviously it's aaaaaa, and the length is $6$. So we now have an upper bound for the answer: $6$. But can we do better?

To do better, the first thought is that you can't use too many as, because otherwise you might have aaaaaa as a common subsequence again, so we'd have failed to improve on our bound. In fact, why don't we try avoiding any as at all? Let's try the string bbbbbbbbbb. This way, the LCS is bbbb, which only has length $4$. We now have a better upper bound! But again, can we do better?

In order to do better, similar to the a case, we can't use more than three bs, otherwise bbbb will be a common subsequence, and we wouldn't have improved on our best solution. This means that we can use at most 3 bs. But if there are only up to $3$ bs, then there are at least 7 as, which means aaaaaa will be a common subsequence. And this is longer than bbbb! Thus, we have just shown that it's impossible to improve on the smallest LCS of $4$, and in fact the smallest LCS for this case is $4$.

The nice thing about this argument is that it easily works for any other string! More specifically:

Suppose our string has $a$ as and $b$ bs. (Thus $a + b = m$.)
We can force the LCS to be $\min(a,b)$ in length, by using the string aaa...a or bbb...b, whichever of $a$ or $b$ is smaller respectively.
But we can show that we can't do any better than $\min(a,b)$. Because suppose we found another string of length $m$ with LCS less than $\min(a,b)$. This string cannot have $\min(a,b)$ as, or $\min(a,b)$ bs, otherwise the LCS would have been $\min(a,b)$ (or longer). Thus, there are less than $\min(a,b)$ as and bs in this string. But this means that the total length of the string is less than $\min(a,b) + \min(a,b) \le a + b = m$, a contradiction.
Thus, the answer is $\min(a,b)$.

Thus, we have just shown that answer for a single string is $\min(a(s),b(s))$, where $a(s)$ and $b(s)$ are the number of as and bs in $s$, respectively.

$n > 1$

Now, what if $n > 1$? Similarly to the above, it's immediately clear that we have the following upper bound: $$\min_{\text{string $s$}} \min(a(s),b(s))$$ Where $s$ runs across all $n$ strings in the input. This LCS is achieved by trying the strings aaa...a and bbb...b. Now, this time can we do better?

In fact, we can easily adjust the argument above to show that this is the smallest LCS we can get!

We can force the LCS to be $C = \min_s \min(a(s),b(s))$ in length, by using the string aaa...a or bbb...b, whichever yields the smaller LCS. Note that $a(s) + b(s) = m$ for any string $s$.
But we can show that we can't do any better than $C$. Because suppose we found another string of length $m$ with LCS less than $C$. This string cannot have $C$ as, or $C$ bs, otherwise the LCS would have been $C$ (or longer). Thus, there are less than $C$ as and bs in this string. But this means that the total length of the string is less than $C + C \le a(s) + b(s) = m$ (for any string $s$), a contradiction.
Thus, the answer is $C = \min_s \min(a(s),b(s))$.

The value $\min_s \min(a(s),b(s))$, can be computed by simply counting the as and bs in each string! This runs in linear time in the input.

Here's an implementation in Python:

for cas in xrange(input()):
    A = B = 10**9
    for i in xrange(input()):
        s = raw_input()
        A = min(A, sum(c == 'a' for c in s))
        B = min(B, sum(c == 'b' for c in s))
    print min(A, B)

Here's a shorter (and codegolfier) implementation:

from collections import Counter
for cas in xrange(input()):
    print min(min(Counter(raw_input()+'ab').values()) for i in xrange(input()))-1

Bonus question: Does the above algorithm still work if the alphabet were larger than $\{a,b\}$?

Time Complexity:

$O(nm)$

AUTHOR'S AND TESTER'S SOLUTIONS:

Setter
Tester
Editorialist

↧

WAYPA - Editorial

December 13, 2015, 4:39 pm

≫ Next: CHCINEMA - Editorial

≪ Previous: ORACLCS - Editorial

PROBLEM LINK:

Contest
Practice

Author:Fedor Korobeinikov
Tester:Sergey Kulik
Editorialist:Kevin Atienza

PREREQUISITES:

String hashing, polynomial hashing, tree centroid decomposition, binary search

PROBLEM:

Given an unrooted tree with $N$ nodes, where each edge contains a single digit. What is the length of the longest simple path in the tree such that the sequence of digits along the path is palindromic?

(The original problem asks for the maximum number of nodes in such a path, but here we'll define "length" as the maximum number of edges in such a path. The answer is simply one plus this number.)

QUICK EXPLANATION:

Binary search on the solution (performing two binary searches, one for even answers and another for odd answers). Thus, we want to know for a given length $l$ whether there is a palindromic path of length $l$.

Root the tree at its centroid (so that each child has at most half the size of the whole tree). Then, recursively check for each subtree whether there is a length $l$ palindromic path. Finally, we want to check whether there is a palindromic path passing through the root.

Compute polynomial hashes $H(x)$ and $H_{\text{rev}}(x)$ of each string that begins from the root, and their reverses. Then, for each node $x$ of depth $d$, let $y$ be its $(l-d)$th ancestor. Then there is a palindromic string of length $l$ starting from $x$ and passing through the root if the following things hold:

$H(y) = H_{\text{rev}}(y)$ (because the string from $y$ to the root represents the middle $2d-l$ characters, so it must also be palindromic).
There exists a node $z$ from another subtree such that the hash of the string from $y$ to $x$ is equal to $H(z)$. (This can be checked quickly with the use of sets, after some preprocessing)

If these conditions hold for some node $x$, then we declare that there is a palindromic path of length $l$ passing through the root.

EXPLANATION:

Binary search

As with all problems involving minimizing something, one thing that immediately comes to mind is binary search. The general idea is: Suppose $\phi(x)$ is some statement, and we want to find the largest $x$ such that $\phi(x)$ is true. Let $L < R$ be two numbers such that $\phi(L)$ is true and $\phi(R)$ is false. Then we can perform binary search on the range $[L,R]$ until we find the maximum $x$. But this only works if $\phi(x)$ is monotonic in the following way: $\phi(x) \implies \phi(x-1)$ for all $x$.

In our case, the statement $\phi(x)$ is: "There exists a palindromic path of length $x$ in the tree." A lower bound could be $L = 1$ (every path of length $1$ is palindromic) and an upper bound $R = N$ (the longest path in any tree is $N-1$). However, this statement is not monotonic. For example, The string babab contains palindromic substrings of length $1$, $3$ and $5$ but not of length $2$ and $4$. This means binary search will not work.

But $\phi(x)$ is almost monotonic; specifically it satisfies the following for all $x$: $\phi(x) \implies \phi(x-2)$. This condition allows us to perform two binary searches, one for even $x$ and odd $x$ (because the statement becomes monotonic when only considering $x$s with the same parity), and the largest $x$ such that $\phi(x)$ is true is simply the larger result of the two!

Thus, what we need to do now is to compute $\phi(x)$. In other words:

Given $x$, is there a palindromic path of length $x$?

Finding a palindromic path of length $x$

We're trying to implement the function $f(T,x)$, which returns true if there is a palindromic path of length $x$ in the tree $T$.

Let's consider rooting $T$ at, say, node $1$. We can consider two kinds of paths: Those that pass through the root (node $1$) and those that don't. A path that doesn't pass through the root is contained in one subtree, so we can simply call $f(T',x)$ recursively for all subtrees $T$. Thus, we now only need to consider paths that pass through the root.

A path through the root consists of two edge-disjoint paths from two nodes $a$ and $b$ to the root, where $a$ and $b$ belong to different subtrees. When is such a path palindromic of length $x$? Let $d_a$ and $d_b$ be the depths (distance from the root) of $a$ and $b$. Without loss of generality, assume $d_a \ge d_b$. The first condition would be that $d_a + d_b = x$, because the total length of the path is just $d_a + d_b$.

Now, let $c$ be the $d_b$th ancestor of $x$. The second condition would be that the path from $c$ to the root must be palindromic, because this path represents the "middle $d_a - d_b$ characters" of the path $a \rightsquigarrow b$. A final condition would be that the path from $a$ to $c$ must be the same string as the path from $b$ to the root. Once all these conditions are satisfied, the path $a \rightsquigarrow b$ is now palindromic.

We can now determine whether there is a palindromic path through the root with length $x$. For a node $a$, let $d_a$ be its depth, and $c$ be its $(x-d_a)$th ancestor. Then there is a palindromic path starting at $a$ if the following conditions are satisfied:

The path from $c$ to the root is palindromic
There is a node $b$ belonging to a different subtree such that the path from $a$ to $c$ is the same string as the path from $b$ to the root.

If any node $a$ satisfies this condition, then we have found our palindromic path of length $x$ through the root. Otherwise, we can see that there is no such path.

Now, how fast this run depends on how we implement the following operations:

Given a node ($c$), check whether the path from that node to the root is palindromic.
Given some path from a node to one of its ancestors ($a \rightsquigarrow c$), check whether there is a path from a node in a different subtree ($b$) to the root but with the same string.
Given $a$, find its $(x-d_a)$th ancestor.

It also depends on the shape of the tree. If the tree is highly degenerate, then we expect to have lots of heavy recursive calls, and the running time will be slow. Our algorithm works best with balanced trees. We'll deal with this problem later.

String processing

Two of the operations we want to implement above require some sort of way to compare strings with each other. For example, consider the first operation. Palindromic string is simply a string whose reverse is equal to itself. Thus, checking whether the path to the root is palindromic is simply checking whether that path is equal to its reverse. Clearly the second operation requires string comparison too.

Comparing strings naïvely runs in $\Theta(N)$ time, and since we need to perform the operations above for each node, the overall running is at least $\Theta(N^2)$ which is unacceptable. Clearly we need to be able to compare strings without iterating through all their characters.

One common way to check whether two strings are "equal" is to hash them with some sort of hash function and check if their hashes are equal. A hash function is simply a function that maps things from a set of objects to a smaller set of objects. Now, since hashes belong to a smaller set of objects, inevitably there will be collisions, or two distinct objects that have the same hash. Thus, this method of comparing strings is not completely correct. However, if our hashes are "random" enough in some precise way, then it might be hard to generate two things with the same hash, and we can just assume that we won't encounter collisions during our algorithm.

In our case, we can use hashing to compare two strings. One common way to hash strings to integers is via the polynomial hash: Let $c_0c_1c_2c_3\ldots c_{k-1}$ be the string. Then the polynomial hash of this string is the following: $$(c_0' + c_1'B + c_2'B^2 + \cdots + c_{k-1}'B^{k-1}) \bmod M$$ where $B$ and $M$ are fixed constants, and where each character $c$ is assigned to some distinct (nonzero) number $c'$. For this to be "random" enough, one way would be to pick $M$ to be some large prime, say $10^9+7$, and $B$ to be some number with a large (multiplicative) period modulo $M$.

The main advantage of the polynomial hash is the ease at which to update the hash when a new character is appended to the beginning or end of the string. For example, if $H$ is the hash of the original string, then appending a new character $c$ at the beginning results in the new hash $(c + H\cdot B)\bmod M$, while appending it at the end results in $(H + cB^k) \bmod M$. This turns out to be very useful for us since we're hashing many related strings in a tree.

For each node $a$, we define $H(a)$ to be the hash of the string from $a$ to the root. We also define $H_{\text{rev}}(a)$ to be the hash of the reversal of this string. Using $H(a)$ and $H_{\text{rev}}(a)$, we can now implement the first operation: The path from $a$ to the root is palindromic "if and only if" $H(a) = H_{\text{rev}}(a)$! (Note that we quote "if and only if" because we are assuming that we won't encounter collisions in our hash function.)

Calculating $H(a)$ and $H_{\text{rev}}(a)$ is easy enough because of the properties of the polynomial hash. If $a$ is the root (i.e. $a = 1$), we have $H(a) = H_{\text{rev}}(a) = 0$, because the empty string hashes to $0$. If $a$ is not the root,

Let $d$ be the depth of $a$,
Let $p$ be the parent of $a$, and
Let $c$ be the digit at the edge $(p,a)$.

Then we have $H(a) = (c' + H(p)\cdot B)\bmod M$ and $H_{\text{rev}}(a) = (H_{\text{rev}}(p)) + c'B^d)\bmod M$. Thus, with a single pass on the tree, these hashes can be computed in overall $O(N)$ time (assuming powers of $B$ have been precomputed.)

Now, let's try to implement the second operation: Given some path from a node to its ancestor ($a \rightsquigarrow c$), check whether there is a path from a node in a different subtree ($b$) to the root with the same string. We can still use the $H(a)$s for this task, but we will need additional tricks. First, hash of the string for the path $a \rightsquigarrow c$ can be obtained from $H(a)$ and $H(c)$ as the following: $$(H(a) - H(c)B^{d_a - d_c})\bmod M$$ Let $V$ be this value. Thus, the problem is now to find a node $b$ in a different subtree such that $H(b) = V$. Here's one way to do it:

Collect all $H(b)$s in a set $S$, for all nodes $b$ from other subtrees.
Check whether $V$ is in $S$.

This obviously works, but constructing $S$ takes time. $O(N)$ time in fact, which is pretty slow. But the thing is, many of these $S$'es are the same for many nodes. Specifically, the set $S$ is the same for all nodes belonging in the same subtree.

Thus, one way to optimize this operation would be to process the nodes subtree by subtree. Before processing any subtree, first construct the multiset $S$ of all $H(b)$s for all nodes. Then for a given subtree:

Remove the $H(a)$s for all nodes $a$ in the subtree from the multiset.
Perform the operation above for all nodes in this subtree.
When you're done, put the $H(a)$s back in the multiset.

Clearly, the $S$ that we use for all operations is correct. But with this scheme, every $H(a)$ is removed and added at most once, so the overall running time is $O(N\cdot k)$, where $k$ is the cost of an insertion and deletion in the multiset! (For example $k = O(\log N)$ for balanced trees.)

Getting the $(x-d)$th ancestor of every node.

Next, we also need to get a very specific ancestor of every node in order for the algorithm above to work. This is the standard level ancestor problem, and there are many common ways to do this, such as using power-of-two ancestor pointers, and things called ladders.

But our case is an offline version of the problem, (all the queries are known beforehand) and there is in fact a simpler way to do it. Simply run a DFS, but always maintain a stack of nodes toward the root. As you go up and down the tree you push and pop nodes in this stack. The nice thing about it is that we have the sequence of nodes to the root every time we encounter a new node, and if we use a self-resizing array, we can get the $(x-d_a)$th ancestor of node $a$ with a single array lookup!

Overall, this step requires a total of $O(N)$ time.

Recursion

From the above, we can now check for the existence of a length-$x$ palindromic path passing through the root. With recursion, we can check for all other paths. However, the speed of our algorithm greatly depends on the shape of the tree: the more imbalanced the tree, the slower our algorithm is! Thus, we must find a way to somehow reduce the overhead of recursion.

Thankfully, there is a simple way to do it: via centroid decomposition. The idea is to not root the tree arbitrarily, rather, we root it at a special node called the centroid. A centroid is a node such that when the tree is rooted at it, there is no heavy subtree, or a subtree with more than half the number of nodes in the original tree.

If we root the tree and all subsequent trees in all recursion levels at their centroids, then we guarantee that the depth of the recursion is at most $O(\log N)$, because you can only halve $N$ at most $1 + \log_2 N$ times. Thus, this allows us to perform the above step just a few times, which gives us a fast algorithm as long as we can find the centroid of a tree quickly.

How do you find a centroid of a tree? First, root the tree arbitrarily, and compute the sizes of all subtrees (including the tree itself). Now, remember that a tree of $N$ nodes can only have at most one heavy subtree. (Why?) So the algorithm is this: start at the root. Then if there is a heavy subtree rooted at, say $r$, then rotate the tree to the new root $r$ (and update the sizes), and repeat until you find a root with no heavy subtrees, in which case you now have a centroid. It can be shown that this algorithm always halts.

We now have the solution to the problem! The overall complexity is $O(N \log^3 N)$: $O(N \log N)$ per step, performed on $O(\log N)$ levels of recursion, and $O(\log N)$ times due to binary search.

If you use a hash set instead, the running time is expected $O(N \log^2 N)$.

Implementation notes

Here we describe a few more things worth mentioning.

On the hash function

First, the above algorithm assumes that we won't ever encounter collisions in our algorithm. However, if we choose $M = 10^9 + 7$ or some other prime with a similar magnitude, then we might run into the famous birthday paradox, which roughly describes that you don't need very many people to have a very high chance of a collision of birthdays. (In fact, only 23 people are required to have a more than 50% chance of two people having the same birthday!) In our case, this means that there is quite a high chance that the will be collisions in our hash function, purely because of the number of strings we are considering. (${10^5 \choose 2} \approx 5\times 10^9$ in the worst case.)

One way to reduce the chance of collision would be to use a much higher modulus, say an 18-digit prime. But in languages without big integer arithmetic, this can be quite hard to implement.

Another way would be to use two relatively prime moduli, $M_1$ and $M_2$, each around 1 billion, and perform two hashes, one for each modulus. This enlarges our hash space similarly but without requiring big integer arithmetic! (This is equivalent to using the modulus $M_1\cdot M_2$, but implicitly expressing the hash modulo $M_1$ and $M_2$ separately via the Chinese remainder theorem.) Thus, instead of a single integer, the hash of our string is now a pair of 32-bit integers. (In fact, you can express it as a single 64-bit integer by storing the two numbers in the 32 higher- and lower-order bits.)

Optimizations

Being a hard problem, the time limit for this problem was intentionally set a bit tighter than usual. Thus, one must really ensure to optimize their implementation of the algorithm. Here are a few details:

Instead of binary searching the whole answer and then performing the algorithm, you can instead push the binary search part inwards. This is because many parts of the algorithm such as rooting the tree at the centroid and computing the hashes take a lot of time. By pushing the binary search inwards, these things will be done less frequently (specifically, by a $\log N$ factor).
When starting a new binary search, use the best result so far as your lower bound. This can shave off a few steps in later binary searches!
Even though hash sets seem theoretically better than tree sets in some sense, it isn't necessarily so in practice, because it depends on the library implementation. For example, in C++, the tree set (std::set) seems to be 10 times faster than the unordered set (std::unordered_set), which is quite surprising (or maybe I'm just missing something in my implementation, that's why it's slow.)
Sometimes, using less memory results in faster running time. This is because using less memory means more things can be stored in the cache, and this helps increase cache hits. So if you have a Node/Tree class, you could try to use the smallest possible data type for each variable in the class to reduce memory usage.
In general, try to be more friendly to your cache. For example, don't use parallel arrays when implementing your tree, especially if you need multiple fields at once many times.

Time Complexity:

$O(N \log^3 N)$ or $O(N \log^2 N)$

AUTHOR'S AND TESTER'S SOLUTIONS:

Setter
Tester
Editorialist

↧

CHCINEMA - Editorial

December 12, 2015, 2:51 am

≫ Next: CHEFGIRL - Editorial

≪ Previous: WAYPA - Editorial

PROBLEM LINK:

Contest
Practice

Author:Andrii Omelianenko
Tester:Sergey Kulik
Editorialist:Kevin Atienza

PREREQUISITES:

Binary search, greedy algorithms

PROBLEM:

In the cinema hall there are $N$ rows with $M$ seats each. There are armrests at each side of every seat, but there is only one armrest between two adjacent seats. $L$ people only need the left armrest, $R$ of them need just the right one, $Z$ need none and $B$ need both. What is the maximum number of people that can attend the show?

QUICK EXPLANATION:

If $L + R + Z \ge MN$, then the answer is $MN$. Otherwise, define $f(b)$ to be true if all $L+R$ people who need one armrest plus $b$ people who need both armrests can all attend the show together, otherwise it's false. $f(b)$ is true iff the following two conditions are true:

$L + R + b + (b - N) \le MN$
$b \le N \left\lceil \frac{M}{2} \right\rceil$

If $B_{\text{max}}$ is the largest $B$ such that $f(B)$ is true, then the answer is simply $\min(L+R+Z+B_{\text{max}}, MN)$. $B_{\text{max}}$ can be computed with binary search.

Note that an $O(1)$ solution is possible.

EXPLANATION:

To ease the discussion, we'll name the four kinds of people:

A l-person needs just the left armrest.
A r-person needs just the right armrest.
A b-person needs both armrests.
A z-person needs no armrests.

Solution

Clearly trying all configurations of people is too slow (except for small subtasks), because there are so many configurations! We must find a way to prune the search. One way would be to discover some properties that the optimal solution must satisfy, to reduce the number of configurations.

Immediately we can observe a few things. First, notice that only the b-people actually cause any trouble. This is because:

Observation 1: $l$ l-people, $r$ r-people and $z$ z-people can always fill a row of length $l+r+z$.

The only restriction is that a r-person can't be directly to the left of a l-person, One way for them to seat would be to place the l-people, then the r-people, then the z-people, in that order. There are many other ways!

As an immediate consequence, in the original problem:

Observation 2: If $L + R + Z \ge MN$, then one can fill the whole theater seat completely.

Thus, when $L + R + Z \ge MN$, the solution is simply $MN$. and so in the following discussions we can assume that $L + R + Z < MN$.

Now, in this case, we are forced to place b-people, because there aren't simply enough l-, r- and z- people to fill all seats. As above, there are still too many configurations to try, but we can reduce them significantly with a new observation: Since the b-people are the root of all our troubles, we'll want to minimize the number of b-people in the grid. In other words, we want to seat the l-, r- or z-people first. But is this always possible? The following says it is:

Observation 3: If $L + R + Z < MN$, then there's always an optimal seating arrangement such that all l-, r- and z-people are seated.

The idea is to start with an optimal arrangement. Suppose that in this arrangement not all l-, r- and z-people are seated. Then we can always replace any b-person with a l-, r- or z-person and we will still be left with a valid seating arrangement, and we can repeat until all l-, r- and z-people are placed. Replacing a b-person with anyone else is okay because a b-person has all the requirements of all other kinds of people.

Thus, we can restrict ourselves to finding optimal arrangements having a seat for all l-, r- and z-people. This also implies that the answer is $L+R+Z+B_{\text{max}}$, where $B_{\text{max}}$ is the largest number of b-people that can be placed in the theater along all other $L+Z+R$ people. Our goal now is to find $B_{\text{max}}$.

Let's define the function $f(b)$. $f(b)$ is true if you can place all l-, r- and z-people with $b$ b-people in the theater, otherwise it's false. Our first observation is the following:

Observation 4: If $b_1 > b_2$, then $f(b_1) \implies f(b_2)$.

This simply means that if you can place $b_1$ b-people, then you can place $b_2$. This is true: simply remove one $b_1 - b_2$ b-people, and the whole configuration is still valid!

Observation 4 is very useful, because it allows us to compute $B_{\text{max}}$ with binary search:

Let $b_l = 0$ and $b_r = B + 1$. Clearly $f(b_l)$ is true and $f(b_r)$ is false.
While $b_r - b_l > 1$, do the following. Let $b_m = \frac{b_l + b_r}{2}$. If $f(b_m)$ is true, set $b_l := b_m$, otherwise set $b_r := b_m$.
$B_{\text{max}}$ is now $b_l$.

The reason the second step works is that if we find a number $b_m$ such that $f(b_m)$ is false, then we can essentially ignore all higher $b$s since they are all false by Observation 4. But if $f(b_m)$ is true, then we can ignore all lower $b$s since we're searching for the maximum $b$ such that $f(b)$ is true.

What remains is to compute $f(b)$ itself. One obvious requirement would be $$L + R + Z + b \le MN,$$ Otherwise there aren't enough seats to place all people.

Next, let's only consider the b-people for now. Can we place $b$ b-people by themselves in the theater? What is the maximum number of b-people that can be placed in the theater? Clearly, no two b-people can be seated together, so around half of the seats will be empty. In fact, in a row of $M$ seats, one can only place up to $\left\lceil \frac{M}{2} \right\rceil$ b-people (where $\left\lceil x \right\rceil$ is the ceiling function). Thus, the maximum number of b-people we can place in the theater with $N$ rows is simply $N \left\lceil \frac{M}{2} \right\rceil$, and the second requirement for $f(b)$ to be true is that $$b \le N \left\lceil \frac{M}{2} \right\rceil.$$

Next, can we place the other people along with the $b$ b-people? The z-people can be placed anywhere, so we first focus on the l- and r- people. Is it possible to place the l- and r-people with $b$ b-people in the theater? Not necessarily:

Observation 5: In a row containing only l-, r- and b- people, one of the seats between any two b-people must be empty.

To see this, consider two b-people with no other b-people in between. Clearly, there must be at least one seat between them, because b-people can't be seated together. Let $c$ be the number of seats in between. Now, suppose all these seats are filled with l- and r-people. Clearly there are $c-1$ armrests between the b-people, not including the ones they use. But there are $c$ l- or r- people using at least $c$ armrests, which is more than available. Thus, at least one must be empty.

A natural consequence would be that if there are $b$ b-people in a row (containing no z-people), then there must be at least $b-1$ empty seats. Thus, $L+R+2b-1\le M$ But can we always make sure that there are exactly $b-1$ empty seats? Yes!

Observation 6: One can place $L$ l-people, $R$ r-people and $b>0$ b-people in a row of $M$ seats in such a way that there are exactly $b-1$ empty seats. In other words, a seating arrangement exists if and only if $L+R+2b-1\le M$.

The idea is simple. Use the first $L$ seats to seat the l-people, then the next $2b-1$ seats to seat the b-people (leaving spaces in between), and finally the next $R$ seats for the r-people.

Now, what if there are $N$ rows? Well, clearly the number of empty, "wasted" seats depends on the number of gaps between b-people among all rows, and our goal now is to minimize this number. Since there are $\max(0,b-1)$ gaps in a row with $b$ b-people, one way is to distribute the $b$'s as evenly as possible. This way, we can calculate the minimum number of gaps given $b$ b-people:

If $b \le N$, then the number of gaps is $0$, because we can place all b-people in distinct rows.
If $b > N$, then the number of gaps is $b - N$, because after placing $N$ b-people in distinct rows, every b-person subsequently seated increases the number of gaps by one.

Thus, a third requirement for $f(b)$ to be true would be:
$$L + R + b + (b - N) \le MN$$ The $(b - N)$ term is the number of gaps that are generated by the b-people overall. Notice that if $b \le N$, this statement is easily seen to be true.

Another way of interpreting $L + R + b + (b - N) \le MN$ would be the following: In a row of $M$ seats, there are $M+1$ armrests, so there are $(M+1)N$ armrests in total. However, each l- and r- person uses one armrest, and each b-person uses two, so overall $L + R + 2b$ armrests are used. Thus, it must be the case that $L + R + 2b \le (M+1)N$, which is equivalent to $L + R + b + (b - N) \le MN$.

Interestingly, we now have all the sufficient requirements for $f(b)$ to be true!

Observation 7: $f(b)$ is true if and only if all these three conditions are satisfied:

$L + R + Z + b \le MN$
$b \le N \left\lceil \frac{M}{2} \right\rceil$
$L + R + b + (b - N) \le MN$

We've just shown above that these are all necessary conditions. To see that they are sufficient, suppose these are all satisfied. Then:

First, distribute the $b$ b-people as evenly as possible among the rows.
Next, distribute the l-people and r-people among the rows in any way (as long as the conditions in Observation 6 is satisfied).
Use the layout described in the proof of Observation 6 to seat the people in each row.
Finally, fill in all remaining seats with the z-people.

Note that the three conditions guarantee that all these steps are possible! Since $f(b)$ can now be calculated in $O(1)$ time, we can now calculate the answer in $O(\log B)$-time with binary search!

A constant-time solution

Even though the binary search solution is fast enough to pass all subtasks, it's actually possible to compute the answer in $O(1)$ time. The key is to manipulate the conditions for $f(b)$ to be true. After a simple rearrangement of terms, they are seen to be equivalent to the following:

$b \le MN - L - R - Z$
$b \le N \left\lceil \frac{M}{2} \right\rceil$
$b \le \left\lfloor \frac{(M+1)N - L - R}{2} \right\rfloor$

These three statements can be combined into a single statement: $$b \le \min\left(MN - L - R - Z, N \left\lceil \frac{M}{2} \right\rceil, \left\lfloor \frac{(M+1)N - L - R}{2} \right\rfloor\right)$$ But this means that the maximum $b$ satisfying this inequality is simply the value at the right-hand side! Thus, we have no more need for binary search and we can simply compute $B_{\text{max}}$ to be $\min(B, \text{[right hand side of the above]})$. This gives us the following one liner in Python (not including input code):

print min(N * M, Z + L + R + min(B, N * (M + 1) - L - R >> 1, N * (M + 1 >> 1)))

Time Complexity:

$O(\log B)$ or $O(1)$

AUTHOR'S AND TESTER'S SOLUTIONS:

Setter
Tester
Editorialist

↧

CHEFGIRL - Editorial

December 13, 2015, 2:00 pm

≫ Next: WA in LEBOMBS

≪ Previous: CHCINEMA - Editorial

PROBLEM LINK:

Contest
Practice

Author:Dmytro Berezin
Tester:Sergey Kulik
Editorialist:Kevin Atienza

PREREQUISITES:

Dynamic programming, directed graphs

PROBLEM:

There is a directed acyclic graph representing people with the following properties:

All nodes have at most one incoming edge.
All nodes have at most one outgoing edge, except node $1$.

Person $1$ (node $1$) holds $32$ secrets, $[1,32]$. If there is an edge from $a$ to $b$, then $a$ tells secrets to $b$. But each edge has a range assigned to it, specifying the range of secrets that can be told "through" this edge.

If a node doesn't have an outgoing edge, she tells all the secrets she knows to Chef.

The following operation can be performed: take some edge, and extend its range by one (to the left or the right). What is the minimum number of operations for Chef to know all $32$ secrets?

QUICK EXPLANATION:

The graph consists of paths all starting from node $1$, and are otherwise disjoint aside from sharing node $1$.

For each such path, and each range $[i,j]$, compute the minimum number of operations to make sure all secrets $[i,j]$ can be told through that path.

For all ranges $[i,j]$, compute $\text{cost}(i,j)$, defined as the minimum number of operations to make sure the secrets $[i,j]$ reach Chef through a single path, among all paths.

Let $\text{best}(k)$ be the minimum $\sum_{[i,j] \in P} \text{cost}(i,j)$ among all partitions $P$ of $[1,k]$ into subranges. (For example $P = \{[1,4],[5,9],[10,15]\}$ partitions $[1,15]$.) The answer is $\text{best}(32)$, and the $\text{best}(k)$s can be computed with dynamic programming, using the following recurrence: $$\text{best}(k) = \min_{1 \le j \le k} \left[\text{best}(j-1) + \text{cost}(j,k) \right]$$

EXPLANATION:

The property that all nodes must have at most incoming edge and at most one outgoing edge (except node $1$) limits the possible shapes the input graph can be in. Essentially, there can't be any sort of branching into or out of any node except node $1$. This means that our graph is really just a bunch of disjoint paths all starting from node $1$. (The problem statement doesn't in fact exclude the possibility of there being paths that don't start at node $1$, but it seems these paths don't appear in the input.) This makes things much easier for us, and also suggests looking at each path individually.

Consider a path from node $1$ to some node with no outgoing edge. Thus, when a secret reaches the end of this path, it also reaches Chef. In fact, the secrets that can pass through this path are those that belong in the intersection of the ranges assigned to all the edges in the path. But the intersection of two ranges is another range! This means that the set of secrets that can be relayed in a path is always some range (possibly empty). In other words, if secrets $a$ and $b$ can be relayed through a path, then all secrets between $a$ and $b$ can also be.

This tells us that the way in which the secrets $[1,32]$ are relayed to Chef is by relaying subranges of $[1,32]$ through different paths until all secrets have reached Chef. For example, the subrange $[11,24]$ might be relayed through one path, the subrange $[28,29]$ through another path (or maybe the same path), and $[23,26]$ in yet another, etc. But for all secrets to be sent, the subranges must cover the subrange $[1,32]$.

Thus, it makes sense for us to compute the minimum number of operations to send the secrets $[i,j]$ through some path, say $p$. Let's denote this by "$\text{cost}(p,i,j)$". For $[i,j]$ to be able to pass through this path, we only need to make sure that $[i,j]$ is contained in the ranges assigned to each edge of the path. It's easy to see that this is necessary and sufficient. So for every range $[a,b]$, we want to extend it so that it contains $[i,j]$, which is equivalent to saying that $a \le i \le j \le b$. It's easy to see that the minimum number of steps we need to accomplish this is $\max(0,a-i) + \max(0,j-b)$:

$\max(0,a-i)$ steps (extending $[a,b]$ to the left) are needed to ensure that $a \le i$.
$\max(0,j-b)$ steps (extending $[a,b]$ to the right) are needed to ensure that $j \le b$.

Therefore, for a single path $p$ and range $[i,j]$ we can easily compute $\text{cost}(p,i,j)$ by summing this value for all edges in $p$.

Now, we want to relay the secrets $[1,32]$, and we have multiple paths to use. As said above, we know that we will send subranges through different paths, but things are a bit trickier because we don't know yet where to send each subrange through so that the total cost is minimized. Thankfully, there are a couple of observations that will make our search easier:

We only need to send every secret at most once, so we can in fact assume that the set of subranges that cover $[1,32]$ are disjoint. (Note that this doesn't mean that there is only one way for a particular secret to reach Chef.) For example, suppose we send the subrange $[8,15]$ through one path and $[13,20]$ through another. These two subranges overlap, so we can replace the second with the smaller subrange $[16,20]$. Clearly, $\text{cost}(p,16,20)$ cannot be greater than $\text{cost}(p,13,20)$. (Why?)
It doesn't make sense to send two disjoint subranges in the same path. Why? Consider for example sending $[3,6]$ and $[10,15]$ through some path $p$ (assuming $[7,9]$ are sent through other paths). However, remember from above that if two secrets can be relayed through a path, then all secrets in between them can also be, too. Thus, secrets in the range $[7,9]$ can also be relayed here too (without incurring any additional cost)! This means we can assume instead that we are relaying $[3,15]$ through $p$.
Finally, it doesn't matter which path we send each subrange $[i,j]$ through. We only want the path that minimizes the cost. For example, if $\text{cost}(p_1,12,25) < \text{cost}(p_2,12,25)$, then we can essentially ignore path $p_2$ when sending $[12,25]$ because there's a way to send it with a smaller cost. (namely, through $p_1$) Thus, we can define the function $\text{cost}(i,j)$ to be the minimum $\text{cost}(p,i,j)$ among all paths $p$.

With these observations, the problem is now reduced to this:

What is the minimum $\sum_{[i,j] \in P} \text{cost}(i,j)$ among all partitions $P$ of $[1,32]$ into subranges?

But this is easily solved with dynamic programming: Let $\text{best}(k)$ be the minimum $\sum_{[i,j] \in P} \text{cost}(i,j)$ among all partitions $P$ of $[1,k]$ into subranges. (For example $\{[1,4],[5,9],[10,15]\}$ partitions $[1,15]$.) Then the answer is $\text{best}(32)$, and $\text{best}(k)$ has the following recurrence: $$\text{best}(k) = \min_{1 \le j \le k} \left[\text{best}(j-1) + \text{cost}(j,k) \right]$$ The $j$ in this formula represents the leftmost endpoint of the rightmost range, $[j,k]$. The minimum total cost for the remaining subranges is $\text{best}(j-1)$.

The base case is $\text{best}(0) = 0$. Thus, after computing the $\text{cost}(i,j)$s for all ranges $[i,j]$, $1 \le i \le j \le 32$, the answer can now be computed easily using this DP solution!

Time Complexity:

$O(NS^3)$ or $O(NS^2)$ where $S$ is the number of secrets. ($S = 32$)

AUTHOR'S AND TESTER'S SOLUTIONS:

Setter
Tester
Editorialist

↧

WA in LEBOMBS

April 6, 2015, 6:40 am

≫ Next: LONG CHALLENGE RATINGS still wrong

≪ Previous: CHEFGIRL - Editorial

My program works well for almost every input but still it shows me "WA" can anyone help me? The link to my code ishttp://www.codechef.com/viewsolution/6683373

↧

LONG CHALLENGE RATINGS still wrong

January 8, 2016, 12:58 am

≫ Next: RECIPE - Editorial

≪ Previous: WA in LEBOMBS

The long challenge ratings are still in need of an update after DEC15.

It still reflects the wrong results where the challenge problem TANKS wasn't taken into account.To convince yourself that this is the case, have a look at the gain/loss col at first page of the ratings. Everybody who had 900 points besides the challenge problems has a gain of approximately 800 points. When taking TANKS into account this should look differently.

↧

RECIPE - Editorial

November 27, 2012, 3:06 am

≫ Next: Awesome resource for DS and Algorithms

≪ Previous: LONG CHALLENGE RATINGS still wrong

PROBLEM LINKS

Practice
Contest

DIFFICULTY

EASY

EXPLANATION

The problem here is divide all numbers by some constant so that the divisions have no remainder. We produce the smallest result by dividing by a number that is as large as possible, that is the greatest common divisor. The greatest common divisor can be computed efficiently by Euclid's algorithm, but in this case it was fast enough to simply check all numbers from 1 to 1000 for divisibility.

SETTER'S SOLUTION

Can be found here.

TESTER'S SOLUTION

Can be found here.

↧

Awesome resource for DS and Algorithms

February 13, 2015, 6:57 am

≫ Next: BUGATACK - Editorial

≪ Previous: RECIPE - Editorial

Hi everyone,

i just want to share this link which consists of all the links and resources on different topics of Competitive ProgrammingData Structures and Algorithms

http://vicky002.github.io/AlgoWiki/

Google Chrome Extension for Ongoing and Upcoming Programming challenges which is very nice and also useful

and one more great and useful thing in this Algowiki is that there is a toolkit called Spoj-Toolkit which is a tool for SPOJ users to match the different outputs of their SPOJ problem solution with the correct output provided by the Toolkit.

Also book named Competitive Programming by Steven-Halim is completely about Competitive Programming U can find it's e-book in the below link.

https://www.dropbox.com/s/1u9hv8pvmx1lc6a/Competitve%20Programming.pdf?dl=0

Also, below is one more link from codechef forum which has almost all topics for Ds and algorithms.

http://discuss.codechef.com/questions/48877/data-structures-and-algorithms

Graph Algorithms in Competitive Programming

Learn Competitive Programming

Must Known algorithms for online programming contests

How to improve and how to train: Kuruma's personal experience and humble request

List of all algorithms needed for ACM-ICPC

Use of STL libraries in C++

DYNAMIC PROGRAMMING USEFUL LINKS:

Good resources or tutorials for dynamic programming besides the TopCoder tutorial

Mastering DP

Dynamic Programming

Examples of basic DP problems

All the best!!! :)

↧

BUGATACK - Editorial

June 21, 2015, 9:43 am

≫ Next: DMCS - Editorial

≪ Previous: Awesome resource for DS and Algorithms

PROBLEM LINK:

Contest
Practice

Author:Utkarsh Lath
Tester:Kevin Atienza
Editorialist:Kevin Atienza

PREREQUISITES:

Dynamic programming, combinatorics, graphs, Floyd-Warshall algorithm

PROBLEM:

The Floyd-Warshall algorithm can compute the transitive closure of a graph using $N$ passes of a certain augmentation method. Suppose we only run the first $N-K$ such passes. What is the number of simple, undirected, unweighted graphs where the transitive closure is still computed correctly?

QUICK EXPLANATION:

After the $k$th pass of Floyd-Warshall, $A[i][j]$ contains $1$ if $i$ and $j$ are connected via a path whose intermediate nodes have indices $\le k$. Therefore, the question is simply: how many simple, undirected, unweighted graphs are there such that connectivity can be determined using only the first $N-K$ vertices as intermediate nodes?

Let's call a node whose index is $\le N-K$ a good node, and $> N-K$ a bad node. We can show that the following are necessary for the transitive closure to be computed using the first $N-K$ passes:

Consider a bad node $x$ that is not connected to any good node. Then any other bad node adjacent to $x$ is also not adjacent to a good node. Furthermore, for any two bad nodes $y$ and $z$ adjacent to $x$, $y$ and $z$ are also adjacent to each other. In other words, all nodes connected to $x$ form a complete graph of bad nodes.
Consider a bad node $x$ that is connected to some good node. Then for any two good nodes $r$ and $s$ connected to $x$, there is a path from $r$ to $s$ only passing through good nodes. In other words, all good nodes connected to $x$ form a connected graph of good nodes.
Consider a bad node $x$ that is connected to some good node, and let $S$ be the set of all good nodes connected to $x$ (by the above, the nodes in $S$ form a connected graph). If $y$ is another bad node that is connected to $x$, then $y$ must be connected to some node in $S$.

Amazingly, these three conditions are sufficient! Therefore, we can count the graphs satisfying the above two properties to come up with the following counting formula:

$$F(N,K) = \sum_{b=1}^K {K-1 \choose b-1} F(N-b,K-b) + \sum_{b=1}^K \sum_{g=1}^{N-K} {N-K \choose g} {K-1 \choose b-1} F(N-g-b,K-b) C_g (2^g-1)^b 2^{b(b-1)/2}$$ with base case $$F(N,0) = 2^{N(N-1)/2}$$

The meanings of certain things in this formula are:

$F(N,K)$ is the number of solutions.
$C_g$ is the number of connected graphs of size $g$.

EXPLANATION:

For a given graph, the Floyd-Warshall algorithm computes the shortest path between any pair of nodes in $O(N^3)$ time, where $N$ is the number of nodes. This can be modified to produce the transitive closure of the graph, and in fact this is the "Floyd-Warshall algorithm" presented in the problem. The transitive closure of a graph is the graph with the same set of nodes for which there is an edge from $i$ to $j$ if and only if there is a path from $i$ to $j$ in the original graph. In terms of undirected graph, this is true if and only if $i$ and $j$ are connected.

The question is: assuming we run only the first $N-K$ passes of the algorithm, how many simple undirected graphs of $N$ labelled nodes are there such that the modified algorithm still returns the transitive closure? In order to answer the question, we must understand how exactly this algorithm works. The following is the pseudocode of the algorithm:

for (int p = 1; p <= N; p++) {
    for (int i = 1; i <= N; i++) {
        for (int j = 1; j <= N; j++) {
            conn[i][j] = conn[i][j] || (i != j && conn[i][p] && conn[p][j]);
        }
    }
}

The algorithm works in $N$ passes. After the $p$th pass, conn[i][j] is true if and only if $i \not= j$ and there is a path from $i$ to $j$ whose intermediate nodes are in $\{1, 2, \ldots, p\}$ (before the first pass, conn[i][j] is true if and only if there is an edge between $i$ and $j$). This means that after $N$ passes, conn[i][j] will be true if and only if $i \not= j$ and there is a path from $i$ to $j$. It should be intuitive how the inner two loops compute conn[i][j] properly.

Now, what happens if we only run the first $N-K$ passes? This means that we are only allowing the nodes $\{1, 2, \ldots, N-K\}$ to be intermediate nodes. Let's call these the good nodes, and the nodes $\{N-K+1, \ldots, N\}$ the bad nodes. If we want the modified algorithm to still correctly compute the transitive closure, we need to ensure that *there is no pair of nodes $(i,j)$ such that all paths from $i$ to $j$ pass through some bad node. This means that, among other things, we must ensure that the following holds in our graph:

Consider a bad node $x$ that is not connected to any good node. Then any other bad node adjacent to $x$ is also not adjacent to a good node. Furthermore, for any two bad nodes $y$ and $z$ adjacent to $x$, $y$ and $z$ are also adjacent to each other (otherwise the algorithm doesn't give the correct result for the pair $(y,z)$). In other words, all nodes connected to $x$ form a complete graph of bad nodes.
Consider a bad node $x$ that is connected to some good node. Then for any two good nodes $r$ and $s$ connected to $x$, there is a path from $r$ to $s$ only passing through good nodes (otherwise the algorithm doesn't give the correct result for the pair $(r,s)$). In other words, all good nodes connected to $x$ form a connected graph of good nodes.
Consider a bad node $x$ that is connected to some good node, and let $S$ be the set of all good nodes connected to $x$ (by the above, the nodes in $S$ form a connected graph). If $y$ is another bad node that is connected to $x$, then $y$ must be connected to some node in $S$ (otherwise, the algorithm doesn't give the correct result for the pair $(y,s)$ for any node $s$ in $S$).

Amazingly, the things we mentioned above exhaust all possibilities for pairs in which the modified algorithm returns the wrong answer! In other words, the answer is the number of simple undirected graphs satisfying all the above properties. The general picture is described in the following:

The green and red nodes are the good and bad nodes, respectively. The bad nodes are grouped into two types: those connected to some good node and those not connected to any good node. Notice that in the first type, the bad nodes form a bunch of complete graphs, while in the second type, they are attached to a single set of connected good nodes.

This can be apparent after a while of thinking. We can also prove it formally:

Proof:

Let's call a graph friendly if the modified "Floyd-Warshall algorithm" (the one run only for the first $N-K$ passes) correctly returns the transitive closure.

It is apparent above that the above properties are necessary for a graph to be friendly, so we only have to prove that they are also sufficient. To do this, we have to show that any pair of nodes $(a,b)$ for which the modified algorithm returns an incorrect answer violates one of the properties above.

Let's consider first the case that both $a$ and $b$ are good nodes. In this case, all paths from $a$ to $b$ must pass through some bad node, which violates the second property above. If $a$ is good and $b$ is bad (or vice versa), then all paths from $a$ to $b$ pass through some other bad node. This means that $b$ is not adjacent to any good node connected to $a$, violating the third property above.

The only remaining case is when both $a$ and $b$ are bad nodes. There are two cases: whether there is a good node connected to $a$ or not. If there is any good node connected to $a$, then in order for the modified algorithm to be wrong, any path from $a$ to $b$ must pass through some other bad node. But this means that the good nodes connected to $a$ do not form a connected graph, violating the second property above. If there aren't any good nodes connected to $a$, then the path from $a$ to $b$ must consist entirely of bad nodes, but if the first property above must hold, then you can conclude that there must be an edge from $a$ to $b$, which is a contradiction. Therefore, the first property must be violated.

This is great, since we know what the graphs we are counting look like! The properties above make it simple enough to count the graphs: Let $F(N,K)$ be the solution for a given $(N,K)$ pair. Then we have:

Consider the smallest-indexed bad node, say $x$. Suppose there are $g$ good nodes and $b$ bad nodes that are connected to $x$ ($0 \le g \le N-K$, $1 \le b \le K$). Note that there are ${N-K \choose g}{K-1 \choose b-1}$ ways to select all of these nodes, and there are (recursively) $F(N-g-b,K-b)$ ways to select the edges for the remaining nodes.

If $g = 0$, then there are no good nodes, and by the first property, the $b$ bad nodes must form a complete graph. There is only one way to build that complete graph on $b$ nodes.

If $g > 0$, then the last two properties apply. First, we must ensure that the $g$ good nodes form a connected graph by themselves. Let $C_g$ be this number, i.e. the number of simple undirected connected graphs on $g$ labelled nodes. Next, we must ensure that each of the $b$ bad nodes is connected to at least one of the $g$ good nodes. There are $2^g-1$ ways to select a nonempty subset of the good nodes, so there are $(2^g-1)^b$ choices for all bad nodes. Finally, we can freely add edges between the bad nodes, and there are $2^{b(b-1)/2}$ ways to choose that. Therefore, we have the following recurrence for $F(N,K)$:

$$F(N,K) = \sum_{b=1}^K {K-1 \choose b-1} F(N-b,K-b) + \sum_{b=1}^K \sum_{g=1}^{N-K} {N-K \choose g} {K-1 \choose b-1} F(N-g-b,K-b) C_g (2^g-1)^b 2^{b(b-1)/2}$$

For the base case $K = 0$ (i.e. there are no bad nodes), all graphs are counted, therefore: $$F(N,0) = 2^{N(N-1)/2}$$

Precomputing all the $F(n,k)$'s for $k \le n \le N$ that fit within the bounds of the problem takes $O(N^4)$ time!

Finally, we need to discuss how to compute $C_g$, the number of connected graphs with $g$ nodes. First, note that there are $2^{g(g-1)/2}$ graphs on $g$ nodes. Let's consider the connected component that node $g$ belongs to. Assume that this connected component contains $k$ nodes ($1 \le k \le g$). How many such graphs are there? There are ${g-1 \choose k-1}$ ways to choose these nodes. There are also $C_k$ ways to choose the edges among the $k$ nodes so that they're connected, and there are $2^{(g-k)(g-k-1)/2}$ ways to select the edges among the remaining nodes. Therefore, we have the following equality: $$2^{g(g-1)/2} = \sum_{k=1}^g {g-1 \choose k-1} C_k 2^{(g-k)(g-k-1)/2}$$

By a simple rearrangement, we get the following recurrence for $C_g$: $$C_g = 2^{g(g-1)/2} - \sum_{k=1}^{g-1} {g-1 \choose k-1} C_k 2^{(g-k)(g-k-1)/2}$$

Thus all the $C_g$'s for $g \le N$ can be computed in $O(N^2)$ time.

Time Complexity:

$O(N^4)$ preprocessing

AUTHOR'S AND TESTER'S SOLUTIONS:

To be uploaded soon.

↧

DMCS - Editorial

January 10, 2016, 8:27 pm

≫ Next: COOLING - Editorial

≪ Previous: BUGATACK - Editorial

PROBLEM LINK:

Practice
Contest

Author:Ke Bi
Tester:Antoniuk Vasyl and Misha Chorniy
Editorialist:Pushkar Mishra

DIFFICULTY:

Hard

PREREQUISITES:

linear recurrence, matrix exponentiation

PROBLEM:

Given a chocolate of size 2 * 2 * 2 * 2 * n. Find out number of ways of breaking it into 8 n small pieces, each of size 1 * 1 * 1 * 1 * 2. Print the answer modulo 10^9 + 7.

EXPLANATION:

Think about a smaller problem
Let us first solve version of this problem in lower dimension. Let us fix chocolate size = 2 * n and piece size = 1 * 2. Let dp[i][mask] denote the number of ways of breaking the chocolate up to coordinate i (in 2nd dimension) and where mask denote the configuration of last row. Here as a single row will contain 2 elements, so mask will have total (1 << 2) possible values. We can obtain a recurrence for dp by trying to fill the mask in all possible ways and making transitions to next state.

Now, we can note that dp[i][mask] is dependent on dp[i - 1][mask']. We can represent this relation by a constant matrix of size 2^2 * 2^2. So, we can find dp[n][mask = (last row is full)] by matrix exponentiation.

Extending to higher dimenstion
Now in case of higher dimension, the mask will be 2^(16) as a row will have dimesion 2 * 2 * 2 * 2. Also, we can obtain the transitions in the similar way. Now, the size of matrix will be 2^16 * 2^16 which is too huge to apply the matrix exponentiation algorithm.

We can make a few observations to reduce the number of states. We note that some of the configurations (masks) of last row are equivalent or some of them are useless (can never lead to a solution).

Some of them are same w.r.t reflection and rotation.
If there are a odd number of 1s in the state, the dp value of it must be 0.
We can color it like chess board.If the number of black cells with 1s is not equal the number of white cell with 1, the dp value of it must be 0.

After these modifications, number of states will reduce dramatically, (around 561, see author's solution for details of implementation of this). Now the time complexity will be around 561^3 log(10^9) per test case, which will lead to TLE. We need to optimize this.

Linear recurrence
We can notice that matrix is a constant matrix (independent of input n). So, we can write a linear recurrence for this in the following way f[n] = coef[1] f[n - 1] + coef[2] f[n-2] + ... + coef[561] f[n - 561]. where each coef[i] is a constant.

Finding simplest recurrence relation
Now, if we could find f[n] dependent on last few terms rather than the current (i.e. 561). So, let us say that we have a sequence of 561 numbers f[1], f[2], f[3], ..., f[561]. So, for a given sequence, we need to know the smallest n for which a linear recurrence relation could be obtained.

A method by gaussian elimination
Let us say that we have a sequence 1, 2, 3, 11, 18, 53, 105

Now, we want to check whether this sequence is a 2-recursive formula (i.e. f[n] depending on previous 2 values f[n - 1], f[n - 2]).

We take a look at the Determinant 1 2 3 2 3 11 3 11 18

Det[{{1, 2, 3}, {2, 3, 11}, {3, 11, 18}}]
The value of the Determinant is not zero, which means it does not have a 2-recursive formula.

Now, let us check whether this is a 3-recursive formula.

Det[{{1, 2, 3, 11}, {2, 3, 11, 18}, {3, 11, 18, 53}, {11, 18, 53, 105}}]

The Det is 0.

Therefore, this has a 3-recursive formula.

So, in general for checking whether a formula is k recursive, we need to make a determinant of following kind

f[1] f[2] .. f[k]
f[2] ... f[k + 1]
f[3] ... f[k + 2]
.
.
f[k] ... f[k + k - 1]

So, we will need to 2 * k values of function f.

For, our problem we get value that formula is 71-recursive formula. So, we obtain these coefficients using gaussian elimination method used for calculating the previous determinant.

Now, each test case can be solved in $O(71^3 log N)$ per test case.

We can calculate value of f[1], ..., f[71 * 2] using the slow method discovered earlier and use it to calculate the desired coefficients.

Another solution using Cayley-Hamilton theorem
Given a recurrence relation of type f[n] = coef[1] f[n - 1] + coef[2] f[n-2] + ... + coef[k] f[n - k], we can find f(n) in $O(k^2 log n)$.

You can see this link for details.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

↧

COOLING - Editorial

November 27, 2012, 5:15 am

≫ Next: KINGSHIP - Editorial

≪ Previous: DMCS - Editorial

PROBLEM LINKS

Practice
Contest

DIFFICULTY

EASY

EXPLANATION

The following greedy algorithm always finds the optimum answer:

Choose the lightest pie not yet on a rack
If none of the remaining racks can hold this pie, stop
Of all the racks that can hold this pie, place the pie on the rack with the smallest weight limit
If there are no more pies, stop. Otherwise, go back to step 1

TESTER'S SOLUTION

Can be found here.

↧

KINGSHIP - Editorial

February 16, 2014, 10:44 am

≫ Next: SPALNUM - Editorial

≪ Previous: COOLING - Editorial

PROBLEM LINK:

Practice
Contest

Author:Shiplu Hawlader
Tester:Tasnim Imran Sunny
Editorialist:Lalit Kundu

DIFFICULTY:

SIMPLE-EASY

PREREQUISITES:

Connected Graph
Tree

PROBLEM:

Connect all the N(<=10^5) cities to make a connected graph in minimum cost. Cost of adding each edge is product of population of the two cities between whom edge is being added. All populations are given in the input.

QUICK EXPLANATION:

Graph formed finally should be a tree. We connect all the other nodes to the node with smallest population which results in minimum cost.

EXPLANATION:

We need to add edges such that all the cities become connected. Connected is defined as "it should be possible from any city to reach any other city by a sequence of edges".

There doesn't exist a graph which is connected and has less than (N-1) edges (which is known as a tree).
So we will be constructing a tree finally with (N-1) edges.
Suppose we create some random tree with (N-1) edges. Now I pick two nodes i and j where P_i and P_j both are not the minimum and remove that edge from there and connect nodes x and j, where P_x is minimum.

I have a change of ( P_x * P_j ) - ( P_i * P_j ) which is less than zero since P_x< P_i.

So, we all nodes will be connected to the node with smallest value.
Pseudo Code:

n=input
value array=input
ans=0
for i=1 to N-1:
    ans += value[i]*value[0]
print ans

AUTHOR'S AND TESTER'S SOLUTIONS:

Author's solution can be found here.
Tester's solution can be found here.

↧

SPALNUM - Editorial

September 20, 2015, 10:08 am

≫ Next: CHEFTMA -- Editorial

≪ Previous: KINGSHIP - Editorial

PROBLEM LINK:

Practice
Contest

Author:Sergey Kulik
Tester:Yanpei Liu
Editorialist:Pawel Kacprzak

DIFFICULTY:

SIMPLE

PREREQUISITES:

Ad hoc, Palindrome

PROBLEM:

Let palindromic number be a number whose decimal digits form a palindrome. For example, $1, 22, 414, 5335$ are palindromic numbers, while $13$ and $453$ are not. Your task is to find the sum of all palindromic numbers in a range $[L, R]$ inclusive. In one test file, you have to solve this task for at most 100 test cases.

QUICK EXPLANATION:

Precompute the sum of palindromic number not greater than $K$, for $1 \leq K \leq 10^5$, and store these values in an array. Provide an answer for a single test case $[L, R]$ using precomputed sums for $R$ and $L - 1$.

EXPLANATION:

Let's first consider solving the problem for a single test cases. We are given two number $L$ and $R$ and we have to compute the sum of palindromic numbers from $L$ to $R$ inclusive. If we can check if a number $N$ is palindromic, then we can iterate over all numbers $N$ in a range $[L, R]$ and add $N$ to the result if and only if $N$ is palindromic. How to check if a number $N$ is palindromic? Well, it is pretty straightforward, we can list the sequence of digits of $N$ from right to left, and check if that sequence is a palindrome comparing corresponding digits. A pseudocode of that method can look like that:

// we assume that N > 0
bool is_palindromic(N):
    digits = [ ]
    while N > 0:
        digits.append(N % 10)
        N /= 10
    i = 0
    j = digits.size() - 1
    while i < j:
        if digits[i] != digits[j]:
            return False
        i += 1
        j -= 1
    return True

This check runs in $O(\log(N))$ time, because the decimal representation of $N$ has $O(\log(N))$ digits.

Being able to perform the palindromic check, we can accumulate the result iterating over all integers in range $[L, R]$. A pseudocode for it might look like this:

res = 0
for N = L to R:
    if is_palindromic(N):
        res += N

This method works in $O((R - L) \cdot \log(R))$ time for a single test case, but since we have to handle at most $100$ of them and a range $[L, R]$ can have up to $10^5$ elements, this method will pass only the first subtask and will timeout on the second.

How to speed it up?

The crucial observation here is that, during the whole computation described above, we might check in a number $N$ is palindromic many times! This is not good, but fortunately, there is a common technique to avoid that.

Often when we are asked many times to compute some result for objects in some range [$A, B]$, we can do the following:

Let $F[N] := \texttt{the result for a range } [0, N]$

If we are able to compute $F[N]$ for all possible $N$, then the answer for a single query $[A, B]$ equals $F[B] - F[A - 1]$, because $F[B]$ contains the result for all numbers not greater than $B$, so if we subtract $F[A - 1]$, i.e the result for all number smaller than $A$, from it, we will get the result for all numbers in range $[A, B]$.

If you did not know this technique, please remember it, because it is very useful.

Using the above method, we can precompute:

$S[N] := \texttt{sum of palindromic number not greater than } N$

in the following way:

S[0] = 1
for N = 1 to 100000:
    S[N] = S[N - 1]
    if is_palindromic(N):
        S[N] += N

The above method runs in $O(10^5 \cdot \log(10^5))$ time, and we can use the S table to answer any single query $[L, R]$ in a constant time.

AUTHOR'S AND TESTER'S SOLUTIONS:

Author's solution can be found here.
Tester's solution can be found here.

↧

CHEFTMA -- Editorial

January 1, 2016, 2:57 am

≫ Next: CHINFL -- Editorial

≪ Previous: SPALNUM - Editorial

PROBLEM LINK:

Practice
Contest

Author:Dmytro Berezin
Tester:Antoniuk Vasyl and Misha Chorniy
Editorialist:Pushkar Mishra

DIFFICULTY:

Simple

PREREQUISITES:

Greedy, In-built data structures

PROBLEM:

EXPLANATION:

Subtask 1
Given constraints are very small. Thus, a brute force solution can easily work. Trying all possible buttons in all possible ways and noting the minimum over all such arrangements gives the answer. The total complexity of this approach is $\mathcal{O}(2^N*2^M*2^K)$. This will pass under the given constraints.

Subtask 2
There are many observations to make here which reduce this problem substantially:

For the given $A[i]$ and $B[i]$ values, we are only interested in the value $A[i]-B[i]$. Basically, we can modify each $A[i]$ to $A[i]-B[i]$. From this point onwards in this editorial, array $A$ is assumed to be containing the modified values, i.e., $A[i]-B[i]$.
The black and white buttons fundamentally do the same things when dealing with the modified values of the array $A$. Thus, we put all the values from the two arrays, $C$ and $D$, in a common multiset called $buttons$. A multiset is a set which allows duplicates to exist.

Now, before we go on to the formal algorithm, let's think of a smaller problem. Let's take two values from the array $A$ (just reminding, the values aren't the ones which were taken as input) $A[i]$ and $A[j]$. Let's assume that $A[j] > A[i]$. Now, let us take two values $x$ and $y$ from the $buttons$ multiset. Let's say that $x$ < $y$ < $A[j]$. Which button should be use for $A[j]$ then. Intuition tells us that it is beneficial to use $y$. This is for two reasons. First one is that using $y$ allows us to complete more number of tasks on the $j^{th}$ day than $x$ using why would. Second reason is that assume $y$ > $A[i]$ and $x$ < $A[i]$. If we use $x$ on $A[j]$, we won't be able to use $y$ on $A[i]$. A better strategy is to use $x$ on $A[i]$ and $y$ on $A[j]$. What if both $x$ and $y$ were less than $A[i]$. Then we could use either of buttons on $A[i]$ and the other on $A[j]$. This is because we will anyway be reducing the same number of tasks from the sum of total incomplete tasks.

This gives us our greedy algorithm: for a particular value $A[i]$, use the button $x$ such that $x$ hasn't been used up till now and $x$ is the largest value less than or equal to $A[i]$ in the $buttons$ multiset. Now, the incomplete tasks left will be $A[i] - x$. Add this to the accumulator variable which is to be finally returned as the answer. $x$ is removed from the multiset since one button can't be used twice. The proof of this greedy algorithm has been given above and can be more formally stated as "Exchange Argument". You can find more about proofs of greedy algorithms here.

The multiset data structure can be found in almost all the major programming langauges. Writing one on your own requires knowledge of balanced binary search trees. All operations in a multiset are in order $\mathcal{O}(\log N)$.

The editorialist's program follows the editorial. Please see for implementation details.

OPTIMAL COMPLEXITY:

$\mathcal{O}(N\log N)$ per test case.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

↧

CHINFL -- Editorial

January 2, 2016, 1:53 am

≫ Next: SEAKAM -- Editorial

≪ Previous: CHEFTMA -- Editorial

PROBLEM LINK:

Practice
Contest

Author:Vitalii Kozhukhivskyi
Tester:Antoniuk Vasyl and Misha Chorniy
Editorialist:Pushkar Mishra

DIFFICULTY:

Medium

PREREQUISITES:

PROBLEM:

There are $N$ kiosks (numbered 0 to $N-1$) which exchange Peppercorns for Antarctican Dollars and vice versa at different rates. The rates vary per second. You have $D$ Peppercorns in the beginning, and you have $M$ seconds to perform exchanges. You need to maximise the Peppercoins you have after $M$ seconds.

EXPLANATION:

Subtasks 1 and 2
The question clearly hints towards a standard DP. For the given constraints, there exists no brute force solution. Thus, we directly move towards formulating a DP. Let us first define some arrays. Let $buying[i][j]$ give the buying rate at the $i^{th}$ second at kiosk number $j$. Similarly, let $selling[i][j]$ give the selling rate at the $i^{th}$ second at kiosk number $j$.

Now, we need to think of the states and overlapping subproblems for the DP. Let us have $DP[time][kiosk][currency]$. What does this denote? $DP[t][i][c]$ denotes the maximum money that you can have in currency 'cur' if you are at kiosk $i$ at the end of $t$ seconds. How many possible values does each state have? $t$ ranges from 1 to $M$, $i$ goes from 1 to $N$ and $c$ can have 2 possible values. Let $c = 0$ denote Peppercorns currency and $c$ = 1 denote Antarctic Dollar.

We now need to formulate the reccurence for this DP. Here we provide a pseudocode of the reccurence:

for i = 0 to N-1
{
    //at time 0, chef can start at any kiosk with
    //D Peppercorns and 0 Antarctic Dollars

    dp[0][i][0] = D, dp[0][i];
}

for t = 1 to M
{
    for i = 0 to N-1
    {
        //chef can simply wait at this kiosk for another second
        dp[t][i][0] = dp[t-1][i][0];
        dp[t][i][1] = dp[t-1][i][1];

        //or approached this kiosk from some other kiosk
        //at distance 'dis'
        for dis = 1 to N-1
        {
            if(i+dis < N)
            {
                //chef can travel from the (i+dis) kiosk to this one.
                //it takes time d.
                dp[t][i][0] = max(dp[t][i][0], dp[t-dis][i+dis][0]);
                dp[t][i][1] = max(dp[t][i][1], dp[t-dis][i+dis][1]);

                //the other option chef has is to exchange currency
                //at this particular kiosk. Since, exchanging takes
                //1 second, we need to exchange the money we had 1
                //second ago, i.e, time (t-1)

                dp[t][i][0] = max(dp[t][i][0], dp[t-1][i][1]*buying[t-1][i]);
                dp[t][i][1] = max(dp[t][i][1], dp[t-1][i][0]/selling[t-1][i]);
                //recall the 1 = Antarctican Dollar, 0 = Peppercoins
            }

            //similarly for the analogous case of the kiosk (i-dis)
            if(i-dis >= 0)
            {
                dp[t][i][0] = max(dp[t][i][0], dp[t-dis][i-dis][0]);
                dp[t][i][1] = max(dp[t][i][1], dp[t-dis][i-dis][1]);

                dp[t][i][0] = max(dp[t][i][0], dp[t-1][i][1]*buying[t-1][i]);
                dp[t][i][1] = max(dp[t][i][1], dp[t-1][i][0]/selling[t-1][i]);
            }
        }
    }
}

//return the maximum of Peppercorn currencies
//over all kiosks at time M. If this value is
//greater than 10^18, then output Quintillionnaire
return max of dp[M][0..N-1][0]

This algorithm runs in $\mathcal{O}(MN^2)$. This is sufficient for the first two subtasks.

Subtask 3
We need to somehow optimize the above algorithm. For that, we need to make a crucial observation. The innermost loop with variable $dis$ goes from 1 to $N-1$. But a little bit of thinking tells us that $dis$ only has to go from 1 to 1. In other words, we just need to check the immediate neighbours of a kiosk instead of checking all the kiosks. Why is that so?

Let us say there are three kiosks $i$, $j$, $k$ such that $j = i-1$ and $k = j-1$. Let us say that at time $t$, $dp[t][k][0] = v_1$ and $dp[t][j][0] = v_2$ such that $v_1 > v_2$. Now, when we are choosing the best possible value of $dp[t+1][i][0]$, we needn't check $dp[t-2][i-2][0]$, i.e., $dp[t][k][0]$ explicitly. This is because if $dp[t][k][0]$ contains a value larger than the neighbours of $i$, which by our assumption ($v_1 > v_2$) it does, then by the recurrence given in the pseudocode, $dp[t][i-1][0]$ would contain at least that value or a bigger one. In other words, it is like saying that if a kiosk is at distance dis from $i$, we needn't check the $dp[t-dis][i-dis][0]$ cell since had that cell contained a big amount, chef would have walked with that amount from the $(i-dis)^{th}$ kiosk to $(i-1)^{th}$ kiosk in the meanwhile (i.e., in $dis-1$ time). Hence, we only need to check the neighbours of a kiosk because we are dealing with a DP which has maximisation in its recurrence.

This observation brings the complexity of the algorithm down to $\mathcal{O}(MN)$ which is sufficient for all the subtasks.

The editorialist's program follows the editorial. Please see for implementation details.

OPTIMAL COMPLEXITY:

$\mathcal{O}(MN)$

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

↧

SEAKAM -- Editorial

December 31, 2015, 6:41 am

≫ Next: Fast string input

≪ Previous: CHINFL -- Editorial

PROBLEM LINK:

Practice
Contest

Author:Sergey Nagin
Tester:Antoniuk Vasyl and Misha Chorniy
Editorialist:Pushkar Mishra

DIFFICULTY:

Medium

PREREQUISITES:

Bitmask DP, Combinatorics

PROBLEM:

Given is a graph with $N$ nodes which is almost complete, i.e., all but $M$ edges are present. Count the number of permutations $P[1..N]$ such that if the permutation was considered to be the order in which nodes are visited in the graph then it is a valid traversal. In other words, there exists an edge from $P[1]$ to $P[2]$, from $P[2]$ to $P[3]$, ..., i.e., from $P[i]$ to $P[i+1]$ for $i = 1$ to $N-1$.

EXPLANATION:

Subtask 1
Simply iterate over all permutations of the nodes and count the valid ones by checking the presence of edges between $P[i]$ and $P[i+1]$. This approach takes $\mathcal{O}(N!)$ time and is fine for this substask.

Subtask 2, 3, 4
Most of the counting questions can be solved using DP. Let us see how we can think about this particular problem in terms of optimal sub-structure and overlapping sub-problems, i.e., the two necessary and sufficient requirements of a DP.

Let us think of the problem now in a way that we get to a DP. We can see that the number of missing edges are extremely few, i.e., just 7. In other words, the graph is an almost complete one. 7 missing edges means we just need to worry about the arrangement of at max 14 nodes. The other nodes can be shuffled in any manner since they don't affect the correctness of a permutation.

Let us call the set of nodes which have certain edges missing $faulty\_nodes$. So it is the relative placement of the nodes of this set. Let $x$ and $y$ be two nodes in this set. If $x$ and $y$ HAVE an edge between them, i.e., edge $(x, y)$ is not amongst the $M$ missing edges, then we needn't worry about how $x$ and $y$ are placed relative to each other in a permutation. This is our only concern is that the nodes which share a missing edge should not be adjacent to each other in a permutation. Even if $x$ and $y$ appear adjacent to each other, that won't make the particular permutation invalid.

But what happens if $x$ and $y$ do not share an edge? We have to make sure they don't appear adjacent to each other. How do we ensure that? We basically need to make sure that there always exists at least one element $z$ such that $z$ has an with both $x$ and $y$. In other words, $z$ behave as a bridge.

This gives us a hint towards the solution, i.e., the DP. We know that there only exist at maximum 14 nodes in the $faulty\_nodes$ set. This means that we can iterate over all possible relative placements of the faulty nodes. Not clear? Let us take an example. Let the given graph have 5 nodes $n_1$, $n_2$, $n_3$, $n_4$, $n_5$. Of these 5, let us assume that edges $(n_2, n_3)$ and $(n_3, n_4)$ are missing, i.e., $n_2$, $n_3$, $n_4$ are faulty nodes.

In this case, there are 6 (i.e., 3!) possible relative placements of the faulty nodes in a permutation. There are $(n_4, n_2, n_3)$, $(n_4, n_3, n_2)$, $(n_2, n_4, n_3)$, $(n_2, n_3, n_4)$, $(n_3, n_4, n_2)$ and $(n_3, n_2, n_4)$. Note that these are relative placements of the faulty nodes only. This means that the remaining nodes, i.e., $n_1$ and $n_5$ can be placed anywhere between these faulty nodes to form a permutation $P$ of the graph. If then each adjacent pair $P[i]$, $P[i+1]$ have an edge between them then the permutation is a valid one. Now let us pick up any one of the 6 relative arrangements and try to add the two remaining nodes in it.

Let's take $(n_2, n_4, n_3)$. Now let us see where all we can fit the $n_1$ and $n_5$ nodes so as to form valid permutations. We must place one between $n_4$ and $n_3$ since they can't be together. There must be a node between them to act as a bridge. So one valid placement of $n_1$ and $n_5$ can be $(n_1, n_2, n_4, n_5, n_3)$. Another one can be made by swapping $n_1$ and $n_5$. Further another one can be made one by moving around $n_1$ in $(n_1, n_2, n_4, n_5, n_3)$, something like $(n_2, n_4, n_5, n_1, n_3)$.

Now, we have a way to count the number of valid permutations that can be made. First, we take a relative arrangement of the faulty nodes. Then between each pair of faulty nodes that have a missing edge, we must place at least one normal node to act as bridge. This way, we will be able to count all the valid permutations.

One way to go about this is to cycle through all the arrangements of the faulty nodes. For a particular arrangement, we check which all pairs of adjacent faulty nodes need a bridge element between them. If a pair $x$, $y$ needs one, we "tie" a normal node $z$ to the front of $y$. Thus, we treat pair $z$, $y$ as one element. Now, we arrange the "modified" nodes in the faulty nodes set (modified as in tied with normal nodes where required) in the $(N-$ number of normal nodes used as bridges$)$ possible positions. The number of ways to do this further multiplied by the factorial of number of normal nodes (this is because any of the normal nodes can be used and bridge and they can be shifted around) gives the number of valid permutations for the particular relative arrangement. But this method has a problem. There can be total of 14! possible arrangements of the faulty nodes. This is not practical. We have to reduce the number of cases to consider.

This is where DP comes into picture. Let's maintain a DP with three states: $DP[mask][first\_node][number\_tied]$. What do these states indicate? $mask$ tells us which of the faulty nodes have been arranged; $first\_node$ is the first node of the arrangement and $number\_tied$ tells the stores the number of normal nodes that have been used as bridges, i.e., tied to a faulty node. The state $mask$ goes from $1$ to $2^{14}-1$, $first\_node$ has 14 possible values and $number\_tied$ also has 14 possible values at max.

Thus, $DP[mask][first\_node][number\_tied]$ gives the number of arrangements of the faulty nodes whose bit is 1 in the number mask such that the first node of the arrangement is $first\_node$ and the $number\_tied$ is the number of normal nodes used as bridges.

We can now formulate the recurrence:

let k = faulty_nodes.size()
let ans = 0 //counter variable
let normal_nodes = N - k;

for mask = 1 to (2^k-1)
{
    for first_node = 0 to k-1
    {
        for number_tied = 0 to k-1
        {
            //appending a new node to the beginning of the arrangement that
            //is given by mask.
            for new_first = 0 to k-1
            {
                if(new_first bit in mask == 0)
                {
                    new_mask = bitwise_or(mask, 2^new_first)
                    missing_edge = 1 if edge (new_first, first_node) missing else 0

                    //adding to the count of sequences starting with new_first and
                    //containing faulty nodes as the ones in the new_mask
                    dp[new_mask][new_first][number_tied + missing_edge] += dp[mask][first_node][number_tied];

                    //note that if the edge (new_first, first_node) is missing,
                    //we will have to tie an element to first_node, i.e.,
                    //increasing number_tied by 1.
                }
            }

            if(mask == (2^k - 1))
            {
                //if all the faulty nodes have been arranged then we can
                //count the number of valid permutations for this
                //particular arrangement

                total_objects = N - number_tied;
                val = dp[mask][first_node][number_tied];

                //getting all ways of arranging items
                get_ways = (total_objects choose k) * (val);

                //multiplying the number of ways of permuting normal nodes.
                get_ways = (get_ways * factorial[normal_nodes]);

                //adding to the counter variable
                ans = (ans + get_ways);
            }
        }
    }
}

return ans;

The editorialist's program follows the editorial. Please see for implementation details. The (n choose k) mod m has been calculated using inverse factorials since m in this case is a sufficiently large prime.

OPTIMAL COMPLEXITY:

$\mathcal{O}(M^32^M)$ per test case.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

↧

Fast string input

August 10, 2013, 12:27 pm

≫ Next: StopStalk: Tool to maintain your algorithmic progress

≪ Previous: SEAKAM -- Editorial

What is the fastest way to input strings in c and c++?

I know there are many ways to get integers fast but I was not able to find anything that helps with strings especially char strings.

↧

StopStalk: Tool to maintain your algorithmic progress

January 11, 2016, 12:32 am

≫ Next: FUZZYADD - Editorial

≪ Previous: Fast string input

Hello Coders,

Hope you are having a great time coding hard. Here I present to you a Utility tool - StopStalk which will encourage you to keep your algorithmic progress going by coding with your friends and improve.

It retrieves your friends’ recent submissions from various competitive websites(CodeChef, Codeforces, Spoj, HackerEarth and HackerRank for now) and shows you in one place. It includes lots of other features like - User streak notification, Global Leaderboard, Filter submissions, search problems by tags, and a lot more… You can send friend requests to your friends on StopStalk or you can also add a Custom User. Register here - StopStalk

The goal is to involve more and more people to Algorithmic Programming and maintain their streak. Also the project is completely Open Source - Github

Feel free to contribute. :)

We would be happy to hear from you - bugs, enhancements, feature-requests, appreciation, etc. In the end - Stop Stalking and start StopStalking! Happy coding!

PS: We update our database every 24 hrs but just for today we will update it every 3 hrs. So don’t miss this opportunity to register.

↧

FUZZYADD - Editorial

June 21, 2015, 9:45 am

≫ Next: RGAME -- Editorial

≪ Previous: StopStalk: Tool to maintain your algorithmic progress

PROBLEM LINK:

Contest
Practice

Author:Kevin Atienza
Tester:Istvan Nagy and Kevin Atienza
Editorialist:Kevin Atienza

PREREQUISITES:

Probability, expected value, probability distribution, amortized analysis

PROBLEM:

You are given a sequence $V_1, \ldots V_N$. We define $S_0 = 0$ and $S_i = S_{i-1} + V_i$. However, when adding two numbers and the sum exceeds $999999$, we instead select the sum as a random integer uniformly chosen from $0$ to $999999$. What is the expected value of $S_i$ for all $1 \le i \le N$?

QUICK EXPLANATION:

Let $P_i(v)$ be the probability that $S_i$ is $v$, for $0 \le v \le 999999$. Initially we have $P_0(0) = 1$ and $P_0(v) = 0$ for $v > 0$.

When we have a new value $V_i$, we need to update this array. We have the following: $$P_i(v) = \begin{cases} \displaystyle\frac{\sum_{j=1}^{V_i} P_{i-1}(1000000-j)}{1000000} + P_{i-1}(v - V_i) & \text{if $v \ge V_i$} \\\ \displaystyle\frac{\sum_{j=1}^{V_i} P_{i-1}(1000000-j)}{1000000} & \text{if $v < V_i$} \end{cases}$$ In other words, the array is shifted $V_i$ steps to the right, and the $V_i$ entries that overflow are redistributed into the whole array uniformly. Also, the expected value of $S_i$ is almost the expected value of $S_{i-1}$ plus $V_i$, except that we have to adjust for the entries that overflowed. The things we need to implement are the shifting operation and the computation of $U_V = \displaystyle\sum_{j=1}^{V} P_{i-1}(1000000-j)$ for a given $V$.

To implement this efficiently, instead of creating the $P$ array explicitly, we instead represent it as a list of runs: $(L, p)$, which means a length of $L$ elements with value $p$. For example, initially, we have $[(1,1), (999999,0)]$. Then to update the representation, we remove a "chunk" (i.e. a series of runs) of size $V_i$ from the end of this list, use them to compute $U_{V_i}$, add the constant $\frac{U_{V_i}}{1000000}$ to all remaining elements, and append a new run at the beginning. The expected value of $S_i$ can also be updated efficiently.

Instead of adding a constant value to all elements of the array explicitly (which can be slow), we instead hold a variable $T$ which is to be added to all values of the array, and simply increment $T$. Then as we access each run $(L,p)$, we use instead the value $p + T$.

Now, in the worst case there can be up to $N$ runs removed in a single operation. However, each run is only removed once from the array, so the amortized cost is $O(1)$ and the overall complexity $O(N)$ :)

One final hurdle is precision: the error accumulates quickly enough because $T$ only increases. To fix this, we simply explicitly add $T$ to all values of the list from time to time to make the error growth smaller. If we decide to add this every $K$ operations, then the complexity becomes $O(N^2/K)$. Choosing something like $K = \sqrt{N}$ would yield a running time $O(N\sqrt{N})$.

There is also a way to solve this with a segment tree with no subtractions, and therefore possibly less precision error.

EXPLANATION:

The definition of the expected value of some random variable $X$ is the sum of the possible values of $X$ multiplied by their probabilities. We know that the possible values of each $S_i$ are simply $[0,1,\ldots,999999]$. For $i \ge 0$ and $0 \le v \le 999999$, let $P_i(v)$ be the probability that $S_i = v$. Thus, the expected value of $S_i$ is by definition $$E[S_i] = \sum_{v=0}^{10^6-1} v\cdot P_i(v)$$ To compute each $E[S_i]$, we will need to compute $P_i(v)$ for all $i$ and $v$ quickly. For the base case $i = 0$, we have $P_0(0) = 1$ and $P_0(v) = 0$ for $v > 0$ (i.e. $S_0$ is equal to $0$ with probability $1$). Also, the values from $P_i$ will be dependent on the values from $P_{i-1}$ because of the equation $S_i = S_{i-1} + V_i$ (where the "$+$" is our special "addition"). Thus, we will try to compute $P_i$ and $P_{i-1}$.

Suppose we have all values from $P_{i-1}$, and now we will add $V_i$ to $S_{i-1}$. These probabilities will change. Let's try to compute $P_i(v)$. If $v < V_i$, then the only way to arrive at the "sum" $v$ is when the addition $S_{i-1} + V_i$ overflowed. But this can only happen for the values $S_{i-1} \ge 10^6 - V_i$. Also, if this is true, then there is only a $\frac{1}{10^6}$ chance of actually getting the value $v$ (because all sums are equally likely). Therefore, we find that for $v < V_i$:
$$P_i(v) = \sum_{s\ge 10^6-V_i} P_{i-1}(s)\cdot\frac{1}{10^6} = \frac{1}{10^6}\left(\sum_{s\ge 10^6-V_i} P_{i-1}(s)\right)$$ On the other hand, if $v \ge V_i$, then there's another way to arrive at $v$, and that is when $S_{i-1}$ was $v - V_i$ at the beginning. This occurs with probability $P_{i-1}(v - V_i)$, so we find that for $v \ge V_i$:
$$P_i(v) = \frac{1}{10^6} \left(\sum_{s\ge 10^6-V_i} P_{i-1}(s)\right) + P_{i-1}(v - V_i)$$ Combining these two, we get: $$P_i(v) = \begin{cases} \displaystyle\frac{1}{10^6}\left(\sum_{s\ge 10^6-V_i} P_{i-1}(s)\right) + P_{i-1}(v - V_i) & \text{if $v \ge V_i$} \\\ \displaystyle\frac{1}{10^6}\left(\sum_{s\ge 10^6-V_i} P_{i-1}(s)\right) & \text{if $v < V_i$} \end{cases}$$

Thus, in order to compute the answer, we must solve two problems:

Maintaining and updating the list of probabilities $P_i$ as we increase $i$, and
Quickly computing the expected value given these probabilities.

The update

We can restate the transformation from the array $P_{i-1}$ to $P_i$. Let $P$ be an array of length $10^6$, such that $P[v]$ is initially $P_{i-1}(v)$. Then:

Remove the last $V_i$ elements of $P$. Let $t$ be the sum of these removed elements.
Add $V_i$ elements at the beginning of $P$, each of value $0$.
Add $\frac{t}{10^6}$ to all values of $P$.

(The first two steps taken together can also be seen as a right shift.) In addition, we also need to maintain the value $\sum_v v\cdot P[v]$, so we can get $E[S_i]$ easily. Initially, $P[0] = 1$ and $P[v] = 0$ for $v > 0$.

In the second operation, we add a contiguous run of $0$s at the beginning, which suggests to us that $P$ contains long runs of data. (A run is a contiguous subsequence in which the same value occurs.) The third operation doesn't do anything to change that: It may change the values of each run, but it doesn't affect the structure / lengths of the runs at all. This hints that we can represent $P$ as a list of runs. A run itself can be represented by two numbers $(l,v)$: its length and its value. Initially, we have two runs: $(1,1)$ and $(10^6-1,0)$. Also, initially $\sum_v v\cdot P[v] = 0$.

Now, let's implement the operations above on this representation of $P$.

First operation

Removing the last $l$ elements is straightforward:

Take the last run $(l_k,v_k)$ in the list. If $l_k \le l$, then remove it from the list, decrement $l$ by $l_k$, and repeat this step.
Otherwise, the last run satisfies $l_k > l$. In this case, simply decrement $l_k$ by $l$. (Don't remove it.)

Along the way, we must be able to update the value of $\sum_v v\cdot P[v]$, so when removing $(l,v)$ from the back of the list, we must adjust it. If the total length of the runs is $L$, then the first index of this run in $P$ must be $L - l$, so we need decrement $S$ by $$\sum_{i=L-l}^{L-1} i\cdot v = v\left[\frac{L(L-1)}{2} - \frac{(L-l)(L-l-1)}{2}\right]$$ A similar adjustment can be done when decrementing the length of the last run. (As needed in the second step.)

Second operation

Now, what about adding a run of $0$s at the beginning? Suppose we want to add $l$ zeroes. These zeroes don't contribute to $\sum_v v\cdot P[v]$, but the shifting of the remaining elements affect it. Specifically, the change in value is equal to $$\begin{align*}& \sum_{v=0}^{L-1} (v+l)\cdot P[v] - \sum_{v=0}^{L-1} v\cdot P[v] \\\&= \sum_{v=0}^{L-1} l\cdot P[v] \\\&= l\cdot \sum_{v=0}^{L-1} P[v] \end{align*}$$ Thus, we also need to maintain $\sum_v P[v]$ in addition to $\sum_v v\cdot P[v]$.

Third operation

Finally, let's tackle the third operation. We wish to add a value $t$ to all runs in the sequence. We don't want to walk through the list and add $t$ individually to the values of the runs because that would be slow. Instead, what we can do is to maintain another variable "$\text{add}$" separately, indicating the value that we want to add to all runs' values. To add $t$ to all runs, we simply increment "$\text{add}$" by $t$. Then, when we access a given run $(l,v)$, we just say its value is $v+\text{add}$.

We also need to remember to update $\sum_v P[v]$ and $\sum_v v\cdot P[v]$, but this is easy. If $L$ is the length of $P$ so far, then the change in $\sum_v P[v]$ is equal to $L\cdot t$, and the change in $\sum_v v\cdot P[v]$ is equal to $\frac{L(L-1)}{2}\cdot t$.

Pseudocode

In case some details aren't clear, the following is a pseudocode you may study:

runs = []
sumPv = 0   // represents sum(P[v])
sumvPv = 0  // represents sum(v*P[v])
L = 0       // length of P
add = 0     // the number to add to all runs

def init():
    // initializes the structure
    runs = [(1,1), (999999,0)]
    sumPv = 1.0
    sumvPv = 0.0
    L = 1000000
    add = 0.0

def remove_from_back(l):
    // remove 'l' elements from the back
    // returns the total of values removed

    if l == 0: // do nothing
        return 0

    Let 'run' be the last run in 'runs'

    // adjust
    l_rem = min(l, run.l)
    real_v = run.v + add // the real value
    sumPv -= real_v * l_rem
    sumvPv -= real_v * (L*(L-1)/2 - (L-l_rem)*(L-l_rem-1)/2)
    L -= l_rem

    if run.l > l:
        run.l -= l // decrement the run, and we're done.
        return l * real_v
    else:
        pop the back of 'runs'
        return run.l * real_v + remove_from_back(l - run.l) // continue removing

def add_zeroes_in_front(l):
    // add 'l' zeroes in front

    add (l,-add) in front of 'runs'
        // remember that "add" will be added to all values, so to add real zeroes,
        // we need to add "-add" in front

    sumvPv += l * sumPv
    L += l

def add_value_to_all(t):
    // add 't' to the value of all runs

    sumPv += L * t
    sumvPv += L*(L-1)/2 * t
    add += t

We maintain the list of runs called "runs" (which can be implemented as a deque, or a sliding array), and four values:

sumPv representing $\sum_v P[v]$
sumvPv representing $\sum_v v\cdot P[v]$
L representing the length of $P$
add representing $\text{add}$

One noteworthy part of this code is in add_zeroes_in_front. Instead of adding a run with value 0, we use the value "-add", because add will be added to it later on. In remove_from_back, we use real_v during computation to update sumPv and sumvPv.

After performing the three operations, we can extract $E[S_i]$ as sumvPv.

Running time

Let's analyze the running time of the algorithm. Type 2 and 3 operations run in $O(1)$, while type 1 operations potentially runs linear in the number of runs, so this seems slow. However, notice that each run can only be pushed and popped at most once, so the overall running time of all type 1 operations is actually proportional to the number of runs added to the structure. But the second operation adds at most $1$ run, so after $N$ operations, there can only be at most $N$ runs, and thus the running time for $N$ operations is $O(N)$. (In other words, the amortized running time of each operation is $O(1)$.) This passes the time limit!

Precision

Although the approach is mathematically correct, it suffers from precision problems. This is because we update sumPv and sumvPv by adding and subtracting large numbers to it (e.g. $\frac{L(L-1)}{2}$ is quite large), so loss of significance happens. There are a few ways around it:

You can use more significant data types. A long double seems to be enough to maintain precision.
You can reset "$\text{add}$" occasionally, by actually adding this value to all runs, then setting it to $0$. The more often we do this, the more we maintain precision, but it comes at the cost of slower running time. Choose a frequency that still passes the time limit and maintains enough precision.
You can find a different approach altogether. An algorithm that doesn't use subtraction is desirable, because no catastrophic cancellation occurs. A solution involving segment trees is possible, though the running time is the slightly slower $O(N \log N)$.

Time Complexity:

$O(N \log N)$ or $O(N)$

AUTHOR'S AND TESTER'S SOLUTIONS:

Will be posted soon

↧

RGAME -- Editorial

December 31, 2015, 11:14 pm

≫ Next: when i compile my code for this question https://www.codechef.com/problems/SPALNUM I am getting time limit exceded! why?

≪ Previous: FUZZYADD - Editorial

PROBLEM LINK:

Practice
Contest

Author:Abhra Dasgupta
Tester:Antoniuk Vasyl and Misha Chorniy
Editorialist:Pushkar Mishra

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Ad Hoc, Observation

PROBLEM:

Given are $N+1$ numbers $A_0$ to $A_N$. These numbers come in as a stream in order from $A_0$ to $A_N$. The number coming in can be placed at either ends of the sequence already present. Score of such a gameplay is calculated as per given rules. Output the sum of scores of all possible different gameplays.

EXPLANATION:

Subtask 1
Subtask 1 can be solved by directly simulating the given problem. In other words, we can directly append the number coming in at either ends and check the sum for each possible arrangement. There can at maximum be $2^N$ different sequences. Therefore, the time complexity is $\mathcal{O}(2^N)$ per test case. This is sufficient for this subtask.

Subtask 2
Let us start by thinking of simpler sequences in which observing patterns could be easier. A trick is to take something like 1, 1, 1, ...., 1, 5 as the sequence. And then the 5 can be shuffled around to different positions to observe how many times a position is taken into account.

Nevertheless, we are going to take a more mathematical approach in this editorial. Let's see what happens when $k^{th}$ number, i.e., $A[k]$ appears in the stream. It can be appended to either of the two ends of the already existing sequence. But how many already existing sequence are there? Clearly, $2^{(k-1)}$. Let us say for now that $A[k]$ is appended to the right of already existing sequence. Now, consider some $p^{th}$ number $A[p]$ coming after $A[k]$. How many sequences exist such that the $A[p]$ will be multiplied by $A[k]$? For $A[p]$ to be multiplied by $A[k]$, all numbers coming in between these two must not go to the side that $A[k]$ is on, i.e., they should be put on the left in all the $2^{(k-1)}$ sequences where $A[k]$ has been appended on the right. If this happens, then when $A[p]$ comes, it will be multiplied by $A[k]$ when placed on the right of it. The $(p+1)^{th}$ up till $N^{th}$ numbers can be arranged in any order after that. So how many sequences in total will have the product of $A[k]$ and $A[p]$? Clearly, $2^{(k-1)}*2^{(N-p)}$. Thus, total value that gets added to the answer is $(A[k]*2^{(k-1)})*(A[p]*2^{(N-p)})$.

We now have a way to calculate the required answer. Below is the pseudocode of the same.

let possible_future_prod[i] = A[i] * 2^(N-i)

let answer = 0; //accumulator variable
for i = 0 to N-1
{
    ways_to_arrange_prefix = 2^(i-1); //if i = 0, then 1

    //multipying A[i] with the number of possible prefixes
    partial_prod = (ways_to_arrange_prefix * A[i]);

    //iterating over elements coming after i
    for j = i+1 to N
    {
        total_prod = partial_prod * possible_future_prod[j];

        //adding total_prod to the accumulator variable
        ans += total_prod;
    }
}

//recall, we had only taken the case when an element is
//appended to the right.
//for taking symmetrical cases into account, multiply by 2.
return 2*ans

This algorithm runs in $\mathcal{O}(N^2)$.

Subtask 3
The algorithm stated in subtask 2 can be made $\mathcal{O}(N)$ by precalculating the suffix sum of the $possible\_future\_sequences$ array. Once we have the $suffix\_sum$ array, the inner loop given above in the pseudocode can be reduced to:

//calculating the suffix_sum array
suffix_sum[n] = possible_future_prod[n]
for i = N-1 downto 0
    suffix_sum[i] = possible_future_prod[i] + suffix_sum[i-1];

let answer = 0; //accumulator variable
for i = 0 to N-1
{
    ways_to_arrange_prefix = 2^(i-1); //if i = 0, then 1

    //multipying A[i] with the number of possible prefixes
    partial_prod = (ways_to_arrange_prefix * A[i]);

    //calculating the sum that can be achieved by
    //multiplying A[i] with numbers coming after it
    total_prod = (partial_prod * suffix_sum[i+1]);

    //adding total_prod to the accumulator variable
    ans += total_prod;
}

//for taking symmetrical cases into account, multiply by 2
return 2*ans

The editorialist's program follows the editorial. Please see for implementation details.

OPTIMAL COMPLEXITY:

$\mathcal{O}(N)$ per test case.

SAMPLE SOLUTIONS:

Author
Tester
Editorialist

↧