PROBLEM LINK:
Div1
Div2
Practice
Setter-Praveen Dhinwa
Tester-Misha Chorniy
Editorialist-Abhishek Pandey
DIFFICULTY:
EASY
PRE-REQUISITES:
Strings, Basic Math, Simulation.
PROBLEM:
Given a string $S$ , with only characters being $'a'$ and $'b$ $'$, find number of good prefixes in the string $T$ which is made by concatenating $N$ copies of $S$, i.e.-
$T = \underbrace{S + S + \dots + S}_{N\text{ times}}$
A prefix is a good prefix if and only if number of occurrences of $'a'$ is strictly greater than the number of occurrences of $'b'$.
QUICK EXPLANATION:
We make cases for this problem. If frequency of $'a'$ is equal to frequency of $'b'$ in string, then calculating answer is simple- for we can simply use $Ans=G(S)*N$ where $G(S)$ denotes number of Good Prefixes in $S$.
If frequency of $'a'$ is more than to frequency of $'b'$ , we notice that after at most $1000$ appendations, each new appendation will contribute maximum possible value only, which is $K$ ($K$=Length of string) .
Similarly we can see that if frequency of $'a'$ is less than to frequency of $'b'$, after $1000$ appendations, each new appendation will contribute $0$ to answer.
So if frequency of character $'a'$ is not equal to that of $'b'$, we make a string $T$ by appending $S$ to it $1000$ times, and then use mathematical formulas to find contribution of every other appendation.
EXPLANATION:
As usual, we will be discussing Editorialist's and tester's approach. Editorialist's approach is based on same lines as that of setter- and hence, will form main part of editorial. We will discuss better optimized solutions (tester's approach) and common mistakes of contestants at the bonus part of editorial :)
Before that, please first get acquainted with the notations I will be using.
- $G(S)$= Number of Good Prefix in string $S$
- $freq(a,S)$= Frequency of character 'a' in string $S$.
- $freq(b,S)$= Frequency of character 'b' in string $S$.
- ${S}^{p}$= String formed by concatenation $S$ $p$ times.
1. Editorialist's/Setter's Solution-
If you want, you can scroll down to refer to my solution side by side while reading the editorial.
The first thing to note are the constraints. $1\le N \le {10}^{9}$ while $1\le |S| \le {10}^{3}$.
We start by calculating number of Good Prefixes in $S$ , i.e. $G(S)$. Along with that, we also find $freq(a,S)$ and $freq(b,S)$. Now we make cases-
a. When $freq(a,S) == freq(b,S)$-
In this case, there is no affect of previous appendations of $S$ on incoming/new appendations of $S$. In other words, when $freq(a,S) \neq freq(b,S)$, then there is some change in value of $(freq(a,S)-freq(b,S))$ after each appendation of $S$ to $T$, which affect the contribution of any future appendation of $S$ towards $G(T)$. In most basic words, it means that since $freq(a,S) \neq freq(b,S)$, then number of $'b'$ in $T$ will not increase at same rate as number of $'a'$ at each appendation, which will affect contribution of next appendation to the final answer.
This is not the case when $freq(a,S) == freq(b,S)$. Hence, each appendation of $S$ will contribute $G(S)$ towards the final answer. There are $N$ appendations of $S$ and we already found $G(S)$ by simple looping earlier. Hence, for this case, $Answer=G(S)*N$
b. When $freq(a,S) < freq(b,S)$-
In this case, because $ freq(a,S) < freq(b,S)$, each new appendation of $S$ to $T$ will decrease $(freq(a,S)-freq(b,S))$. In other words, the number of $'b'$ in $T$ will increase more quickly than number of $'a'$, which will reduce the contribution of every future appendation of $S$ towards final answer.
We see that, after every appendation, the contribution of next appendation must reduce by $at$ $least$ $1$ , ($\because freq(a,S) < freq(b,S)$). Remember that length of $S$ is only upto $1000$. This simply means, that after at most $1000$ appendations, the contribution of incoming appendation reduces to $0$. (In fact, $1000$ is a loose upper bound, we can go closer to the real value, but for sake of explaining lets take it $1000$ right now :) )
So, we form the string $T$ by appending string $S$ $min(N,1000)$ times as-
$T = \underbrace{S + S + \dots + S}_{min(N,1000)\text{ times}}$
and our answer is equal to $G(T)$ as any future appendation has $0$ contribution towards final answer.
$Answer=G(T)$
b. When $freq(a,S) > freq(b,S)$-
Clearly, each new appendation of $S$ to $T$ will increase $(freq(a,S)-freq(b,S))$. Thus, the number of $'a'$ in $T$ will increase more quickly than number of $'b'$, which will increase the contribution of every future appendation of $S$ towards final answer.
The maximum possible contribution of an appendation to our answer can be $|S|$, i.e. the length of string $S$. Again, as seen above, after $at$ $most$ $1000$ appendations, the next appendation will reach its maximum contribution. Hence, we do same as above, we form $T$ by appending string $S$ $min(N,1000)$ times as-
$T = \underbrace{S + S + \dots + S}_{min(N,1000)\text{ times}}$
Now each future appendation will have maximum contribution, equal to $|S|$. Hence, our answer will be-
$Answer=G(T)+|S|*max(0,N-1000)$
Now, what is the time complexity? How many of you would agree if I said-
$Time$ $Complexity=$ $O(min(N,1000)*|S|) \equiv O(1000*|S|) \equiv O(|S|)$
Is it correct?
.
.
.
(PS: The above time complexity is a lie :) )
SOLUTION:
Setter
Tester - $O(|S|Log(|S|))$
Tester - $O(|S|)$
Editorialist - $O(???)$
CHEF VIJJU'S CORNER:
1.Most of the contestants did a good job in framing the simulation part correctly, checking for overflows, and coming at right expressions. But they $still$ got a $TLE$. And let me reveal the culprit here! It was-
$T=S+T$
Yes, thats right! This expression should had been $T+=S$. The $+$ operator takes time comparable to $O(|S|+|T|)$ while the $+=$ symbol takes time comparable to $O(|S|)$. Treat it like "Creating a new array with characters of $T$ and $S$ $v/s$ Adding $|S|$ characters to back of char-array $T$ ." For reference, have a look at this solution. Replace the $+$ operator with $+=$ and see the time diffference!
2. Tester's $O(|S|log(|S|))$ solution is discussed here. When $freq(a,S)> freq(b,S)$, what he does is, keep a track of $freq(a,S)- freq(b,S)$ at various points of the string, and then sort them. Now, we keep a variable $curr$ which stores the count excessive $'a$ $'$ we get after each appendation. He uses this to check at most how many prefixes more he can get by the condition $have[ptr - 1] + cur > 0$. Depending on that, he adds $ans += (int)have.size() - ptr;$ to the answer. In other words, tries to go checks the least value in of $freq(a,S)- freq(b,S)$ which $curr$, number of excessive $'a'$ $s$ can afford, and adds to answer accordingly. Same approach used for $freq(a,S)< freq(b,S)$ case, which is left for the readers to work out :).
3. Testers $O(|S|)$ solution is a bit similar. Firstly, Misha eliminated sorting, and replaced it with a cumulative summation'd prefix array. Meaning, he stores the frequency of each value of $freq(a,S)- freq(b,S)$ and does a cumulative sum of that array. Again, $curr$ holds $freq(a,S)- freq(b,S)$. Now, his expression is $ans += cnt[2 * SHIFT - 1] - cnt[SHIFT - cur];$, meaning "Max possible contribution - Contributions which cannot be afforded due to low curr". The solution is same in essence to the first one, but he cleverly got rid of sorting, leading to a much simpler $O(|S|)$ approach.
4. Now you have seen two optimized approaches, I will have a question for you. Have a detailed look at the time complexity I mentioned. No doubt, my solution does ,plain and simple, have a complexity of at least $O(min(N,1000)*|S|)$ , but dont you get something fishy? There is a, very common, and intentional error involved here :). Note that, there is no sense in the $"1000"$ I kept in $O(min(N,1000)*|S|)$. Ask yourself, what is this $1000?$ It is nothing but to denote $|S|$!! Hence, the real complexity would be $O(min(N,|S|)*|S|) \equiv O({|S|}^{2})$. Dont be like this editorialist when computing time complexity xD.
5. The string in this question was made up of only $2$ characters. Replace $'a'$ and $'b'$ with $1$ and $0$ and you have a binary string. Some questions on similar topics-