...or the dark side of a binary search, as well as how it relates to reinforcement learning, and why it's all so cool

<aside> 🖊️ Authors: Valery Khamenya, Maksim Dzemidovich Updated: 23.08.2022

</aside>

Introduction

In real-world problems, sometimes you need to estimate in a real-time mode quickly some real parameter hidden from observation, and you need to estimate it from a completely poor binary response from the environment - essentially, a "undershoot/overshoot" or "hot/cold" response.

It's even worse (and even more realistic) if you know nothing about the parameter you're guessing. That is, a priori this sought parameter is virtually unbounded on a real number line. And it's even worse when this sought parameter doesn't stand still while you're trying to guess it, but jumps back and forth on its unknown business to any cosmic distance in the meantime. Note, you don't have to dream of smoothness/differentiability here. We are not even hoping about continuity here at all. What a difficult optimization problem for an analytical solution. However, we will talk about full-fledged random process in the second part of this article.

Valery's remark:

When this problem landed on my desk around 2010, I naively thought, "Empirically, it's easy! It's just like an ordinary binary search on a segment, except that outside the segment, instead of dividing in half, you have to do doubling. Also, instead of keeping the ends of the search segment, one could also work with some vector step, which should be doubled if we're going in the right direction. And if in the wrong direction, it should be divided in half with a simultaneous sign reversal in the correct direction - if an undershoot, then to the right hand, and if an overshoot, then to the left hand. The convergence will probably be exponential, and the number of steps will be logarithmic with appropriate disclaimers.”

Here the thoughtful reader might think: "and all you had to do was take a simple Kalman filter. Well, maybe tweak it a bit and that's it!" Take your time, thoughtful reader, there is a suspicion that you can't take this task with your bare Kalman hands.

Noting like that! This thing didn't converge at all! It is worth emphasizing that it did not converge, even in such a gentle case, when the conceived number does not change in the process of guessing.

Even more interestingly, the convergence appeared if less beautiful multipliers were used instead of doubling and halving.

Anyway, 10 years later, I dug out this problem, and Maxim Demidovich and I started catching our parameters.

Part 1. Guessing a real constant

So, let's start with our gentle case: some real constant is conceived and we try to guess (estimate) it by a poor binary response from the environment (undershoot/overshoot, hot/cold). This corresponds to a stationary stochastic process with zero variance and mean equal to the conceived constant.

Search Non-Convergence with Dichotomy and Doubling of a Step

What did the original search algorithm look like with dichotomies and doubling of a step?

Let

$h$ be our unknown search parameter (and so far it does not even change over time),

$x_t$ be our estimate of the sought parameter $h$ at step $t$,

$u_t$ be the response of the environment to our estimate $x_t$ of the parameter $h$ ("$-1$" - undershoot, "$1$" - overshoot)

$\delta_t$ be the amount by which $x_t$ is changed on the step $t$ and

$a$ be some positive constant, $a\in\mathbb{R}, a>0$

Then the original model was evolving by performing sequentially in a loop:

$$ \begin{equation}u_{t+1}=\begin{cases}-1, &x_t<h\\0,&x_t=h\\1, &x_t≥0\end{cases}\end{equation} $$

$$ \begin{equation}\delta_{t+1}=\begin{cases}a\delta_t, &u_{t+1}\delta_t<0\\0,&u_{t+1}=0\\-\frac{\delta_t}{a}, &u_t\delta_t>0\end{cases} \end{equation} $$

$$ \begin{equation}x_t = x_{t-1} + \delta_t\end{equation} $$

In the initial naive idea $a$ was equal to $2$. That is, when during moving to the right we have an "undershoot", i.e. it was necessary to accelerate, the step is doubled. When it was necessary to react to the "overshoot", the step was reversed and halved. If we are moving to the left all is similar we just treat “undershoot” and “overshoot” vice versa.