Smart Search using ‘stored success’

Database

I was doing a course in Dynamic Programming. I dug into it back in 1995, when I started working, but back then it was too difficult for me. There was no StackExchange in 1995. DP was university material and there were only some C++ examples online. And I worked in VB, mostly with object collections and I had no idea about memoisation with arrays, I was no good with arrays. I broke my brain over it.

In 1999 I developed an accounting system, and it’s main processing routine took 4 hours (on a Pentium 1). I had to match 1800 costcenters to 1200 banking statements each month. Each cost center had 6 characteristiscs and each payment 5, so your maximum number of matching operations is 1200x4x1800x6, 64,8 million operations. That run wass always in the last three days of the month, and I sometimes had do to run some corrections and regenerate the report, and it took 4 hours every time… that was too long…

I could use a stronger computer with a faster CPU, more and better memory, and maybe speed it up 10%, or use SQL optimisation and get another 5% but that all just didn’t achieve enough.

I had a weird idea : game AI ! I got a manual on basic AI techniques in games. Games were way ahead in applied basic AI techniques back then. It handled basic subjects like binary space partitioning, search space minimisation, the basics. And I found a routine stripped from the Doom engine. It revolved around reusing ‘stored success’.

The idea is a very simple, you store the last matching characteristic with the costcenter. You did the 5×6 matching operations and out came the ‘winner’ ! So you stored the winning characteristic with the cost center. The contracts (and hence often matching criteria like contract code) only changed maybe once a year, so the next time you could be over 87.5++% sure it would match again on the same criterium most likely, 11 out 12 months it will remain the same.

It is mostly about chance, probability. “You are most likely to find the match here…”

Then you group results by the matching characteristics. Next time you search you start searching with the most found match. I matched 600 times on postal code, and that was the number 1 matching criteria. You tried to match them to 1200 banking records on postal code and could immediately match say 580-590 statements in the first run and then skip to the next most succesful matching item, the address perhaps, or contact code, but that way you could on average immediately match 1100 out of 1200 statements and only had to do the elaborate matching of 5×6 for 100 payment records versus 1800 costcenters: 5.4 million matching operations (+1200 per matching criteria you processed), 7% of the initial total operations.

And with different matching criteria you again stored the ‘winner’ and next run it would first try to match on that. It automatically optimised itself.

In practical reality it meant a 97% reduction of average matching operations (compared to the maximum search). That reduced the overal processing time to 20 minutes. I could start up the routine before the lunch break and pick up the results after lunch. That works a lot better. Machines do not need lunch.

That isn’t classical DP, but it also works with memoisation, using ‘stored success’.

DRY : Dont Repeat Yourself.

I am getting back on my feet and one thing I had pending was another confrontation with DP.

FreeCodeCamp have a free course in JavaScript. All other courses are mostly in Java, as Java is more for the backend and JS is mainly frontend oriented.

They explain things with basic graphics and that works for me, I suck at formulae, shorthand stuff. But the graphics make sense to my brain. “Explain it to me like I am 5 years old”.

DP or rather optimisation in general is common (and even inherent) to any business. If you want to work as general purpose (database) coder for companies, it really helps to know DP and DRY.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top