Finding array pivot points and measuring complexity
Abstract
First, the news: I’m now in on the jobs market and C++ beckons. Secondly, some fun problems to consider.
Table of Contents
This week, I’ve been going jobs crazy, working hard at going to loads of interviews. I am happy that one of the first places I heard back from offered me a job on the spot, and they are easily in the top three or four companies I applied to in preference (with none strictly higher). So, it looks probable I will be working on an enterprise server product here in Cambridge next year.
The interviews have been dragging me back down the ground and forcing me to brush up on my rusty programming skills. C++ in particular has been a triumph in theory, and troublesome in practice. I present here two interesting problems encountered in the last week.
1. A fun game
The most accessible problem is here first. There is a hat, with the numbers 1 to 100 on slips of paper. Drawing a slip costs £1, but gives you the option of claiming the amount on it in winnings, or putting it back. Once you claim your winnings, the game is over. What is your strategy, and how much do you win? I make the average winnings 1135/13 exactly (about £87.31), though I am not confident that all my floors and ones are in the right place. (Knowing you can ignore the special case of pay-off being an integer might make your sketch easier.) On average, you accept the 50/7-th number drawn.
2. Finding array pivots
Google had a very interesting process: they sent me an online test at a certain website rhyming with agility (to prevent Google picking this up; the terminology here is mine too). The problems were pretty easy, but I fiffed and faffed with the STL so much that I nearly ran out of thinking time. The most interesting problem was as follows, because it spawned some thoughts and questions.
Posed: given an array vector
<> from
iterators, and so on. If you know the various STL
containers and their methods reasonably well, you would do fine (I could
certainly score much higher now that I have written 100 lines of C++ in
the last twelve months instead of 0).
I present my code here, for various reasons.
#include <vector> #include <set> #include <list> #include <sstream> #include <iostream> #include "cycle-min.h" #include <ctime> #include <cstdlib> using namespace std; /* * Our naive method. Not great: does a load of repetitive * searching and ends up with O(n^2) complexity. */ int arrayPivotNaive(vector<int> &A) { int n = A.size(); if (n < 1) return -1; int lmax = A[0]; for (int i = 0; i < n; ++i) { if (A[i] < lmax) continue; lmax = A[i]; bool rmin = true; for (int j = i+1; j < n; ++j) { if (A[i] > A[j]) { rmin = false; break; } } if (rmin) return i; } return -1; } /* * What I coded up in the test (with an error). This still * has O(n logn) performance, which I thought was about right. */ int arrayPivotBetter(vector<int> &A) { int n = A.size(); if (n < 1) return -1; int lmax = A[0]; multiset<int> rmin(++A.begin(), A.end()); // n logn right there. for (int i =0; i < n - 1; ++i) { if (A[i] < lmax) continue; lmax = A[i]; if (A[i] <= *rmin.begin()) return i; rmin.erase(rmin.find(A[i + 1]));// log n; line is run n times } if (A[n - 1] >= lmax) return n - 1; return -1; } /* * Clearly the optimal answer: O(n) time */ int arrayPivotBest(vector<int> &A) { int n = A.size(); if (n < 1) return -1; list<int> possPivots; int rmin = A[n - 1]; for (int i = n - 1; i >= 0; --i) { if (A[i] > rmin) continue; rmin = A[i]; possPivots.push_front(i); } int lmax = A[0]; for (int i = 0; i < n; ++i) { bool poss = false; if (possPivots.front() == i) { poss = true; possPivots.pop_front(); } if (A[i] < lmax) continue; if (poss) return i; } return -1; } void addTest(vector< vector<int> > &tests, string test) { stringstream ss(test); vector<int> t; ss.peek(); while (ss.good()) { int next; ss >> next; t.push_back(next); } tests.push_back(t); } string testComplexity(int (*fun)(vector<int>&)) { vector< pair<int, double> > samples; long length = 1; double t; clock_t ts; srand(time(NULL)); do { length *= 2; vector<int> test; //It is hard to pick random input which is not 'kind'; by //inspection, it is clear that the vector [1,2,...,n-1,0] //selects worst-case time for the naive algorithm, and //unsorted hits the multimap one badly. for (int i = 0; i < length; ++i) test.push_back(i==length-1?0:i+1);//rand()%length + 1 ts = clock(); ticks tick0 = getticks(); (*fun)(test); t = elapsed(getticks(), tick0); ts = clock() - ts; samples.push_back(pair<int, double>(length, t)); cout << length << "," << t << endl; } while (length < 67108865 && ts < 30 * CLOCKS_PER_SEC); } int main() { vector< vector<int> > tests; // Some basic conformance tests: addTest(tests, ""); addTest(tests, "1"); addTest(tests, "1 2"); addTest(tests, "2 1"); addTest(tests, "3 5 4"); addTest(tests, "3 4 2"); addTest(tests, "3 4 3"); addTest(tests, "3 4 4"); addTest(tests, "3 4 5"); addTest(tests, "3 3 2"); addTest(tests, "3 3 3"); addTest(tests, "3 3 4"); addTest(tests, "3 2 1"); addTest(tests, "3 2 2"); addTest(tests, "3 2 3"); addTest(tests, "3 2 4"); addTest(tests, "3 1 2"); cout << "***** Conformance tests *****\n*****\n"; for (vector< vector<int> >::iterator it = tests.begin(), end = tests.end(); it != end; ++it) { cout << (*it).size() << ": "; for (vector<int>::iterator val = (*it).begin(), valend = (*it).end(); val != valend; ++val) { cout << *val << " "; } cout << " | " << arrayPivotNaive(*it) << " " << arrayPivotBetter(*it) << " " << arrayPivotBest(*it) << " " << endl; } // Now we want to check the complexity: cout << "\n\n***** Complexity tests *****\n*****\n"; testComplexity(&arrayPivotNaive); testComplexity(&arrayPivotBetter); testComplexity(&arrayPivotBest); }
To test, I wrote up the obvious answer in a few minutes; no
particular C++ knowledge required—it’s just for
loops. I
don’t know why I fixated on the next algorithm, but it’s the one I went
for during the test. The third one came to me just after and is
optimal.
One of the nice things about ***ility is that it marks your code according to various criteria, one of which is how the algorithm scales. As practice, I started to write a test harness to see if I could do it too.
(Incidentally, as Google fodder: I had to hunt for a while to find
an accurate way of timing how long a program takes to execute in C++,
using high resolution timers better than ordinary ctime
functions. The answer is extremely easy: Boost has it, or better still
there is a file cycle.h
to download which
does the magic of detecting your compiler and grabbing the data from your
processor’s most accurate register.)
Now, this is the open point: how do you actually work out
empirically the performance of an algorithm? It takes more than just
random data to work out the worst case running time (because a random
array is very unlikely to have any pivots, and the naïve algorithm works
this out in linear time). Even assuming you could find a way to
heuristically stress test some code as a black box and work out what sort
of input it favoured, there is another question I am considering: given
some data, can we actually tell the difference between
The different polynomial times can be distinguished very easily, but
as
discussed here,