Whatever you choose to do, you can always improve speed by iterating a few times over sub-sampled images.
ex.
sub sample to 64x64,
find candidate matches
put candidates into a list of candidate locations.
sub sample to 128x128,
for the candidate location list from the previous iteration, search an area around the candidate locations
for these areas, put the best match into a new list
sub sample to 256x256,
for the candidate location list from the previous iteration, search a SMALLER area around the candidate locations
for these areas, put the best match into a new list
sub sample to 512x512,
for the candidate location list from the previous iteration, search an EVEN SMALLER area around the candidate locations
for these areas, put the best match into a new list
...
And so on and so forth, until you're up to max rez, and you're just fine-aligning within a couple pixels.
The difference in speed between this approach, and just working in full rez all the time, is dramaic. (Orders of magnitude improvement in speed).
You can do a similar approach to your templates. If you want to find an object in any orientation, have a sub-set of images for a few poses.
Match those to each spot, then if you've got a decent match score, look up (or generate) other similar poses, and try to match those.
And repeat again with more similar poses to your best matches from the last iteration, each time generating poses that are less and less different.
In 3d this is a bit easier because you can employ 'spin images' to do a pose irrespective (and to an extent resolution irrespective) signature of a set of points.
http://www.cs.cmu.edu/~dhuber/projects/ ... indep.html
I maybe you could re-work that method for 2d, but I've never done it...
Computer vision is actually one of the most fun and satisfying things I've ever worked on. I hope you have as much fun with it as I did
-scheherazade