tests, ran on GeForce6600 AGP 128MB on-card memory, 512MB system memory, AthlonXP 1.667GHZ

HDTV source resolution 1140x1080 for simple, 1140x912 for the rest; target resolution 1790x1140 for both

-simple -none	...	4.3 msec
-simple -gray	...	4.4 msec
-simple -skin	...	9.2 msec

-geom -none	...	5.8 msec
-geom -gray	...	5.9 msec
-geom -skin	...	8.9 msec

(tesselation 128*128)
-polynet -skin	...	2.0 msec
-polynet -gray	...	2.0 msec
-polynet -skin	...	5.3 msec

(tesselation 128*128, perspective correction)
-polynetpersp -none	...	1.6 msec
-polynetpersp -gray	...	1.7 msec		(!!wow!!)
-polynetpersp -skin	...	5.1 msec

-persp -none	...	7.3 msec
-persp -gray	...	7.3 msec
-persp -skin	...	11 msec

	Transformation shaders in ami demo can be subdivided into two cathegories. The first one is group of fragment shaders, taking texture with mirror image as source and drawing directly transformad result. To do this, we draw fullscreen quad with proper texture coordinates, which are then used as input for calculations. This way of calculation is not very effective, because of computational redundancy. The distance from center of the mirror is function of y-coordinate in resulting image, whereas angle of half-axis where pixel is going to be read is function of corresponding x-coordinate, so we theoretically only need to transform all x-coordinates and all y-coordinates to get the very same results, but that would require some additional textures and context switches so we decided for a kind of simpler sollution.
Into second cathegory falls those shaders that calculate transformed texture coordinates in vertex shader (i.e. not for every pixel on-screen) and then are linearly interpolated. Of course, transformations we are working with are everything but linear so we can't get away with fulscreen quad any longer. We need quite a good subdivision for our ulscreen quads. Our implementation uses vertex buffer object to store big quad strip vertices (in fast memory of the graphics card) which we are drawing instead of fulscreen quad. It's not ideal sollution, but as an advantage we can use anisotropic filtering, provided by hardware. (Anisotropic filter is taking sampling density in account which results into elliptic samples (multiple points are sampled) rather than circular ones). This was considered quite a good sollution, because with quad tesselation to 128 * 128 sub-quads and perspective correct projection, the whole transformation doesn't take much more than 1.5 milliseconds (see the table).
When looking back, it's quite a lot of time saved so we are reconsidering transformations of x and y coordinate separately. In the end, we can cache results in texture until parameters of transformation change (we can omit position and radius of the mirror in those pre-computed values, possibly saving most of recalculations)

