On Friday October 17, this site was moved to a new server, https://mw.hh.se.  The original address will continue to work. Whithin a week or two this site will return to the original address. /Peo HH IT-dep 
WG211/M23Kiselyov
From WG 2.11
				The Mysteries of AXPY (Oleg Kiselyov)
AXPY is one of the Basic Linear Algebra (BLAS) vector operations: vector addition aX+Y. It is a perfect target for classical optimizations like partial loop unrolling and scalar promotion. (AXPY is also embarrassingly parallel; however, this talk focuses on single-thread performance.) These optimizations are indeed carried out -- by hand -- in OpenBLAS, regarded as one of the two fastest BLAS implementations. One can make a case for automatic code generation, to reduce the tedium of applying such optimizations (given that there are many platforms and several AXPY varieties to optimize: SAXPY, DAXPY, CAXPY). This is the traditional elevator talk about metaprogramming in HPC.
How does it correspond to real life, in this day and age?