WG211/M14Puschel: Difference between revisions
|  Created page with "Raising the level of abstraction is a key concern of software engineering, and libraries (either used directly or as a target of a program generation system) are a successful tec..." | 
| (No difference) | 
Revision as of 16:06, 12 January 2015
Raising the level of abstraction is a key concern of software engineering, and libraries (either used directly or as a target of a program generation system) are a successful technique to raise programmer productivity and to improve software quality. Unfortunately successful libraries may contain functions that may not be general enough. For example, many numeric performance libraries contain functions that work on one- or higher-dimensional arrays. A problem arises if a program wants to invoke such a function on a non-contiguous subarray (e.g., in C the column of a matrix or a subarray of an image). If the library developer did not foresee this scenario, the client program must include explicit copy steps before and after the library function call, incurring a possibly high performance penalty. A better solution would be an enhanced library function that allows for the desired access pattern. Exposing the access pattern allows the compiler to optimize for the intended usage scenario(s). As we do not want the library developer to generate all interesting versions manually, we present a tool that takes a library function written in C and generates such a customized function for typical accesses. We describe the approach, discuss limitations, and report on the performance. As example access patterns we consider those most common in numerical applications: striding and block striding, general permutations, as well as scaling. We evaluate the tool on various library functions including filters, scans, reductions, sorting, FFTs, and linear algebra operations. The automatically generated custom version is in most cases significantly faster than using individual steps, offering speed-ups that are typically in the range of 1.2–1.8x
Joint work with Benjamin Hess and Thomas R. Gross