Thursday, August 16. 2012
I will begin now acting on an older idea that I espoused on the blog -- discussing topics of programming and software engineering related to HPC in general and Madagascar in particular. Comment feedback is more than welcome :-)
Today's discussion is about the necessity of light object orientation in many geophysical applications.
Like medicine, object orientation can be very useful, but when used inappropriately or in large quantities can lead to severe problems.
One necessity for this medicine is the case when the same algorithm applies to 2D and 3D data, with minimal changes other than adding an extra loop.
Geophysicists writing code usually either create two versions of the program, or they try to only write a 3-D version that also deals with 2-D data.
Creating two versions of the program leads to severe maintenance issues in the long term. The potential for bugs and forking the code is enormous. There are also other variations (anisotropy, alternative kernels, etc), so the amount of code duplication can easily explode.
Writing a 3-D version that also works on 2-D data is a better option, however this usually requires contortions to introduce length-1 axes upfront in some of the inputs, and may create outputs with different dimensions for 2-D vs. 3-D, breaking workflows. The biggest block, however, comes from the fact that the 3-D code is more verbose, harder to understand and keep in mind in its entirety, and more prone to bugs. This is why people first implement a feature in the 2-D program, test it, understand it, and then upgrade the 3-D to it. Also, the simplicity of 2-D programs comes in handy when trying to explain the algorithm to other people.
The way to both have the cake and eat it is through overloading (link) and light object orientation. This makes it possible to write a program that looks like a 2-D program, but also works on 3-D data.
Neither C nor F90 are truly object-oriented, but they do have user-defined structures and they have static global variables that can be used to remember whether the problem is 2-D or 3-D. These features should be enough to work out a solution for this light OO problem. To convert pairs of programs that already exist as both 2D and 3D, one would first define structures that have both x and y members (i.e. offset.x and offset.y), then convert both programs to use those instead of the native types (i.e. offset_x and offset_y). Then, a test for whether the data is 2-D or 3-D is implemented, and the value preserved as a global variable so it can be accessed by various procedures without being passed as an explicit argument. Then, code sequences that have extra operations for 3D are moved into functions that are in effect overloaded, so they call a 3-D operator for 3-D data, and 2-D operator for 2-D data. In the end, the 2-D and 3-D main programs should look the same, and what is left is removing one file, renaming another, and keeping only one version.
What do you think is the best way to avoid code duplication in geophysical applications?
Display comments as (Linear | Threaded)
IWAVE is built in this "light OO" fashion, plus a bit more. The backbone consists of structs realizing key FD types and associated functions for manipulating them - lightweight "objects", with static members acting as private data, as you describe. IWAVE also uses a form of runtime polymorphism (inheritance): for the central structs, the corresponding functions are accessed via function pointers which are members of the struct, and must be initialized by calling a "constructor", so that these "class methods" are accessible. This setup realizes an abstract type with concrete subtypes. IWAVE uses it to abstract a large part of the essential parallel FD code so that it can be shared by all apps of the class.
I also believe that there is a role for genuine, gold-plated OO in contexts where great differences in abstraction level exist. The main opportunity in our business is simulation-driven optimization, also known as inversion.
I am not sure if I properly understand what "light" object orientation is. It is the term "light" that confuses me.
Thanks a lot.
Sorry for the confusing term. By it I mean classes and methods that are short and simple, with no complex relationships and inheritances, just one step up from classic procedural programming, so they are easy to understand, write and debug even by a novice.
Last Search (Google, Yahoo, Bing, Scroogle)