Difference between revisions of "GSOC2009"

From Madagascar
Jump to navigation Jump to search
 
(31 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
=Welcome to Madagascar's Google Summer of Code Page=
 +
 
{|border="1" cellpadding="5" cellspacing="0"  
 
{|border="1" cellpadding="5" cellspacing="0"  
 
|-  
 
|-  
Line 7: Line 9:
 
<br>
 
<br>
  
[[Image:Google2.png|frame|right|[http://code.google.com/soc/ Google Summer of Code]]]
+
[[Image:2009socwithlogo.gif‎ |frame|right|[http://code.google.com/soc/ Google Summer of Code]]]
 +
 
 +
Madagascar, an open source project, is a leading participant in the [http://en.wikipedia.org/wiki/Open_research Open Research movement]. As described on Wikipedia, the central theme of open research is to make clear accounts of the methodology, along with data and results extracted therefrom, freely available via the internet. This permits a massively distributed collaboration.
 +
 
 +
Its design is based on a few simple and powerful principles.
 +
 
 +
From the coder's point of view, Madagascar is written in C and in Python. The C library is a very loosely coupled set of [http://en.wikipedia.org/wiki/Filter_(Unix) unix-style filters], transforming stdin to stdout. The Python is mostly an implementation of a custom build system on top of the rule based build system [http://www.scons.org/ SCons].
 +
 
 +
Seismic data processing consists of a sequence of steps. Madagascar's filter-based design allows such sequences to be easily composed and abstracted. A key advantage of the Madagascar system is that the computational pipeline is also construed as a build system. Modifications to intermediate steps automatically reinvoke only necessary computations and skip over up-to-date ones, just as a more conventional build system would recompile modules whose code had been touched while reusing modules which are newer than their source. Madagascar extends this model all the way from raw data to publication.
 +
 
 +
This strategy is a key to [http://csdl2.computer.org/comp/mags/cs/2009/01/mcs2009010005.pdf reproducibility]. By maintaining scripts which contain all transformations from raw data to final publication quality document, Madagascar supports repeatability and testing of scientific computations, thus advancing the collaborative nature of science in the same way that open source advances the collaborative nature of computing.
 +
 
 +
Directions in which Madagascar is expanding include visualization, parallelization, and user interfaces.
  
 
=Project Ideas=
 
=Project Ideas=
See also [http://sourceforge.net/tracker/?group_id=162909&atid=825648 feature request tracker]
 
  
==Graphical User Interface==
+
See also the [http://sourceforge.net/tracker/?group_id=162909&atid=825648 feature request tracker].
 +
 
 +
==Graphical User Interface (''Mentor: Sergey Fomel'')==
 
* Add an option to [http://rsf.svn.sourceforge.net/viewvc/rsf/trunk/framework/rsfdoc.py?view=markup sfdoc] to output spec files in the format defined for [http://www.henrythorson.com/interface.htm TKSU]. This should make '''TKSU''' immediately applicable. Spec files can be generated automatically at the compile time.
 
* Add an option to [http://rsf.svn.sourceforge.net/viewvc/rsf/trunk/framework/rsfdoc.py?view=markup sfdoc] to output spec files in the format defined for [http://www.henrythorson.com/interface.htm TKSU]. This should make '''TKSU''' immediately applicable. Spec files can be generated automatically at the compile time.
 
* Rewrite '''TKSU''' in Python, possibly using [http://wiki.python.org/moin/TkInter TkInter]  
 
* Rewrite '''TKSU''' in Python, possibly using [http://wiki.python.org/moin/TkInter TkInter]  
 
* See http://sourceforge.net/forum/forum.php?thread_id=1579059&forum_id=552249 for more discussions.
 
* See http://sourceforge.net/forum/forum.php?thread_id=1579059&forum_id=552249 for more discussions.
 
* Investigate alternative solutions.
 
* Investigate alternative solutions.
 +
* Skills and interests: GUI, Python.
 +
 +
==Data Visualization (''Mentor: Vladimir Bashkardin'')==
 
* Migrate 2D rendering OpenGL-based code from GSEGYView to Madagascar and create an interactive viewer with zooming/panning features.
 
* Migrate 2D rendering OpenGL-based code from GSEGYView to Madagascar and create an interactive viewer with zooming/panning features.
 
* Migrate 3D rendering GLSL-based code from GSEGYView to Madagascar and create a viewer with the support of pluggable shader programs.
 
* Migrate 3D rendering GLSL-based code from GSEGYView to Madagascar and create a viewer with the support of pluggable shader programs.
 
* Finish 3D rays viewer
 
* Finish 3D rays viewer
 
* Create a set of alternatives to sfgraph, sfgrey, sfcontour programs, that would use PLPLOT library instead of VPlot; also, create "pens", that could read from those programs and generate ps, pdf, png output; analyze flexibility of PLPLOT and the possibility to fully mimic VPlot's output (including animation).
 
* Create a set of alternatives to sfgraph, sfgrey, sfcontour programs, that would use PLPLOT library instead of VPlot; also, create "pens", that could read from those programs and generate ps, pdf, png output; analyze flexibility of PLPLOT and the possibility to fully mimic VPlot's output (including animation).
 +
* Skills and interests: scientific visualization, C.
  
==Java API==
+
==Interactive UI (''Mentor: Michael Tobis'')==
* Add a Java interface to [[Guide to madagascar API|other supported interfaces]]
+
* test and build Python wrappers around existing function to create a novel inetractive environment which is both interactive and reproducible
* Possibly use [http://en.wikipedia.org/wiki/Java_Native_Interface JNI]
+
* help refactor existing SCons scripts to reduce coupling and increase clarity
* Investigate possible connections with [http://www.mines.edu/~dhale/jtk/ Mines JTK] and [http://sourceforge.net/projects/javaseis/ JavaSeis]
+
* integrate with iPython/sage environment
 
+
* Skills and interests: strong OOP and test-driven development, Python. SCons a big plus.
  
 
+
==Geophysics / Numerical Analysis (''Mentor: Paul Sava'')==
==Binary Packages==
 
* Generate binary packages to simplify installation on multiple platforms.
 
** [http://en.wikipedia.org/wiki/RPM_Package_Manager RPM]
 
** [http://www.debian.org/doc/manuals/maint-guide/index.en.html Debian] and [https://wiki.ubuntu.com/PackagingGuide/Complete Ubuntu]
 
** [http://www.cygwin.com/setup.html Cygwin]
 
* Given Madagascar's dependencies, and a standardized way of finding other package's dependencies come up with a way/apply a tool to determine the minimum number of packages that make a self-contained Linux distributions that runs Madagascar. Build such a distribution starting from an existing well-supported distribution. Build a virtual appliance from that distribution.
 
==Geophysics / Numerical Analysis==
 
 
* Implement an optimal algorithm for parallel transposes of arrays with 4 or 5 dimensions, up to a few tens of terabytes in volume, on a multi-node Linux cluster
 
* Implement an optimal algorithm for parallel transposes of arrays with 4 or 5 dimensions, up to a few tens of terabytes in volume, on a multi-node Linux cluster
 
* As a bonus, FFT one of the transposed dimensions
 
* As a bonus, FFT one of the transposed dimensions
 
* Implement a hardware-adaptive transpose algorithm for a 1-node, SMP machine of 8 nodes or more. Investigate speed of transfers, size of caches, memory arrangement, etc, and make it hardware-adaptive. Bonus for out-of-core capabilities.
 
* Implement a hardware-adaptive transpose algorithm for a 1-node, SMP machine of 8 nodes or more. Investigate speed of transfers, size of caches, memory arrangement, etc, and make it hardware-adaptive. Bonus for out-of-core capabilities.
 
* Implement 3-D seismic data header storage using the fastest open-source database, then compare header I/O times with the classic approach of having a simple table. Which is the fastest way of implementing a large database knowing that the values it will hold are all bools, ints and floats?
 
* Implement 3-D seismic data header storage using the fastest open-source database, then compare header I/O times with the classic approach of having a simple table. Which is the fastest way of implementing a large database knowing that the values it will hold are all bools, ints and floats?
 
+
* Skills and interests: numerical methods, scientific computation, parallel computing.
=Mentors=
 
*Sergey Fomel
 
*Paul Sava
 
*Nick Vlad
 
*Vladimir Bashkardin
 

Latest revision as of 09:57, 15 March 2009

Welcome to Madagascar's Google Summer of Code Page

Google Summer of Code is a program that offers student developers stipends to write code for various open source projects. Google will be working with several open source, free software, and technology-related groups to identify and fund several projects over a three month period.


Madagascar, an open source project, is a leading participant in the Open Research movement. As described on Wikipedia, the central theme of open research is to make clear accounts of the methodology, along with data and results extracted therefrom, freely available via the internet. This permits a massively distributed collaboration.

Its design is based on a few simple and powerful principles.

From the coder's point of view, Madagascar is written in C and in Python. The C library is a very loosely coupled set of unix-style filters, transforming stdin to stdout. The Python is mostly an implementation of a custom build system on top of the rule based build system SCons.

Seismic data processing consists of a sequence of steps. Madagascar's filter-based design allows such sequences to be easily composed and abstracted. A key advantage of the Madagascar system is that the computational pipeline is also construed as a build system. Modifications to intermediate steps automatically reinvoke only necessary computations and skip over up-to-date ones, just as a more conventional build system would recompile modules whose code had been touched while reusing modules which are newer than their source. Madagascar extends this model all the way from raw data to publication.

This strategy is a key to reproducibility. By maintaining scripts which contain all transformations from raw data to final publication quality document, Madagascar supports repeatability and testing of scientific computations, thus advancing the collaborative nature of science in the same way that open source advances the collaborative nature of computing.

Directions in which Madagascar is expanding include visualization, parallelization, and user interfaces.

Project Ideas

See also the feature request tracker.

Graphical User Interface (Mentor: Sergey Fomel)

Data Visualization (Mentor: Vladimir Bashkardin)

  • Migrate 2D rendering OpenGL-based code from GSEGYView to Madagascar and create an interactive viewer with zooming/panning features.
  • Migrate 3D rendering GLSL-based code from GSEGYView to Madagascar and create a viewer with the support of pluggable shader programs.
  • Finish 3D rays viewer
  • Create a set of alternatives to sfgraph, sfgrey, sfcontour programs, that would use PLPLOT library instead of VPlot; also, create "pens", that could read from those programs and generate ps, pdf, png output; analyze flexibility of PLPLOT and the possibility to fully mimic VPlot's output (including animation).
  • Skills and interests: scientific visualization, C.

Interactive UI (Mentor: Michael Tobis)

  • test and build Python wrappers around existing function to create a novel inetractive environment which is both interactive and reproducible
  • help refactor existing SCons scripts to reduce coupling and increase clarity
  • integrate with iPython/sage environment
  • Skills and interests: strong OOP and test-driven development, Python. SCons a big plus.

Geophysics / Numerical Analysis (Mentor: Paul Sava)

  • Implement an optimal algorithm for parallel transposes of arrays with 4 or 5 dimensions, up to a few tens of terabytes in volume, on a multi-node Linux cluster
  • As a bonus, FFT one of the transposed dimensions
  • Implement a hardware-adaptive transpose algorithm for a 1-node, SMP machine of 8 nodes or more. Investigate speed of transfers, size of caches, memory arrangement, etc, and make it hardware-adaptive. Bonus for out-of-core capabilities.
  • Implement 3-D seismic data header storage using the fastest open-source database, then compare header I/O times with the classic approach of having a simple table. Which is the fastest way of implementing a large database knowing that the values it will hold are all bools, ints and floats?
  • Skills and interests: numerical methods, scientific computation, parallel computing.