Go Back   Hardware Canucks > HARDWARE CANUCKS COMMUNITY > HardwareCanucks F@H Team

    
Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old June 27, 2009, 08:52 PM
LCB001's Avatar
Folding Captain
 
Join Date: Feb 2008
Location: Aylmer QC.
Posts: 1,774

My System Specs

Default How Does FAH Code Development And Sysadmin Get Done + Twitter

For those that haven't seen this yet,

From the news blog of Prof. Vijay Pande located here: Folding@home

This is a long but interesting read and gives a good idea about what is coming down the pipe concerning the new F@H clients as well as what is happening with the servers...
Quote:

June 17, 2009

<H3 class=entry-header>How does FAH code development and sysadmin get done?</H3>One of the more common question I get asked is how we do our client/server/core programming and backend system administration. Also, others were curious about updates on various core projects. So, I thought it made sense to answer both in one post, since the answers are related. This will be a bit of a long answer to several short questions, but hopefully it will help give some insight into how we do what we do.
First, some history. When we started in 2001, I personally wrote most of the code (client, server, scientific code integration, etc), with some help from a summer student (Dr. Jarod Chapman) and some help from Adam Beberg on general distributed computing issues and the use of his Cosm networking library. I was just starting out as a professor then, with a relatively small group (4 people at the time), so it was common for the leader of the lab to do a lot of hands on work.

As time went on, the group matured and grew, with increasing funding from NIH and NSF. This allowed the group to grow to about 10 people in 2005. At this point, much of the duties were given to different people in the lab: the server code development was performed by (now Prof.) Young Min Rhee and then later by Dr. Guha Jayachandran. Client development was done by Siraj Khaliq, then Guha, then help from several people (including Adam Beberg as well as volunteers, such as Uncle Fungus). Core development was done by Dr. Rhee, (now Prof.) Michael Shirts, and others.

This model worked reasonably well, with each team member giving some significant, but not overly onerous amount of his/her time (eg 10% to 20%) to FAH development. These key developers were able to add a lot of functionality, both to aid the science and the donor experience.

However, in time, this model became unscalable and unsustainable. As time went on, the individual developers graduated (in academic research, the research is done by graduate students or postdoctoral scholars, both of whom do not stay longer than say 3-5 years). While the original team was able to build a powerful and complex system, maintaining that system by new generations of students/postdocs became unsustainable. The code was complex and well known by the original authors, but maintenance by new developers was complex and easy to make errors, due to the complexity of the software.

In parallel with these efforts in code development, we also were maturing in terms of our server backend. We went from having a few small (10GB hard drives!) servers, to a very large, enterprise style backend, with hundreds of terabytes of storage. This too became a major challenge to manage by the scientific group.

A new plan. Therefore, in 2007, I started a new plan to migrate these duties (code development and system administration) to professional programmers and system administrators. Today, most of FAH code development is done by professional programmers, and in time I expect all of it will be done that way. The desire to start with a clean code base lead to new projects, such as the v5 server code, second generation GPU code (GPU2), second generation SMP code (SMP2), new client (v7 client in the works), which have been developed with a clean slate.

There are some differences in how donors will see the fruits of these efforts. I have found that while the programmers write much cleaner code (much more modular and systematic and maintainable), the code development is typically slower. While the scientific group can often make certain changes say in a month, the professional programmers may take 2 or 3. What we get for that extra time is more cleanly written code, no hacks, and a plan for long term sustainability (clean code, well documented code, high level programming practices, etc). Some projects are still done by the scientific staff (eg Dr. Peter Kasson continues to do great things with the SMP client as well as work towards SMP2), I expect that in time this will all be done by programmers.

Analogously, sysadmin has been pushed to a professional group at Stanford. Similarly, they are more careful and methodical, but slower to respond due to this. My hope is that as we migrate away from our older legacy hardware and they set up clean installs with the v5 server code, the issues of servers needing restarts should be greatly improved. This infrastructure changeover has been much slower than I expected, in part due to the practices used by the sysadmin team to avoid hackish tricks and to keep a well-organized, uniform framework amongst all of the servers (eg scripting and automating common tasks).

One important piece good news is that the people we've got are very good. I'm very happy to be working with some very strong programmers, including Peter Eastman, Mark Friedrichs, and Chris Bruns (GPU2/OpenMM code), Scott Legrand and Mike Houston (contacts at NVIDIA and ATI, respectively, for GPU2 issues), Joe Coffland and his coworkers (v5 server, Protomol Core, Desmond/SMP2 core, v7 client). System admin is also now done professionally, via Miles Davis' admin group at Stanford Computer science. Also, since she has help desk experience, Terri Fedelin (who does University admin duties for me personally) has also been working on the forum helping triage issues.

Where are we now? Much of their work is behind the scenes and we generally only talk about big news when we're ready to release, but if you're curious, you can see some of it publicly, such as tracking GPU2 development via the OpenMM project (http://simtk.org/home/openmm) and the Gromacs/SMP2 core via the http://gromacs.org cvs (look for updates involving threads, since what is new about SMP2 is the use of threads instead of MPI). You can also follow some more of the nitty gritty details on my Twitter feed (Vijay Pande (vijaypande) on Twitter), where I plan to try to give more day-to-day updates, albeit in a simpler (and less gramatically correct) form; the hope here is to try to have more frequent updates, even if they are smaller and simpler.

As the GPU2 code base matured in functionality, GPU2 core development has been mainly bug fixes, which is a good thing. SMP2 has been testing in house for a while and I expect it will still take a few weeks. The main issue is trying to make sure we get good scalability with threads based solutions, removing bottlenecks, etc. The SMP2 initiative lead to two different cores, one for the Desmond code from DE Shaw Research and another for a Gromacs variant (a variant of the A4 core). We having been testing both in single cpu-core format (the A4 Gromacs core is a single core version of what will become SMP2) and we hope to release in a week or two a set of single core Desmond jobs. If those look good, multiple-core versions via threads (not MPI) will follow thereafter.

The v5 system roll out is continuing, with the plan to have a parallel v5 infrastructure (set up by the new sysadmins) with our current one, and have the science team migrate new projects to the new infrastructure. The v5 code has been running for a while in a few tests and we expect one of the GPU servers to migrate this week, with one or two servers migrating every week as time goes on. The new code does not crash/hang the way the v3/v4 code does (it hung under high load and needed the process to be killed) and so we expect much more robust behavior from it. Also, Joe Coffland has been great regarding responding to code changes and bug fixes.

So, the upshot of this new scheme is that donors will likely see more mature software, which also means slower revs between cores, both since fewer revs are needed in the new model (a lot of issues are simplified by the cleaner code base) and because the revs now involve a lot of internal QA and testing and more careful methodical programming.

The long term upshot for FAH is better software and more sustainable software. It's taking time to get it done, but based on the results so far (eg GPU2 vs GPU), I think it has been worth the wait (but we still have a fair ways to go before we can see all of the fruits of this work).



And for those real Addicts, you can now follow the action on Twitter:
Quote:

June 16, 2009

<H3 class=entry-header>My twitter experiment</H3>Several people haven been pushing me to blog more, but it's hard for me to find the time, especially since I'm often sitting in meetings all day long with various Pande Group members, FAH members, Simbios exec committee, or other University business (eg in my capacity as Chair of Biophysics). In those situations, I don't have my laptop out, but I can often sneak my phone to check email and make brief responses.

So, people have suggested that I try Twitter to try to keep the flow of information going to donors. I've set up an account here

http://twitter.com/vijaypande


This won't replace my blog, but augment it. The blog will be where big, major announcements are placed. The Twitter feed will be more about small updates and day to day items.

I say this is an experiment since I'm not sure if this is a good way to get people information or if anyone cares about all of these details (although judging from feedback from Folding@home donors in the forum, it seems like some may be interested).

Anyway, I'll give it a shot to see if it can help the flow of info to donors about the science, code development, and infrastructure issues in Folding@home and science at Stanford from my perspective.



__________________
Folding For Team 54196

Reply With Quote
Reply


Tags
f@h , folding , news

Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Whos on twitter elec999 Off Topic 0 April 5, 2009 01:46 PM
Twitter? Anyone? cadaveca Off Topic 38 March 4, 2009 01:48 PM
Gaming/Development Pc ! ddrmanxbxfr New Builds 3 January 5, 2009 02:57 PM
Slow development of a new system gsp5322 New Builds 10 August 14, 2008 05:01 PM
FAH #$%'s up Office? Babrbarossa O/S's, Drivers & General Software 3 November 25, 2007 07:56 PM