Beowulf Clusters?

rodsfree · Mar 26, 2005

Any body know if a Beowulf would churn out the WU's faster?

Just wondering. Cause the hardwear looks exactly the same as a farm.

Crashsector · Mar 26, 2005

Probably not. It's the same mentality that 3 1GHz machines will be 'faster' than one 3GHz machine. With Beowulf clusters all of the processors are working together as one, if I'm correct in my assumption.

You'd be better off loading fold-server on them and letting them all do their own work.

[Spectre] · Mar 26, 2005

No they won't because IIRC a beowulf distributes workload accross the nodes. The F@H program cannot take advantage of that kind of parrallelism. If i am wrong I am sure someone will correct me.

Tigerbiten · Mar 26, 2005

The hardware is the same in a farm / cluster, its the software thats different.

In a farm a single work unit is to a single blade.
In a cluster a single work unit is broken down into parts and then sent to all blades.
Because you cannot break the F@H work units into smaller packets then a cluster will not work for F@H.

Luck.......

unhappy_mage · Mar 26, 2005

rodsfree said:
the hardwear looks exactly the same as a farm.

What, like denim overalls or something?

The problem with doing a beowulf cluster is that it'll only be able to work on one processor at once. There is a benefit, but it's not the one you'd usually associate with beowulf clusters. The benefit is that you can add/remove machines without worrying about losing the data on them. Beowulf clusters are usually designed so that all the data is in a few places so that you can add/remove machines without losing that particular WU.

That's how I understand it, anyway. Believe me, this and folding on graphics cards are the two most mentioned things over at the folding forums. You're not the first one to think of them, but that doesn't make it a bad idea.

Arkaine23 · Mar 26, 2005

Stanford has said they are planning and working on a cluster client of F@H. WHat type of cluster its for I'm not sure... but Team 32 will be swift in producing a linux distro to take advantage of it if and when Stanford releases it.

Bodega · Mar 26, 2005

all the team 32 guys should join up with us... its only 1 [int] away

fs2k155 · Mar 26, 2005

The main workhorse of my points is a 16 processor cluster of 3.06GHz Xeons. I run individual processes on each node and have scripts to automatically shut them down when production work gets distributed to them. If they are really looking at a cluster version, I'd be interested.

Currently it is no faster than 16 individual machines. I'm not sure what they'd improve on by distributing it, unless they have some huge work units that are too big for an individual machine. The worst thing about it all is that the client can't detect the amount of RAM I'm using, so it assumes 1MB (which obviously isn't true) and won't let me run the big work units. There's some problem with the 64 bit memory extensions and high memory that prevents detection.

rodsfree · Mar 27, 2005

fs2k155 said:
The main workhorse of my points is a 16 processor cluster of 3.06GHz Xeons.

Drooooollllllllll, slobber, slobber

slowbiznatch · Mar 28, 2005

deleted.

[Spectre] · Mar 28, 2005

slowbiznatch said:
I wouldn't mind taking a look at your scripts (if you don't mind). I'm seriously considering loading something up on the clusters we have here before we ship them out to the customers. Of course, we would only use this for testing purposes here and always remember to delete it before shipping the final product out...

Here's what we could work with right now:
http://www.hardforum.com/showthread.php?t=880078

Yeah but even a couple days of running on that would be

mattjw916 · Mar 28, 2005

I'm working on a Beowulf setup at work... the compute nodes are all HP Proliant DL360G4s... it's half done with 80 or so nodes installed and another 4 racks full of computers in storage... too bad there is no internet connectivity for it or I would have to "burn in" the systems with F@H.

[Spectre] · Mar 28, 2005

Tease

fs2k155 · Mar 29, 2005

The script I use is based on "lw" by Fred Wheeler. I used this script back in the distributed.net days when it actually seemed they were not spinning their wheels aimlessly. His original post of this script is archived here: http://lists.distributed.net/pipermail/rc5/1997-October/033262.html

The modifications I've made are dependant on the finstall program: http://www.vendomar.ee/~ivo/finstall

You'll need to change the location of the folding script which handles all the start/shutdown of the client(s) on your machine. There is no need to duplicate all the work and flexibility included in that package.

The original version was designed to be more versatile, but I wanted something just for this specific purpose. I've probably not cleaned everything extraneous from Fred Wheeler's version, but it is functional, and anyone should be able to modify this to fit their needs for folding.

Code:

#! /bin/sh

# fahlw - folding at home load watcher version 1.0
#
# Usage: fahlw
#
# If there is any signifigant machine usage, FaH is stopped with
# the folding script.  This is dependant upon the finstall package.
#
# When the machine has no significant usage, the batch process is
# restarted.
#
# The 1, 5 and 15 minute average machine loads are found using
# uptime(1) and a long sed command.  The load is converted from a
# decimal to an integer by removing the decimal point.  This is done
# because test(1) ([] in sh if statements) can only compare integers.
# A load average of 1.23 is represented as 123.
#
# The machine load is checked by lw every minute.
#
# Based on code by:
# Fred Wheeler (wheeler at ipl.rpi.edu)
# Oct 6, 1997
#
# Modified for FaH by Paul Comfort (pc at null dot net)
# Feb 2005

# Modify these values to fit your environment:
# start if all load averages are below these values, in hundredths
ld_run1=150
ld_run5=170
ld_run15=170
# stop if any load average is above these values, in hundredths
ld_stop1=350
ld_stop5=300
ld_stop15=290

# how much output to print, 0: none, 1: starts/stops, 2: each check
verbose=1

# exit with error status if signaled
trap "exit 1" 1 2 15

# print a message on exit
trap 'echo lw: killed `date`' 0

# initial state of the batch process is assumed to be stopped
batch_state=s

# infinite loop
while true
do

  # extract three 3 digit integers representing the load
  cmd=`uptime | sed \
    -e 's/.*load average:\(.*\)/\1/' \
    -e 's/\.//g' \
    -e 's/,//g' \
    -e 's/ *\([0-9]*\) *\([0-9]*\) *\([0-9]*\)/ld1=\1;ld5=\2;ld15=\3;/'`
  eval $cmd

  # get the hour of the day and the day of the week
  hour=`date +%H`
  day=`date +%w`

  # default new batch process state is the old batch process state
  new_batch_state=$batch_state

  # decide if batch process state should be changed
  if [ $batch_state = r ]; then
    if [ $ld1 -ge $ld_stop1 -o $ld5 -ge $ld_stop5 -o $ld15 -ge $ld_stop15 ] ; then
      new_batch_state=s
    fi
  elif [ $batch_state = s ]; then
    if [ $ld1 -le $ld_run1 -a $ld5 -le $ld_run5 -a $ld15 -le $ld_run15 ] ; then
      new_batch_state=r
    fi
  else
    echo lw: error, unknown batch process state name
    exit 1
  fi

  # find signal name
  if [ $new_batch_state = r ]; then
    signal=start
  elif [ $new_batch_state = s ]; then
    signal=stop
  else
    echo lw: error, unknown batch process new state name
    exit 1
  fi

  # determine if state has changed
  if [ $batch_state != $new_batch_state ]; then
    state_change=1
  else
    state_change=0
  fi

  if [ $state_change = 1 ]; then
# control the folding process(es)
# Modify this location to fit your environment
    /usr/local/idle/foldingathome/folding $signal
    kill_status=$?
  fi


  # display state transition, load and time is verbose set high enough
  if [ \( $state_change = 1 -a $verbose -ge 1 \) -o $verbose -ge 2 ]; then
    sc=${batch_state}-${new_batch_state}
    echo lw: sc:$sc ld:$ld1 $ld5 $ld15 hr:$hour dy:$day `date`
  fi

  # update batch process state
  batch_state=$new_batch_state

  # sleep for one minute
  sleep 60

done

Beowulf Clusters?

rodsfree

[H]ard|Gawd

Crashsector

[H]ard|Gawd

[Spectre]

[H] Admin

Tigerbiten

[H]ard|DCer of the Month - February 2007/January 2

unhappy_mage

[H]ard|DCer of the Month - October 2005

Arkaine23

Gawd

Bodega

Gawd

fs2k155

n00b

rodsfree

[H]ard|Gawd

slowbiznatch

Gawd

[Spectre]

[H] Admin

mattjw916

[H]ard|Gawd

[Spectre]

[H] Admin

fs2k155

n00b