ServerLoadSpreading
Note: You are viewing an old revision of this page. View the current version.
large numbers of servers contacting a central host
some thoughts on using a perl hash function instead of a random delay
use B;
sub compute_hash
{
  my ($string,$range) = @_;
  my $hc = hex(B::hash $string);
  return $hc % $range + 1;
}
reference: http://www.perlmonks.org/index.pl?node_id=315876
how hashes really work: http://www.perl.com/pub/2002/10/01/hashes.html
calling the perl has function: http://www.perlmonks.org/?node_id=229982
more than you could ever want to know about hashing: http://burtleburtle.net/bob/hash/doobs.html
another hash function that probably isn't very good:
sub compute_hash
{
  my ($string,$low,$high) = @_;
  my $hc = 0;
  my $i = 0;
  my $range = ($high - $low) + 1;
  while ($i < length($string))
  {
    $hc = ((($hc) >> 27) | (($hc) << 5)) ^ ord(substr($string,$i++,1));
  }
  $hc = ((($hc) >> 1) | (($hc) << 31)) ^ ($hc);
  $hc = ((($hc) >> 3) | (($hc) << 29)) ^ ($hc);
  $hc = ((($hc) >> 5) | (($hc) << 27)) ^ ($hc);
  $hc = ((($hc) >> 7) | (($hc) << 25)) ^ ($hc);
  $hc = ((($hc) >> 9) | (($hc) << 23)) ^ ($hc);
  $hc = ((($hc) >> 11) | (($hc) << 21)) ^ ($hc);
  $hc = ((($hc) >> 13) | (($hc) << 19)) ^ ($hc);
  $hc = ((($hc) >> 15) | (($hc) << 17)) ^ ($hc);
  return $low + ($hc % $range);
}
need to do some distribution tests against a corpus of common hostnames or something like that.
why you should use a hash instead of a random delay? 1. Repeatable delay for each host. 2. maybe better guarantee of spread?
would be interesting to compare these delays to the spread generated by calling bash RANDOM.
#!/usr/bin/perl
# randomer.pl
# process input in form of
# <hostname> <delay>
#
#and transform into
#
# <delay> <count>
#
# suitable for frequency plotting.
use warnings;
use strict;
my %counter = ();
while(<>)
  {
    (my $delay) = (/^\S+\s+(\S+)/);
    $counter{$delay}++;
  }
foreach (sort {$a<=>$b} keys %counter)
  {
    print $_ . " " . $counter{$_} . "\n";
  }
results in output like this:
1 1 2 1 3 1 7 1 30 1 42 1 47 1 49 1 66 1 71 1 96 1 112 1 126 1 134 1 136 1 140 1
etc.
gnuplot control file:
set terminal png size 1200,300 font "/Library/Fonts/Andale Mono.ttf" 12 set output "spread.png" set xrange [0:86400] set yrange [0:4] set xlabel "delay (s)" set ylabel "# of hosts" set title "collisions using perl \"int(rand())\" for delay value - 9313 hosts, 86400s spread" set ytics 1 set xtics nomirror set ytics nomirror set key off plot "random-freq.txt" using 1:2 index 0 title "number of collisions" pt 6 lt 2
output:

collision counts:
8805 hosts no collisions
483 hosts one collision
25 hosts two collisions
no hosts more than two collisions
 
      
