2012-06-26

Waiting around for something to happen in Go

I was faced with what turned out to be a synchronization problem with a simple solution using Go.  How do you get multiple goroutines to wait for a single event to occur?  The answer is... close a channel.
package main import ( "fmt" ) func main() { count := make(chan int) hold := make(chan int) num := 16 for i := 0; i < num; i++ { go func(i int) { fmt.Printf("%d ", i) count <- 1 <-hold fmt.Printf("%d ", i) count <- 1 }(i) } for reported := 0; reported != num; reported += <-count { } fmt.Printf("\n") close(hold) for reported := 0; reported != num; reported += <-count { } fmt.Printf("\n") } 
This program outputs something like:
0 2 1 4 5 3 8 9 7 6 12 13 14 11 10 15 1 5 4 3 2 0 8 6 13 14 12 7 9 11 10 15 
The exact order will depend on the order in which the goroutines are executed.  The program uses two channels called count and hold. The count channel is used by the 16 goroutines to each inform the main program that they have performed a printing task.  The main routine counts the number of messages it has seen on the count channel so it knows when all the goroutines have completed their first print.

The main routine then closes the hold channel.  All of the goroutines are trying to receive something on that channel.  When it is closed the receive fails.

The goroutines then all print out their id number again and terminate.  The program waits for them to finish by receiving another 16 messages on the count channel.

2012-06-25

What the final Turing Machine Google Doodle does

If you solved the puzzles in the Google Doodle commemorating Alan Turing's 100th birthday and clicked on the rabbit you'd get to see the machine performing this program starting with a blank tape.


Translating that into pseudo-code you obtain the following:
 one do if one blank do right while not blank one right zero while not blank left end one right else blank do right while not blank one while not blank left end zero right end end 
There one, zero and blank each write those symbols to the same; when used in an if or while they test for the presence of a symbol.  Right and left move the tape one square left or right.

The two branches of the main if there are doing very similar things.  They both blank out the current square, then move right until they find blank space and write 1 0 or 1 depending on whether the original square examined in the if was 1 or 0.  Then the tape is moved back to the blank and the original number 1 or 0 replaced.

So, it's extending the sequence on the tape by 1 0 or 1 depending on whether the number it sees is 1 or 0.

Each time around the outer loop the tape changes as follows:
1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 1 0 
This program appears to be calculating the rabbit sequence 1, 10, 101, 10110, ... and writing it out onto the tape.

And here's a little Perl program the simulates the Turing machine in the Google Doodle:
use strict; use warnings; my @tape; my $pos = 0; one(); do { dump_tape(); if ( is_one() ) { blank(); do { right() } while ( !is_blank() ); one(); right(); zero(); while ( !is_blank() ) { left() } one(); } else { blank(); do { right() } while ( !is_blank() ); one(); while ( !is_blank() ) { left() } zero(); } right(); } while(1); sub zero { x('0') } sub one { x('1') } sub blank { x(' ') } sub x { $tape[$pos] = $_[0] } sub is_blank { is(' ') } sub is_one { is('1') } sub is { $tape[$pos] eq $_[0] } sub right { if ( $pos == $#tape ) { push @tape, (' '); } $pos += 1; } sub left { $pos -= 1 } sub dump_tape { print join( ' ', @tape ), "\n"; sleep(1); } 

2012-06-22

Is there a 'Moore's Law' for web pages?

I came across an article I wrote in 1999 for The Guardian entitled Cut you modem's flying time which mentions that at the time the HTML of The Guardian's home page was 18kB.  Today the home page is more like 250kB.

I was curious about the growth pattern so using the Internet Archive I downloaded the home page of The Guardian for every year available from 1996 to 2011 (plus the current page) and compared the sizes of the HTML of the front page.  Here's the raw data:

 Year Bytes ---- ----- 1996 5381 1997 11140 1998 10435 1999 39013 2000 97746 2001 70933 2002 92995 2003 81833 2004 92637 2005 92078 2006 108445 2007 118300 2008 186670 2009 184271 2010 181221 2011 192592 2012 253748 

This excludes anything other than the raw HTML of / on The Guardian.  Clearly, it's grown a lot, but curious about the pattern I decided to see what curve would fit by using Wolfram Alpha.  Linear, quadratic and cubic fits were all about an R^2 of 0.90.  But an exponential fitted with R^2 of 0.97.

The exponential that generates that curve is 28985.6 * (1.134292^x) (x being the year counting 1996 as 0).  For comparison, Moore's Law is n * 1.414213^x (doubling every two years; I don't have an estimate for n).
For that exponential doubling takes a bit more than 5 years.
I wonder if there's a 'Moore's Law' for web sites.  Are we seeing exponential growth in the HTML used?  And what happens if we take into account the other assets?  And what's the relationship between this and bandwidth consumed on the Internet?
Discuss.

New Scientist's Instant Expert on Alan Turing now free to non-subscribers

Tomorrow is the 100th anniversary of the birth of Alan Turing and earlier this year I was asked to write a special supplement for New Scientist covering his life and work.  The supplement was in the June 1 edition of the physical magazine and available on line to subscribers only.

This morning a gentle nudge of New Scientist persuaded them to make the entire thing open to anyone who registers on their site.  You don't have to pay, just register.  The complete thing is here.

The individual articles in the special are:
A big thank you to New Scientist for making this available to all.

2012-06-13

Autograms

The following is a list of words that can be anagrammed into other words.  It is ordered by the number of possible words that can be made from a single word (e.g. dog and god have the same letters, trap, part and prat have the same letters).  It shows all the words that have 6 or more possible anagrams.

I particularly like that canter can turn into trance, recant, nectar, cretan, tanrec, and creant.

10 angor argon goran grano groan nagor orang organ rogan ronga
10 elaps lapse lepas pales salep saple sepal slape spale speal
9 ester estre reest reset steer stere stree terse tsere
9 caret carte cater crate creat creta react recta trace
8 asteer easter eastre reseat saeter seater staree teaser
8 laster lastre rastle relast resalt salter slater stelar
8 armet mater metra ramet tamer terma trame trema
8 leapt palet patel pelta petal plate pleat tepal
8 arist astir sitar stair stria tarsi tisar trias
7 canter creant cretan nectar recant tanrec trance
7 alem alme lame leam male meal mela
7 acinar arnica canari carian carina crania narica
7 argel ergal garle glare lager large regal
7 ante aten etna neat taen tane tean
7 least setal slate stale steal stela tales
7 aril lair lari liar lira rail rial
7 alert alter artel later ratel taler telar
7 abel able albe bale beal bela blae
7 dater derat detar drate rated trade tread
7 easting gainset genista ingesta seating signate teasing
7 arpent enrapt entrap panter parent pretan trepan
7 entrail latiner latrine ratline reliant retinal trenail
7 darter dartre redart retard retrad tarred trader
7 albeit albite baltei belait betail bletia libate
7 alien aline anile elain elian laine linea
7 atle laet late leat tael tale teal
7 aldern darnel enlard lander lenard randle reland
7 lepra paler parel parle pearl perla relap
7 emir imer mire reim remi riem rime
6 aitch chait chati chita taich tchai
6 acher arche chare chera rache reach
6 reins resin rinse risen serin siren
6 cerotin cointer cotrine cretion noticer rection
6 gater grate great greta retag targe
6 degrain deraign deringa gradine grained reading
6 angler arleng garnel largen rangle regnal
6 alban balan banal laban nabal nabla
6 caliver caviler claiver clavier valeric velaric
6 aster serta stare strae tarse teras
6 alerse leaser reales resale reseal sealer
6 agnel angel angle genal glean lagen
6 amen enam mane mean name nema
6 apert pater peart prate taper terap
6 petrous posture proetus proteus septuor spouter
6 angrite granite ingrate tangier tearing tigrean
6 pearlet pleater prelate ptereal replate repleat
6 amil amli lima mail mali mila
6 alevin alvine valine veinal venial vineal
6 altair atrail atrial lariat latria talari
6 sawt staw swat taws twas wast
6 balder bardel bedlar bedral belard blader
6 halse leash selah shale sheal shela
6 danuri diurna dunair durain durani durian
6 asper parse prase spaer spare spear
6 grein inger nigre regin reign ringe
6 corset cortes coster escort scoter sector
6 ates east eats sate seat seta
6 beround bounder rebound unbored unorbed unrobed
6 actor corta croat rocta taroc troca
6 derange enraged gardeen gerenda grandee grenade
6 asterin eranist restain stainer starnie stearin
6 clethra latcher ratchel relatch talcher trachle
6 esker keres reesk seker skeer skere
6 acrolein arecolin colinear cornelia creolian lonicera
6 agroan angora anogra arango argoan onagra
6 inset neist snite stein stine tsine
6 evil levi live veil vile vlei
6 anis nais nasi nias sain sina
6 palster persalt plaster psalter spartle stapler
6 abord bardo board broad dobra dorab
6 aspen panse snape sneap spane spean
6 mate meat meta tame team tema
6 roset rotse soter stero store torse
6 poter prote repot tepor toper trope
6 resaw sawer seraw sware swear warse
6 earthen enheart hearten naether teheran traheen
6 owser resow serow sower swore worse

Here's the code that I used to generate the list
 use strict; use warnings; my %proper; open W, "</usr/share/dict/propernames"; while (<W>) { chomp; $proper{lc($_)} = 1; } close W; my %words; open W, "</usr/share/dict/words"; while (<W>) { chomp; my $w = lc($_); if ( !defined( $proper{$w} ) ) { my $l = join('', sort split('', $w)); if ( !defined( $words{$l}{words}{$w} ) ) { $words{$l}{count} += 1; $words{$l}{words}{$w} = 1; } } } close W; foreach my $w (sort { $words{$b}{count} <=> $words{$a}{count} } keys %words) { print "$words{$w}{count} " . join(' ', sort keys %{$words{$w}{words}}), "\n"; }

2012-06-08

One way to fix your rubbish password database

Suppose for a moment that you are running a web site that has a poorly implemented password database that stores password hashes using something like SHA1(password) or MD5(password) or versions of that with some salt.  (This is what LinkedIn, last.fm and eHarmony were all doing and undoubtedly many, many other companies are in the same position).

And suppose you realize that if your password database gets disclosed your users are in danger and you'd really like to change the system used to something much more secure like bcrypt or scrypt.

Here's how you can do that in one go and secure your password database.

1. Suppose you have a database that contains password hashes for the n users of your site, and for each user you have salt si and hash hi (where hi was computed with some algorithm such as SHA1 or MD5).  (Note that the rest of these instructions work if there's no salt, just ignore it).

2. Suppose that you've chosen to use scrypt.  For each user you first create a new random salt value s'i and then compute a new hash h'i using the formula scrypt(s'i, hi) and store the new salt and hash in the database.  You throw away the old weak hash hi and forget it ever existed. So you are left with two salt values and a new hash.  (I've ignored the other scrypt parameters which are left to the reader to determine).

3. When user i logs in and presents password p you use your old weak hash algorithm (let's suppose it was md5(salt, password)) to compute a hash for comparison as follows: scrypt(s'i,  md5(si, p)) and compare that with the h'i stored in the database.

4. If, like last.fm, you were also allowing third-parties to authorize users by presenting the old hash value instead of the password then you can still use this scheme.  When the third-party presents hash h for user i you calculate scrypt(s'i,  h) and do the comparison.

5. If step 4 is not needed then you can go further when a user logs in.  Once the user has logged in successfully with password p you can completely eliminate any trace of the old weak hash by choosing a new random salt value s''i and computing scrypt(s''i,  p) and storing that in the database.

This has the effect of immediately making your password database more secure if stolen without any effort on the part of your users.

2012-06-04

Animated solution to the "Never gonna give you up" program problem

I mentioned the fun "smallest program to output the lyrics of 'Never gonna give you up'" the other day.  It being a bank holiday weekend I've come up with my own solution which has 589 bytes in Perl.

It works by building a dictionary from which to compress the lyrics and then there's a simple decompressor.  Like many of the solutions this works by having the dictionary be indexed by single bytes and be built into a single string that expands to the result.

Here's the compressor.
use strict; use warnings; my $n =<<EOF; We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy I just wanna tell you how I'm feeling Gotta make you understand Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you We've known each other for so long Your heart's been aching but You're too shy to say it Inside we both know what's been going on We know the game and we're gonna play it And if you ask me how I'm feeling Don't tell me you're too blind to see Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you (Ooh, give you up) (Ooh, give you up) (Ooh) Never gonna give, never gonna give (Give you up) (Ooh) Never gonna give, never gonna give (Give you up) We've know each other for so long Your heart's been aching but You're too shy to say it Inside we both know what's been going on We know the game and we're gonna play it I just wanna tell you how I'm feeling Gotta make you understand Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you EOF my @b = map(chr, (reverse 129..254)); foreach my $i (reverse 2..173) { my $found = 1; while($found) { my %saving; $found = 0; foreach my $j (0..length($n)-$i) { my $s = substr($n, $j, $i); my $t = $n; $t =~ s/\Q$s\E/0/g; $saving{$s} = length($n) - length($t) - $i - 1; } my $top = (sort { $saving{$b} <=> $saving{$a} } keys %saving)[0]; if ($saving{$top} > 0) { my $nb = shift(@b); $n =~ s/\Q$top\E/$nb/g; $n = $top . "$nb$n"; $found = 1; } } } print $n;
It works by looking for all strings starting at length 173 (which I determined by looping and corresponds to the entire chorus) and working backwards. It computes the space saved by extracting that string and replacing it with a dictionary entry and removes the strings that provide the most space saving. When there are no more left for a specific length it moves onto the next length (one smaller). The result is 514 byte string containing a bunch of high-ASCII characters like this:
 gÐngÑveÒe ÓthÔouÕo Önd× yÕØÖloÙiÑ Út's Û(OohÜe'reÝ YÕÞ knowß,ÐivàÚoá I'm â tÖsãtell ä a× åØ æeåçay it è äértØêß ëÐonna ì oÔer íæupîmakeæïÛbeen ðëÔÓñÕ'rÓtoÖòeî) óeÒrìô We'Òßõ NôöóÜ÷ôgiÒø howâfeeliÑ ù Nøú÷)ú, nø (Givû I just wannaéyÕùGotta ïu×ersta× þü eachífor sÙÑÞr hearðachÚbut YòshyãèInsidÓwÓboÔëwhaðgoán WeñgamçwÝìplèýúîöletædownörun arÕ×ådeseêöïcryösayÐoodbyeöäa liçhuê þWÝ nÖstraÑers tÙÒÞñrulesåsÖdÖI A full commitmenÛwhatâÔinkáfÞ wÕldn'tÐet Ôis from anyíguyüõnýA× ifæask meùDon'témÓyòbli×ãee þþ Üà÷àûûóõýüþþ 
To understand the compression look at the start. The first string to be decompressed will be " g", then "ng", then "ve" and so on. The high-ASCII characters are both the dictionary separators (between entries) and the characters to be replaced with the corresponding entry in the dictionary. Thus each occurrence of "Ð" is replaced with " g". A complete program to decompress to the lyrics is as follows:
$b=<<E; gÐngÑveÒe >ÓthÔouÕo Önd× yÕØÖloÙiÑ Út's Û(OohÜe'reÝ YÕÞ knowß,ÐivàÚoá I'm â tÖsãtell ä a× åØ æeåçay it è äértØêß ëÐonna ì oÔer íæupîmakeæïÛbeen ðëÔÓñÕ'rÓtoÖòeî) óeÒrìô We'Òßõ NôöóÜ÷ôgiÒø howâfeeliÑ ù Nøú÷)ú, nø (Givû I just wannaéyÕùGotta ïu×ersta× þü eachífor sÙÑÞr hearðachÚbut YòshyãèInsidÓwÓboÔëwhaðgoán WeñgamçwÝìplèýúîöletædownörun arÕ×ådeseêöïcryösayÐoodbyeöäa liçhuê þWÝ nÖstraÑers tÙÒÞñrulesåsÖdÖI A full commitmenÛwhatâÔinkáfÞ wÕldn'tÐet Ôis from anyíguyüõnýA× ifæask meùDon'témÓyòbli×ãee þþ Üà÷àûûóõýüþþ E map{($a,$b)=split($c=chr,$b,2);$b=~s/$c/$a/g}(208..254);print$b 
The final line does all the work progressively splitting off the dictionary entry and then using a regular expression substitution to transform the string. There's a bunch of Perl abuse going on. Notice there's no ; at the end of the program or in the { } block. The $t=chr works because chr reads the $_ variable set from the map and it both sets $c and returns the set value as an argument to split. Also notice how its valid Perl to do print$b with no space. Since it's fairly likely that some of those characters got messed up in the web version, here's a Base64 encoded version of the program for people to test with.
JGI9PDxFOwogZ9BuZ9F2ZdJlINN0aNRvddVvINZu ZNcgedXY1mxv2WnRINp0J3Mg2yhPb2jcZSdyZd0K WdXeIGtub3ffLNBpduDab+EgSSdtIOIgdNZz43Rl bGwg5CBh1yDl2CDmZeXnYXkgaXQK6CDk6XJ02Orf IOvQb25uYSDsIG/UZXIg7eZ1cO5tYWtl5u/bYmVl biDw69TT8dUnctN0b9byZe4pCvNl0nLs9ApXZSfS 3/UKTvT289z39Gdp0vggaG934mZlZWxp0Qr5Ck74 +vcp+iwgbvgKKEdpdvsKSSBqdXN0IHdhbm5h6XnV +UdvdHRhIO9112Vyc3Rh1wr+/CBlYWNo7WZvciBz 2dHeciBoZWFy8GFjaNpidXQKWfJzaHnj6Eluc2lk 03fTYm/U63doYfBnb+FuCldl8Wdhbed33exwbOj9 +u72bGV05mRvd272cnVuIGFy1dflZGVzZer272Ny efZzYXnQb29kYnll9uRhIGxp52h16gr+V90gbtZz dHJh0WVycyB02dLe8XJ1bGVz5XPWZNZJCkEgZnVs bCBjb21taXRtZW7bd2hhdOLUaW5r4WbeIHfVbGRu J3TQZXQg1GlzIGZyb20gYW557Wd1efz1bv1B1yBp ZuZhc2sgbWX5RG9uJ3TpbdN58mJsadfjZWUK/v4K 3OD34Pv78/X9/P7+CkUKbWFweygkYSwkYik9c3Bs aXQoJGM9Y2hyLCRiLDIpOyRiPX5zLyRjLyRhL2d9 KDIwOC4uMjU0KTtwcmludCRiCg==
And to make it totally clear what's happening, here's a little animation of the decompression process.

2012-06-01

Compression of the lyrics of "Never gonna give you up" using the Bentley/McIlroy algorithm

There's a fun StackExchange post about the smallest program to output the lyrics to Never gonna give you up making the rounds today.  I'm not entering the competition, but I thought it would be interesting to look at from a compression perspective.  I've been working a lot with the Bentley/McIlroy algorithm for long string compression and this is a good example to illustrate its output.

Here are the original lyrics (1871 bytes):
We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy
I just wanna tell you how I'm feeling
Gotta make you understand
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
We've known each other for so long
Your heart's been aching but
You're too shy to say it
Inside we both know what's been going on
We know the game and we're gonna play it
And if you ask me how I'm feeling
Don't tell me you're too blind to see
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
(Ooh, give you up)
(Ooh, give you up)
(Ooh)
Never gonna give, never gonna give
(Give you up)
(Ooh)
Never gonna give, never gonna give
(Give you up)
We've know each other for so long
Your heart's been aching but
You're too shy to say it
Inside we both know what's been going on
We know the game and we're gonna play it
I just wanna tell you how I'm feeling
Gotta make you understand
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
And here's the compressed version (note that the actual compressed file is a 611 byte binary file, here I've altered it for readability).  You see either the text itself, or a reference to preceding text (which has potentially been expanded first).  The references are in the form where p is the position (counting from zero at the start of the decompressed text) and n the number of characters: it means 'copy n characters from position p to here'.
We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy
I just wanna tell you how I'm feeling
Gotta make you understand
Never gonna give you up<204,13>let you down<204,13>run around<45,5>desert you<20
4,13><184,9>cry<204,13>say goodbye<204,13>tell a lie<45,5>hu<285,7>
We've<30,5>n each<129,7>for so long
Your heart's been aching but
You're too shy to say it
Inside we both<30,6>wha<422,9>going on
We<30,10>game<45,5>we're<210,7>play it
And if you ask me<161,17>Don't tell m<220,5>'re too blind to see
<340,13><217,161><229,12><634,161>(Ooh,<633,12>)<967,24>)<340,13>give, n<230,11>
give
(G<635,10><1004,57><377,11><389,160><139,239><229,12><634,333>
The first <204,13> is the reference back to "Never gonna " which is used many times.  The <45,5> is a reference to the first instance of " and ".  Notice how "Never gonna make you cry" is made up from a reference to "Never gonna " and then "make you ".
But it's towards the end where large blocks of text are repeated in the lyrics that the reference scheme really comes into its own.   The end references are to large repeated blocks, but notice how those blocks themselves first have to be decompressed in the early parts of the lyrics.
On this short text we get a compression to 33% of the original size.  But gzip still wins getting the text down to 402 bytes (22%).  Bentley/McIlroy works better with longer blocks of text where there's repetition of long strings over long distances.