Wednesday, December 31, 2008

hashes and collisions

ok, so there's been a few blurbs about hashes and collisions lately...

this is something that caught my eye back in the day in CS class...

i am not a math guy at all (stats breaks my brain), so i am not at all qualified to speak on this topic. a hash function like md5 or sha1 or whatever takes an arbitrary sized input and reduces it to a pseudo-unique string of a certain size. so take the following md5 values:

echo "r" | md5sum
72cfd272ace172fa35026445fbef9b03
echo "rw" | md5sum
bc3f381953be1f16b956a9d394cf969f
echo "rwnin" | md5sum
023c306a26488624aaa2b3028779cfb0

so each input gets a "unique" output, but the issue is that a one, two, five, or five thousand character input always gets a 32 character output. so if you input a single character, or the entire text of hamlet, or any (or every) subset of hamlet possible you will always get a 32 character output with md5.

as i said before, i'm not so good with math, but there is a fundamental problem here. a 32 character hex value can represent approx 3.4x10^38 values. that's a ton!!! BUT. that huge number of values is used to represent *all arbitrary (infinite) values*...

and that's the problem. so even sha512 gives you a fixed length output. ultimately you know that collisions in such a system are possible. they may be mathematically unlikely, but they are inevitable.

so it's kinda frustrating to read the vuln advisories which say "oh, most people stopped using md5 so this isn't an issue", because a few years ago there were advisories which said "we stopped using 3DES so this isn't an issue".

if we decide to place our trust in hash based certificates (which is our trust in the tubes, at the end of the day), we need to accept that someone might get lucky and fake a CA cert. the haters may say "oh well that's super unlikely". well, i guess they are the same people who say "it's stupid to buy a lottery ticket! do you know the odds!?!"

well guess what, every week or three, some lucky bastard wins the lottery. and some unlucky bastard gets struck by lightning. so don't be surprised if someone finds a collision for your hash algorithm.

1 comment:

ShawnM said...

Happy with this post except for your justification for buying lottery tickets. Like you said, you suck at stats.