# Had an idea - looking for a math buff to tell me if it's possible with today's technology.

Valdis.Kletnieks at vt.edu Valdis.Kletnieks at vt.edu
Wed May 18 23:42:43 UTC 2011

```On Thu, 19 May 2011 00:26:26 BST, Heath Jones said:
> I wonder if this is possible:
>
> - Take a hash of the original file. Keep a counter.
> - Generate data in some sequential method on sender side (for example simply
> starting at 0 and iterating until you generate the same as the original
> data)
> - Each time you iterate, take the hash of the generated data. If it matches
> the hash of the original file, increment counter.
> - Send the hash and the counter value to recipient.
> - Recipient performs same sequential generation method, stopping when
> counter reached.

MD5 is a 128 bit hash.

2^128 is 340,282,366,920,938,463,463,374,607,431,768,211,456 - you're welcome
to iterate that many times to find a duplicate. You may get lucky and get a hit
in the first trillion or so attempts - but you may get unlucky and not get a
hit until the *last* few trillion attempts. On average you'll have to iterate
about half that huge number before you get a hit.

And it's lossy - if you hash all the possible 4K blocks with MD5, you'll find
that each of those 2^128 hashes has been hit about 256 times - and no
indication in the hash of *which* of the 256 colliding 4K blocks you have on
this iteration.  (The only reason that companies can do block-level
de-duplication by saving a hash as an index to one copy shared by all blocks
with the same hash value is because you have a *very small* fraction of the
possibilities covered, so if you saved a 4K block of data from somebody's
system32 folder under a given MD5 hash, it's *far* more likely that another
block with that same hash is from another copy of another identical system32
folder, than it is an actual accidental collision.)

Protip:  A good hash function is by definition one-way - given the data, it's
easy to generate the hash - but reversing it to find the "pre-image" (the data
that *generated* the hash) is massively difficult.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://mailman.nanog.org/pipermail/nanog/attachments/20110518/92e3234b/attachment.sig>
```