Had an idea - looking for a math buff to tell me if it's possible

Justin Cook jhcook at gmail.com
Thu May 19 00:10:04 UTC 2011


Why is this on nanog?

Yes it is "possible". But the CPU use and time will be absurd compared to
just sending the data across the network.

I would say attempting this with even a small file will end up laughable.
Passwords are just several bytes and have significant lifetimes.

-- 
Justin Cook
On 19 May 2011 01:03, "Heath Jones" <hj1980 at gmail.com> wrote:
> My point here is it IS possible to transfer just a hash and counter value
> and effectively generate identical data at the remote end.
> The limit that will be hit is the difficulty of generating and comparing
> hash values with current processing power.
>
> I'm proposing iterating through generated data up until the actual data.
> It's not even a storage issue, as once you have incremented the data you
> don't need to store old data or hash values - just the counter. No massive
> hash tables.
>
> It's a CPU issue.
>
> Heath
>
> On 19 May 2011 00:42, <Valdis.Kletnieks at vt.edu> wrote:
>
>> On Thu, 19 May 2011 00:26:26 BST, Heath Jones said:
>> > I wonder if this is possible:
>> >
>> > - Take a hash of the original file. Keep a counter.
>> > - Generate data in some sequential method on sender side (for example
>> simply
>> > starting at 0 and iterating until you generate the same as the original
>> > data)
>> > - Each time you iterate, take the hash of the generated data. If it
>> matches
>> > the hash of the original file, increment counter.
>> > - Send the hash and the counter value to recipient.
>> > - Recipient performs same sequential generation method, stopping when
>> > counter reached.
>>
>> MD5 is a 128 bit hash.
>>
>> 2^128 is 340,282,366,920,938,463,463,374,607,431,768,211,456 - you're
>> welcome
>> to iterate that many times to find a duplicate. You may get lucky and get
a
>> hit
>> in the first trillion or so attempts - but you may get unlucky and not
get
>> a
>> hit until the *last* few trillion attempts. On average you'll have to
>> iterate
>> about half that huge number before you get a hit.
>>
>> And it's lossy - if you hash all the possible 4K blocks with MD5, you'll
>> find
>> that each of those 2^128 hashes has been hit about 256 times - and no
>> indication in the hash of *which* of the 256 colliding 4K blocks you have
>> on
>> this iteration. (The only reason that companies can do block-level
>> de-duplication by saving a hash as an index to one copy shared by all
>> blocks
>> with the same hash value is because you have a *very small* fraction of
the
>> possibilities covered, so if you saved a 4K block of data from somebody's
>> system32 folder under a given MD5 hash, it's *far* more likely that
another
>> block with that same hash is from another copy of another identical
>> system32
>> folder, than it is an actual accidental collision.)
>>
>> Protip: A good hash function is by definition one-way - given the data,
>> it's
>> easy to generate the hash - but reversing it to find the "pre-image" (the
>> data
>> that *generated* the hash) is massively difficult.
>>
>>



More information about the NANOG mailing list