Had an idea - looking for a math buff to tell me if it's possible with today's technology.

Dorn Hetzel dorn at hetzel.org
Wed May 18 20:18:38 UTC 2011


On Wed, May 18, 2011 at 4:07 PM, Landon Stewart <lstewart at superb.net> wrote:

> Lets say you had a file that was 1,000,000,000 characters consisting of
> 8,000,000,000bits.  What if instead of transferring that file through the
> interwebs you transmitted a mathematical equation to tell a computer on the
> other end how to *construct* that file.  First you'd feed the file into a
> cruncher of some type to reduce the pattern of 8,000,000,000 bits into an
> equation somehow.  Sure this would take time, I realize that.  The equation
> would then be transmitted to the other computer where it would use its
> mad-math-skillz to *figure out the answer* which would theoretically be the
> same pattern of bits.  Thus the same file would emerge on the other end.
>
> The real question here is how long would it take for a regular computer to
> do this kind of math?
>
> The real question is whether this is possible.  And the short answer is No,
at least not in general.

Now if your file has patterns that make it compressible, you can make it
smaller, but not
all files can be compressed this way, at least not in a way that makes them
smaller.

To understand why, consider the case of a file of one byte, or 8 bits.
 There are 256 possible
files of this size, 00000000, 00000001, 00000010, ..., 11111101, 11111110,
1111111.

Since each code we send must generate a unique file (or what's the point, we
need 256 different codes
to represent each possible file), but the shortest general way to write 256
different codes is
still 8 bits long.  Now, we can use coding schemes and say that the one-bit
value 1 represents 11111111
because that file happens a lot.  Then we could use 01 to represent
something else, but
we can't use 1 at the beginning again because we couldn't tell that from the
file named by 1.

Bottom line, for some codes to be shorter than the file they represent,
others must be longer...

So if files have a lot of repetition, you can get a win, but for "random"
data, not so much :(



More information about the NANOG mailing list