Wanted: volunteers with bandwidth/storage to help save climate data

Fri Dec 16 20:30:44 UTC 2016

I seriously doubt that there's going to be a witchhunt even close to as well
funded as anti-torrent DMCA-wielding piracy hunters, and it's not even nearly
the same as keeping a copy of wikileaks.aes, or sattelite photos of
Streisand's campus, or photos of Elian Gonzales, a copy of deCSS, the cyberSitter
killfile, etc ("we've been here before.").

The issue will be 1000s of half copies that are from differing dates sometimes
with no timestamps or other metadata, no SHA256 sums, etc etc. It's going to be
a records management nightmare. Remember, all these agencies wont be shut down
on Jan 20th making that the universal time-stamp date. Some of them may even
be encouraged to continue producing data, possibly even cherry picked or otherwise
tainted. Others will carry on quietly, without the administration noticing.

Im glad some serious orgs are getting into it - U of T, archive.org, wikipedia, etc.
We'll have at least a few repo's that cross-agree on progeny, date, sha256, etc.

Only once jackboots are knocking on doors "where's the icecore sample data, Lebowski!"
will we really have to consider the quality levels of the other repos. Not that
they shouldnt be kept either, of course.

Remember, this is only one piece of the puzzle. The scientists can do as much data-
collecting as they want -- if the political side of the process wants to make 'mentioning
climate change illegal' in state bills or other policies or department missions,
it's far more effective than rm'ing a buncha datasets.

http://abcnews.go.com/US/north-carolina-bans-latest-science-rising-sea-level/story?id=16913782

Nonetheless - mirror everything everywhere always...

/kc

On Fri, Dec 16, 2016 at 02:05:01PM -0500, Steven Miano said:
  >It would seem like the more copies the better, seemingly chunking this data
  >up and using .torrent files may be a way to both (a) ensure the integrity
  >of the data, and (b) enable an additional method to ensure that there are
  >enough copies being replicated (initial seeders would hopefully retain the
  >data for as long as possible)...
  >
  >On Fri, Dec 16, 2016 at 12:24 PM, Ken Chase <math at sizone.org> wrote:
  >
  >> University Toronto's Robarts Library is hosting an all-day party tomorrow
  >> of
  >> people to surf and help identify datasets, survey and get size and details,
  >> authenticate copies, etc.
  >>
  >> fb event: https://www.facebook.com/events/1828129627464671/
  >>
  >> /kc
  >>
  >> On Fri, Dec 16, 2016 at 06:42:46PM +0200, DaKnOb said:
  >>   >We are currently working on a scheme to successfully authenticate and
  >> verify the integrity of the data. Datasets in https://climate.daknob.net/
  >> are compressed to a .tar.bz2 and then hashed using SHA-256. The final file
  >> with all checksums is then signed using a set of PGP keys.
  >>   >
  >>   >We are still working on a viable way to verify the authenticity of
  >> files before there are tons of copies lying around and there???s a working
  >> group in the Slack team I sent previously where your input is much needed!
  >>   >
  >>   >Thanks,
  >>   >Antonios
  >>   >
  >>   >> On 16 Dec 2016, at 18:30, Ken Chase <math at sizone.org> wrote:
  >>   >>
  >>   >> Surfing through the links - any hints on how big these datasets are?
  >> Everyone's got
  >>   >> a few TB to throw at things, but fewer of us have spare PB to throw
  >> around.
  >>   >>
  >>   >> There's some random #s on the goog doc sheet for sizes (100's of TB
  >> for the
  >>   >> landsat archive seems credible), and there's one number that destroys
  >>   >> credibility of the sheet (100000000000 GB (100 ZB)) for the EPA
  >> archive.
  >>   >>
  >>   >> The other page has many 'TBA' entries for size.
  >>   >>
  >>   >> Not sure what level of player one needs to be to be able to serve a
  >> useful
  >>   >> segment of these archives. I realize some of the datasets are tiny
  >> (<GB)
  >>   >> but which ones are most important vs size (ie the win-per-byte ratio)
  >> isnt indicated.
  >>   >> (I know its early times.)
  >>   >>
  >>   >> Also I hope they've SHA512'd the datasets for authenticity before all
  >> these
  >>   >> myriad copies being flungabout are 'accused' of being manipulated 'to
  >> promote
  >>   >> the climate change agenda' yadda.
  >>   >>
  >>   >> Canada: time to step up! (Cant imagine the Natl Research Council
  >> would do so
  >>   >> on their mirror site, too much of a gloves-off slap in the face to
  >> Trump.)
  >>   >>
  >>   >> /kc
  >>   >>
  >>   >>
  >>   >> On Fri, Dec 16, 2016 at 06:02:46PM +0200, DaKnOb said:
  >>   >>> If you???re interested, there???s also a Slack team:
  >> climatemirror.slack.com
  >>   >>>
  >>   >>> You can find more info about that here:
  >>   >>>
  >>   >>> - https://climate.daknob.net/
  >>   >>> - http://climatemirror.org/
  >>   >>> - http://www.ppehlab.org/datarefuge
  >>   >>>
  >>   >>> Thank you for your help!
  >>   >>>
  >>   >>>
  >>   >>>> On 16 Dec 2016, at 17:58, Rich Kulawiec <rsk at gsp.org> wrote:
  >>   >>>>
  >>   >>>> This is a short-term (about one month) project being thrown together
  >>   >>>> in a hurry...and it could use some help.  I know that some of
  >>   >>>> you have lots of resources to throw at this, so if you have an
  >>   >>>> interest in preserving a lot of scientific research data, I've set
  >>   >>>> up a mailing list to coordinate IT efforts to help out.  Signup via
  >>   >>>> climatedata-request at firemountain.net or, if you prefer Mailman's
  >> web
  >>   >>>> interface, http://www.firemountain.net/mailman/listinfo/climatedata
  >>   >>>> should work.
  >>   >>>>
  >>   >>>> Thanks,
  >>   >>>> ---rsk
  >>   >>>>
  >>   >>>
  >>   >>
  >>
  >> --
  >> Ken Chase - math at sizone.org Guelph Canada
  >>
  >
  >
  >
  >-- 
  >Miano, Steven M.
  >http://stevenmiano.com

--
Ken Chase - math at sizone.org