Day 1899 / Huggingface Hub prefers zip archives because they support streaming
Random nugget from Document to compress data files before uploading · Issue #5687 · huggingface/datasets:
- gz, to compress individual files
- zip, to compress and archive multiple files; zip is preferred rather than tar because it supports streaming out of the box
(Streaming: https://huggingface.co/docs/datasets/v2.4.0/en/stream TL;DR don’t download the entire dataset for very large datasets, add stream=true
to the load_dataset()
fn)