[Cyberduck-trac] [Cyberduck] #10278: Optimize Checksum Calculation

Cyberduck trac at cyberduck.io
Thu Mar 15 09:14:57 UTC 2018


#10278: Optimize Checksum Calculation
----------------------------+-------------------------
    Reporter:  allklier     |      Owner:
        Type:  enhancement  |     Status:  new
    Priority:  normal       |  Milestone:
   Component:  core         |    Version:  6.4.1
    Severity:  normal       |   Keywords:
Architecture:               |   Platform:  macOS 10.12
----------------------------+-------------------------
 Two suggestions to optimize checksum calculation while uploading to S3.

 I frequently upload very large files (75-100GB) to S3 and the checksum
 calculation adds a significant delay in a time sensitive workflow. I was
 just uploading a 75GB file, and the checksum calculation took 10min before
 the actual upload started. Actual upload time is 32min, so that adds a 33%
 time penalty in uploading, which is significant and very unfortunate.

 - Compute the checksum during the upload, rather than a separate pre-calc
 pass. Yes, that reduces redundancy of the checksum because it becomes a
 single read, but errors are more likely during upload than local disk
 read.
 - The algorithm for reading the file for checksum calculation seems slow.
 My primary storage (RAID5) supports read bandwidth in excess of 400MB/s,
 yet during the calculation of the checksum the read speed never exceeds
 120MB/s, so checksum calculation is limited by code not I/O bandwidth.

--
Ticket URL: <https://trac.cyberduck.io/ticket/10278>
Cyberduck <https://cyberduck.io>
Libre FTP, SFTP, WebDAV, S3 & OpenStack Swift browser for Mac and Windows


More information about the Cyberduck-trac mailing list