If you have any problems related to the accessibility of any content (or if you want to request that a specific publication be accessible), please contact us at email@example.com.
FAST AND SECURE INTEGRITY VERIFICATION FOR HIGH-SPEED DATA TRANSFERS
AltmetricsView Usage Statistics
The amount of data generated by scientific and commercial applications is growing at an ever-increasing pace. This data is often moved between geographically distributed sites for various purposes such as collaboration and archival, leading to significant increase in data transfer rates. Surge in data transfer rates when combined with proliferation of scientific applications that cannot tolerate data corruption paved the way for the development of end-to-end integrity verification technique to protect data transfers against silent data corruption. Integrity verification minimizes the likelihood of silent data corruption by comparing checksum of files at source and destination servers using secure hash algorithms, such as MD5 and SHA256. However, it imposes significant performance penalty due to overhead of checksum computation. In this dissertation, we propose Fast Integrity VERification (FIVER) algorithm which overlaps checksum computation and data transfer operations to reduce the overhead of integrity verification. Experimental results show that FIVER is able to bring down the overhead of end-to-end integrity verification from 60% by the state-of-the-art solutions to below 10% by concurrently executing transfer and checksum operations. Moreover, caching policy of operating systems poses a permanent data loss risk in the case of unexpected power outage despite using end-to-end integrity verification as existing implementations complete the verification process while data is still on memory. FIVER addresses this issue by enforcing dirty data on memory to be flushed to disk before terminating the integrity verification. By doing so, FIVER ensures that integrity verification completes only after data is safely written to disk and any potential data loss in the event of power outage will trigger retransmission of data upon system recovery. Finally, we present blockchain-based ledger architecture to store the checksum of frequently accessed scientific datasets to further minimize the overhead of integrity verification. In the proposed architecture, the checksum of files is calculated and pushed to a private blockchain when they are first created such that future transfers will not require data source to recalculate checksum. Moreover, we find that blockchain-based integrity verification reduces transfer time by up to 50% when data source is the bottleneck of the integrity verification process.