why a file of size > 5GB cannot be uploaded without segmentation

Asked by Ankit Mittal

Hi All,

we can upload a file of size less than 5 GB without segmentation but why not a file of size more than 5 GB without segmentation?

Regards,
Ankit

Question information

Language:
English Edit question
Status:
Solved
For:
OpenStack Object Storage (swift) Edit question
Assignee:
No assignee Edit question
Solved by:
gholt
Solved:
Last query:
Last reply:
Revision history for this message
Best gholt (gholt) said :
#1

Each object uploaded is a file (plus however many replicas) on a file system at some point. Larger than 5G (and even 5G for smaller clusters) can cause imbalances of storage distribution. Imagine you have a 3 replica cluster with 100 drives, all well-balanced, and then you upload a 100G object. Now you'll have 3 drives that have 100G more than any other. That's 5% of a 2T drive.

Now, if all you're using a cluster for is 100G files, than it won't matter since, over time, everything will balance out. But usually large objects are less common than smaller ones, and therefore can cause "lumpiness" in distributed storage.

There are, of course, ways to solve this. One way is to auto-split objects greater than a certain size, but then you have that added complexity and the added complexity of tracking all those pieces and ensuring overwrites clean up old pieces atomically, etc.

When originally writing Swift, we did our best to stay on the side of code simplicity to keep the "core technology" as bug free and reliable as possible.

In this particular case, it should be easy enough to split such large objects into segments on the client side. Additionally, such splitting can allow you to upload several segments in parallel, gaining more speed through use of more drives at once in your cluster.

I think there's a push to transparently allow segmenting in the Swift proxy as well -- you'd lose the speed benefit but have a simpler client, and a more complex proxy but at least that code would only be on an "off" execution path. But even this would require the users to understand that splitting was occurring and that they'd have a set of segment objects somewhere in their Swift containers.

Revision history for this message
Ankit Mittal (ankitm-r) said :
#2

thanks gholt

Revision history for this message
Ankit Mittal (ankitm-r) said :
#3

Thanks gholt, that solved my question.