Multi-part Upload to S3 programmatically in .Net using C#
Multi-part Upload to S3 programmatically in .Net using C#
Uploading large files or a batch of thousands of files or continuous backup into S3 can sometimes be problematic using AWS Console. Apart from Storage Gateways, another easier solution is to write just a few lines of code to enable this. Files can be up to a maximum of 5TB in size. But remember that a maximum of 10,000 parts is allowed.
You can use other logical scenarios such as based upon certain conditions uploading to a specified bucket or prepending a prefix etc...
This blog post specifically shows and discusses some code on how to upload a file as smaller chunks and re-assemble it on the server. This blog post also discusses some useful related functions.
The following are the requirements:
- A user with programmatic access. Can be created in IAM console and enabling programmatic access. Just download the CSV file, write some code to read the 3rd and 4th columns from the second line in the CSV. These are the SecretKey and SecretAccessKey. If needed store somewhere encrypted. This is to prevent the SecretAccessKey from prying eyes. Also, restrict the permissions to just what's needed through IAM policies. Practice a good form of Key rotation, for example, once a week or month or 3 months, etc...
- The region of the bucket
- The name of the bucket
The code is very straightforward:
1) Instantiate AmazonS3Client
2) Initiate Multi-Part Upload
3) Upload the file as chunks until completed
4) Finalize the file
var s3Client = new AmazonS3Client(new BasicAWSCredentials(ACCESS_KEY, SECRET_ACCESS_KEY));
The above code instantiates an s3Client with the SecretKey and SecretAccessKey.
var response = await s3Client.InitiateMultipartUploadAsync(new InitiateMultipartUploadRequest
{
BucketName = bucketName,
Key = keyName,
StorageClass = S3StorageClass.DeepArchive
});
The above code initiates a multi-part request. The name of the bucket, keyname - the name of the file, and any prefixes for a hierarchical look and storage class can be specified here. For example, Organization/Department/FileName.ext. Here the Key would be "Organization/Department/FileName.ext". S3 stores these in a flat fashion, i.e the entire string is the key but the prefixes and separator "/" can be used for a hierarchical look in the S3 console. On the response always verify for an HTTP status code of 200 before proceeding. Because of any network connection issues and to improve robustness you can try waiting a few seconds/milliseconds and retrying another once or twice before throwing or displaying an error. For synchronous code remove the await and read the Result
property.
The response object contains a string property known as UploadId
. This property needs to be stored, for larger files, in some kind of durable storage such as a database could be preferred. So that, some network connection after hours of uploading would not disrupt. There are other ways of finding broken files or parts, from the List methods which are mentioned briefly at the end of this blog post.
long filePosition = 0;
long contentLength = new FileInfo(filePath).Length;
long partSize = 5 * (long)Math.Pow(2, 20); // 5MB
List<UploadPartResponse> uploadResponses = new List<UploadPartResponse>();
for (int i = 1; filePosition < contentLength; i++)
{
var result = await s3Client.UploadPartAsync(new UploadPartRequest
{
BucketName = bucketName,
Key = keyName,
UploadId = response.UploadId,
PartNumber = i,
PartSize = partSize,
FilePosition = filePosition,
FilePath = filePath
});
if (result.HttpStatusCode == HttpStatusCode.OK)
{
uploadResponses.Add(result);
}
else{ // Retry logic etc... }
filePosition += partSize;
}
filePath is the path to the file which needs to be uploaded.
The partSize variable calculates the number of bytes for 5 MB. The part size can be 5MB to 5GB.
We create an empty list of UploadPartResponse to hold the responses, for large files and for robustness these can be stored in some persistent storage such as a database. The string property ETag is needed.
Once the entire file is uploaded as chunks, the file can be finalized as follows:
var completeUploadResponse = s3Client.CompleteMultipartUploadAsync(new CompleteMultipartUploadRequest
{
BucketName = bucketName,
Key = keyName,
UploadId = response.UploadId
}).Result;
There is also an overload of CompleteMultipartUploadAsync that accepts a list of PartETag, where you can specify the specific parts that need to be merged to complete the file upload. The PartETag class has two properties ETag and PartNumber. The PartNumber is what is specified in the UploadPartRequest request and ETag is from the UploadPartResponse.
Some other interesting methods on the s3Client object are:
1) AbortMultipartUploadAsync - To cancel a pending MultiPartUpload for any reason.
2) ListMultipartUploadsAsync - Get a list of Multipart Uploads that have been initiated but not completed nor aborted.
3) ListPartsAsync - List parts of a specific MultipartUpload.
There are several more methods on the s3Client object but the above-mentioned methods are interesting in the context of the current article. Also, the list API calls return only 1000 objects, there are means to obtain the next list, but that's beyond the scope of the current blog post. Maybe, I will make a separate blog post.
Happy development :)
Comments
Post a Comment
Chime in!