Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB
To work with data in the cloud, you can upload it to Amazon® Simple Storage Service (Amazon S3™) and then access the data in Amazon S3 from MATLAB® or from workers in your cluster.
You can either read or write Amazon S3 data from your MATLAB session. Your MATLAB session can be anywhere, including your local machine, MATLAB Online™, or your cloud resource in Cloud Center.
Set up Access to Amazon S3 Bucket
You must set up Amazon Web Services (AWS®) credentials to work with remote data in Amazon S3. These AWS credentials must also have the required read and write policies. If you are creating resources on Cloud Center, you can add AWS access either before creating the resource or while the resource is running.
Add AWS Access Before Creating Cloud Resources in Cloud Center
If you are creating a MATLAB or MATLAB Parallel Server™ resource in Cloud Center, you can set up access to read data from:
The S3 buckets in the AWS account linked to your Cloud Center account from which you are creating a cloud resource
Public S3 buckets
To set up read access to an S3 bucket, you need to add the required AWS Identity, Access and Management (IAM) policy by setting Additional
IAM Policies (Optional) to the value arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess when creating your
cloud resource.
To also set up write access to the S3 buckets in the AWS account linked to your Cloud
Center account from which you are creating the cloud resource, enter the value
arn:aws:iam::aws:policy/AmazonS3FullAccess instead.
Add AWS Access in Your MATLAB Session
You must use an AWS session token to set up read access if:
You already started your cloud resources in Cloud Center.
Your MATLAB session is on your local machine or MATLAB Online.
You want to access an S3 bucket in an AWS account that you did not use to create a cloud resource in Cloud Center.
If you are a root user for the AWS account, follow these steps. Otherwise, contact your AWS account administrator. If you are provided with a long-term token, skip to
step 3. If you are provided an AWS session token instead, skip to step 6. Ensure that your token has the S3
read (AmazonS3ReadOnlyAccess) or write (e.g.
AmazonS3FullAccess) policy that you need.
Create an identity and access management (IAM) user using the AWS account that contains the S3 bucket. For more information, see Creating an IAM User in Your AWS Account.
Generate an access key to receive a long-term access token that includes an access key ID (
AWS_ACCESS_KEY_ID) and a secret access key (AWS_SECRET_ACCESS_KEY). For more information, see Managing Access Keys for IAM Users. Ensure that this access key has the required S3 access policies. This access key allows you to generate an AWS session token.Download and install the AWS Command Line Interface tool on the machine with your MATLAB instance. This tool supports commands specific to AWS in your system terminal.
In the system terminal, enter this command to set up the AWS CLI. You are prompted to enter the details of your long-term access token.
aws configure
To obtain an AWS session token, enter this command in your system terminal.
This command generates a session token that is valid for an hour. This session token includes an AWS access key ID, an AWS secret access key, and an AWS session token. Note that the keys in this session token are different from those in the long-term access token.aws sts get-session-token --duration-seconds 3600
Tip
Instead of using the AWS CLI, you can use AWS CloudShell. For details about CloudShell, see Getting started with AWS CloudShell. For more information about session tokens, see Request temporary security credentials
Once you have your session token, specify your AWS access key ID, secret access key, region of the bucket, and session token as system environment variables in your MATLAB command window using the
setenv(MATLAB) command.setenv("AWS_ACCESS_KEY_ID","YOUR_AWS_ACCESS_KEY_ID") setenv("AWS_SECRET_ACCESS_KEY","YOUR_AWS_SECRET_ACCESS_KEY") setenv("AWS_DEFAULT_REGION","YOUR_AWS_DEFAULT_REGION") setenv("AWS_SESSION_TOKEN","YOUR_AWS_SESSION_TOKEN")To increase the security of your code and make your code safer to share, you can store your credentials in your MATLAB vault as secrets and then reference them in your code. For more information, see Keep Sensitive Information Out of Code (MATLAB).
If you are using MATLAB Parallel Server on Cloud Center, configure your cloud cluster to access S3 services.
After you create a cloud cluster, configure your cluster profile with your AWS credentials. In your MATLAB session, in the Environment section on the MATLAB Home tab, select Parallel > Create and Manage Clusters. In the Cluster Profile Manager, select your cloud cluster profile. Scroll to the
EnvironmentVariablesproperty and add these environment variable names:AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_DEFAULT_REGION, andAWS_SESSION_TOKEN. For more details, see Set Environment Variables on Workers (Parallel Computing Toolbox).
Verify Access to AWS Credentials from your MATLAB Session
Download and install the AWS Command Line Interface tool. In your MATLAB session, check if you already have access to an AWS account.
!aws sts get-caller-identityIf you have access to an AWS account in your MATLAB session, then this command returns your AWS account number and other details of your AWS account.
Download Data Sets to Local Machine
To follow along with the examples in this page, you can download these MathWorks® data sets on your local machine. Follow these steps to get started.
The Example Food Images data set contains 978 photographs of food in nine classes. You can download this data set to your local machine using this command in MATLAB.
fprintf("Downloading Example Food Image data set ... ") filename = matlab.internal.examples.downloadSupportFile('nnet', 'data/ExampleFoodImageDataset.zip'); fprintf("Done.\n") unzip(filename,"MyLocalFolder/FoodImageDataset");
To obtain the Traffic Signal Work Orders data set on your local machine, use this command.
fprintf("Downloading Traffic Signal Work Orders data set ... ") zipFile = matlab.internal.examples.downloadSupportFile("textanalytics","data/Traffic_Signal_Work_Orders.zip"); fprintf("Done.\n") unzip(zipFile,"MyLocalFolder/TrafficDataset");
Upload Data to Amazon S3 from Local Machine
This section shows you how to upload data sets from your local machine to your Amazon S3 bucket. Later sections in this page show you how to work with remote image and text data.
You can upload data to Amazon S3 by using the AWS S3 web page. For more efficient file transfers to and from Amazon S3, use the AWS Command Line Interface tool.
To upload data set from your local machine to your Amazon S3 bucket, follow these steps.
Create a bucket for your data using the following command in your MATLAB command window. Replace
MyCloudDatawith the name of your Amazon S3 bucket.!aws s3 mb s3://MyCloudDataUpload your data using the following command in your MATLAB command window.
!aws s3 cp mylocaldatapath s3://MyCloudData --recursiveFor example, to upload the Example Food Images data set from your local machine to your Amazon S3 bucket, use this command.
!aws s3 cp MyLocalFolder/FoodImageDataset s3://MyCloudData/FoodImageDataset/ --recursiveTo upload the Traffic Signal Work Orders data set from your local machine to your Amazon S3 bucket, use this command.
!aws s3 cp MyLocalFolder/TrafficDataset s3://MyCloudData/TrafficDataset/ --recursive
Access Data from Amazon S3 in MATLAB
After you store your data in Amazon S3, you can use Data Import and Export (MATLAB) functions in MATLAB to read or write data from the Amazon S3 bucket in MATLAB. MATLAB functions that support a remote location in their filename input arguments allow access to remote data. To check if a specific function allows remote access, refer to its function page.
Note
If you are on a MATLAB session in MATLAB
Parallel Server in Cloud Center, save the images to the /shared/persisted
folder on your headnode so that all worker nodes across the cluster can access the folder.
This location is optimal because each worker does not have to download the data
individually.
For example, you can use imread (MATLAB) to read images from an Amazon S3 bucket. Replace s3://MyCloudData with the URL of your
Amazon S3 bucket.
To write data into the Amazon S3 bucket, you can similarly use Data Import and Export (MATLAB) functions that support write access to remote data. To check if a specific function allows remote access, refer to its function page.
Read Data from Amazon S3 in MATLAB Using Datastores
To access large data sets in Amazon S3 from your MATLAB client or your cluster workers, you can use datastores. A datastore is a
repository for collections of data that are too large to fit in memory. Datastores allow you
to read and process data stored in multiple files on a remote location as a single entity.
For example, use an imageDatastore (MATLAB) to read images from an Amazon S3 bucket.
Create an
imageDatastoreobject that points to the URL of the Amazon S3 bucket. Replaces3://MyCloudDatawith the URL of your Amazon S3 bucket.imds = imageDatastore("s3://MyCloudData/FoodImageDataset/", ... IncludeSubfolders=true, ... LabelSource="foldernames");
Read the first image from Amazon S3 by using the
readimage(MATLAB) function.img = readimage(imds,1);
Display the image by using the
imshow(MATLAB) function.imshow(img)
To use datastores to read files or data of other formats, see Getting Started with Datastore (MATLAB).
For a step-by-step example that shows how to train a convolutional neural network using data stored in Amazon S3, see Train Network in the Cloud Using Automatic Parallel Support (Deep Learning Toolbox).
Write Data to Amazon S3 from MATLAB Using Datastores
You can use datastores to write data from MATLAB or cluster workers to Amazon S3. For example, use a tabularTextDatastore (MATLAB) object to read tabular data from Amazon S3 into a tall array, preprocess the data, and then write it back to Amazon S3.
Create a datastore object that points to the URL of the Amazon S3 bucket.
ds = tabularTextDatastore("s3://MyCloudData/TrafficDataset/Traffic_Signal_Work_Orders.csv");Read the tabular data into a tall array and preprocess the data by removing missing entries and sorting the data.
tt = tall(ds); tt = sortrows(rmmissing(tt));
Write the data back to Amazon S3 by using the
write(MATLAB) function.write("s3://MyCloudData/TrafficDataset/preprocessedData/",tt);To read your tall data back, use the
datastore(MATLAB) function.ds = datastore("s3://MyCloudData/TrafficDataset/preprocessedData/"); tt = tall(ds);
To use datastores to write files or data of other formats, see Getting Started with Datastore (MATLAB).