Importing Private Cloud Assets from AWS S3

How to import assets in private buckets on AWS S3 for labeling on Ango Hub

Administrators and project managers can import assets to Ango Hub from AWS private S3 buckets.

The main difference between importing data from cloud storage services and using drag-and-drop is the files' location. When importing assets with drag-and-drop, the assets get copied to a folder within Ango Hub. This means that, if you are using Hub's cloud version, your assets are uploaded to Ango AI's own cloud storage. When importing assets from your own cloud storage, however, your assets are left in your own storage and are never copied anywhere else.

There are two ways to allow Ango Hub to access private bucket data: through creating an IAM user for Ango Hub, or by setting up IAM delegation. Both methods function identically, with the only difference being in the setup.

Configure CORS

The CORS header below allows Ango Hub to send a request to your cloud storage, and allows your cloud storage to explicitly allow requests from Hub. This is a necessary step to ensure Hub can connect to your private bucket.

The steps to follow to set up CORS with Ango Hub can be found here.

Connect your Cloud Storage by creating an IAM User for Ango Hub

Once you've set up CORS for your bucket, you will need to create a connection between Hub and the bucket itself.

To link your bucket to Hub with the IAM User method, you will need your bucket's Access Key ID and Secret Key. To obtain them, you will need to create an IAM user on your cloud storage provider's management dashboard. More information on how to do this here.

Both your Access Key ID and your Secret Access Key will be alphanumeric strings, the former 20 and the latter 40 characters long.

Once you have obtained the two strings, go to your organization's page, then click on Storages and Add Storage.

From the dialog that pops up, provide your connection with a unique name, pick your provider, and input the aforementioned keys. From the "Access Type" selector, pick "IAM User".

After clicking OK, your bucket will be linked to Hub, and it will show up in your list of integrations.

Integrations belong to your organization, not your user.

If you want to be able to access your non-public cloud data in multiple organizations, you will have to integrate Hub with AWS in every one of them.

Connect your Cloud Storage by using IAM Delegation

Ensure you have first set up the CORS policy for your bucket correctly by following the steps in this page.

If you do not wish to or cannot create a new IAM user for Ango, you may use IAM delegation to connect your private bucket to Ango Hub.

Initialize Ango Hub integration

  1. From Ango Hub, click on Organization in the top bar.

  2. Enter the Storages tab.

  3. Click on Add Storage. The "Add Storage" dialog will appear. Keep this tab open as you will enter information here later.

Set up IAM role and policy on AWS S3

  1. In a new tab, open your AWS S3 console.

  2. Navigate to the bucket(s) you would like to connect to Hub. For each bucket you would like to connect, take note of the Amazon Resource Name (ARN) for it:

  1. From the "IAM" section, navigate to the "Policies" sub-section, then click on "Create Policy"

  1. Click on "JSON"

  1. In the Policy Editor, paste the following JSON. In the "Resource" property, paste the bucket ARNs you have copied in step 5. Each ARN will have to be copied twice: once by itself, and once with a trailing /*. This will allow Hub to access both the bucket and its contents.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::your-bucket-name/*",
                "arn:aws:s3:::your-bucket-name",
                "arn:aws:s3:::your-second-optional-bucket-name/*",
                "arn:aws:s3:::your-second-optinal-bucket-name"
            ]
        }
    ]
}
PropertyDescription

Version

The formatting version of the policy. Use the version dated 2012-10-17.

Effect

Specifies that the actions listed below will be allowed.

Action

The actions that will be allowed. In this case, Ango Hub is given read-only access to the bucket.

Resource

The buckets that Ango Hub will be able to access.

  1. Click on the "Next" button at the bottom of the screen.

  2. In the "Policy Name" text area, enter a name for the new policy.

  3. Click on "Create Policy" at the bottom.

  4. From the IAM section of the AWS dashboard, enter the "Roles" sub-section and click on "Create Role":

  1. Click on "Custom trust policy"

  2. In the text area that appears, paste the following JSON:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::397578311638:user/angohub-delegate"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "your-external-id-for-the-role-here"
                }
            }
        }
    ]
}

The "Principal -> AWS" property must be pasted as-is and it is the same for all users: arn:aws:iam::397578311638:user/angohub-delegate

In the sts:externalId property, enter an external ID by which this role will be known. You will need this later when finalizing the integration with Ango Hub.

  1. Click on "Next".

  2. From the "Add Permissions" page that appears, enable the checkbox next to the policy you have created in step 11.

  3. Click on "Next".

  4. In the "Role name" field, enter a name for this role.

  5. Click on "Create Role".

  6. From the "Roles" sub-section, click on the role you have just created, and copy its ARN. You will need it later:

Finalize Ango Hub integration

  1. Go back to the Ango Hub tab you had opened in step 3.

  2. In the "Add Storage" dialog, enter a unique name for your storage, enter the name of the region where the bucket is located, select "IAM Delegated" in the "Access Type" selector, then paste the Role ARN and the External IDs we took note of earlier.

  3. Click on "OK" to finalize the integration.

Preparing the JSON

After connecting our bucket, we will need to prepare a JSON file containing each asset’s external ID as well as the asset's full absolute path, plus optionally the image's width and height if the asset is an image.

Ensure your URLs are percent-encoded in the UTF-8 format.

If your filenames contain spaces, for example, ensure they are encoded as %20, and not as pluses (+).

Ensure the region information is present in the URLs you provide in the JSON.

The URL must be in the format https://<bucket-name>.s3.<region>.amazonaws.com/<path-to-file>

If you do not provide region information, Hub will assume the region is us-east-1.

This is what the JSON should look like:

The width and height fields are only necessary if you plan on exporting polygon annotations as image bitmasks. In most cases, width and height information will not be necessary.

[
  {
   "data":"https://bucket-name.s3.eu-central-1.amazonaws.com/file.jpg",
   "externalId":"cute-cat.jpg",
   "width":603,
   "height":450
  },
  {
   "data":"https://bucket-name.s3.eu-central-1.amazonaws.com/file%202.jpg",
   "externalId":"cute-dog.jpg",
   "width":1800,
   "height":1200
  }
]

There is no upper bound to the number of assets that can be imported to a project in Ango Hub this way. There is no file size restriction.

Uploading the JSON to Ango Hub

From your project’s dashboard, enter the Assets tab and click on Add Data.

A dialog will pop up. Click on “Upload Data URL” at the top.

Click on Advanced Configs and from Storage Method, pick the storage integration you created in the previous step.

Drag the JSON file you would like to upload to the box in the center. Alternatively, click on the box to open your system’s file explorer and select it from there.

Click on the Close button. Your assets will show up in the Assets tab.

It is also possible to upload assets through our API.

Last updated