/**
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
* SPDX-License-Identifier: Apache-2.0.
*/
#pragma once
#include Describes the S3 data source.See Also:
AWS
* API Reference
If you choose S3Prefix, S3Uri identifies a key name
* prefix. Amazon SageMaker uses all objects that match the specified key name
* prefix for model training.
If you choose ManifestFile,
* S3Uri identifies an object that is a manifest file containing a
* list of object keys that you want Amazon SageMaker to use for model training.
*
If you choose AugmentedManifestFile, S3Uri identifies an
* object that is an augmented manifest file in JSON lines format. This file
* contains the data you want to use for model training.
* AugmentedManifestFile can only be used if the Channel's input mode
* is Pipe.
If you choose S3Prefix, S3Uri identifies a key name
* prefix. Amazon SageMaker uses all objects that match the specified key name
* prefix for model training.
If you choose ManifestFile,
* S3Uri identifies an object that is a manifest file containing a
* list of object keys that you want Amazon SageMaker to use for model training.
*
If you choose AugmentedManifestFile, S3Uri identifies an
* object that is an augmented manifest file in JSON lines format. This file
* contains the data you want to use for model training.
* AugmentedManifestFile can only be used if the Channel's input mode
* is Pipe.
If you choose S3Prefix, S3Uri identifies a key name
* prefix. Amazon SageMaker uses all objects that match the specified key name
* prefix for model training.
If you choose ManifestFile,
* S3Uri identifies an object that is a manifest file containing a
* list of object keys that you want Amazon SageMaker to use for model training.
*
If you choose AugmentedManifestFile, S3Uri identifies an
* object that is an augmented manifest file in JSON lines format. This file
* contains the data you want to use for model training.
* AugmentedManifestFile can only be used if the Channel's input mode
* is Pipe.
If you choose S3Prefix, S3Uri identifies a key name
* prefix. Amazon SageMaker uses all objects that match the specified key name
* prefix for model training.
If you choose ManifestFile,
* S3Uri identifies an object that is a manifest file containing a
* list of object keys that you want Amazon SageMaker to use for model training.
*
If you choose AugmentedManifestFile, S3Uri identifies an
* object that is an augmented manifest file in JSON lines format. This file
* contains the data you want to use for model training.
* AugmentedManifestFile can only be used if the Channel's input mode
* is Pipe.
If you choose S3Prefix, S3Uri identifies a key name
* prefix. Amazon SageMaker uses all objects that match the specified key name
* prefix for model training.
If you choose ManifestFile,
* S3Uri identifies an object that is a manifest file containing a
* list of object keys that you want Amazon SageMaker to use for model training.
*
If you choose AugmentedManifestFile, S3Uri identifies an
* object that is an augmented manifest file in JSON lines format. This file
* contains the data you want to use for model training.
* AugmentedManifestFile can only be used if the Channel's input mode
* is Pipe.
If you choose S3Prefix, S3Uri identifies a key name
* prefix. Amazon SageMaker uses all objects that match the specified key name
* prefix for model training.
If you choose ManifestFile,
* S3Uri identifies an object that is a manifest file containing a
* list of object keys that you want Amazon SageMaker to use for model training.
*
If you choose AugmentedManifestFile, S3Uri identifies an
* object that is an augmented manifest file in JSON lines format. This file
* contains the data you want to use for model training.
* AugmentedManifestFile can only be used if the Channel's input mode
* is Pipe.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
Depending on the value specified for the S3DataType, identifies
* either a key name prefix or a manifest. For example:
A key
* name prefix might look like this: s3://bucketname/exampleprefix
*
A manifest might look like this:
* s3://bucketname/example.manifest
A manifest is an S3
* object which is a JSON file consisting of an array of elements. The first
* element is a prefix which is followed by one or more suffixes. SageMaker appends
* the suffix elements to the prefix to get a full set of S3Uri. Note
* that the prefix must be a valid non-empty S3Uri that precludes
* users from specifying a manifest whose individual S3Uri is sourced
* from different S3 buckets.
The following code example shows a valid * manifest format:
[ {"prefix":
* "s3://customer_bucket/some/prefix/"},
* "relative/path/to/custdata-1",
* "relative/path/custdata-2",
...
* "relative/path/custdata-N"
]
This JSON is
* equivalent to the following S3Uri list:
* s3://customer_bucket/some/prefix/relative/path/to/custdata-1
s3://customer_bucket/some/prefix/relative/path/custdata-2
...
* s3://customer_bucket/some/prefix/relative/path/custdata-N
The complete set of S3Uri in this manifest is the input data for
* the channel for this data source. The object that each S3Uri points
* to must be readable by the IAM role that Amazon SageMaker uses to perform tasks
* on your behalf.
If you want Amazon SageMaker to replicate the entire dataset on each ML
* compute instance that is launched for model training, specify
* FullyReplicated.
If you want Amazon SageMaker to replicate
* a subset of data on each ML compute instance that is launched for model
* training, specify ShardedByS3Key. If there are n ML compute
* instances launched for a training job, each instance gets approximately
* 1/n of the number of S3 objects. In this case, model training on each
* machine uses only the subset of training data.
Don't choose more ML * compute instances for training than available S3 objects. If you do, some nodes * won't get any data and you will pay for nodes that aren't getting any training * data. This applies in both File and Pipe modes. Keep this in mind when * developing algorithms.
In distributed training, where you use multiple
* ML compute EC2 instances, you might choose ShardedByS3Key. If the
* algorithm requires copying training data to the ML storage volume (when
* TrainingInputMode is set to File), this copies
* 1/n of the number of objects.
If you want Amazon SageMaker to replicate the entire dataset on each ML
* compute instance that is launched for model training, specify
* FullyReplicated.
If you want Amazon SageMaker to replicate
* a subset of data on each ML compute instance that is launched for model
* training, specify ShardedByS3Key. If there are n ML compute
* instances launched for a training job, each instance gets approximately
* 1/n of the number of S3 objects. In this case, model training on each
* machine uses only the subset of training data.
Don't choose more ML * compute instances for training than available S3 objects. If you do, some nodes * won't get any data and you will pay for nodes that aren't getting any training * data. This applies in both File and Pipe modes. Keep this in mind when * developing algorithms.
In distributed training, where you use multiple
* ML compute EC2 instances, you might choose ShardedByS3Key. If the
* algorithm requires copying training data to the ML storage volume (when
* TrainingInputMode is set to File), this copies
* 1/n of the number of objects.
If you want Amazon SageMaker to replicate the entire dataset on each ML
* compute instance that is launched for model training, specify
* FullyReplicated.
If you want Amazon SageMaker to replicate
* a subset of data on each ML compute instance that is launched for model
* training, specify ShardedByS3Key. If there are n ML compute
* instances launched for a training job, each instance gets approximately
* 1/n of the number of S3 objects. In this case, model training on each
* machine uses only the subset of training data.
Don't choose more ML * compute instances for training than available S3 objects. If you do, some nodes * won't get any data and you will pay for nodes that aren't getting any training * data. This applies in both File and Pipe modes. Keep this in mind when * developing algorithms.
In distributed training, where you use multiple
* ML compute EC2 instances, you might choose ShardedByS3Key. If the
* algorithm requires copying training data to the ML storage volume (when
* TrainingInputMode is set to File), this copies
* 1/n of the number of objects.
If you want Amazon SageMaker to replicate the entire dataset on each ML
* compute instance that is launched for model training, specify
* FullyReplicated.
If you want Amazon SageMaker to replicate
* a subset of data on each ML compute instance that is launched for model
* training, specify ShardedByS3Key. If there are n ML compute
* instances launched for a training job, each instance gets approximately
* 1/n of the number of S3 objects. In this case, model training on each
* machine uses only the subset of training data.
Don't choose more ML * compute instances for training than available S3 objects. If you do, some nodes * won't get any data and you will pay for nodes that aren't getting any training * data. This applies in both File and Pipe modes. Keep this in mind when * developing algorithms.
In distributed training, where you use multiple
* ML compute EC2 instances, you might choose ShardedByS3Key. If the
* algorithm requires copying training data to the ML storage volume (when
* TrainingInputMode is set to File), this copies
* 1/n of the number of objects.
If you want Amazon SageMaker to replicate the entire dataset on each ML
* compute instance that is launched for model training, specify
* FullyReplicated.
If you want Amazon SageMaker to replicate
* a subset of data on each ML compute instance that is launched for model
* training, specify ShardedByS3Key. If there are n ML compute
* instances launched for a training job, each instance gets approximately
* 1/n of the number of S3 objects. In this case, model training on each
* machine uses only the subset of training data.
Don't choose more ML * compute instances for training than available S3 objects. If you do, some nodes * won't get any data and you will pay for nodes that aren't getting any training * data. This applies in both File and Pipe modes. Keep this in mind when * developing algorithms.
In distributed training, where you use multiple
* ML compute EC2 instances, you might choose ShardedByS3Key. If the
* algorithm requires copying training data to the ML storage volume (when
* TrainingInputMode is set to File), this copies
* 1/n of the number of objects.
If you want Amazon SageMaker to replicate the entire dataset on each ML
* compute instance that is launched for model training, specify
* FullyReplicated.
If you want Amazon SageMaker to replicate
* a subset of data on each ML compute instance that is launched for model
* training, specify ShardedByS3Key. If there are n ML compute
* instances launched for a training job, each instance gets approximately
* 1/n of the number of S3 objects. In this case, model training on each
* machine uses only the subset of training data.
Don't choose more ML * compute instances for training than available S3 objects. If you do, some nodes * won't get any data and you will pay for nodes that aren't getting any training * data. This applies in both File and Pipe modes. Keep this in mind when * developing algorithms.
In distributed training, where you use multiple
* ML compute EC2 instances, you might choose ShardedByS3Key. If the
* algorithm requires copying training data to the ML storage volume (when
* TrainingInputMode is set to File), this copies
* 1/n of the number of objects.
A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline const Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline bool AttributeNamesHasBeenSet() const { return m_attributeNamesHasBeenSet; } /** *A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline void SetAttributeNames(const Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline void SetAttributeNames(Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& WithAttributeNames(const Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& WithAttributeNames(Aws::VectorA list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& AddAttributeNames(const Aws::String& value) { m_attributeNamesHasBeenSet = true; m_attributeNames.push_back(value); return *this; } /** *A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& AddAttributeNames(Aws::String&& value) { m_attributeNamesHasBeenSet = true; m_attributeNames.push_back(std::move(value)); return *this; } /** *A list of one or more attribute names to use that are found in a specified * augmented manifest file.
*/ inline S3DataSource& AddAttributeNames(const char* value) { m_attributeNamesHasBeenSet = true; m_attributeNames.push_back(value); return *this; } private: S3DataType m_s3DataType; bool m_s3DataTypeHasBeenSet; Aws::String m_s3Uri; bool m_s3UriHasBeenSet; S3DataDistribution m_s3DataDistributionType; bool m_s3DataDistributionTypeHasBeenSet; Aws::Vector