|
113 | 113 | "source": [
|
114 | 114 | "Set up the IAM role. This role gives SageMaker FeatureStore access to your S3 bucket. \n",
|
115 | 115 | "\n",
|
116 |
| - "*Note that in this example we use the default SageMaker role, assuming it has both `AmazonSageMakerFullAccess` and `AmazonSageMakerFeatureStoreAccess` managed policies. If not, please make sure to attach them to the role before proceeding.*" |
| 116 | + "<div class=\"alert alert-block alert-warning\">\n", |
| 117 | + "<b>Note:</b> In this example we use the default SageMaker role, assuming it has both <b>AmazonSageMakerFullAccess</b> and <b>AmazonSageMakerFeatureStoreAccess</b> managed policies. If not, please make sure to attach them to the role before proceeding.\n", |
| 118 | + "</div>" |
117 | 119 | ]
|
118 | 120 | },
|
119 | 121 | {
|
|
135 | 137 | "source": [
|
136 | 138 | "## Inspect Dataset\n",
|
137 | 139 | "\n",
|
138 |
| - "The provided dataset is a sampled version of [IEEE fraud detection dataset](https://www.kaggle.com/c/ieee-fraud-detection/data) with 2000 transactions. The dataset has two tables: identity and transactions. They can both be joined by the 'TransactionId' column. The transaction table contains information about a particular transaction such as amount, credit or debit card while the identity table contains information about the user such as device type and browser. The transaction must exist in the transaction table, but might not always be available in the identity table.\n", |
| 140 | + "The provided dataset is a synthetic dataset with two tables: identity and transactions. They can both be joined by the `TransactionId` column. The transaction table contains information about a particular transaction such as amount, credit or debit card while the identity table contains information about the user such as device type and browser. The transaction must exist in the transaction table, but might not always be available in the identity table.\n", |
139 | 141 | "\n",
|
140 |
| - "The objective of the model is to predict if a transaction is fraudulent or not given the transaction record. We have decided to use 18 columns from both tables to train this model.\n", |
141 |
| - "\n", |
142 |
| - "Note that the sampled data is in the SageMaker public S3 bucket." |
| 142 | + "The objective of the model is to predict if a transaction is fraudulent or not, given the transaction record." |
143 | 143 | ]
|
144 | 144 | },
|
145 | 145 | {
|
|
155 | 155 | "\n",
|
156 | 156 | "s3_client = boto3.client('s3', region_name=region)\n",
|
157 | 157 | "\n",
|
158 |
| - "fraud_detection_bucket_name = 'sagemaker-featurestore-fraud-detection'\n", |
159 |
| - "identity_file_key = 'sampled_identity.csv'\n", |
160 |
| - "transaction_file_key = 'sampled_transactions.csv'\n", |
| 158 | + "fraud_detection_bucket_name = 'sagemaker-sample-files'\n", |
| 159 | + "identity_file_key = 'datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_identity.csv'\n", |
| 160 | + "transaction_file_key = 'datasets/tabular/fraud_detection/synthethic_fraud_detection_SA/sampled_transactions.csv'\n", |
161 | 161 | "\n",
|
162 | 162 | "identity_data_object = s3_client.get_object(Bucket=fraud_detection_bucket_name, Key=identity_file_key)\n",
|
163 | 163 | "transaction_data_object = s3_client.get_object(Bucket=fraud_detection_bucket_name, Key=transaction_file_key)\n",
|
|
0 commit comments