Question 1

An organization wants to simplify the management of cross-account data permissions by using a centralized service to define and enforce fine-grained access control at the database, table, and column levels. Which service is best suited for this?

Accepted Answer

AWS Lake Formation

Answer

AWS Identity and Access Management (IAM)

Answer

Amazon S3 Bucket Policies

Answer

Amazon Redshift Spectrum

Question 2

A data engineer needs to join data from an Amazon S3-based data lake with a dimensional table stored in an Amazon Redshift cluster without moving the S3 data into Redshift permanently. Which feature should they use?

Accepted Answer

Amazon Redshift Spectrum

Answer

Amazon Redshift Federated Query

Answer

Amazon Athena

Answer

AWS Glue DataBrew

Question 3

Which AWS Glue component is used to automatically discover data formats and retrieve schemas to populate the Data Catalog?

Accepted Answer

Glue Crawler

Answer

Glue Job

Answer

Glue Workflow

Answer

Glue Trigger

Question 4

A streaming application requires real-time data ingestion and processing with a retention period of 24 hours. The engineering team wants to use a managed service that can scale shards to handle throughput. Which service should they choose?

Accepted Answer

Amazon Kinesis Data Streams

Answer

Amazon Kinesis Data Firehose

Answer

Amazon SQS

Answer

Amazon MSK Connect

Question 5

A data engineer is designing a pipeline where an S3 event triggers a process to validate the schema of an uploaded CSV file. The process is lightweight and runs in less than 30 seconds. What is the most cost-effective compute option?

Accepted Answer

AWS Lambda

Answer

Amazon EC2 Reserved Instances

Answer

Amazon EMR

Answer

AWS Glue ETL Jobs

Question 6

You need to monitor AWS Glue job failures and receive an email notification whenever a job fails. Which combination of services provides this functionality?

Accepted Answer

Amazon CloudWatch Events (EventBridge) and Amazon SNS

Answer

AWS CloudTrail and Amazon SES

Answer

Amazon CloudWatch Logs and AWS Lambda

Answer

AWS Glue Workflows and Amazon Redshift

Question 7

To optimize performance for an Amazon Redshift cluster, a data engineer wants to ensure that data is distributed across nodes based on a common join key. Which distribution style should be used?

Accepted Answer

KEY distribution

Answer

EVEN distribution

Answer

AUTO distribution

Answer

ALL distribution

Question 8

A company requires all data at rest in Amazon S3 to be encrypted using keys that are rotated annually and managed by a dedicated security team within AWS. Which encryption method meets this with the least operational overhead?

Accepted Answer

SSE-KMS

Answer

SSE-C

Answer

Client-Side Encryption

Answer

SSE-S3

Question 9

A complex ETL pipeline involves multiple dependencies where Glue Jobs must only run after specific S3 objects are created and a Lambda function succeeds. What is the most robust way to orchestrate this?

Accepted Answer

AWS Step Functions

Answer

Amazon S3 Event Notifications

Answer

Cron jobs on Amazon EC2

Answer

AWS Glue Triggers

Question 10

In an architectural review, you are asked to reduce the cost of a long-running Amazon EMR cluster used for daily batch processing. The workload is fault-tolerant. Which instance configuration is best?

Accepted Answer

Use Spot Instances for Task nodes

Answer

Use Reserved Instances for all nodes

Answer

Use On-Demand Instances for Master nodes

Answer

Use Spot Instances for Master nodes

Question 11

A data engineer needs to optimize the performance of an Amazon Athena query that scans a large dataset in Amazon S3. The dataset is currently stored in a single flat CSV file. Which combination of strategies will result in the greatest reduction in data scanned and improved query speed?

Accepted Answer

Convert the data to Apache Parquet format and implement Hive-style partitioning.

Answer

Compress the CSV file using GZIP and increase the Athena DPU limit.

Answer

Store the data in Amazon DynamoDB and use Athena Federated Query.

Answer

Split the large CSV into multiple smaller CSV files within the same S3 prefix.

Question 12

A data engineer is designing a data lake on Amazon S3 and needs to ensure that any PII data is automatically identified and classified as it is uploaded. The organization also needs a dashboard to visualize the risk levels of their data across several S3 buckets. Which AWS service is specifically designed for this purpose?

Accepted Answer

Amazon Macie

Answer

Amazon Inspector

Answer

AWS Glue DataBrew

Answer

Amazon GuardDuty

Question 13

An Amazon Redshift database contains a large table with historical sales data. A data engineer notices that queries filtering by 'transaction_date' are becoming slower as the table grows to include millions of rows. Which optimization strategy should be applied to the table to improve the performance of these specific queries?

Accepted Answer

Define 'transaction_date' as the Sort Key for the table.

Answer

Create a secondary index on the 'transaction_date' column.

Answer

Set the 'transaction_date' as the Distribution Key using DISTSTYLE KEY.

Answer

Alter the table to use DISTSTYLE ALL to replicate data across all nodes.

Question 14

A data engineering team monitors an Amazon Redshift cluster and notices that a specific query is consuming a disproportionate amount of memory, slowing down other critical reporting tasks. They want to automatically abort any query that runs for longer than 60 seconds to ensure consistent performance. Which feature should they use?

Accepted Answer

Redshift Query Monitoring Rules (QMR) within Workload Management (WLM)

Answer

AWS Step Functions with a Wait state

Answer

Amazon Redshift Spectrum scheduled tasks

Answer

AWS Lambda function triggered by CloudWatch Logs

Question 15

A data engineer is designing a highly available and fault-tolerant ingestion pipeline that collects clickstream data from a website and writes it to an Amazon S3 data lake in near real-time. The engineer must ensure that the data is compressed into Parquet format before it reaches S3 to reduce query costs in Amazon Athena. Which solution implements this with the least operational overhead?

Accepted Answer

Use Amazon Kinesis Data Firehose with its built-in data format conversion feature.

Answer

Use AWS Glue ETL jobs scheduled every minute to read raw JSON from S3 and write it back as Parquet.

Answer

Ingest data via Amazon MSK and use a custom Kafka Connect sink to write Parquet files to S3.

Answer

Send data to Amazon Kinesis Data Streams and use an AWS Lambda function to manually convert and compress each record.

Question 16

A data engineer is designing a disaster recovery strategy for an Amazon Redshift cluster. The business requirement specifies that the data must be available in a secondary AWS Region if the primary Region experiences an outage. The Recovery Point Objective (RPO) is less than 24 hours. Which approach provides this capability with the minimum operational effort?

Accepted Answer

Enable cross-region snapshot copy for the Redshift cluster.

Answer

Use AWS Glue to perform an hourly Export-Transform-Load (ETL) into a Redshift cluster in the secondary Region.

Answer

Configure Amazon Redshift Spectrum to point to a global Amazon Aurora database.

Answer

Set up an Amazon S3 Cross-Region Replication (CRR) and manually rebuild the cluster from S3 data during a disaster.

Question 17

A fintech company needs to process millions of stock market transactions per second with sub-second latency and store the raw data in an Amazon S3-based data lake for historical analysis. Which architecture provides the most scalable and cost-effective solution?

Accepted Answer

Ingest data using Amazon Kinesis Data Streams, process it with Amazon Kinesis Data Analytics for real-time insights, and use Amazon Kinesis Data Firehose to batch and write data to Amazon S3.

Answer

Send transactions to an Amazon SQS queue, trigger an AWS Lambda function for each message, and write individual files to Amazon S3.

Answer

Ingest data into an Amazon RDS instance, then use an AWS Glue crawler to move the data to Amazon S3 every hour.

Answer

Use Amazon MSK to stream data directly to Amazon Redshift, then use UNLOAD to move data to Amazon S3.

Question 18

An e-commerce company wants to implement a weekly ETL process that transforms nested JSON logs from S3 into Parquet format for Amazon Redshift Spectrum. They need to minimize costs and manage job dependencies. Which approach is best?

Accepted Answer

Use AWS Glue ETL jobs with Job Bookmarks enabled and orchestrate the workflow using AWS Glue Workflows.

Answer

Run an Amazon Athena 'INSERT INTO' query every hour and manually monitor the execution.

Answer

Use a persistent Amazon EMR cluster running 24/7 with Apache Spark and Cron jobs for scheduling.

Answer

Use AWS Step Functions to trigger individual AWS Lambda functions that process the data in memory for up to 15 minutes.

Question 19

A healthcare provider needs to perform complex analytical queries on 500 TB of historical patient records. The data is currently in Amazon S3. The solution must provide high performance for complex joins while keeping storage costs low. What is the most architecturaly sound approach?

Accepted Answer

Use Amazon Redshift with RA3 instances and Redshift Managed Storage, using Redshift Spectrum to query infrequently accessed cold data on S3.

Answer

Load all 500 TB into an Amazon RDS PostgreSQL database using Multi-AZ for performance.

Answer

Use Amazon DynamoDB with Global Secondary Indexes to perform the complex joins and aggregations.

Answer

Import all data into Amazon OpenSearch Service and use the SQL plugin for complex relational joins.

Question 20

An IoT company receives sensor readings from 100,000 devices via Amazon Kinesis Data Streams. They need to perform a sliding window calculation to find the average temperature every 5 minutes. Which service should they use for this streaming transformation?

Accepted Answer

Amazon Kinesis Data Analytics

Answer

AWS Glue DataBrew

Answer

Amazon S3 Select

Answer

Amazon Redshift Spectrum

Question 21

A SaaS provider needs to move data from an Amazon Aurora database to an Amazon S3 data lake. The data contains PII that must be masked before storage. Which solution is most efficient for a recurring daily schedule?

Accepted Answer

AWS Glue ETL with the 'Detect PII' transform, scheduled via a Glue Trigger.

Answer

Use S3 Batch Operations to run a Python script on every object after it is uploaded to the bucket.

Answer

Copy the database snapshots to S3 and use Amazon Macie to mask the data after it is stored.

Answer

Use Amazon Kinesis Video Streams to capture database changes and mask them using Amazon Rekognition.

Question 22

A data engineer needs to join two massive tables in Amazon Redshift: 'Sales' (10 billion rows) and 'Products' (1,000 rows). Which distribution style for the 'Products' table will result in the best query performance?

Accepted Answer

DISTSTYLE ALL

Answer

DISTSTYLE EVEN

Answer

DISTSTYLE AUTO

Answer

DISTSTYLE KEY

Question 23

A company wants to store data in Amazon S3 for long-term archiving. The data is rarely accessed but must be available within minutes if requested. Which S3 storage class is most cost-effective for this specific requirement?

Accepted Answer

S3 Glacier Flexible Retrieval

Answer

S3 Glacier Deep Archive

Answer

S3 Standard

Answer

S3 Intelligent-Tiering

Question 24

To optimize the cost of an Amazon Redshift cluster used only for business hours (9 AM to 5 PM), which feature should the data engineer implement?

Accepted Answer

Pause and Resume

Answer

Elastic Resize

Answer

Query Queuing

Answer

Concurrency Scaling

Question 25

A media company wants to crawl their S3 data lake to populate a Data Catalog. The data is stored in folders partitioned by 'year', 'month', and 'day'. How does this partitioning benefit Amazon Athena queries?

Accepted Answer

It reduces the amount of data scanned by filtering only the relevant S3 prefixes, lowering costs.

Answer

It increases the maximum file size limit in S3.

Answer

It allows Athena to perform updates and deletes on individual rows.

Answer

It automatically encrypts the data using AES-256.

Question 26

A developer needs to orchestrate a complex pipeline involving AWS Glue, Amazon EMR, and AWS Lambda with sophisticated error handling and conditional branching. Which service is most suitable?

Accepted Answer

AWS Step Functions

Answer

S3 Event Notifications

Answer

AWS Glue Triggers

Answer

Amazon EventBridge

Question 27

A global IoT company needs to ingest data from 50,000 sensors. The requirement is to ensure the data is ordered by the 'DeviceID' for downstream processing in a custom consumer application, and it must support a retention period of 7 days for potential re-processing. Which Kinesis configuration is most appropriate?

Accepted Answer

Use Amazon Kinesis Data Streams and set 'DeviceID' as the Partition Key, while increasing the stream retention period to 168 hours.

Answer

Use Amazon Kinesis Video Streams to capture the sensor metadata and set the metadata TTL to 7 days.

Answer

Use Amazon Kinesis Data Firehose with an AWS Lambda transformation to sort data by DeviceID before delivery to S3.

Answer

Use Amazon Kinesis Data Streams with a single shard to ensure total order across all devices, regardless of the Partition Key.

Question 28

A SaaS provider uses an Amazon Redshift cluster to store subscription and usage data for its clients. Several times a day, large-scale nightly reporting queries cause performance degradation for the real-time dashboards used by support staff. The company wants a cost-effective solution to ensure that reporting queries do not impact the dashboard performance without manually resizing the cluster. Which feature should they implement?

Accepted Answer

Enable Workload Management (WLM) with Concurrency Scaling to automatically add transient capacity when reporting queries begin to queue.

Answer

Execute the reporting queries using Amazon Redshift Spectrum against a copy of the data stored in Amazon S3.

Answer

Use Elastic Resize to permanently add more nodes to the cluster before the reporting jobs start.

Answer

Migrate the entire dataset to an Amazon RDS Aurora Global Database to handle the read capacity.

Question 29

A healthcare analytics company manages an Amazon S3 data lake containing sensitive patient records. They utilize an AWS Glue ETL job to transform these records into a schema suitable for analysis. Due to strict compliance requirements, the company must ensure that any sensitive data (PII) is automatically identified and masked during the ETL process before the data is written to the destination. Which approach is the most efficient and native way to achieve this using AWS Glue?

Accepted Answer

Use the 'Detect Sensitive Data' transform within the AWS Glue ETL job to identify PII and apply a 'Masking' action on those specific columns.

Answer

Configure an S3 Object Lambda to intercept every read request and apply a custom Python scrubbing script.

Answer

Store the data in Amazon RDS and use SQL triggers to check for PII patterns using regular expressions.

Answer

Enable Amazon Macie on the destination bucket to delete any objects that contain unmasked PII after the Glue job completes.

Question 30

A global IoT company collects telemetry data from 1 million sensors every minute and stores it in an Amazon S3 data lake. A data engineer needs to join this high-volume sensor data with a small 50 MB 'Device Metadata' CSV file stored in S3 to filter for specific regions. The join must be performed using an AWS Glue ETL Spark job. Which optimization technique will most significantly improve the join performance and reduce data shuffling across the cluster?

Accepted Answer

Read the metadata file as a DynamicFrame and use the 'Broadcast' join hint to keep the metadata in memory on all worker nodes.

Answer

Partition the sensor data in S3 using the 'Region' column before running the Glue job.

Answer

Convert the 50 MB metadata CSV into a 10 TB Amazon Redshift table and perform a federated query.

Answer

Increase the number of Glue Workers to G.2X to provide more local disk space for shuffling.

Question 31

A data engineer needs to ingest a continuous stream of sensor data and transform it in near real-time before loading it into Amazon S3 for long-term storage. Which combination of services provides the most efficient managed solution with minimal custom code?

Accepted Answer

Kinesis Data Firehose with an AWS Lambda function for transformation

Answer

Amazon RDS with an AWS Glue Job scheduled every hour

Answer

Kinesis Data Streams with a custom consumer application on Amazon EC2

Answer

AWS Glue Crawlers writing directly to an Amazon Redshift cluster

Question 32

A company wants to store data that is accessed infrequently but must be available immediately (within milliseconds) when requested. Which Amazon S3 storage class is the most cost-effective for this scenario?

Accepted Answer

S3 Standard-Infrequent Access (S3 Standard-IA)

Answer

S3 Glacier Flexible Retrieval

Answer

S3 Glacier Deep Archive

Answer

S3 Intelligent-Tiering

Question 33

Which AWS Glue component is primarily responsible for scanning data sources, identifying data formats, and populating the AWS Glue Data Catalog with table definitions?

Accepted Answer

Glue Crawler

Answer

Glue Workflow

Answer

Glue DataBrew

Answer

Glue Job

Question 34

A data engineer is designing a pipeline where an AWS Glue Job needs to read data from an Amazon S3 bucket. What is the most secure way to grant the Glue Job the necessary permissions?

Accepted Answer

Create an IAM Role with the required S3 permissions and attach it to the Glue Job

Answer

Enable public access on the S3 bucket

Answer

Store an IAM User's Access Key and Secret Key in the Glue script

Answer

Provide the Glue Job with the root account credentials

Question 35

When comparing Amazon RDS and Amazon Redshift for a data engineering project, which statement best describes the primary use case for Redshift?

Accepted Answer

It is a column-oriented database optimized for complex analytical queries (OLAP) on large datasets

Answer

It is a row-oriented database optimized for high-volume transactional processing (OLTP)

Answer

It is a managed file storage service used for big data processing

Answer

It is a NoSQL database designed for sub-millisecond document retrieval

Question 36

A data pipeline requires a streaming service that allows multiple consumers to read the same data stream independently and supports data retention for up to 365 days. Which service should be used?

Accepted Answer

Kinesis Data Streams

Answer

Kinesis Data Firehose

Answer

AWS Glue Crawlers

Answer

Amazon SQS

Question 37

In AWS Glue, what is the primary function of a 'Script' within a Glue Job?

Accepted Answer

To contain the ETL logic that transforms the data from source to target

Answer

To store the metadata and schema definitions of the database

Answer

To manage the encryption keys for the S3 buckets

Answer

To schedule the intervals at which the crawler runs

Question 38

Which S3 storage class provides the lowest cost for long-term archival of data that only needs to be retrieved once or twice a year, with a retrieval time of 12 hours?

Accepted Answer

S3 Glacier Deep Archive

Answer

S3 Glacier Instant Retrieval

Answer

S3 One Zone-IA

Answer

S3 Standard

Question 39

What happens if an AWS Glue Crawler finds a change in the schema of the source data compared to the existing table in the Data Catalog?

Accepted Answer

It can be configured to update the table definition with the new schema

Answer

It stops the execution of all associated Glue Jobs

Answer

It automatically deletes the data in the source bucket

Answer

It creates a new IAM Role for the data owner

Question 40

A company needs to implement a data lake on AWS. They want to ensure that access to specific S3 folders used by different data engineering teams is strictly controlled based on the principles of least privilege. Which AWS service is primarily used to define these access permissions?

Accepted Answer

AWS Identity and Access Management (IAM)

Answer

AWS Glue DataBrew

Answer

Amazon CloudWatch

Answer

Amazon Athena

Question 41

An analytics company needs to provide SQL-based querying capabilities on data stored in an Amazon S3 data lake without the overhead of managing infrastructure or loading the data into a database. Which service is best suited for this serverless ad-hoc analysis?

Accepted Answer

Amazon Athena

Answer

AWS Glue DataBrew

Answer

Amazon RDS

Answer

Amazon Redshift

Question 42

A data engineer is designing a disaster recovery plan for a data warehouse. They need a cost-effective storage solution for database backups that are used less than once a year, but the data must be stored across multiple Availability Zones for high durability. Which S3 storage class meets these requirements?

Accepted Answer

S3 Glacier Deep Archive

Answer

S3 One Zone-IA

Answer

S3 Intelligent-Tiering

Answer

S3 Standard

Question 43

A data engineer needs to ingest real-time data into an S3 bucket and prefers a managed service that can automatically scale and requires no manual management of shards or consumer applications. Which service is the best fit?

Accepted Answer

Amazon Kinesis Data Firehose

Answer

Amazon Managed Streaming for Apache Kafka (MSK)

Answer

Amazon Kinesis Data Streams

Answer

AWS Glue Extractors

Question 44

A data engineer is building a pipeline where an AWS Glue Job must access a database in a private subnet of an Amazon VPC. Which AWS Glue component must be configured to allow the job to communicate with the database?

Accepted Answer

Glue Connection

Answer

Glue Data Catalog

Answer

Glue Trigger

Answer

Glue Crawler

Question 45

A data engineer is designing a pipeline where streaming data flows into Amazon Kinesis. They need a custom-built consumer application to process the stream with a response time of less than

200

milliseconds. Which service should be selected to provide the required low-latency, dedicated throughput for multiple independent consumers?

Accepted Answer

Kinesis Data Streams with Enhanced Fan-out

Answer

Kinesis Data Streams with standard consumers

Answer

Kinesis Data Firehose with S3 destination

Answer

AWS Glue Streaming ETL jobs

Question 46

A data engineer needs to move data from an on-premises Oracle database to an Amazon S3 bucket on a weekly basis using an AWS Glue Job. Because the database is behind a corporate firewall, the Glue Job must securely access the on-premises network. Which IAM configuration is essential for the Glue Job's execution role to achieve this?

Accepted Answer

The role must have the 'AdministratorAccess' permission and the 'AWSGlueServiceRole' policy attached.

Answer

The role must be granted 's3:PutObject' access to the corporate firewall's internal logs.

Answer

The role must have a policy that allows 'ec2:CreateNetworkInterface' and 'ec2:DescribeNetworkInterfaces' to run in a VPC.

Answer

The role must include a trust relationship with the on-premises Oracle Identity Manager.

Question 47

A data engineer needs to select an Amazon Redshift feature to handle unpredictable, short bursts of high-volume queries from a data visualization tool without impacting the performance of standard extract, transform, and load (ETL) operations. Which feature should be used?

Accepted Answer

Concurrency Scaling

Answer

Redshift Spectrum

Answer

Classic Resize

Answer

Aqua (Advanced Query Accelerator)

Question 48

A data engineer is designing a data lake and needs to decide between using an Amazon S3 bucket versus an Amazon RDS instance for raw data storage. Which factor most strongly favors choosing S3 over RDS for storing large volumes of unstructured satellite imagery data?

Accepted Answer

S3 is an object store that provides virtually unlimited scalability for any file type at a lower cost per GB.

Answer

S3 provides built-in SQL indexing that is faster than RDS for complex relational schema lookups.

Answer

S3 is a block storage service that allows for faster random-access writes to database files.

Answer

S3 requires less security configuration since it does not support IAM roles or bucket policies.

Question 49

A data engineer is configuring an AWS Glue Job to process sensitive customer data. To ensure that the job can only be triggered by specific automated events and that the data processed is encrypted at rest using a customer-managed key, which combination of AWS features should be implemented?

Accepted Answer

Use AWS Glue Triggers for automation and AWS Key Management Service (KMS) for encryption

Answer

Use Kinesis Data Streams for automation and AWS Secrets Manager for encryption

Answer

Use Amazon RDS Multi-AZ for automation and S3 Server-Side Encryption (SSE-S3) for encryption

Answer

Use AWS Glue Crawlers for automation and IAM User Groups for encryption

Question 50

Which AWS service provides a centralized console for managing fine-grained access control, such as column-level security, for data stored in an S3-based data lake?

Accepted Answer

AWS Lake Formation

Answer

Amazon GuardDuty

Answer

AWS Secrets Manager

Answer

Amazon CloudWatch

Question 51

Which IAM policy element is used to grant a user permission to perform a specific action only if the request is encrypted via TLS?

Accepted Answer

Condition

Answer

Effect

Answer

Resource

Answer

Principal

Question 52

To track which user deleted a specific Amazon S3 bucket, which AWS service should a data engineer consult for API call history?

Accepted Answer

AWS CloudTrail

Answer

Amazon Inspector

Answer

AWS Trusted Advisor

Answer

AWS Config

Question 53

What is the primary purpose of a KMS Key Policy?

Accepted Answer

To define which users or roles have permission to use or manage the KMS key

Answer

To automatically rotate passwords for RDS databases

Answer

To set the billing limit for encryption operations

Answer

To monitor the network traffic of an EC2 instance

Question 54

Which encryption method requires the client to encrypt data before sending it to Amazon S3?

Accepted Answer

Client-Side Encryption

Answer

SSE-KMS

Answer

SSE-S3

Answer

SSE-C

Question 55

How can a data engineer implement data masking for sensitive PII data in Amazon Redshift without changing the physical data?

Accepted Answer

Dynamic Data Masking

Answer

IAM Role Assumption

Answer

S3 Lifecycle Policies

Answer

VPC Peering

Question 56

What is the benefit of using an IAM Role instead of a permanent IAM User Access Key for an application running on EC2?

Accepted Answer

It provides temporary security credentials that rotate automatically

Answer

It converts all data to ciphertext automatically

Answer

It eliminates the need for any VPC security groups

Answer

It allows for faster data transfer speeds

Question 57

In AWS Lake Formation, what does 'LF-TBAC' stand for regarding access control?

Accepted Answer

Lake Formation Tag-Based Access Control

Answer

Linked Frame Type-Based Access Console

Answer

Local Form Token-Based Authentication Check

Answer

Logical File Transparent Access Code

Question 58

When using SSE-KMS for S3 encryption, which log records the usage of the key to decrypt an object?

Accepted Answer

AWS CloudTrail

Answer

Amazon Athena Audit

Answer

VPC Flow Logs

Answer

S3 Access Logs

Question 59

Which type of KMS key is managed by AWS services and cannot be deleted by the customer?

Accepted Answer

AWS Managed Keys

Answer

Imported Material Keys

Answer

Customer Managed Keys

Answer

Asymmetric Keys

Question 60

Which specific Lake Formation permission must be granted to a user to allow them to view only a subset of columns within a Glue Data Catalog table?

Accepted Answer

Select permission with column-level filtering

Answer

Alter permission with row-level security

Answer

Drop permission using IAM conditional keys

Answer

Describe permission on the database level

Question 61

An AWS Data Engineer needs to ensure that all data moving between an on-premises data center and Amazon S3 is protected from interception. Which approach specifically addresses security for 'data in transit'?

Accepted Answer

Using SSL/TLS to interact with AWS service endpoints

Answer

Implementing S3 Object Lock in compliance mode

Answer

Using KMS to encrypt the data at rest

Answer

Enabling AES-256 encryption on the S3 bucket

Question 62

A data engineer needs to prevent an IAM user from viewing sensitive 'SSN' columns in an Amazon Athena query while allowing them to see all other data. What is the most efficient way to enforce this using AWS Lake Formation?

Accepted Answer

Apply a Data Cell Filter to exclude the specific column for that user

Answer

Create a separate S3 bucket for every single column

Answer

Encrypt the 'SSN' column with a KMS key the user cannot access

Answer

Remove the 's3:GetObject' permission from the user's IAM policy

Question 63

Which specific AWS KMS key type allows a data engineer to use the same key material across multiple AWS Regions to simplify the decryption of replicated data sets?

Accepted Answer

Multi-Region keys

Answer

Asymmetric keys

Answer

Symmetric AWS Managed keys

Answer

CloudHSM backed keys

Question 64

An AWS Data Engineer needs to ensure that a specified IAM role can only access an Amazon S3 bucket if the request originates from a private corporate network. Which policy element should they use to enforce this security governance?

Accepted Answer

A Condition block with 'aws:SourceIp'

Answer

A Resource block with 'arn:aws:s3:::bucket-name/*'

Answer

An Action block with 's3:GetBucketLocation'

Answer

An Effect block set to 'Allow'

Related quizzes

Related quizzes