
Reliable Google Cloud Certified Professional-Data-Engineer Dumps PDF Jan 19, 2023 Recently Updated Questions
Pass Your Google Professional-Data-Engineer Exam with Correct 270 Questions and Answers
Introduction
Data engineers are responsible for finding trends in data sets and developing algorithms to help make raw data more useful to the enterprise. This IT role requires a significant set of technical skills, including a deep knowledge of SQL database design and multiple programming languages They collect, transform, and visualize data. The Data Engineer designs, builds, maintains, and troubleshoots data processing systems with a particular emphasis on the security, reliability, fault-tolerance,scalability, fidelity, and efficiency of such systems.
NEW QUESTION 77
You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline? (Choose two.)
- A. Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery
- B. Increase the number of max workers
- C. Use a larger instance type for your Cloud Dataflow workers
- D. Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
- E. Change the zone of your Cloud Dataflow pipeline to run in us-central1
Answer: C,D
NEW QUESTION 78
Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud?
- A. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional storage bucket as a final destination.
- B. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
- C. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.
- D. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
Answer: C
NEW QUESTION 79
Which of these are examples of a value in a sparse vector? (Select 2 answers.)
- A. [0, 1]
- B. [0, 0, 0, 1, 0, 0, 1]
- C. [1, 0, 0, 0, 0, 0, 0]
- D. [0, 5, 0, 0, 0, 0]
Answer: A,C
Explanation:
Categorical features in linear models are typically translated into a sparse vector in which each possible value has a corresponding index or id. For example, if there are only three possible eye colors you can represent 'eye_color' as a length 3 vector: 'brown' would become [1, 0, 0], 'blue' would become [0, 1, 0] and 'green' would become [0, 0, 1]. These vectors are called "sparse" because they may be very long, with many zeros, when the set of possible values is very large (such as all English words).
[0, 0, 0, 1, 0, 0, 1] is not a sparse vector because it has two 1s in it. A sparse vector contains only a single
1.
[0, 5, 0, 0, 0, 0] is not a sparse vector because it has a 5 in it. Sparse vectors only contain 0s and 1s.
Reference: https://www.tensorflow.org/tutorials/linear#feature_columns_and_transformations
NEW QUESTION 80
Which of the following statements is NOT true regarding Bigtable access roles?
- A. To give a user access to only one table in a project, you must configure access through your application.
- B. To give a user access to only one table in a project, grant the user the Bigtable Editor role for
that table. - C. Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
- D. You can configure access control only at the project level.
Answer: B
Explanation:
For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:
Read from, but not write to, any table within the project.
Read from and write to any table within the project, but not manage instances.
Read from and write to any table within the project, and manage instances.
NEW QUESTION 81
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set.
You want to increase the AUC of the model. What should you do?
- A. Train a classifier with deep neural networks, because neural networks would always beat SVMs
- B. Deploy the model and measure the real-world AUC; it's always higher because of generalization
- C. Perform hyperparameter tuning
- D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC
Answer: C
NEW QUESTION 82
You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required. You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)
- A. MySQL
- B. Cassandra
- C. MongoDB
- D. Redis
- E. HBase
- F. HDFS with Hive
Answer: C,E,F
NEW QUESTION 83
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use?
(Choose three.)
- A. Supervised learning to determine which transactions are most likely to be fraudulent.
- B. Unsupervised learning to predict the location of a transaction.
- C. Unsupervised learning to determine which transactions are most likely to be fraudulent.
- D. Clustering to divide the transactions into N categories based on feature similarity.
- E. Supervised learning to predict the location of a transaction.
- F. Reinforcement learning to predict the location of a transaction.
Answer: C,D,F
Explanation:
Explanation/Reference:
NEW QUESTION 84
Which of the following is NOT true about Dataflow pipelines?
- A. Dataflow pipelines use a unified programming model, so can work both with streaming and batch data sources
- B. Dataflow pipelines can be programmed in Java
- C. Dataflow pipelines can consume data from other Google Cloud services
- D. Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner
Answer: D
Explanation:
Dataflow pipelines can also run on alternate runtimes like Spark and Flink, as they are built using the Apache Beam SDKs Reference: https://cloud.google.com/dataflow/
NEW QUESTION 85
You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost- sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?
- A. Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.
- B. Increase the size of your parquet files to ensure them to be 1 GB minimum.
- C. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.
- D. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
Answer: D
NEW QUESTION 86
Your United States-based company has created an application for assessing and responding to user actions. The primary table's data volume grows by 250,000 records per second. Many third parties use your application's APIs to build the functionality into their own frontend applications. Your application's APIs should comply with the following requirements:
* Single global endpoint
* ANSI SQL support
* Consistent access to the most up-to-date data
What should you do?
- A. Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.
- B. Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.
- C. Implement BigQuery with no region selected for storage or processing.
- D. Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.
Answer: D
NEW QUESTION 87
All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud Bigtable node.
- A. after
- B. once
- C. only if
- D. before
Answer: D
Explanation:
In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.
The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.
When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.
NEW QUESTION 88
You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud.
You want to support transactions that scale horizontally. You also want to optimize data for range queries on non-key columns. What should you do?
- A. Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.
- B. Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.
- C. Use Cloud Spanner for storage. Add secondary indexes to support query patterns.
- D. Use Cloud SQL for storage. Add secondary indexes to support query patterns.
Answer: A
Explanation:
Explanation/Reference:
Reference: https://cloud.google.com/solutions/data-lifecycle-cloud-platform
NEW QUESTION 89
You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?
- A. BigQuery cannot be used as a sink
- B. Only batch
- C. Only streaming
- D. Both batch and streaming
Answer: D
Explanation:
When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job. When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery's streaming inserts Reference: https://cloud.google.com/dataflow/model/bigquery-io
NEW QUESTION 90
You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DTstores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRINGtype. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive. What should you do?
- A. Create a view CLICK_STREAM_V, where strings from the column DTare cast into TIMESTAMPvalues.
Reference the view CLICK_STREAM_Vinstead of the table CLICK_STREAMfrom now on. - B. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DTinto TIMESTAMPvalues. Run the query into a destination table NEW_CLICK_STREAM, in which the column TSis the TIMESTAMPtype. Reference the table NEW_CLICK_STREAMinstead of the table CLICK_STREAMfrom now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.
- C. Delete the table CLICK_STREAM, and then re-create it such that the column DTis of the TIMESTAMPtype.
Reload the data. - D. Add two columns to the table CLICK STREAM: TSof the TIMESTAMPtype and IS_NEWof the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEWto true. For future queries, reference the column TSinstead of the column DT, with the WHEREclause ensuring that the value of IS_NEWmust be true.
- E. Add a column TSof the TIMESTAMPtype to the table CLICK_STREAM, and populate the numeric values from the column TSfor each row. Reference the column TSinstead of the column DTfrom now on.
Answer: D
NEW QUESTION 91
You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do?
- A. Create a table on Cloud Spanner, and insert and delete rows with the job information
- B. Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them
- C. Create an API using App Engine to receive and send messages to the applications
- D. Create a table on Cloud SQL, and insert and delete rows with the job information
Answer: B
Explanation:
Pubsub is used to transmit data in real time and scale automatically.
NEW QUESTION 92
You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?
- A. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below
4000, send an alert. - B. Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
- C. Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
- D. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert.
Answer: D
NEW QUESTION 93
You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?
- A. Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
- B. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below 4000, send an alert.
- C. Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour.
Compute the average when the window closes, and send an alert if the average is less than 4000 messages. - D. Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert.
Answer: D
NEW QUESTION 94
......
Latest 2023 Realistic Verified Professional-Data-Engineer Dumps: https://www.testinsides.top/Professional-Data-Engineer-dumps-review.html
Pass Professional-Data-Engineer Exam Updated 270 Questions: https://drive.google.com/open?id=1Bfvr-DNmJKGacHhfx1gCVH7LSmfrQQgo