GCP Flowcharts (Cheatsheet for ACE and PCA GCP certification)

Please note that I have no association with any training companies or third parties linking through to this post. This post is freely available to help folks understand Google Cloud!

So it’s easy to find the one you want. This single post also allows me to maintain an up to date collection from one place.

Once I have more than 1 flowchart for a topic/ area I will create a new heading ,for now those singletons are under misc.

Attribution: All graphics & flowcharts apart from ones I drew myself & Sara’s cheerfully copied from the Google Cloud platform or blog site

Latest additions - May 2021: Authenticating service accounts & choosing private access options ( Security)

😀

Compute

Which compute option ?

Even with the increasing popularity of serverless options traditional Compute options are very much in demand. I know I know I’m using traditional and including App engine & Kubernetes but even k8s is 5 years old now ( at the time of writing June 2019) so I think I can get away with that :-) So choosing a traditional compute option flowchart is still very much valid

GCP has a continuum of compute options which can be graphically depicted as:

alt_text

It may be obvious at either end of the continuum which option you choose but the decision becomes less straigh tforward in the middle so flowchart to the rescue :

alt_text

The compute flowchart with accompanying words can be found here and a nice table comparing the compute options is here.

Which Serverless (compute) Option?

If you want access to compute power where you just want to write the code and not have to worry about the underlying infrastructure then the serverless options are for you. Basically GCP takes care of the servers that are actually lurking way underneath the abstraction for you as well as the provisioning ( scaling up & down ) .

alt_text

GKE by itself is not serverless as fits this description as you still have to define and configure way too much it’s not just a here’s my code and here you go through but it does provide the platform for a serverless platform as you can see in the flow chart. But the sharp eyed amongst you may have noticed that Apo Engine can be considered a serverless service although it’s also included in the what I call traditional compute option

The flow chart and words about GCP serverless options can be found here There’s also a product comparison table

Sizing & scoping GKE clusters to meet your use case

Determining the number of GKE ( Google kubernetes engine) clusters and the size of the clusters required for your workloads requires looking at a number of factors. The article Choose size and scope of Kubernetes engine discusses these factors. Alas it’s sadly lacking a flowchart so I’ve addressed that for you ( maybe at some point the article will include a flowchart ). I know it seems I have created 2 mini charts but then it was a post about sizng & scoping your GKE clusters !

alt_text

The words discussing the decision points are all in the article

Serverless Scaling Strategies

Write code, deploy it and the scaling will happen automagically for you thats the usp of “serverless” . That may be mostly true if your full stack auto scales but in a lot of cases that isn’t the case and suddenly you do need to start worrying about backend services such as a database for example that has rate and connection limits. To help you with architecting your serverless applications built with GCP so they scale effectively my colleague @ptone wrote about 6 strategies you can adopt here . And yes he included a flowchart for your delectation to help you figure out which strategy is the right one for your use case : alt_text

If after admiring that flowchart you want to dive deeper into rate limiting techniques using GCP there’s this

Storage and Data

What Storage type?

Data data data data data! ( Sung to the 60’s Batman theme music) . I struggle to think of any application where data isn’t a thing . The myriad ways you can store your data is probably after considering the security controls needed the most important decision you need to make. Google Cloud has your back with some useful tables (I love tables too) which can be found here and here’s a complementary flowchart to help you decide which storage option fits your use case

alt_text

How to select the appropriate way to transfer data sets to GCP for your use case

Transferring large data sets to GCP ( or indeed any cloud) means that you have to consider two initial questions How much data do you need to transfer? and how long have you got to get that data to GCP? In this case we are really focusing on getting large volumes of data to Cloud Storage. This then leads onto the other questions that you need to consider to allow you to determine what transfer method may meet your use case . How are you connected to GCP? How much bandwidth is actually available between your source and GCP? The article on Transferring big data sets to GCP discusses the information you need to determine the connectivity required and what methods to choose. It has a flowchart and the one below is a slightly modified version of the one found in the article.

alt_text

Choosing a Cloud Storage class for your use case

Cloud Storage (GCS) is a fantastic service which is suitable for a variety of use cases. The thing is it has different classes and each class is optimised to address different use cases. All the storage classes offer low latency (time to first byte typically tens of milliseconds) and high durability. You can use the same APiIs , lifecycle rules etc . Basically the classes differ by their availability, minimum storage durations, and charges for storage and access.

There are 4 classes that you need to care about .

Multi regional — geo redundant storage optimised for storing data that is frequently accessed (“hot” objects) for example web site serving and multi media streaming.

Regional — Data can be stored at lower cost, with the trade-off of data being stored in a specific regional location, instead of having redundancy distributed over a large geographic area. This is ideal for when you need the data to be close to the computing resources that process the data say for when using Dataproc.

Nearline — Nearline Storage is ideal for data you plan to read or modify on average once a month or less. Nearline Storage data stored in multi-regional locations is redundant across multiple regions, providing higher availability than Nearline Storage data stored in regional locations. This is great for backups . You should be carrying out regular DR fire drills at least once a month which includes recovering your data from your backups !

Coldline- a very low cost, highly durable storage service. It is the best choice for data that you plan to access at most once a year, due to its slightly lower availability, 90-day minimum storage duration, costs for data access, and higher per-operation costs. This is ideal for long term archiving use cases

Here’s a flow chart that helps you decide which storage class is appropriate for your use case when you don’t feel like reading too many words to figure out your choices ( which after all is what flowcharts are for ) .

alt_text

For an overview of the GCS storage classes see here

Data processing - Cloud Dataflow versus Cloud Dataproc

If you have lots of files that need processing you may already be familiar with the Hadoop /Spark ecoystem and you would probably use GCP’s Cloud Dataproc as the path of least resistance. But GCP also has a unified batch & stream service Cloud Dataflow which is their managed Apache beam . Cloud Dataflow is a service unlike Dataproc where you don’t need to worry about the compute so it’s a “serverless” service because GCP takes care of provisioning and managing the compute on your behalf. GCP have created a handy flowchart for you which can be found on both the Cloud Dataflow & Cloud Dataproc landing pages with more words than I have here.

alt_text

Security

How to manage encryption keys

GCP has a continuum of ways for you to manage your encryption keys graphically depicted as

alt_text

Yes I know that the continuum graphic alone is probably all you need but when the announcement for the KMS service was made they produced a flow chart and I Just had to include it here

alt_text

The words that go with the above can be found here and a nice table that compliments the flow chart can be found here at the Encryption at rest landing page . ( Everything you ever wanted to know about Encryption at rest on GCP and more !)

Which Authentication option ?

I was torn about keeping this one in this list but in the end I decided to keep it as it was still valid and the flowchart below it on using GCP’s Identity platform complemented rather than replaced it. This is one of my own flowcharts as at the time I wrote the original medium post GCP didn’t have one for this yet!! Then in Dec 2nd 2017: Neal Mueller responded to my hint about wanting a GCP flowchart for Authentication and it’s so much prettier than my version 😊 so I updated the flowchart below with the prettier version! Thanks Neal.

So just to make sure we are on the same page authentication identifies who you are ! This flowchart is focused on whether its identity — > application ( deployed on GCP) or identity — > direct access to GCP

alt_text

and as I haven’t written the words to go with this flowchart I’ve left you a few links instead:

Firebase Authentication

Service Accounts

GAE User authentication options

Cloud IoT using JSON Web Tokens

Cloud Identity

Need an identity mgt product?

How you manage your identities depends on the use case. Need to manage users who will have direct access to GCP resources versus users who need access to an application that you’re hosting on GCP? Different requirements and thus different solutions required. Here’s a flowchart to help you figure out out the right solution for your use case.

alt_text

I will get round to updating this flowchart one day to reflect the name change from CICP to identity platform . The words that go with the flowchart can be found here.

Securing your GKE end points

Arguably this flowchart could be catalogued under Compute but as it’s about securing end points under security it goes. The idea for this flowchart arose after my team had the discussion re what option would be appropriate for what use case when you want to secure your end points using GKE. So thanks team for the inspiration for this one

When a GKE operator wants to serve content from GKE and secure it they have a number of ways of addressing this depending on the use case as shown in this flow chart:

alt_text

APi’s exposed outside of your GKE cluster then use Apigee edge which provides a way to manage your API’ss acting as a proxy to them. It can provide services such as security e.g is that call to your API authorized.

If you are looking at service to service security within the cluster then Istio is the mesh for you

if you are wanting to authenticate access to your web apps it depends on whether they are internal users or external. For internal users then Cloud IAP is where you need to stop and have a look while for end users Identity platform is the stop you need.

You can also use Istio and Apigee together. Istio can secure the communication between services, provide observability, etc while Apigee can provide external authentication, quotas and overall API policy management.

There are nuances particularly with istio which and I quote my team mate James “the lines blur a bit when looking at Istio” but starting from here isn’t a bad place to start from

Authenticating service accounts

Depending on your use case the way you configure service accounts to authenticate to Google cloud to access resources differs

The article: Best practices for using and managing service accounts

Identifies four ways you can approach authentication to meet specific use cases.

Attached service accounts - you attach the service account to the underlying compute resource. By attaching the service account, you enable the application to obtain tokens for the service account and to use the tokens to access Google Cloud APIs and resources.

For kubernetes use workload identity - Create a dedicated service account for each Kubernetes pod that requires access to Google APIs or resources. This limits the scope of access to the pod level rather than the node. For each Kubernetes pod that requires access to Google APIs or resources you create a Kubernetes service account and attach it to the pod . Workload Identity is used to create a mapping between the service accounts and their corresponding Kubernetes service accounts.

Running your application on premises or another cloud no problem you can use Workload identity federation. Workload identity federation lets you create a one-way trust relationship between a Google Cloud project and an external identity provider. Once you’ve established the trust, applications can use credentials issued by the trusted identity provider to impersonate a service account. By using workload identity federation, you can let applications use the authentication mechanisms that the external environment provides ( e.g AD FS,AWS temporary credentials ) and you avoid having to store and manage service account keys.

There are always cases where you need to do the thing we really don’t want to do and in this case it’s having to download service account keys so as loathe as I am to mention this option due to the risks involved with downloading service account keys there are just some situations it cannot be avoided . If you must do this then I would suggest using Vault secrets engine to manage service account keys. And no entry here is complete without its flowchart:

alt_text

Choosing Private access options

Accessing Google APIs and services from non public routable IP addresses is a very common configuration requirement and as you would expect there are various ways to achieve this using Google Cloud. What configuration you use ultimately boils down to having to be concerned about three things

  • If your source is on premises or it’s a Google cloud resource
  • If you need to only access resources that are supported by VPC service controls
  • Is your Google cloud source serverless or not

If your source is on premises you need to connect to a Google cloud VPC network by using Cloud VPN or Cloud Interconnect . If you need to restrict access to only those services supported by VPC Service controls you need to configure DNS, firewall rules, and routes to use one of the Private Google Access-specific domains and VIPs

Use restricted.googleapis.com when you only need access to Google APIs and services that are supported by VPC Service Controls. See the list of supported services here .

Use private.googleapis.com if you need access to any google apis or service that is not restricted by you configuring VPC Service Controls.

By configuring private google access on the subnet of your Google cloud VPC network where you have VMs without external IP addresses the VMs can also use Private Google Access-specific domains and VIPs to access Google Cloud services and APIs

An alternative configuration for VMs without external IPs is to use a private service connect endpoint in your VPC network. There are some cool use cases that this can be used with from your on-premises network as well. For example you can use your own wide-area networking instead of Google’s, to control data movement by you managing which Cloud Interconnect attachment (VLAN) is used to send traffic to Google APIs.

If your serverless environment needs to access resources in your VPC network via internal IP addresses then use Serverless VPC access . This enables you to connect from Cloud Run, Cloud Functions or App Engine Standard directly to your VPC network.

Here’s the obligatory flowchart (I used Excalidraw)

alt_text

Networking

Which Network Tier?

GCP’s network even if I say so myself is fantastic but it’s recognised that not every use case needs to optimize for performance and cost may be the driver. So welcome to Network tiers.

alt_text

You can see the funky animated gif for the above image here

alt_text

The words that go with the above can be found here . There are some useful tables there too.

Choosing a Load balancer

Load balancing is great it allows you to treat a group of compute resources as a single entity providing an entry point that has in the case of GCP load balancing services a single anycast IP address. Combining GCP Load balancers with autoscaling you can scale the resources up and down according to metrics you configure. There are loads more cool features but you get the idea. So what type of load balancing service do you need? Layer 7, layer 4, global , regional? Maybe you need an internal load balancer well there’s a flowchart for helping you decide ( Okay you knew that was coming didn’t you? 😃)

alt_text

Here are the words to go with the flowchart. Once you have figured out what load balancing option is likely to address your needs have a look at the load balancing overview page as a first stop before diving in.

Choosing the floating IP address pattern that maps to your use case

Floating IP’s are a way to allow you to move an IP address from one server to another . Typically this pattern is usually required for HA deployments or for disaster recovery scenarios. For example where you have one active server or appliance such as databases with a non serving replica /hot standby . When you have to swap to the secondary server you point the floating IP to it. This negates the need to update clients to use an alternative IP to point to the alternative server. The article On best practices for floating IP addresses has a list of uses cases for on premises and provides a number of options for implementing the pattern for Compute engine instances and yes has a flowchart to help you choose the solution for your use case . Here’s the flow chart

alt_text

Options for connecting to other clouds from GCP

Whatever the reasons ( They range from having processing in one place and data somewhere else, to distributing processing across clouds, through to DR etc) people want to be able to connect to other clouds from GCP.

GCP have written a great article describing the various patterns that can be employed and yes they have a flowchart to help you decide which pattern is the right one for your use case which I share here for your delectation:

alt_text

The article with this flowchart and a walk through of the different patterns can be found here

Data Analytics

ML or SQL ?

Always wanted to know whether you really need to use ML or whether a SQL query will suffice well Sara Robinson tweeted this flow chart

alt_text

From https://twitter.com/SRobTweets/status/1053273512079699968

She then wrote some words to augment the flowchart here and then wrote some more words walking you through figuring out if ML is a good fit for your prediction task. A SQL query may be all you need. Use the right tool for the job . I love these two posts well I do get to look at the flowchart twice !

Running Juypter notebooks on Google cloud

Jupyter notebooks are used to create and share documents that contain live code, equations, visualizations and narrative text. Their use for data science use cases is ubiquitous. Depending on your use case you need to make a decision re exactly how you manage them on Google cloud to meet the balance between the controls administrators need to apply to meet the principles of least privilege by using a hub to manage user profiles centrally , yet allowing users of the notebooks to do their jobs without restrictive controls getting in the way as they see it!

It’s a delicate balancing act and then to add to that you need to figure out what product is suitable to run your notebooks on:

This all starts to feel confusing but by starting with the question of whether the users of the notebooks need to use spark you can quickly determine what configuration meets your use case.

The article Extending AI Platform Notebooks to Dataproc and Google Kubernetes Engine has a handy flow chart that basically starts with that question and a comprehensive walkthrough to help you figure out what is the right configuration for you to run Jupyter notebooks for your use case.

alt_text

Misc

Hybrid & multi-cloud logging & monitoring patterns

Hybrid and multi-cloud architectures are here to stay and looking at ways to manage those is key to not having to wipe the tears of ops/ sysadmin staff dealing with the operational overhead. It’s important to have a consistent logging and monitoring approach not only to give a single pane of glass but to simplify the admin of managing applications in two environments. This guide discusses architectural patterns for logging & monitoring in hybrid or multi cloud environments and it’s flow chart helps navigate your choices between a centralised logging approach no matter where your apps are deployed versus a segregated approach.

alt_text

What annotations(labels) should you use for which use case

GCP has a number of ways of annotating or labelling( this can get slightly over loaded hence the use of the word annotation) resources. Each annotation has different functionality and scope, they are not mutually exclusive and you will often use a combination of them to meet your requirements so I wrote a post with added flow chart to help you navigate which annotation(s) to use for what use case. Here’s the flow chart :

alt_text

Professional Cloud Architect Practice Exam

1 - Because you do not know every possible future use for the data TerramEarth collects, you have decided to build a system that captures and stores all raw data in case you need it later. How can you most cost-effectively accomplish this goal?

  •  A. Have the vehicles in the field stream the data directly into BigQuery.
  •  B. Have the vehicles in the field pass the data to Cloud Pub/Sub and dump it into a Cloud Dataproc cluster that stores data in Apache Hadoop Distributed File System (HDFS) on persistent disks.
  •  C. Have the vehicles in the field continue to dump data via FTP, adjust the existing Linux machines, and use a collector to upload them into Cloud Dataproc HDFS for storage
  •  D. Have the vehicles in the field continue to dump data via FTP, and adjust the existing Linux machines to immediately upload it to Cloud Storage with gsutil.

Feedback

A is not correct because TerramEarth has cellular service for 200,000 vehicles, and each vehicle sends at least one row (120 fields) per second. This exceeds BigQuery's maximum rows per second per project quota[1]. Additionally, there are 20 million total vehicles, most of which perform uploads when connected by a maintenance port, which drastically exceeds the streaming project quota further.

B is not correct because although Cloud Pub/Sub is a fine choice for this application, Cloud Dataproc is probably not. The question posed asks us to optimize for cost. Because Cloud Dataproc is optimized for ephemeral, job-scoped clusters[2], a long-running cluster with large amounts of HDFS storage could be very expensive to build and maintain when compared to managed and specialized storage solutions like Cloud Storage[3].

C is not correct because the question asks us to optimize for cost, and because Cloud Dataproc is optimized for ephemeral, job-scoped clusters[2], a long-running cluster with large amounts of HDFS storage could be very expensive to build and maintain when compared to managed and specialized storage solutions like Cloud Storage[3].

D is correct because several load-balanced Compute Engine VMs would suffice to ingest 9 TB per day, and Cloud Storage is the cheapest per-byte storage offered by Google. Depending on the format, the data could be available via BigQuery immediately, or shortly after running through an ETL job. Thus, this solution meets business and technical requirements while optimizing for cost.


2 - Today, TerramEarth maintenance workers receive interactive performance graphs for the last 24 hours (86,400 events) by plugging their maintenance tablets into the vehicle. The support group wants support technicians to view this data remotely to help troubleshoot problems. You want to minimize the latency of graph loads. How should you provide this functionality?

  •  A. Execute queries against data stored in a Cloud SQL.
  •  B. Execute queries against data indexed by vehicle_id.timestamp in Cloud Bigtable.
  •  C. Execute queries against data stored on daily partitioned BigQuery tables
  •  D. Execute queries against BigQuery with data stored in Cloud Storage via BigQuery federation.

Feedback

A is not correct because Cloud SQL provides relational database services that are well-suited to OLTP workloads, but not storage and low-latency retrieval of time-series data.

B is correct because Cloud Bigtable is optimized for time-series data. It is cost-efficient, highly available, and low-latency. It scales well. Best of all, it is a managed service that does not require significant operations work to keep running.

C is not correct because BigQuery is fast for wide-range queries, but it is not as well-optimized for narrow-range queries as Cloud Bigtable is. Latency will be an order of magnitude shorter with Cloud Bigtable for this use.

D is not correct because the objective is to minimize latency, and although BigQuery federation offers tremendous flexibility, it doesn't perform as well as native BigQuery storage[2], and will have longer latency than Cloud Bigtable for narrow-range queries.


3 - Your agricultural division is experimenting with fully autonomous vehicles. You want your architecture to promote strong security during vehicle operation. Which two architecture characteristics should you consider?

  •  A. Use multiple connectivity subsystems for redundancy.
  •  B. Require IPv6 for connectivity to ensure a secure address space.
  •  C. Enclose the vehicle’s drive electronics in a Faraday cage to isolate chips.
  •  D. Use a functional programming language to isolate code execution cycles.
  •  E. Treat every microservice call between modules on the vehicle as untrusted.
  •  F. Use a Trusted Platform Module (TPM) and verify firmware and binaries on boot.

Feedback

A is not correct because this improves system durability, but it doesn't have any impact on the security during vehicle operation.

B is not correct because IPv6 doesn't have any impact on the security during vehicle operation, although it improves system scalability and simplicity.

C is not correct because it doesn't have any impact on the security during vehicle operation, although it improves system durability.

D is not correct because merely using a functional programming language doesn't guarantee a more secure level of execution isolation. Any impact on security from this decision would be incidental at best.

E is correct because this improves system security by making it more resistant to hacking, especially through man-in-the-middle attacks between modules.

F is correct because this improves system security by making it more resistant to hacking, especially rootkits or other kinds of corruption by malicious actors.


4 - Which of TerramEarth’s legacy enterprise processes will experience significant change as a result of increased Google Cloud Platform adoption?

  •  A. OpEx/CapEx allocation, LAN change management, capacity planning
  •  B. Capacity planning, TCO calculations, OpEx/CapEx allocation
  •  C. Capacity planning, utilization measurement, data center expansion
  •  D. Data center expansion,TCO calculations, utilization measurement

Feedback

A is not correct because LAN change management processes don't need to change significantly. TerramEarth can easily peer their on-premises LAN with their Google Cloud Platform VPCs, and as devices and subnets move to the cloud, the LAN team's implementation will change, but the change management process doesn't have to.

B is correct because all of these tasks are big changes when moving to the cloud. Capacity planning for cloud is different than for on-premises data centers; TCO calculations are adjusted because TerramEarth is using services, not leasing/buying servers; OpEx/CapEx allocation is adjusted as services are consumed vs. using capital expenditures.

C is not correct because measuring utilization can be done in the same way, often with the same tools (along with some new ones). Data center expansion is not a concern for cloud customers; it is part of the undifferentiated heavy lifting that is taken care of by the cloud provider.

D is not correct because data center expansion is not a concern for cloud customers; it is part of the undifferentiated heavy lifting that is taken care of by the cloud provider. Measuring utilization can be done in the same way, often with the same tools (along with some new ones).


5 - You analyzed TerramEarth’s business requirement to reduce downtime and found that they can achieve a majority of time saving by reducing customers’ wait time for parts. You decided to focus on reduction of the 3 weeks’ aggregate reporting time. Which modifications to the company’s processes should you recommend?

  •  A. Migrate from CSV to binary format, migrate from FTP to SFTP transport, and develop machine learning analysis of metrics.
  •  B. Migrate from FTP to streaming transport, migrate from CSV to binary format, and develop machine learning analysis of metrics.
  •  C. Increase fleet cellular connectivity to 80%, migrate from FTP to streaming transport, and develop machine learning analysis of metrics.
  •  D. Migrate from FTP to SFTP transport, develop machine learning analysis of metrics, and increase dealer local inventory by a fixed factor.

Feedback

A is not correct because machine learning analysis is a good means toward the end of reducing downtime, but shuffling formats and transport doesn't directly help at all.

B is not correct because machine learning analysis is a good means toward the end of reducing downtime, and moving to streaming can improve the freshness of the information in that analysis, but changing the format doesn't directly help at all.

C is correct because using cellular connectivity will greatly improve the freshness of data used for analysis from where it is now, collected when the machines are in for maintenance. Streaming transport instead of periodic FTP will tighten the feedback loop even more. Machine learning is ideal for predictive maintenance workloads.

D is not correct because machine learning analysis is a good means toward the end of reducing downtime, but the rest of these changes don't directly help at all.


6 - Your company wants to deploy several microservices to help their system handle elastic loads. Each microservice uses a different version of software libraries. You want to enable their developers to keep their development environment in sync with the various production services. Which technology should you choose?

  •  A. RPM/DEB
  •  B. Containers
  •  C. Chef/Puppet
  •  D. Virtual machines

Feedback

A is not correct because although OS packages are a convenient way to distribute and deploy libraries, they don't directly help with synchronizing. Even with a common repository, the development environments will probably deviate from production.

B is correct because using containers for development, test, and production deployments abstracts away system OS environments, so that a single host OS image can be used for all environments. Changes that are made during development are captured using a copy-on-write filesystem, and teams can easily publish new versions of the microservices in a repository.

C is not correct because although infrastructure configuration as code can help unify production and test environments, it is very difficult to make all changes during development this way.

D is not correct because virtual machines run their own OS, which will eventually deviate in each environment, just as now.


7 - Your company wants to track whether someone is present in a meeting room reserved for a scheduled meeting. There are 1000 meeting rooms across 5 offices on 3 continents. Each room is equipped with a motion sensor that reports its status every second. You want to support the data upload and collection needs of this sensor network. The receiving infrastructure needs to account for the possibility that the devices may have inconsistent connectivity. Which solution should you design?

  •  A. Have each device create a persistent connection to a Compute Engine instance and write messages to a custom application.
  •  B. Have devices poll for connectivity to Cloud SQL and insert the latest messages on a regular interval to a device specific table.
  •  C. Have devices poll for connectivity to Cloud Pub/Sub and publish the latest messages on a regular interval to a shared topic for all devices.
  •  D. Have devices create a persistent connection to an App Engine application fronted by Cloud Endpoints, which ingest messages and write them to Cloud Datastore.

Feedback

A is not correct because having a persistent connection does not handle the case where the device is disconnected.

B is not correct because Cloud SQL is a relational database and not the best fit for sensor data. Additionally, the frequency of the writes has the potential to exceed the supported number of concurrent connections.

C is correct because Cloud Pub/Sub can handle the frequency of this data, and consumers of the data can pull from the shared topic for further processing.

D is not correct because having a persistent connection does not handle the case where the device is disconnected.


8 - Your company wants to try out the cloud with low risk. They want to archive approximately 100 TB of their log data to the cloud and test the analytics features available to them there, while also retaining that data as a long-term disaster recovery backup. Which two steps should they take?

  •  A. Load logs into BigQuery.
  •  B. Load logs into Cloud SQL.
  •  C. Import logs into Stackdriver.
  •  D. Insert logs into Cloud Bigtable.
  •  E. Upload log files into Cloud Storage.

Feedback

A is correct because BigQuery is the fully managed cloud data warehouse for analytics and supports the analytics requirement.

B is not correct because Cloud SQL does not support the expected 100 TB. Additionally, Cloud SQL is a relational database and not the best fit for time-series log data formats.

C is not correct because Stackdriver is optimized for monitoring, error reporting, and debugging instead of analytics queries.

D is not correct because Cloud Bigtable is optimized for read-write latency and analytics throughput, not analytics querying and reporting.

E is correct because Cloud Storage provides the Coldline storage class to support long-term storage with infrequent access, which would support the long-term disaster recovery backup requirement.


9 - You set up an autoscaling instance group to serve web traffic for an upcoming launch. After configuring the instance group as a backend service to an HTTP(S) load balancer, you notice that virtual machine (VM) instances are being terminated and re-launched every minute. The instances do not have a public IP address. You have verified that the appropriate web response is coming from each instance using the curl command. You want to ensure that the backend is configured correctly. What should you do?

  •  A. Ensure that a firewall rule exists to allow source traffic on HTTP/HTTPS to reach the load balancer.
  •  B. Assign a public IP to each instance, and configure a firewall rule to allow the load balancer to reach the instance public IP.
  •  C. Ensure that a firewall rule exists to allow load balancer health checks to reach the instances in the instance group.
  •  D. Create a tag on each instance with the name of the load balancer. Configure a firewall rule with the name of the load balancer as the source and the instance tag as the destination.

Feedback

A is not correct because the issue to resolve is the VMs being terminated, not access to the load balancer.

B is not correct because this introduces a security vulnerability without addressing the primary concern of the VM termination.

C is correct because health check failures lead to a VM being marked unhealthy and can result in termination if the health check continues to fail. Because you have already verified that the instances are functioning properly, the next step would be to determine why the health check is continuously failing.

D is not correct because the source of the firewall rule that allows load balancer and health check access to instances is defined IP ranges, and not a named load balancer. Tagging the instances for the purpose of firewall rules is appropriate but would probably be a descriptor of the application, and not the load balancer.


10 - Your organization has a 3-tier web application deployed in the same network on Google Cloud Platform. Each tier (web, API, and database) scales independently of the others. Network traffic should flow through the web to the API tier, and then on to the database tier. Traffic should not flow between the web and the database tier. How should you configure the network?

  •  A. Add each tier to a different subnetwork.
  •  B. Set up software-based firewalls on individual VMs.
  •  C. Add tags to each tier and set up routes to allow the desired traffic flow.
  •  D. Add tags to each tier and set up firewall rules to allow the desired traffic flow.

Feedback

A is not correct because the subnetwork alone will not allow and restrict traffic as required without firewall rules.

B is not correct because this adds complexity to the architecture and the instance configuration.

C is not correct because routes still require firewall rules to allow traffic as requests. Additionally, the tags are used for defining the instances the route applies to, and not for identifying the next hop. The next hop is either an IP range or instance name, but in the proposed solution the tiers are only identified by tags.

D is correct because as instances scale, they will all have the same tag to identify the tier. These tags can then be leveraged in firewall rules to allow and restrict traffic as required, because tags can be used for both the target and source.


11 - Your organization has 5 TB of private data on premises. You need to migrate the data to Cloud Storage. You want to maximize the data transfer speed. How should you migrate the data?

  •  A. Use gsutil
  •  B. Use gcloud.
  •  C. Use GCS REST API.
  •  D. Use Storage Transfer Service.

Feedback

A is correct because gsutil gives you access to write data to Cloud Storage.

B is not correct because gcloud is the command-line interface for common platform tasks and does not include accessing Cloud Storage.

C is not correct because the data size would require a resumable upload, and that does not meet the requirement of maximizing the data transfer speed.

D is not correct because Storage Transfer Service is for importing online data, not on-premises. Your data source can be an Amazon Simple Storage Service (Amazon S3) bucket, an HTTP/HTTPS location, or a Cloud Storage bucket.


12 - You are designing a mobile chat application. You want to ensure that people cannot spoof chat messages by proving that a message was sent by a specific user. What should you do?

  •  A. Encrypt the message client-side using block-based encryption with a shared key.
  •  B. Tag messages client-side with the originating user identifier and the destination user.
  •  C. Use a trusted certificate authority to enable SSL connectivity between the client application and the server.
  •  D. Use public key infrastructure (PKI) to encrypt the message client-side using the originating user’s private key.

Feedback

A is not correct because although this would encrypt the message, it does not validate either the client or the server.

B is not correct because a malicious actor could spoof the user identifier and destination user information.

C is not correct because SSL only requires the server to have a signed certificate and does not require validating the client.

D is correct because PKI requires that both the server and the client have signed certificates, validating both the client and the server.


13 - You are designing a large distributed application with 30 microservices. Each of your distributed microservices needs to connect to a database backend. You want to store the credentials securely. Where should you store the credentials?

  •  A. In the source code
  •  B. In an environment variable
  •  C. In a key management system
  •  D. In a config file that has restricted access through ACLs

Feedback

A is not correct because storing credentials in source code and source control is discoverable, in plain text, by anyone with access to the source code. This also introduces the requirement to update code and do a deployment each time the credentials are rotated.

B is not correct because consistently populating environment variables would require the credentials to be available, in plain text, when the session is started.

C is correct because key management systems generate, use, rotate, encrypt, and destroy cryptographic keys and manage permissions to those keys.

D is not correct because instead of managing access to the config file and updating manually as keys are rotated, it would be better to leverage a key management system. Additionally, there is increased risk if the config file contains the credentials in plain text.


14 - Mountkirk Games wants to set up a real-time analytics platform for their new game. The new platform must meet their technical requirements. Which combination of Google technologies will meet all of their requirements?

  •  A. Kubernetes Engine, Cloud Pub/Sub, and Cloud SQL
  •  B. Cloud Dataflow, Cloud Storage, Cloud Pub/Sub, and BigQuery
  •  C. Cloud SQL, Cloud Storage, Cloud Pub/Sub, and Cloud Dataflow
  •  D. Cloud Pub/Sub, Compute Engine, Cloud Storage, and Cloud Dataproc

Feedback

A is not correct because Cloud SQL is the only storage listed, is limited to 10 TB of storage, and is better suited for transactional workloads. Mountkirk Games needs queries to access at least 10 TB of historical data for analytic purposes.

B is correct because:

  • Cloud Dataflow dynamically scales up or down, can process data in real time, and is ideal for processing data that arrives late using Beam windows and triggers.
  • Cloud Storage can be the landing space for files that are regularly uploaded by users’ mobile devices.
  • Cloud Pub/Sub can ingest the streaming data from the mobile users.
  • BigQuery can query more than 10 TB of historical data.

C is not correct because Cloud SQL is the only storage listed, is limited to 10TB of storage, and is better suited for transactional workloads. Mountkirk Games needs queries to access at least 10 TB of historical data for analytic purposes.

D is not correct because Mountkirk Games needs the ability to query historical data. While this might be possible using workarounds, such as BigQuery federated queries for Cloud Storage or Hive queries for Cloud Dataproc, these approaches are more complex. BigQuery is a simpler and more flexible product that fulfills those requirements.


15 - Mountkirk Games has deployed their new backend on Google Cloud Platform (GCP). You want to create a thorough testing process for new versions of the backend before they are released to the public. You want the testing environment to scale in an economical way. How should you design the process?

  •  A. Create a scalable environment in GCP for simulating production load.
  •  B. Use the existing infrastructure to test the GCP-based backend at scale.
  •  C. Build stress tests into each component of your application and use resources from the already deployed production backend to simulate load.
  •  D. Create a set of static environments in GCP to test different levels of load—for example, high, medium, and low.

Feedback

A is correct because simulating production load in GCP can scale in an economical way.

B is not correct because one of the pain points about the existing infrastructure was precisely that the environment did not scale well.

C is not correct because it is a best practice to have a clear separation between test and production environments. Generating test load should not be done from a production environment.

D is not correct because Mountkirk Games wants the testing environment to scale as needed. Defining several static environments for specific levels of load goes against this requirement.


  •  A. Cloud Storage, Cloud Dataflow, Compute Engine
  •  B. Cloud Storage, App Engine, Cloud Load Balancing
  •  C. Container Registry, Google Kubernetes Engine, Cloud Load Balancing
  •  D. Cloud Functions, Cloud Pub/Sub, Cloud Deployment Manager

Feedback

A is not correct because Mountkirk Games wants to set up a continuous delivery pipeline, not a data processing pipeline. Cloud Dataflow is a fully managed service for creating data processing pipelines.

B is not correct because a Cloud Load Balancer distributes traffic to Compute Engine instances. App Engine and Cloud Load Balancer are parts of different solutions.

C is correct because:

  • Google Kubernetes Engine is ideal for deploying small services that can be updated and rolled back quickly. It is a best practice to manage services using immutable containers.
  • Cloud Load Balancing supports globally distributed services across multiple regions. It provides a single global IP address that can be used in DNS records. Using URL Maps, the requests can be routed to only the services that Mountkirk wants to expose.
  • Container Registry is a single place for a team to manage Docker images for the services.

D is not correct because you cannot reserve a single frontend IP for cloud functions. When deployed, an HTTP-triggered cloud function creates an endpoint with an automatically assigned IP.


16 - Your customer is moving their corporate applications to Google Cloud Platform. The security team wants detailed visibility of all resources in the organization. You use Resource Manager to set yourself up as the org admin. What Cloud Identity and Access Management (Cloud IAM) roles should you give to the security team?

  •  A. Org viewer, Project owner
  •  B. Org viewer, Project viewer
  •  C. Org admin, Project browser
  •  D. Project owner, Network admin

Feedback

A is not correct because Project owner is too broad. The security team does not need to be able to make changes to projects.

B is correct because:

  • Org viewer grants the security team permissions to view the organization's display name.
  • Project viewer grants the security team permissions to see the resources within projects.

C is not correct because Org admin is too broad. The security team does not need to be able to make changes to the organization.

D is not correct because Project owner is too broad. The security team does not need to be able to make changes to projects.


17 - To reduce costs, the Director of Engineering has required all developers to move their development infrastructure resources from on-premises virtual machines (VMs) to Google Cloud Platform. These resources go through multiple start/stop events during the day and require state to persist. You have been asked to design the process of running a development environment in Google Cloud while providing cost visibility to the finance department. Which two steps should you take?

  •  A. Use persistent disks to store the state. Start and stop the VM as needed.
  •  B. Use the --auto-delete flag on all persistent disks before stopping the VM.
  •  C. Apply VM CPU utilization label and include it in the BigQuery billing export.
  •  D. Use BigQuery billing export and labels to relate cost to groups.
  •  E. Store all state in local SSD, snapshot the persistent disks, and terminate the VM.
  •  F. Store all state in Cloud Storage, snapshot the persistent disks, and terminate the VM.

Feedback

A is correct because persistent disks will not be deleted when an instance is stopped.

B is not correct because the --auto-delete flag has no effect unless the instance is deleted. Stopping an instance does not delete the instance or the attached persistent disks.

C is not correct because labels are used to organize instances, not to monitor metrics.

D is correct because exporting daily usage and cost estimates automatically throughout the day to a BigQuery dataset is a good way of providing visibility to the finance department. Labels can then be used to group the costs based on team or cost center.

E is not correct because the state stored in local SSDs will be lost when the instance is stopped.

F is not correct because there is no need for persistent disks or snapshots if the state is to be stored in Cloud Storage.


18 - Your company has decided to make a major revision of their API in order to create better experiences for their developers. They need to keep the old version of the API available and deployable, while allowing new customers and testers to try out the new API. They want to keep the same SSL and DNS records in place to serve both APIs. What should they do?

  •  A. Configure a new load balancer for the new version of the API.
  •  B. Reconfigure old clients to use a new endpoint for the new API.
  •  C. Have the old API forward traffic to the new API based on the path.
  •  D. Use separate backend services for each API path behind the load balancer.

Feedback

A is not correct because configuring a new load balancer would require a new or different SSL and DNS records which conflicts with the requirements to keep the same SSL and DNS records.

B is not correct because it goes against the requirements. The company wants to keep the old API available while new customers and testers try the new API.

C is not correct because it is not a requirement to decommission the implementation behind the old API. Moreover, it introduces unnecessary risk in case bugs or incompatibilities are discovered in the new API.

D is correct because an HTTP(S) load balancer can direct traffic reaching a single IP to different backends based on the incoming URL.


19 - The database administration team has asked you to help them improve the performance of their new database server running on Compute Engine. The database is used for importing and normalizing the company’s performance statistics. It is built with MySQL running on Debian Linux. They have an n1-standard-8 virtual machine with 80 GB of SSD zonal persistent disk. What should they change to get better performance from this system in a cost-effective manner?

  •  A. Increase the virtual machine’s memory to 64 GB.
  •  B. Create a new virtual machine running PostgreSQL.
  •  C. Dynamically resize the SSD persistent disk to 500 GB.
  •  D. Migrate their performance metrics warehouse to BigQuery.

Feedback

A is not correct because increasing the memory size will not improve persistent disk throughput.

B is not correct because the DB administration team is requesting help with their MySQL instance. Migration to a different product should not be the solution when other optimization techniques can still be applied first.

C is correct because persistent disk performance is based on the total persistent disk capacity attached to an instance and the number of vCPUs that the instance has. Incrementing the persistent disk capacity will increment its throughput and IOPS, which in turn improve the performance of MySQL.

D is not correct because the DB administration team is requesting help with their MySQL instance. Migration to a different product should not be the solution when other optimization techniques can still be applied first.