Author: Eswara Pendli
In a recent press release, Mulesoft announced the release of a new BigQuery connector in Anypoint Exchange and released.
“BigQuery connector in Anypoint Exchange will help line-of-business owners combine datasets — from disparate sources like Salesforce and Google Analytics — to make informed decisions. This means line-of-business users, like marketers, can derive better real-time data insights around segmentation and personalization by leveraging Google Cloud’s strengths and BigQuery’s advanced querying capabilities.”
In this series-1, we will see what is BigQuery, Features of BigQuery and Configuration of BigQuery in Mule using REST API for now.
Why?
- A serverless, highly scalable, and cost-effective cloud data warehouse designed to help us make informed decisions quickly, so we can transform our business with ease
- Accelerate time-to-value with a fully managed and serverless cloud data warehouse that is easy to set up and manage and doesn’t require a database administrator.
- ● Quickly analyze gigabytes to petabytes of data using ANSI SQL (American National Standards Institute – SQL) at blazing-fast speeds, and with zero operational overhead
- Have peace of mind with BigQuery’s robust security, governance, and reliability controls that offer high availability and a 99.9% uptime SLA. Data is encrypted by default and includes support for customer-managed encryption keys.
All Features:
- Serverless:
We can focus on data and analysis rather than worrying about upgrading, securing, or managing the infrastructure.
- Real-Time Analytics:
BigQuery’s high-speed streaming insertion API provides a powerful foundation for real-time analytics, making our latest business data immediately available for analysis.
- High availability:
BigQuery automatically provides highly durable, replicated storage in multiple locations with no extra charge and no additional setup.
- Standard SQL:
BigQuery supports a standard SQL dialect that is ANSI:2011 compliant, which reduces the need for code rewrites. BigQuery also provides ODBC and JDBC drivers at no cost to ensure our current applications can interact with its powerful engine.
- Storage and compute separation:
With this, we have the option to choose the storage and processing solutions that make sense for our business and control access and costs for each.
- Automatic backup and easy restore:
BigQuery automatically replicates data and keeps a seven-day history of changes, allowing us to easily restore and compare data from different times.
- Data transfer service:
The BigQuery DTS automatically transfers data from external data sources, like Google Marketing Platform, Google Ads, YouTube, and partner SaaS applications to BigQuery on a scheduled and fully managed basis. Users can also easily transfer data from Teradata and Amazon S3 to BigQuery.
- Big data ecosystem integration:
With Dataproc and Dataflow, BigQuery provides integration with the Apache big data ecosystem, allowing existing Hadoop/Spark and Beam workloads to read or write data directly from BigQuery using the Storage API.
- Flexible pricing models:
Pricing lets us pay only for the storage and compute that we use.
- Data governance and security:
BigQuery provides strong security and governance controls with fine-grained Identity and Access Management, and our data is always encrypted at rest and in transit.
- Geo-expansion:
BigQuery gives us the option of geographic data control (in US, Asia, and European locations), without the headaches of setting up and managing clusters and other computing resources in-region.
- Foundation for AI:
Integrations with AI Platform Prediction and TensorFlow enable us to train powerful models on structured data in minutes with just SQL.
- Rich monitoring and logging with Stackdriver:
BigQuery provides rich monitoring, logging, and alerting through Stackdriver Audit Logs and it can serve as a repository for logs from any application or service using Stackdriver Logging.
- Public datasets:
Google Cloud Public Datasets offer a powerful data repository of more than 100 high-demand public datasets from different industries. Google provides free storage for all public datasets, and customers can query up to 1 TB of data per month at no cost.
- Commercial datasets:
Commercial data providers host their data offerings directly in BigQuery and Cloud Storage, which means that once you license a dataset from one of their partners, you can access this data immediately and process it in place without having to store or move any data.
Pre-Requisites:
- We need to have google cloud account/sandbox account.
Getting Started / Setup Dataset / Table:
- Here we are using BigQuery web UI in the Cloud Console.
- BigQuery web UI used to complete tasks like running queries, loading data, and exporting data.
- Dataset:
Datasets are top-level containers that are used to organize and control access to our tables and views. A table or view must belong to a dataset and it contained within a specific project, so we need to create at least one dataset before loading data into BigQuery.
Demo
Setup Database at GCP (Google Cloud Platform):
- In the Cloud Console, on the project selector page, select or create a Cloud project. (https://console.cloud.google.com/projectselector2/home/dashboard?_ga=2.120024392.1366931291.1581598739-1744192159.1581411477)
- BigQuery provides a sandboxif we do not want to provide a credit card or enable billing for our project.
- The BigQuery web UI provides an interface to query tables, including public datasetsoffered by BigQuery.
- Once Project setup is done. Go to https://console.cloud.google.com/home/dashboard?_ga=2.120024392.1366931291.1581598739-1744192159.1581411477&project=navigation-api-demo&folder=&organizationId=
Or
BigQuery Navigation:
- Choose the project name and click on create dataset:

- In the section, provide relevant dataset name and choose other options as required:

- Next, create table under <test_dataset>:

- Here, we can select multiple options for schema:

- We will be creating a table using ‘test_csv’ data with the Upload option. Select the ‘Browse’ button and upload relevant test file:

- Now the table has been created with ‘testData_DOB.csv’ attached to it.
- Here, we can run multiple queries at editor section:

- Like this, we can set up our database with high SLA @serverless.
- Now we want to utilize this database for our integration purpose.
- We can use BigQuery REST API to connect it from external sources.
- We can do multiple operations using this REST API.
Configuration / Connection Establishment for MuleSoft:
Here are the few operations supported by BigQuery REST API:
** Base URI of BigQuery API is: “https://www.googleapis.com:443/”



- Here we will see simple setup / configuration from Anypoint Studio:


- Following is the BigQuery REST service configuration:

- Here is how we can connect to BigQuery API using OAuth 2.0 Configuration:


- Following is the URL’s need to configure:
Callback URL: http://localhost:8081/callback
Local Authorization URL: http://localhost:8081/web
Authorization URL: https://accounts.google.com/o/oauth2/auth
Scopes: https://www.googleapis.com/auth/{scopeName}
Token URL: https://oauth2.googleapis.com/token
Client_ID: {Your Client ID}
Client_Secret: {Your Client Secret}
- Once we are done, Now go to the following location by choosing relevant project nameand Click on “Credentials” and update Authorized redirect URIs as specified below: https://console.cloud.google.com/apis/credentials/consent?project=navigation-api-demo

- Here is the location, where we can see all credential details:

- Once the setup is done. Let’s deploy our application locally.
- After successful deployment in local, hit following URL on Google Chrome (as Postman facing challenge to obtain OAuth 2.0 Token for Google) to get Authorization code / Access Tokenand Choose relevant Gmail account:
http://localhost:8081/web/


- Copy the code value and Use it for the OAuth 2.0 Authorization.

- Trigger the request URL by make use of code value as Access Token and Click on Preview Request as follows:

### BigQuery is fully-managed, we don’t need to deploy any resources, such as disks and virtual machines. ###
Stay tuned for Series-2 !!
References:
- https://www.mulesoft.com/press-center/runtime-fabric-google-cloud-gcp
- https://cloud.google.com/bigquery/docs/reference/rest/
- https://cloud.google.com/bigquery/docs/quickstarts/quickstart-web-ui?hl=en_US
- https://cloud.google.com/bigquery?hl=en_US
- https://cloud.google.com/bigquery/docs/interacting-with-bigquery?hl=en_US