MuleSoft + BigQuery Series 1

Author: Eswara Pendli

In a recent press release, Mulesoft announced the release of a new BigQuery connector in Anypoint Exchange and released.

“BigQuery connector in Anypoint Exchange will help line-of-business owners combine datasets — from disparate sources like Salesforce and Google Analytics — to make informed decisions. This means line-of-business users, like marketers, can derive better real-time data insights around segmentation and personalization by leveraging Google Cloud’s strengths and BigQuery’s advanced querying capabilities.”

In this series-1, we will see what is BigQuery, Features of BigQuery and Configuration of BigQuery in Mule using REST API for now.

Why?

  • A serverless, highly scalable, and cost-effective cloud data warehouse designed to help us make informed decisions quickly, so we can transform our business with ease 
  • Accelerate time-to-value with a fully managed and serverless cloud data warehouse that is easy to set up and manage and doesn’t require a database administrator.
  • ●     Quickly analyze gigabytes to petabytes of data using ANSI SQL (American National Standards Institute – SQL) at blazing-fast speeds, and with zero operational overhead
  • Have peace of mind with BigQuery’s robust security, governance, and reliability controls that offer high availability and a 99.9% uptime SLA. Data is encrypted by default and includes support for customer-managed encryption keys.

All Features:

  • Serverless: 

We can focus on data and analysis rather than worrying about upgrading, securing, or managing the infrastructure.

  • Real-Time Analytics: 

BigQuery’s high-speed streaming insertion API provides a powerful foundation for real-time analytics, making our latest business data immediately available for analysis.

  • High availability:

BigQuery automatically provides highly durable, replicated storage in multiple locations with no extra charge and no additional setup.

  • Standard SQL:

BigQuery supports a standard SQL dialect that is ANSI:2011 compliant, which reduces the need for code rewrites. BigQuery also provides ODBC and JDBC drivers at no cost to ensure our current applications can interact with its powerful engine.

  • Storage and compute separation:

With this, we have the option to choose the storage and processing solutions that make sense for our business and control access and costs for each.

  • Automatic backup and easy restore:

BigQuery automatically replicates data and keeps a seven-day history of changes, allowing us to easily restore and compare data from different times.       

  • Data transfer service:

The BigQuery DTS automatically transfers data from external data sources, like Google Marketing Platform, Google Ads, YouTube, and partner SaaS applications to BigQuery on a scheduled and fully managed basis. Users can also easily transfer data from Teradata and Amazon S3 to BigQuery.

  • Big data ecosystem integration:

With Dataproc and Dataflow, BigQuery provides integration with the Apache big data ecosystem, allowing existing Hadoop/Spark and Beam workloads to read or write data directly from BigQuery using the Storage API.

  • Flexible pricing models:

Pricing lets us pay only for the storage and compute that we use.

  • Data governance and security:

BigQuery provides strong security and governance controls with fine-grained Identity and Access Management, and our data is always encrypted at rest and in transit.

  • Geo-expansion:

BigQuery gives us the option of geographic data control (in US, Asia, and European locations), without the headaches of setting up and managing clusters and other computing resources in-region.

  • Foundation for AI:

       Integrations with AI Platform Prediction and TensorFlow enable us to train powerful models on structured data in minutes with just SQL.

  • Rich monitoring and logging with Stackdriver:

       BigQuery provides rich monitoring, logging, and alerting through Stackdriver Audit Logs and it can serve as a repository for logs from any application or service using Stackdriver Logging.

  • Public datasets:

       Google Cloud Public Datasets offer a powerful data repository of more than 100 high-demand public datasets from different industries. Google provides free storage for all public datasets, and customers can query up to 1 TB of data per month at no cost.

  • Commercial datasets:

       Commercial data providers host their data offerings directly in BigQuery and Cloud Storage, which means that once you license a dataset from one of their partners, you can access this data immediately and process it in place without having to store or move any data.

Pre-Requisites:

  • We need to have google cloud account/sandbox account.

Getting Started / Setup Dataset / Table:

  • Here we are using BigQuery web UI in the Cloud Console.
  • BigQuery web UI used to complete tasks like running queries, loading data, and exporting data.
  • Dataset: 

Datasets are top-level containers that are used to organize and control access to our tables and views. A table or view must belong to a dataset and it contained within a specific project, so we need to create at least one dataset before loading data into BigQuery.

Demo

Setup Database at GCP (Google Cloud Platform):

  1. In the Cloud Console, on the project selector page, select or create a Cloud project. (https://console.cloud.google.com/projectselector2/home/dashboard?_ga=2.120024392.1366931291.1581598739-1744192159.1581411477)
  2. BigQuery provides a sandboxif we do not want to provide a credit card or enable billing for our project.
  3. The BigQuery web UI provides an interface to query tables, including public datasetsoffered by BigQuery.
  1. Once Project setup is done. Go to https://console.cloud.google.com/home/dashboard?_ga=2.120024392.1366931291.1581598739-1744192159.1581411477&project=navigation-api-demo&folder=&organizationId=

                         Or

BigQuery Navigation:

https://console.cloud.google.com/bigquery?_ga=2.120024392.1366931291.1581598739-1744192159.1581411477&project=navigation-api-demo

  1. Choose the project name and click on create dataset:
  1. In the section, provide relevant dataset name and choose other options as required:
  1. Next, create table under <test_dataset>:
  1. Here, we can select multiple options for schema:
  1. We will be creating a table using ‘test_csv’ data with the Upload option. Select the ‘Browse’ button and upload relevant test file:
  1. Now the table has been created with ‘testData_DOB.csv’ attached to it.
  2. Here, we can run multiple queries at editor section:
  1. Like this, we can set up our database with high SLA @serverless.
  2. Now we want to utilize this database for our integration purpose.
  3. We can use BigQuery REST API to connect it from external sources.
  4. We can do multiple operations using this REST API.

Configuration / Connection Establishment for MuleSoft:

Here are the few operations supported by BigQuery REST API:

** Base URI of BigQuery API is: “https://www.googleapis.com:443/”

  1. Here we will see simple setup / configuration from Anypoint Studio:
  • Following is the BigQuery REST service configuration:
  • Here is how we can connect to BigQuery API using OAuth 2.0 Configuration:
  • Following is the URL’s need to configure:

Callback URL: http://localhost:8081/callback

Local Authorization URL: http://localhost:8081/web

Authorization URL: https://accounts.google.com/o/oauth2/auth

Scopes: https://www.googleapis.com/auth/{scopeName}

Token URL: https://oauth2.googleapis.com/token

Client_ID: {Your Client ID}

Client_Secret: {Your Client Secret}

  • Here is the location, where we can see all credential details:
  • Once the setup is done. Let’s deploy our application locally.
  • After successful deployment in local, hit following URL on Google Chrome (as Postman facing challenge to obtain OAuth 2.0 Token for Google) to get Authorization code / Access Tokenand Choose relevant Gmail account:

http://localhost:8081/web/

  • Copy the code value and Use it for the OAuth 2.0 Authorization.
  1. Trigger the request URL by make use of code value as Access Token and Click on Preview Request as follows:

### BigQuery is fully-managed, we don’t need to deploy any resources, such as disks and virtual machines. ###

Stay tuned for Series-2 !!

References:

We use cookies on this site to enhance your user experience. For a complete overview of how we use cookies, please see our privacy policy.