Authors: Pragati Srivastava & Pranit Jain
We have used a python script to transfer data from the SQL server to Snowflake.
Introduction to MSSQL
What is SQL Server?
It is a software developed by Microsoft, which is implemented from the specification of RDBMS, also an ORDBMS. Even though it’s platform dependent, both GUI and command based software, it supports SQL language, an IBM product, non-procedural, common database, and case-insensitive language.
Usage of SQL Server
- To create databases.
- To maintain databases.
- To analyze the data through SQL Server Analysis Services (SSAS).
- To generate reports through SQL Server Reporting Services (SSRS).
- To carry out ETL operations through SQL Server Integration Services (SSIS).
SQL Server Components
- SQL Server works in client-server architecture; hence it supports two types of components − (a) Workstation and (b) Server.
- Workstation components are installed in every device/SQL Server operator’s machine.
These are just interfaces to interact with Server components. Example: SSMS, SSCM, Profiler, BIDS, SQLEM, etc.
- Server components are installed in a centralized server.
Example: SQL Server, SQL Server Agent, SSIS, SSAS, SSRS, SQL browser, SQL Server full-text search, etc.
Advantages of Instances
- To install different versions in one machine.
- To reduce cost.
- To maintain production, development, and test environments separately.
- To reduce temporary database problems.
- To separate security privileges.
- To maintain a standby server.
Introduction to Snowflake
Snowflake Data Warehouse
Snowflake Data Warehouse is a fully managed, cloud data warehouse available to customers in the form of Software-as-a-Service (SaaS) or Database-as-a-Service (DaaS). The phrase ‘fully.
Managed’ means users shouldn’t be concerned about back-end work like server installation, maintenance, etc. Its unique architecture is a hybrid of traditional shared-disk database architectures and shared-nothing database architectures, which allows complete relational database support on both structured as well as semi-structured data (CSV, JSON, ORC, Avro, Parquet, XML), and now it is supporting to unstructured data such as Images, Videos, etc.
Some specific benefits include: –
- Modern security– To keep your data safe, take advantage of features like “always-on” encryption. Perfect for industries that deal with sensitive information.
- Advanced analytics– By allowing concurrent, safe, regulated access to data, you can democratize analytics and migrate to real-time data streams.
- High scalability– An almost infinite number of workloads can be spun up and down. You may now save as much as you want and tackle resource-intensive undertakings.
- Easy manageability– It’s simple to get started with a data warehouse like Snowflake, and you don’t need a large crew to manage your low-level infrastructure.
- Strong accessibility– You’ll be able to access your data storage systems from anywhere, 24/7.
Prerequisites:
We need Python, MySQL, snowflake-connector, pyodbc for connecting PPython with SQL server, pandas library.
Introduction to Python
What is Python?
Python is a high-level, general-purpose, and very popular programming language. Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along all cutting-edge technology in the Software Industry.
Why Python?
- Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc.).
- Python has syntax that allows developers to write programs with fewer lines than some other programming languages.
- Python runs on an interpreter system, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick.
- Python can be treated in a procedural way, an object-oriented way, or a functional way.
Introduction to PYODBC
What is Pyodbc for?
Pyodbc is an open-source Python module that makes accessing ODBC databases simple. It implements the DB API 2.0 specification. Using pyodbc, you can easily connect Python.
Applications to data sources with an ODBC driver. The ODBC driver manager and driver will connect, typically over a network, to the database server.
Introduction to Pandas
What is Pandas?
A Python Pandas data frame is more than an array data structure. Pandas is a powerful tool that lets you:
- Convert JSON, CSV, array, dictionaries, and other data to row and column format
- Work with them using names instead of indexes (you can still opt for indexes)
In short, Pandas is sort of like a spreadsheet, but once you work with using code, not Microsoft Excel. The biggest benefits: –
- Pandas makes extremely complicated data transformations easy and natural.
- It includes a wealth of math, analytics, and other functions.
Key Features of Pandas
- Fast and efficient DataFrame object with default and customized indexing.
- Tools for loading data into in-memory data objects from different file formats.
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of date sets.
- Label-based slicing, indexing, and subsetting of large data sets.
- Columns from a data structure can be deleted or inserted.
- Group by data for aggregation and transformations.
- High performance merging and joining of data.
- Time Series functionality.
Introduction to Snowflake Connector
The Snowflake Connector for Python provides an interface for developing Python applications to connect to Snowflake and perform all standard operations. It provides a programming alternative to developing applications in Java or C/C++ using the Snowflake JDBC or ODBC drivers.
The connector is a native, pure Python package that has no dependencies on JDBC or ODBC. It can be installed using pip on Linux, macOS, and Windows platforms where Python 3.6, 3.7, 3.8, or 3.9 is installed.
SnowSQL, the command line client provided by Snowflake, is an example of an application developed using the connector.
Install SQL Server 2017 Developer Edition
To install SQL Server, you need to download it from the Microsoft.com website via the following link: –
https://www.microsoft.com/en-us/sql-server/sql-server-downloads
Once the download completes, you double-click the file SQLServer2017-SSEI-Dev.exe to launch the installer.
- The installer asks you to select the installation type; choosing the Custom installation type allows you to step through the SQL Server installation wizard and select the features that you want to install.
- Specify the folder for storing the installation files that the installer will download, then click the Install button.
3. The installer starts downloading the install package for a while.
- Once the download completes, open the folder that stores the install package and double-click the SETUP.exe file.
5. The following window displays; select the installation option on the left.
6. Click the first link to launch a wizard to install SQL Server 2017.
7. Specify the edition you want to install, select Developer edition, and click the Next button.
8. Select the “I accept the license terms.” and click the Next button.
- Check the “Use Microsoft Update to check for updates (recommended)” to get the security and other important updates for the SQL Server and click the Next button.
- The installation checks for the prerequisites before installation. If no error is found, click the Next button.
- Select the features that you want to install. For now, you just need the Database Engine Services, just check the checkbox and click the Next button to continue.
12. Specify the name and install ID for the instance of the SQL Server and click the Next button.
- Specify the service account and collation configuration. Just use the default configuration and click the Next button.
- Specify the database engine security mode. First, choose Mixed Mode. Next, enter the password for the SQL Server system administrator (sa) account. Then, re-enter the same password to confirm it. After that, click the Add Current User button. Finally, click the Next button.
15. Verify the SQL Server 2017 features to be installed.
16. The installer starts the installation process.
17. Once it completes, the following window displays. Click the OK button.
18. Click the Close button to complete the installation.
Install Microsoft SQL Server Management Studio
To interact with SQL Servers, you need to install SQL Server Management Studio (SSMS). The SQL Server Management Studio is software for querying, designing, and managing SQL Server on your local computer or the cloud. It provides you with tools to configure, monitor, and administer SQL Server instances.
First, download the SSMS from the Microsoft website via the following link:
Second, double-click the installation file SSMS-Setup-ENU.exe to start installing. The installation process of SMSS is straightforward; you need to follow the screen sequence.
1. Click the Install button
2. Wait for a few minutes while the installer sets up the software.
3. Once setup is completed, click the Close button
Now, you should have a SQL Server 2017 and SQL Server Management Studio installed on your computer.
Integration of SQL Server with Snowflake:
Connecting to SQL Server:
First, we are connecting PPython to SQL Server using pyodbc. Pyodbc is an open-source Python module that makes accessing ODBC databases simple. It implements the DB API 2.0
Specification. Using pyodbc, you can easily connect Python applications to data sources with an ODBC driver.
Fetching data from SQL Server:
We are fetching data from the source table in a pandas data frame. And further, we will use this pandas data frame to write to the target table in Snowflake.
Connecting to Snowflake:
The Snowflake Connector for Python provides an interface for developing Python applications to connect to Snowflake and perform all standard operations. The connector is a native, pure Python package that has no dependencies on JDBC or ODBC. It can be installed using pip on Linux, macOS, and Windows platforms where Python 3.6, 3.7, 3.8, or 3.9 is installed.
The connector supports developing applications using the Python Database API v2 specification (PEP-249), including using the following standard API objects:
- Connection objects for connecting to Snowflake.
- Cursor objects for executing DDL/DML statements and queries.
For the connection string, please provide your username, password, and account of the warehouse you want to use and the database.
Writing data to Snowflake table:
We have used the write_panndas function to write to the Snowflake table.
Writes a Pandas DataFrame to a table in a Snowflake database. To write the data to the table, the function saves the data to Parquet files, uses the PUT command to upload these files to a temporary stage, and uses the COPY INTO command to copy the data from the files to the table. You can use some function parameters to control how the PUT and COPY INTO <table> statements are executed.
Checking whether data has been inserted successfully or not:
For checking whether data has been successfully inserted or not. We fetched the data from the Snowflake table using the fetch_pandas_all function. It will print the snowflake table fetched into a pandas data frame.
Conclusion
This python script can be used with any database as the source needs to change the connection details. You can pass any other connection details with its driver package in Python at the place of SQL server connection details. This script does not take much time to transfer data from the source database to the target database. We have tested 1 lakh records, and it completes it within a minute.