Migrating to the cloud is like moving to a new house. Where do you move to? What do you bring, what do you get rid of? Start fresh, or drag stuff you’re still using with you? Good questions. One question I get when moving a Microsoft Data platform to the cloud is: what tools do we use there? The classic Microsoft answer is: it depends. If starting completely fresh, you have a wide range of options and can target implementing a modern cloud architecture immediately. If you’re moving legacy environments, you’ll be constrained by the systems and business operations you have to support and move but will have the advantage of eventually moving to a modern cloud architecture down the road. If you’ve made a significant investment in, or possess the most knowledge of, a tool set and architecture—such your ETL architecture with SQL Server Integration Services (SSIS), there’s a good chance you will have to port the existing infrastructure to the cloud.
We’ll start focusing more on SSIS, but—as with looking to move to a new house—a lot of questions need to be asked first. These are common to most toolsets. Some key questions are as follows:
- What is the overall Total Cost of Ownership (TCO)?
- Am I starting new, or do I need to re-platform all (or a significant portion of) the ETL infrastructure?
- What is the timeframe I have to accomplish this?
- How familiar with cloud ETL tools is my staff?
- What is the current level of investment/number of ETL processes in play?
- Will my current toolset even work in the cloud environment?
- Am I doing an uplift or reengineer of the current environment?
- Can I discontinue or sunset processes or systems during the move?
- What is the overall strategic direction of the company? Will this dovetail-in with that direction?
- How much risk can I reasonably incur by reengineering my infrastructure?
If you’re rearchitecting your infrastructure, you’ll be evaluating everything. As with moving, you’ll throw stuff out, give stuff away, and box up what you need or feel you can’t leave behind. You’ll look at systems, toolsets, and new capabilities across the board to determine what should go, what can be left behind, what has to be rewritten, and where re-platforming should occur. This is a good opportunity to review and remove code/processes not utilized or do some lightweight rewriting to reduce or streamline certain parts of your code base—or re-engineer, if justified.
Most of the time, legacy environments will need to be moved. Hardware/data center costs are often the key driver in the move. For example, my CIO said leases are coming due for our servers and I need to move before the leases are up. An uplift scenario, or a partial uplift, is the most common scenario I see for supporting existing business operations. Position ourselves in the cloud as painlessly as possible and place us in a position to phase-in cloud capabilities without disrupting (or minimizing the disruption of) the business. Sound familiar? If it’s a complete rewrite, then, options abound but there are still time pressures. At some point you will have to cut the new environment over to production with minimal disruption to the business. And sometimes, you will have to just start small and trial-out scenarios that might work best for you. You will determine these based on your needs, and on what you can support.
So, if we’ve decided to go with the scenario of uplifting our existing ETL infrastructure—or the ETL that I need turns out to be so small that it doesn’t justify coming up on a new toolset—I can still look to leveraging SSIS as a toolset. New tools and capabilities will be available and coming into the mix, but the impact on how you phase them in is minimized by leveraging existing toolset knowledge, initially. I have used SSIS for 16+ years and love using it but find myself chomping at the bit for the new cloud capabilities, such as Polybase and Kafka, and repurposing/limiting SSIS usage moving forward when the code is ported. Once we’re in the cloud, you can immediately start bringing in new capabilities in a composite architecture for your ETL infrastructure and start sunsetting portions of your SSIS usage.
SSIS in the cloud. What options do I have?
- VM in the cloud (IaaS)
- Implement on a managed SQL Server cloud instance (PaaS)
- Implement under another ETL tool (PaaS)
- A VM in Azure, AWS, or Google Cloud. Basically, you still have your VM—similar to being on-premises, just in the cloud. This is the most expensive scenario, from a TCO point of view, but also one that’s the most familiar. You’ll have to fully manage the VM as you would in your data center. A SQL Server license is required to run SSIS, but most implementations I’ve seen have a lower-level edition (such as Standard Edition) with a lower resource configuration, as the data platform/database instance is best located elsewhere (i.e., managed database cloud instance) and can incur the high resource-cost better. SSISDB repository is also best served on the same VM as SSIS in a local SQL Server. A cloud VM, with SSIS on it, is also often shared with other non-cloud products that need a VM to re-platform to the cloud. SSAS MOLAP, and SSRS are good examples of this. This lower-level configuration readily allows a DEV/Test/PROD implementation with separate VMs to support. This option supports Package and Project deployment models. Third-party vendors have SSIS components that allow you to access cloud resources and extend SSIS.
- Azure SQL Managed Instance, and AWS RDS have the ability to support SSIS in a SSISDB repository on the cloud instance, and allow you to implement SSIS without a cloud VM. As a PaaS option, you only worry about your code, and not the platform, and have an excellent SLA available. Caveats exist in that you have to make sure the platform is still able to connect to file shares and data sources if existing packages are deployed from on-premises to the cloud instance. Since you don’t have local storage, you have to keep that in mind when it comes to package and cloud design. Cost-wise it’s more effective as you are leveraging the licensing and resource configuration of the managed instance. Project deployment model in SSIS for AWS is required, and recommended for Azure SQL Managed Instance, but it can also support a package-deployment model. AWS also provides the ability, via system-stored procedures, to copy files on S3 to a “local” drive that your SSIS packages can leverage.
- Lastly, Azure Data Factory (ADF) in Azure supports the ability to run your SSIS package within ADF with a SSIS runtime environment. As a PaaS option, you only worry about your code, and not the platform. This option will allow you to also extend your SSIS with the capabilities of ADF orchestrating SSIS and other ADF operations, such as native operations, and running data bricks, for instance. However, there is no robust scheduler, such as SQL Agent, so ADF has to orchestrate the SSIS execution, and may cause a sight variance in start time and SLA. You will also have to learn a new product (ADF), in addition to porting your SSIS environment and performing some level of coding to implement this scenario. Also, the Project deployment model in SSIS is required. ADF with Service Type of SSIS is required and costs more than an Azure VM with SQL Standard Edition. Keep an eye on pricing options and advertised SLA rates.
I would not be reluctant to migrate your SSIS to the cloud. However, with a change in architecture comes many questions. What is the long-term direction of my toolsets and how will the tools map into my strategic direction? What is the effort and cost? TCO is the final justification. If you’re re-platforming an existing ETL infrastructure to the cloud, I would recommend doing so in manageable efforts. Get the cloud infrastructure, and your presence in the cloud in place. Then move your SSIS there, make changes where appropriate (or required), and re-cast your business infrastructure when re-platformed. This way you can manage changes without significantly impacting your business. Disrupting your mission-critical business operations is the last thing we can justify. However, once you’re in the cloud, the tools and capabilities available to you are staggering. So, how are you planning to move your ETL/SSIS to the cloud? For help answering this question, please feel free to reach out to us at any time. We’d love to help you get started.
Business Intelligence Architect