Eighty-nine percent of organizations agree that the rate of change has accelerated in the past two years, and it’s not likely to slow down anytime soon. In an age of relentless change, companies must have a firm grasp on the data and insights that can help them evolve their customer, workforce, and operational strategies accordingly. At the core, this requires a readily scalable approach to data architecture, best enabled by public cloud architectures. Yet, the movement away from on-premise data storage and towards the public cloud is a tall order. It can be challenging to pick an ideal public cloud platform that meets the needs of your business, and that best utilizes the existing skillsets within your workforce. Using an expertise-guided approach to inform your selection is the best way forward. In this blog series, we distill the data pipeline architectures of the top cloud platforms available today. From there, we equip you with some practical criteria to inform the right selection for your organization.
For enterprise extract, transform, and load (ETL) needs, Amazon Web Service (AWS) offers AWS Glue: an ETL service that makes it seamless to organize, cleanse, validate, and format data for storage in a data warehouse or data lake. AWS Glue automatically discovers structured or unstructured data stored within data lakes in Amazon Simple Storage Service (S3), data warehouses (Redshift), and other databases (e.g., MySQL, Oracle, Microsoft SQL Server, and PostgreSQL). These are part of the Amazon Relational Database Services that run on Amazon Elastic Compute Cloud (EC2) instances in a Virtual Private Cloud (VPC). AWS Glue is serverless, with no compute resources to deal with. It takes care of everything needed to run your ETL jobs in a fully managed, scale-out fashion. This approach is quite cost-effective, given you’re only paying for the computing power needed for the jobs at hand.
Scaling Compute Power and Memory
AWS Auto Scaling keeps a pulse on your applications and adjusts the capacity automatically to help you perform at the lowest possible cost. With Amazon EC2 Auto Scaling, you can be confident that you have ample Amazon EC2 instances to handle the load for your application. This process uses a collection method called Auto Scaling for groups and scaling policies. It triggers EC2 Auto Scaling to launch or terminate instances as demand on your application increases or decreases.
AWS offers multiple services that allow you to store, govern, and analyze data, ultimately helping you operate and innovate in a more agile manner.
- Amazon Simple Storage Service (Amazon S3) is a storage service that is highly scalable, promotes best-in-class data availability, is highly secure, and maximizes operational performance.
- Amazon Elastic File System (Amazon EFS) is an easy-to-use and fully managed file system that you can use with AWS Cloud services and on-premise resources.
- Amazon Elastic Block Store (EBS) is a block storage service known for its ease of use. It's intended to be used with Amazon Elastic Compute Cloud (EC2).
AWS Global Cloud Infrastructure has over 175 services from data centers around the world. AWS has the networking capabilities required to run any workload in the cloud with security, availability, performance, global coverage, and manageability. These capabilities include:
- 76 availability zones (multiple, physically separated, and isolated).
- 100 GB network bandwidth available from C5n instances.
- 24 AWS geographic regions with low latency, high throughput, and high redundancy.
- 216 points of presence, providing global coverage for your users.
Below is an example of a modern data warehouse architecture in AWS, showing how each of the above services comes into play:
AWS resolves many of the on-premise ETL shortcomings that companies typically experience by offering simple, scalable, and cost-effective cloud computing solutions. Below are a few key diagnostic criteria you can apply to validate whether AWS is the best choice for your business:
- Your organization currently uses AWS for other applications and tools.
- Your team has expertise in AWS's technology stack, making the transition to AWS public cloud smoother.
- If a powerful data dictionary is one of your top factors for selecting a serverless ETS tool, then AWS Glue is an excellent choice. Glue works just as well with unstructured data as it does with structured and semi-structured data.
In the next blog of our series, we’ll explore the capabilities of Google Cloud Platform and arm you with helpful criteria to evaluate it within your organization’s unique context.
Click here to read parts one, three, and four in our series.