AWS re:Invent 2021 Recap: Data Engineering Announcements

AWS re:Invent 2021 Recap: Data Engineering Announcements

·

6 min read

This year, AWS re:Invent was special in many ways. It was the first onsite re:invent after the pandemic forced the previous year’s conference to go virtual. Also, it was the first re:invent with new AWS CEO, Adam Selipsky delivering a keynote. AWS made more than 120 announcements during re:Invent 2021, many of them introducing a new service or a new feature for an existing service.

As part of this blog post, we are going to highlight the most announcements from the perspective of a Data Engineer. And we are going to focus on two things: what was the new feature or tool added, and how does it make a difference to what we have been doing. And if you are a Data Engineer, you wouldn’t want to miss out on these.

  1. AWS introduces serverless preview for EMR and Redshift

    • What happened?

    AWS provides a serverless option for EMR and Redshift to run data analytics in clusters without having to think about provisioning or maintaining these resources.

    • What difference does it make?

    For Redshift, it means that you don’t have to choose several manual configurations and in a few clicks can start querying (pre-loaded) sample data. It also means that Redshift serverless enables you to query data directly in any format such as Parquet, S3 data lakes, as well as data in other databases like RDS and Aurora. For EMR, it means a little more. EMR serverless automatically provisions the resources required by your application. It adjusts resource allocation according to the need of the application. And since you only pay for the resources you use, EMR serverless is cost-effective.

  2. Amazon Kinesis introduces data streams On-Demand

    • What happened?

    Kinesis Data streams on-demand now offers the opportunity to run gigabyte scale read and write throughput per minute without capacity planning.

    • What difference does it make?

    You don’t have to provision or manage servers (as this is serverless). Also, you can now pay based on throughput consumed rather than provisioned resources. When you choose on-demand capacity mode, this service scales up/down depending on your workload automatically.

  3. AWS introduces Amazon MSK Serverless in public preview

    • What happened?

    AWS launched a new type of MSK cluster that makes it easy for developers to maintain Apache Kafka without thinking about provisioning resources/servers. They have launched it in public-preview mode and in addition, offer a pay-as-you-go pricing model.

    • What differences does it make?

    With this launch, it is easier for developers to get started with Apache Kafka. It supports native AWS integrations, so switching your existing applications to MSK would not be an issue. Moreover, using the throughput-based pricing model there are no upfront costs. At the moment it is only available in the region us-east (Ohio) in public-preview mode.

  4. AWS Lake Formation adds three new capabilities via Governed Tables

    • What happened?

    Firstly, Lake Formation introduced multi-table transaction support via governed tables. Secondly, these governed tables ensure your data storage is optimized for querying. And thirdly, they introduced row and cell permissions to enhance data security. Currently, only Amazon Athena, Amazon Redshift Spectrum and AWS Glue ETL scripts support querying governed tables.

    • What difference does it make?

    Multi-table transaction support means users don’t have to create custom error-handling methods for updates and Lake Formation ensures a consistent view. Storage optimizations are achieved in governed tables through data compaction and garbage collection. Also, you don’t need to worry about multiple S3 objects being populated by your upstream application.

  5. AWS Chatbot now supports management of AWS resources in Slack

    • What happened?

    Previously you could only monitor AWS resources and retrieve diagnostics about your AWS resources through Slack, now you can run AWS CLI commands from Slack.

    • What difference does it make?

    Customers can now manage AWS resources directly from their Slack channels. They can securely run AWS CLI commands to scale EC2 instances, run AWS Systems Manager runbooks, and change AWS Lambda concurrency limits. Additionally, customers can also configure channel permissions to match their security concerns.

  6. Amazon Athena now supports ACID transactions and introduces fine-grained security via Lake Formation.

    • What happened?

    Athena introduced ACID transactions. This enables multiple concurrent users to make reliable, row-level modifications to their Amazon S3 data from Athena’s console, API, and ODBC and JDBC drivers. On top of this, Athena also trained fine-grained permissions for accessing data for these ACID-compliant tables.

    • What difference does it make?

    Using Lake Formation Data Filtering, administrators can now grant column-, row-, and cell-level permissions on their Amazon S3 data lake tables that are enforced when Athena users query this data. With ACID-compliant transactions, you can now make regulatory updates to your data in Athena without needing a custom record locking solution. And with time travel capability (newly-added feature), you can recover data that was recently deleted using just a SELECT statement.

  7. Amazon S3 adds new S3 Event Notifications

    • What happened?

    S3 Event notifications now help to build event-driven applications which are triggered when objects are transitioned or expired on S3 buckets. And you can send these notifications to SNS, EventBridge, SQS and Lambda.

    • What difference does it make?

    Using this feature you can have an automatic tracking of your data in DynamoDB Tables or AWS Glue Catalogs. These notifications are now available for S3 Lifecycle, S3 Intelligent-Tiering, object tags, and object access control lists.

  8. AWS launches SQL Notebook support for Amazon Redshift

    • What happened?

    Redshift announced SQL notebook support to enable data analysts/scientists to author queries more easily, organizing multiple SQL queries and annotations on a single document.

    • What difference does it make?

    You can combine your SQL queries in a single document in the notebook. Additionally, you can also share this notebook with team members. Markdown cells in the notebook also help in the proper documentation for your queries.

  9. AWS introduces Amplify Studio

    • What happened?

    Amplify Studio is a visual development environment that offers UI developers new features(in preview mode). ​​Amplify Studio automatically translates designs made in Figma to human-readable React UI component code.

    • What difference does it make?

    Amplify Studio offers developers the ability to do plug-and-play with React-UI components that are fully customizable. It also enables developers to connect these components to backend configuration via Amplify studio. And all this comes with minimal coding.

  10. AWS launches Karpenter v0.5

    • What happened?

    Karpenter is the new Kubernetes cluster autoscaling project that helps you with provisioning EC2 instances and Kubernetes pods under a minute.

    • What difference does it make?

    Previously customers needed to create autoscaling EC2 groups to support increasing workloads and improve cost efficiency. With Karpenter, this responsibility is removed from the customer’s shoulder. Karpenter auto-scales accordingly, adds/removes instances as required and removes overhead costs on over-provisioning and scaling-down.

  11. Amazon Timestream introduces offers three new features

    • What happened?

    Amazon Timestream has added three new capabilities, namely, scheduled queries, multi-measure records, and magnetic storage writes to make time-series data processing faster.

    • What difference does it make?

    With scheduled queries, customers can schedule your large queries for computing aggregates, roll-ups and Timestream takes care of processing these large source tables and creates a destination table (for easier reporting). With magnetic storage writes, customers no longer have to maintain a memory store with a large data retention period for the purpose of processing late arrival data. With these new features, Timestream has made it easier to analyse IoT (sensor) data and Dev-Ops metrics.

  12. AWS introduces SDKs for RUST, Kotlin and Swift

    • What happened?

    AWS introduces SDKs for RUST, Kotlin and Swift to enable programmers in these languages to follow best practices and interact with AWS using these languages.

    • What difference does it make?

    If your team use one of these languages, you can expect an easier interaction with AWS resources.

This was the last part of our AWS re:Invent 2021 recap. We hope you found it useful. Please feel free to reach out to us in case you have questions or feedback. Thank You for the support!