AWS re:Invent 2021 Recap: Data Science Announcements

AWS re:Invent 2021 Recap: Data Science Announcements

This year, AWS re:Invent was special in many ways, first onsite re:invent after the pandemic forced the previous year’s conference to go virtual and also the first re:invent with AWS new CEO, Adam Selipsky delivering a keynote.

AWS made more than 120 announcements during re:Invent 2021, many of them introducing a new service or new feature for an existing service.

In this blog post, we are only going to highlight the most important Data Science announcements that you definitely shouldn’t miss!

  1. Amazon SageMaker Studio Lab (currently in preview), a free, no-configuration ML service

    • What happened?

    Amazon SageMaker Studio Lab is a free service for machine learning that provides a Jupyterlab IDE with both CPU and GPU as an option for the backend. It’s going to be the AWS challenger for the Google Colab service.

    • What difference does it make?

    Since it’s a free service and doesn’t need an AWS account to work with, it can be a good place for experimenting while you need to worry about neither cost nor identity management. SageMaker studio Lab comes with its own limitation such as limited time of GPU access (4 hours) and its poor integration with other AWS service. Therefore as soon as you pass the experimentation phase, you need to go to other AWS services like SageMaker or SageMaker Studio to deploy your model in production.

  2. Amazon SageMaker Inference Recommender

    • What happened?

    Amazon SageMaker Inference Recommender helps you choose the best available compute instance and configuration to deploy machine learning models for optimal inference performance and cost.

    • What difference does it make?

    Choosing the right instance that is neither lower nor higher than your need was always challenging since it can result in low performance/high latency and unnecessary expensive instances respectively. So this feature can be helpful in such cases.

  3. Amazon SageMaker Serverless Inference

    • What happened?

    Amazon SageMaker Serverless Inference is a new inference option that enables you to easily deploy machine learning models for inference without configuring or managing the underlying infrastructure. Amazon SageMaker Serverless Inference uses Lambda Functions under the hood to deploy your model.

    • What difference does it make?

    With SageMaker Serverless Inference, you pay only for the duration of running the inference code and the amount of data processed, not for idle time. You also don’t need to worry about the server configuration. Since it uses Lambda it has some limitations: like a maximum memory of 6144MB and the problem of ‘cold starts’. To learn more about Serverless architecture, see our post on how to choose your serverless architecture.

  4. Amazon SageMaker Training Compiler

    • What happened?

    Amazon SageMaker Training Compiler is a new feature of SageMaker that can accelerate the training of deep learning models by up to 50% through more efficient use of GPU instances. Popular deep learning frameworks like PyTorch and TensorFlow are supported and they can be used with minimal change to your training script. SageMaker Training Compiler accelerates training by converting DL models from their high-level language representation to hardware-optimized instructions.

    • What difference does it make?

    It’s a good choice for models with considerable training times, as a rule of thumb for 30 minutes and more.

  5. Amazon SageMaker Studio Monitoring Spark jobs running on EMR

    • What happened?

    Amazon recently announced that SageMaker Studio Notebooks could visually, browse and connect to Amazon EMR clusters. Starting today, with the built-in integration with EMR, you can do interactive data preparation and machine learning at petabyte scale within the single universal SageMaker Studio notebook..

    • What difference does it make?

    With the growing amount of data you need for training deep learning models, you need an easy way for your data scientist to use Spark on EMR. This new feature offers this ability.

  6. Amazon SageMaker Pipelines now integrates with SageMaker Model Monitor and SageMaker Clarify

    • What happened?

    Amazon SageMaker Pipelines is a machine learning service from AWS that helps you build end-to-end machine learning workflows. It now supports integration with Amazon SageMaker Model Monitor and Amazon SageMaker Clarify.

    • What difference does it make?

    You can easily incorporate model quality and bias detection in your ML workflow with these integrations. The increased automation can help reduce your operational burden in building and managing ML models.

  7. Amazon SageMaker Model Registry now supports endpoint visibility, custom metadata and model metrics

    • What happened?

    SageMaker Model Registry enables data scientists to catalog their ML models. Now also provides endpoint visibility from SageMaker Studio, so you will be able to store custom metadata, and read/write a broad range of metrics for your models.

    • What difference does it make?

    This new feature helps data scientists to keep track of their models’ training more conveniently.

  8. Introducing Amazon Lex Automated Chatbot Designer (Preview)

    • What happened?

    AWS announced an automated chatbot designer for Amazon Lex. This service uses Machine Learning to analyze conversation transcripts and build a chatbot or a virtual assistant that can respond to users.

    • What difference does it make?

    AWS claims that this feature can reduce the the time/resources to develop a chatbot from weeks to just a few hours.

In our next post, we’ll cover the most important announcements for DevOps Engineers.