What you would learn in Data Engineering Master Class using AWS Analytics Services course?
Data Engineering is all about developing Data Pipelines to get data from different sources to Data Lake or Data Warehouse and out of Data Lake or Data Warehouse to downstream systems. I'll show you how to create Data Engineering Pipelines using AWS Analytics Stack in this course. It comprises services like Glue, Elastic Map Reduce (EMR), Lambda Functions, Athena, EMR, Kinesis, and many more.
Here are the steps at a high level that you'll be following during the course.
Setup Development Environment
Beginning with AWS
Storage - All you need to know about AWS S3 (Simple Storage Service)
User Security Level - Controlling Users, Roles, and Policies by using IAM
Infrastructure AWS EC2 (Elastic Cloud Compute)
Data Ingestion by using AWS Lambda Functions
the Lifecycle of development of Pyspark
AWS Glue Overview AWS Glue Components
Installing the Spark History Server to support AWS Glue Jobs
Deep Dive into AWS Glue Catalog
Experimenting with AWS the Glue Job APIs
AWS Bookmarks for Glue Jobs
Starting with AWS EMR
Implementing Spark Applications using AWS EMR
Streaming Pipeline by using AWS Kinesis
Consuming data from AWS S3 with boto3 that is ingested with AWS Kinesis
Converting GitHub Data to Amazon Dynamodb
A brief overview of Amazon AWS Athena
Amazon AWS Athena using AWS CLI
Amazon AWS Athena using Python boto3
Starting by Using Amazon AWS Redshift
Copy data from AWS s3 to AWS Redshift Tables
Develop applications using the AWS Redshift Cluster
AWS Redshift Tables that include Distkeys and Sort keys
AWS Redshift Federated Queries and Spectrum
Course Content:
- Data Engineering leveraging AWS Analytics features
- AWS Essentials like S3, IAM, EC2, and many more
- Understanding AWS s3 cloud-based storage
- Learning about the details of virtual machines in AWS, also referred to as EC2
- Controlling AWS IAM user's groups, roles, as well as policies related to RBAC (Role-Based Access Control)
- Managing Tables using AWS Glue Catalog
- Engineer batch Data Pipelines using AWS Glue Jobs
- Automating Batch Data Pipelines using AWS Glue Workflows
- Running Queries using AWS Athena - Serverless query engine service
- Utilizing AWS Elastic Map Reduce (EMR) Clusters to build Data Pipelines
- Utilizing AWS Elastic Map Reduce (EMR) Clusters to report and dashboards
- Data Ingestion via AWS Lambda Functions
- Scheduling by using AWS Events Bridge
- Engineering Streaming Pipelines using AWS Kinesis
- The streaming of Web Server logs by using AWS Kinesis Firehose
- An overview of data processing with AWS Athena
- The running AWS Athena queries or commands through CLI
- Running AWS Athena queries using Python boto3
- Creating the AWS Redshift Cluster, Creating tables and executing CRUD operations
- Transfer data copied from S3 onto AWS Redshift Tables
- Knowing how to distribute Styles in addition to creating tables with Distkeys
- Executing queries that are based on other RDBMS Tables with AWS Redshift Federated Queries
- Queries that are running for Glue as well as Athena Catalog tables using AWS Redshift Spectrum
Download Data Engineering Master Class using AWS Analytics Services from below links NOW!