How to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Hive and Hue and work with Amazon DynamoDB, Amazon Redshift, Amazon QuickSight, Amazon Athena and Amazon Kinesis.
AWS IoT can collect and handle large quantities of data coming from a variety of sources https://aws.amazon.com/iot/ and makes it easy to use AWS services like AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Machine Learning, and Amazon DynamoDB.
AWS DataSync https://aws.amazon.com/datasync/ is a data transfer service that makes it easy for you to automate moving data between on-premises storage and Amazon S3 or Amazon Elastic File System (Amazon EFS).
Amazon FSx for Lustre https://aws.amazon.com/fsx/lustre/ provides a high-performance file system optimized for fast processing of workloads such as machine learning, high performance computing (HPC), video processing, financial modeling, and electronic design automation (EDA).
Apache Spark Streaming enables high-throughput, fault-tolerant, and scalable processing of live data streams. It divides the incoming data streams into batches before sending them to the Spark engine for processing. http://spark.apache.org/streaming/
Amazon Managed Streaming for Kafka (MSK) https://aws.amazon.com/msk/ is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data.
Data Lake Concepts and Building a Serverless Data Lake
Quick Start Data Lake with SnapLogic https://aws.amazon.com/quickstart/architecture/data-lake-with-snaplogic/ builds a data lake environment on AWS in about 15 minutes by deploying SnapLogic components and AWS services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift.