https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html
Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
- Master node:A node that manages the cluster by running software components to coordinate the distribution of data and tasks among other nodes for processing. The master node tracks the status of tasks and monitors the health of the cluster. Every cluster has a master node, and it's possible to create a single-node cluster with only the master node.
- Core node: A node with software components that run tasks and store data in the Hadoop Distributed File System (HDFS) on your cluster. Multi-node clusters have at least one core node.
- Task node: A node with software components that only runs tasks and does not store data in HDFS. Task nodes are optional.
- Create an Amazon S3 Bucket
- Create an Amazon EC2 Key Pair
- Open the Amazon EMR console and Choose Create cluster
- On the Create Cluster - Quick Options page, accept the default values.
- Choose Create
- Open the Amazon EMR console
- Choose Clusters
- Choose the Name of the cluster.
- User Security and access choose the Security groups for Master link.
- Choose ElasticMapReduce-master from the list.
- Choose Inbound, Edit to add a new inbound rule.
- Scroll down through the listing of default rultes and choose Add Rule at the bottom of the list.
- For Type, select SSH. For source, select My IP</strong
- Choose Save
- Click Step.
- Choose Add. The step apears in the console with as status of Pending.
- Open the Amazon S3 console.
- Choose the Bucket name and then the folder that set up earlier.
- Choose the file, and the choose Download to save it locally.
- Open the file.