Skip to content

Latest commit

 

History

History
54 lines (47 loc) · 2.85 KB

Amazon EMR.md

File metadata and controls

54 lines (47 loc) · 2.85 KB

Amazon EMR

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html

What is Amazon EMR?

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. Additionally, you can use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

Cluster and Nodes

  • Master node:A node that manages the cluster by running software components to coordinate the distribution of data and tasks among other nodes for processing. The master node tracks the status of tasks and monitors the health of the cluster. Every cluster has a master node, and it's possible to create a single-node cluster with only the master node.
  • Core node: A node with software components that run tasks and store data in the Hadoop Distributed File System (HDFS) on your cluster. Multi-node clusters have at least one core node.
  • Task node: A node with software components that only runs tasks and does not store data in HDFS. Task nodes are optional.

Analyzing Big Data with Amazon EMR

  • Create an Amazon S3 Bucket avatar
  • Create an Amazon EC2 Key Pair avatar
  • Open the Amazon EMR console and Choose Create cluster avatar
  • On the Create Cluster - Quick Options page, accept the default values. avatar avatar
  • Choose Create avatar
  • Open the Amazon EMR console
  • Choose Clusters
  • Choose the Name of the cluster. avatar
  • User Security and access choose the Security groups for Master link.
  • Choose ElasticMapReduce-master from the list.
  • Choose Inbound, Edit to add a new inbound rule.
  • Scroll down through the listing of default rultes and choose Add Rule at the bottom of the list.
  • For Type, select SSH. For source, select My IP</strong
  • Choose Save
  • Click Step. avatar
  • Choose Add. The step apears in the console with as status of Pending. avatar
  • Open the Amazon S3 console.
  • Choose the Bucket name and then the folder that set up earlier.
  • Choose the file, and the choose Download to save it locally.
  • Open the file. avatar