Command To Generate Machine Keys In Hadoop

Posted on by
  1. Command To Generate Machine Keys In Hadoop Training
  2. Vending Machine Keys
  3. Fastenal

Jul 13, 2013  How to create Hadoop cluster on Virtual Machines Running on same Laptop/Desktop. The idea is to create 3 virtual machines using any software you want (I will use VirtualBox), string them up so that they can see each other (means can ping each other), configure password less ssh over them. Create the Hadoop Cluster. Create an SSH key pair. Use the ssh-keygen command to create public and private key files. The following command generates a 2048-bit RSA key pair that can be used with HDInsight: ssh-keygen -t rsa -b 2048 You're prompted for information during the key creation process. For example, where the keys are stored or whether to use a passphrase. Creating Service Principals and Keytab Files for Hadoop. The following instructions assume you are using the KDC machine and using the kadmin.local command line administration utility. Using kadmin.local on the KDC machine allows you to create principals without needing to create a separate 'admin' principal before you start. Oct 22, 2013 Here the second command will generate a key pair with an empty password. Note: Empty key is not recommended but here we are putting the key as empty as we don't want to enter the password every time hadoop interacts with its nodes. Now since the key pair is generated we have to enable SSH access to local machine with this newly created key.

#Use SSH with Linux-based Hadoop on HDInsight from Windows

[AZURE.SELECTOR]

Secure Shell (SSH) allows you to remotely perform operations on your Liux-based HDInsight clusters using a command-line interface. This document provides information on connecting to HDInsight from Windows-based clients by using the PuTTY SSH client.

[AZURE.NOTE] The steps in this article assume you are using a Windows-based client. If you are using a Linux, Unix, or OS X client, see Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X.

##Prerequisites

  • PuTTY and PuTTYGen for Windows-based clients. These utilities are available from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

  • A modern web browser that supports HTML5.

OR

  • Azure CLI for Mac, Linux and Windows.

##What is SSH?

SSH is a utility for logging in to, and remotely executing, commands on a remote server. With Linux-based HDInsight, SSH establishes an encrypted connection to the cluster head node and provides a command line that you use to type in commands. Commands are then executed directly on the server.

###SSH user name

An SSH user name is the name you use to authenticate to the HDInsight cluster. When you specify an SSH user name during cluster creation, this user is created on all nodes in the cluster. Once the cluster is created, you can use this user name to connect to the HDInsight cluster head nodes. From the head nodes, you can then connect to the individual worker nodes.

###SSH password or Public key

An SSH user can use either a password or public key for authentication. A password is just a string of text you make up, while a public key is part of a cryptographic key pair generated to uniquely identify you.

A key is more secure than a password, however it requires additional steps to generate the key and you must maintain the files containing the key in a secure location. If anyone gains access to the key files, they gain access to your account. Or if you lose the key files, you will not be able to login to your account.

A key pair consists of a public key (which is sent to the HDInsight server,) and a private key (which is kept on your client machine.) When you connect to the HDInsight server using SSH, the SSH client will use the private key on your machine to authenticate with the server.

##Create an SSH key

Use the following information if you plan on using SSH keys with your cluster. If you plan on using a password, you can skip this section.

  1. Open PuTTYGen.

  2. For Type of key to generate, select SSH-2 RSA, and then click Generate.

  3. Move the mouse around in the area below the progress bar, until the bar fills. Moving the mouse generates random data that is used to generate the key.

    Once the key has been generated, the public key will be displayed.

  4. For added security, you can enter a passphrase in the Key passphrase field, and then type the same value in the Confirm passphrase field.

    [AZURE.NOTE] We strongly recommend that you use a secure passphrase for the key. However, if you forget the passphrase, there is no way to recover it.

  5. Click Save private key to save the key to a .ppk file. This key will be used to authenticate to your Linux-based HDInsight cluster.

    [AZURE.NOTE] You should store this key in a secure location, as it can be used to access your Linux-based HDInsight cluster.

  6. Click Save public key to save the key as a .txt file. This allows you to reuse the public key in the future when you create additional Linux-based HDInsight clusters.

    [AZURE.NOTE] The public key is also displayed at the top of PuTTYGen. You can right-click this field, copy the value, and then paste it into a form when creating a cluster using the Azure Portal.

Amplitude shift keying generation circuit. ##Create a Linux-based HDInsight cluster

When creating a Linux-based HDInsight cluster, you must provide the public key created previously. From Windows-based clients, there are two ways to create a Linux-based HDInsight cluster:

  • Azure Portal - Uses a web-based portal to create the cluster.

  • Azure CLI for Mac, Linux and Windows - Uses command-line commands to create the cluster.

Each of these methods will require the public key. For complete information on creating a Linux-based HDInsight cluster, see Provision Linux-based HDInsight clusters.

###Azure Portal

When using the Azure Portal to create a Linux-based HDInsight cluster, you must enter an SSH Username, and select to enter a PASSWORD or SSH PUBLIC KEY.

If you select SSH PUBLIC KEY, you can either paste the public key (displayed in the Public key for pasting into OpenSSH authorized_keys file field in PuttyGen,) into the SSH PublicKey field, or select Select a file to browse and select the file that contains the public key.

This creates a login for the specified user, and enables either password authentication or SSH key authentication.

###Azure Command-Line Interface for Mac, Linux, and Windows

You can use the Azure CLI for Mac, Linux and Windows to create a new cluster by using the azure hdinsight cluster create command.

For more information on using this command, see Provision Hadoop Linux clusters in HDInsight using custom options.

##Connect to a Linux-based HDInsight cluster

  1. Open PuTTY.

  2. If you provided an SSH key when you created your user account, you must perform the following step to select the private key to use when authenticating to the cluster:

    In Category, expand Connection, expand SSH, and select Auth. Finally, click Browse and select the .ppk file that contains your private key.

  3. In Category, select Session. From the Basic options for your PuTTY session screen, enter the SSH address of your HDInsight server in the Host name (or IP address) field. The SSH address is your cluster name, then -ssh.azurehdinsight.net. For example, mycluster-ssh.azurehdinsight.net.

  4. To save the connection information for future use, enter a name for this connection under Saved Sessions, and then click Save. The connection will be added to the list of saved sessions.

  5. Click Open to connect to the cluster.

    [AZURE.NOTE] If this is the first time you have connected to the cluster, you will receive a security alert. This is normal. Select Yes to cache the server's RSA2 key to continue.

  6. When prompted, enter the user that you entered when you created the cluster. If you provided a password for the user, you will be prompted to enter it also.

[AZURE.NOTE] The above steps assume you are using port 22, which will connect to head node 0 on the HDInsight cluster. If you use port 23, you will connect to head node 1. For more information on the head nodes, see Availability and reliability of Hadoop clusters in HDInsight.

###Connect to worker nodes

The worker nodes are not directly accessible from outside the Azure datacenter, but they can be accessed from the cluster head node via SSH.

If you provided an SSH key when you created your user account, you must perform the following steps to use the private key when authenticating to the cluster if you want to connect to the worker nodes.

  1. Install Pageant from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html. This utility is used to cache SSH keys for PuTTY.

  2. Run Pageant. It will minimize to an icon in the status tray. Right-click the icon and select Add Key.

  3. When the browse dialog appears, select the .ppk file that contains the key, and then click Open. This adds the key to Pageant, which will provide it to PuTTY when connecting to the cluster.

    [AZURE.IMPORTANT] If you used an SSH key to secure your account, you must complete the previous steps before you will be able to connect to worker nodes.

  4. Open PuTTY.

  5. If you use an SSH key to authenticate, in the Category section, expand Connection, expand SSH, and then select Auth.

    In the Authentication parameters section, enable Allow agent forwarding. This allows PuTTY to automatically pass the certificate authentication through the connection to the cluster head node when connecting to worker nodes.

  6. Connect to the cluster as documented earlier. If you use an SSH key for authentication, you do not need to select the key - the SSH key added to Pageant will be used to authenticate to the cluster.

  7. After the connection has been established, use the following to retrieve a list of the nodes in your cluster. Replace ADMINPASSWORD with the password for your cluster admin account. Replace CLUSTERNAME with the name of your cluster.

    This will return information in JSON format for the nodes in the cluster, including host_name, which contains the fully qualified domain name (FQDN) for each node. The following is an example of a host_name entry returned by the curl command:

  8. Once you have a list of the worker nodes you want to connect to, use the following command from the PuTTY session to open a connection to a worker node:

    Replace USERNAME with your SSH user name and FQDN with the FQDN for the worker node. For example, workernode0.workernode-0-e2f35e63355b4f15a31c460b6d4e1230.j1.internal.cloudapp.net.

    [AZURE.NOTE] If you use a password to authentication your SSH session, you will be prompted to enter the password again. If you use an SSH key, the connection should finish without any prompts.

  9. Once the session has been established, the prompt for your PuTTY session will change from username@hn0-clustername to username@wn0-clustername to indicate that you are connected to the worker node. Any commands you run at this point will run on the worker node.

  10. Once you have finished performing actions on the worker node, use the exit command to close the session to the worker node. This will return you to the username@hn0-clustername prompt.

##Add more accounts

If you need to add more accounts to your cluster, perform the following steps:

  1. Generate a new public key and private key for the new user account as described previously.

  2. From an SSH session to the cluster, add the new user with the following command:

    This will create a new user account, but will disable password authentication.

  3. Create the directory and files to hold the key by using the following commands:

  4. When the nano editor opens, copy and paste in the contents of the public key for the new user account. Finally, use Ctrl-X to save the file and exit the editor.

  5. Use the following command to change ownership of the .ssh folder and contents to the new user account:

  6. You should now be able to authenticate to the server with the new user account and private key.

##SSH tunneling

SSH can be used to tunnel local requests, such as web requests, to the HDInsight cluster. The request will then be routed to the requested resource as if it had originated on the HDInsight cluster head node.

[AZURE.IMPORTANT] An SSH tunnel is a requirement for accessing the web UI for some Hadoop services. For example, both the Job History UI or Resource Manager UI can only be accessed using an SSH tunnel.

For more information on creating and using an SSH tunnel, see Use SSH Tunneling to access Ambari web UI, ResourceManager, JobHistory, NameNode, Oozie, and other web UI's.

##Next steps

Now that you understand how to authenticate by using an SSH key, learn how to use MapReduce with Hadoop on HDInsight.

#Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X

[AZURE.SELECTOR]

Secure Shell (SSH) allows you to remotely perform operations on your Linux-based HDInsight clusters using a command-line interface. This document provides information on using SSH with HDInsight from Linux, Unix, or OS X clients.

[AZURE.NOTE] The steps in this article assume you are using a Linux, Unix, or OS X client. These steps may be performed on a Windows-based client if you have installed a package that provides ssh and ssh-keygen, such as Bash on Ubuntu on Windows.

If you do not have SSH installed on your Windows-based client, use the steps in Use SSH with Linux-based HDInsight (Hadoop) from Windows for information on installing and using PuTTY.

##Prerequisites

  • ssh-keygen and ssh for Linux, Unix, and OS X clients. This utilities are usually provided with your operating system, or available through the package management system.

  • A modern web browser that supports HTML5.

OR

  • Azure CLI.

    [AZURE.INCLUDE use-latest-version]

##What is SSH?

SSH is a utility for logging in to, and remotely executing, commands on a remote server. With Linux-based HDInsight, SSH establishes an encrypted connection to the cluster headnode and provides a command line that you use to type in commands. Commands are then executed directly on the server.

###SSH user name

An SSH user name is the name you use to authenticate to the HDInsight cluster. When you specify an SSH user name during cluster creation, this user is created on all nodes in the cluster. Once the cluster is created, you can use this user name to connect to the HDInsight cluster headnodes. From the headnodes, you can then connect to the individual worker nodes.

###SSH password or Public key

An SSH user can use either a password or public key for authentication. A password is just a string of text you make up, while a public key is part of a cryptographic key pair generated to uniquely identify you.

A key is more secure than a password, however it requires additional steps to generate the key and you must maintain the files containing the key in a secure location. If anyone gains access to the key files, they gain access to your account. Or if you lose the key files, you will not be able to login to your account.

A key pair consists of a public key (which is sent to the HDInsight server,) and a private key (which is kept on your client machine.) When you connect to the HDInsight server using SSH, the SSH client will use the private key on your machine to authenticate with the server.

##Create an SSH key

Use the following information if you plan on using SSH keys with your cluster. If you plan on using a password, you can skip this section.

  1. Open a terminal session and use the following command to see if you have any existing SSH keys:

    Look for the following files in the directory listing. These are common names for public SSH keys.

    • id_dsa.pub
    • id_ecdsa.pub
    • id_ed25519.pub
    • id_rsa.pub
  2. If you do not want to use an existing file, or you have no existing SSH keys, use the following to generate a new file:

    You will be prompted for the following information:

    • The file location - The location defaults to ~/.ssh/id_rsa.

    • A passphrase - You will be prompted to re-enter this.

      [AZURE.NOTE] We strongly recommend that you use a secure passphrase for the key. However, if you forget the passphrase, there is no way to recover it.

    After the command finishes, you will have two new files, the private key (for example, id_rsa) and the public key (for example, id_rsa.pub).

##Create a Linux-based HDInsight cluster

When creating a Linux-based HDInsight cluster, you must provide the public key created previously. From Linux, Unix, or OS X clients, there are two ways to create an HDInsight cluster:

  • Azure Portal - Uses a web-based portal to create the cluster.

  • Azure CLI for Mac, Linux and Windows - Uses command-line commands to create the cluster.

Each of these methods will require either a password or a public key. For complete information on creating a Linux-based HDInsight cluster, see Provision Linux-based HDInsight clusters.

###Azure Portal

When using the Azure Portal to create a Linux-based HDInsight cluster, you must enter an SSH USER NAME, and select to enter a PASSWORD or SSH PUBLIC KEY.

If you select SSH PUBLIC KEY, you can either paste the public key (contained in the file with the .pub extension) into the SSH PublicKey field, or select Select a file to browse and select the public key file.

[AZURE.NOTE] The key file is simply a text file. The contents should appear similar to the following:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCelfkjrpYHYiks4TM+r1LVsTYQ4jAXXGeOAF9Vv/KGz90pgMk3VRJk4PEUSELfXKxP3NtsVwLVPN1l09utI/tKHQ6WL3qy89WVVVLiwzL7tfJ2B08Gmcw8mC/YoieT/YG+4I4oAgPEmim+6/F9S0lU2I2CuFBX9JzauX8n1Y9kWzTARST+ERx2hysyA5ObLv97Xe4C2CQvGE01LGAXkw2ffP9vI+emUM+VeYrf0q3w/b1o/COKbFVZ2IpEcJ8G2SLlNsHWXofWhOKQRi64TMxT7LLoohD61q2aWNKdaE4oQdiuo8TGnt4zWLEPjzjIYIEIZGk00HiQD+KCB5pxoVtp user@system

This creates a login for the specified user, by using the password or public key you provide.

###Azure Command-Line Interface for Mac, Linux and Windows

You can use the Azure CLI for Mac, Linux and Windows to create a new cluster by using the azure hdinsight cluster create command.

For more information on using this command, see Provision Hadoop Linux clusters in HDInsight using custom options.

##Connect to a Linux-based HDInsight cluster

From a terminal session, use the SSH command to connect to the cluster headnode by providing the address and user name:

  • SSH address - There are two addresses that may be used to connect to a cluster using SSH:

    • Connect to the headnode: The cluster name, followed by -ssh.azurehdinsight.net. For example, mycluster-ssh.azurehdinsight.net.

    • Connect to the edge node: If your cluster is R Server on HDInsight, the cluster will also contain an edge node that can be accessed using RServer.CLUSTERNAME.ssh.azurehdinsight.net, where CLUSTERNAME is the name of the cluster.

  • User name - The SSH user name you provided when you created the cluster.

The following example will connect to the primary headnode of mycluster as the user me:

If you used a password for the user account, you will be prompted to enter the password.

If you used an SSH key that is secured with a passphrase, you will be prompted to enter the passphrase. Otherwise, SSH will attempt to automatically authenticate by using one of the local private keys on your client.

[AZURE.NOTE] If SSH does not automatically authenticate with the correct private key, use the -i parameter and specify the path to the private key. The following example will load the private key from ~/.ssh/id_rsa:

ssh -i ~/.ssh/id_rsa me@mycluster-ssh.azurehdinsight.net

If you are connecting to using the address for the headnode, and no port is specified, SSH will default to port 22, which will connect to the primary headnode on the HDInsight cluster. If you use port 23, you will connect to the secondary. For more information on the headnodes, see Availability and reliability of Hadoop clusters in HDInsight.

###Connect to worker nodes

The worker nodes are not directly accessible from outside the Azure datacenter, but they can be accessed from the cluster headnode via SSH.

Command To Generate Machine Keys In Hadoop Training

If you use an SSH key to authenticate your user account, you must complete the following steps on your client:

  1. Using a text editor, open ~/.ssh/config. If this file doesn't exist, you can create it by entering touch ~/.ssh/config in the terminal.

  2. Add the following to the file. Replace CLUSTERNAME with the name of your HDInsight cluster.

    This configures SSH agent forwarding for your HDInsight cluster.

  3. Test SSH agent forwarding by using the following command from the terminal:

    This should return information similar to the following:

    If nothing is returned, this indicates that ssh-agent is not running. Consult your operating system documentation for specific steps on installing and configuring ssh-agent, or see Using ssh-agent with ssh.

  4. Once you have verified that ssh-agent is running, use the following to add your SSH private key to the agent:

    If your private key is stored in a different file, replace ~/.ssh/id_rsa with the path to the file.

Use the following steps to connect to the worker nodes for your cluster.

[AZURE.IMPORTANT] If you use an SSH key to authenticate your account, you must complete the previous steps to verify that agent forwarding is working.

  1. Connect to the HDInsight cluster by using SSH as described previously.

  2. Once you are connected, use the following to retrieve a list of the nodes in your cluster. Replace ADMINPASSWORD with the password for your cluster admin account. Replace CLUSTERNAME with the name of your cluster.

    This will return information in JSON format for the nodes in the cluster, including host_name, which contains the fully qualified domain name (FQDN) for each node. The following is an example of a host_name entry returned by the curl command:

  3. Once you have a list of the worker nodes you want to connect to, use the following command from the SSH session to the server to open a connection to a worker node:

    Replace USERNAME with your SSH user name and FQDN with the FQDN for the worker node. For example, workernode0.workernode-0-e2f35e63355b4f15a31c460b6d4e1230.j1.internal.cloudapp.net.

    [AZURE.NOTE] If you use a password to authentication your SSH session, you will be prompted to enter the password again. If you use an SSH key, the connection should finish without any prompts.

  4. Once the session has been established, the terminal prompt will change from username@hn#-clustername to username@wk#-clustername to indicate that you are connected to the worker node. Any commands you run at this point will run on the worker node.

  5. Once you have finished performing actions on the worker node, use the exit command to close the session to the worker node. This will return you to the username@hn#-clustername prompt.

Connect to a Domain-joined HDInsight cluster

Domain-joined HDInsight integrates Kerberos with Hadoop in HDInsight. Because the SSH user is not an Active Direcotry domain user, this user account cannot run Hadoop commands from SSH shell on a domain-joined cluster directly. You must run kinit first.

To run Hive queries on a Domain-joined HDInsight cluster using SSH

Vending Machine Keys

  1. Connect to a Domain-joined HDInsight cluster using SSH. For instrocutions, see Connect to a Linux-based HDInsight cluster.

  2. Run kinit. It will ask you for a domain user name and domain user password. For more information on configure domain users for domain-joined HDInsight clusters, see Configure Domain-joined HDInisight clusters.

  3. Open the Hive console by enter:

    Then you can run Hive commands.

##Add more accounts

Fastenal

  1. Generate a new public key and private key for the new user account, as described in the Create an SSH key section.

    [AZURE.NOTE] The private key should either be generated on a client that the user will use to connect to the cluster, or securely transferred to such a client after creation.

  2. From an SSH session to the cluster, add the new user with the following command:

    This will create a new user account, but will disable password authentication.

  3. Create the directory and files to hold the key by using the following commands:

  4. When the nano editor opens, copy and paste in the contents of the public key for the new user account. Finally, use Ctrl-X to save the file and exit the editor.

  5. Use the following command to change ownership of the .ssh folder and contents to the new user account:

  6. You should now be able to authenticate to the server with the new user account and private key.

##SSH tunneling

SSH can be used to tunnel local requests, such as web requests, to the HDInsight cluster. The request will then be routed to the requested resource as if it had originated on the HDInsight cluster headnode.

[AZURE.IMPORTANT] An SSH tunnel is a requirement for accessing the web UI for some Hadoop services. For example, both the Job History UI or Resource Manager UI can only be accessed using an SSH tunnel.

For more information on creating and using an SSH tunnel, see Use SSH Tunneling to access Ambari web UI, ResourceManager, JobHistory, NameNode, Oozie, and other web UI's.

##Next steps

Now that you understand how to authenticate by using an SSH key, learn how to use MapReduce with Hadoop on HDInsight.