Author: Manish Prabhu
Hadoop is a framework written in Java which is basically used for processing big data applications. Hadoop is having two basic components:
- HDFS (Hadoop Distributed File System): It is a file system used in Hadoop which stores the information in the form of blocks. It has two main components: a namenode & a datanode. A namenode stores the metadata about datanodes. A datanode stores the actual information.
- MapReduce: It is a processing framework for Hadoop having two main phases a map & a reduce function. A map divides the input into parts and applies some processing on input. Reduce function is used to combine the results and to generate the final output.
- YARN: Yet Another Resource Negotiator is used as a resource management component of hadoop.
1.1 Installing Hadoop on Windows
Step 1: Download & Install java 8 on your system. Click here.
Step 2: Download hadoop archive(Hadoop 3.1.0) zip file & extract it in C:\Users\DELL location.
Step 3: Right click on This PC. Click on Properties. Click on Advanced system settings. Click on the environment variables button.
Step 4: Create a new user variable called JAVA_HOME & put java installation directory path upto bin directory inside it.
Step 5: Create a new user variable called HADOOP_HOME & put hadoop extracted directory path inside it.
Step 6: Edit Path system variable. It should contain the HADOOP_HOME path as well as JAVA_HOME path.
Step 7: Go to hadoop-3.1.0 directory(extracted). Go to the etc/hadoop folder. Edit core-site.xml file using any editor. For the first time, you will get <configuration> </configuration> empty tag. You need to put the required configuration inside it.
Step 8: Go to hadoop main directory. Create a data folder inside it. Inside data folder create two subfolders namely namenode & datanode. Again go to etc/hadoop directory & edit hdfs-site.xml as follows:
Step 9: Edit mapred-site.xml file as follows:
Step 10: Edit yarn-site.xml file as follows:
Step 11: Edit hadoop-env.cmd file. Edit JAVA_HOME variable inside it. Put the java installation main directory path inside it. If the path contains spaces then put the path inside double quotes.
Step 12: Download hadoop windows compatible files from https://github.com/s911415/apache-hadoop-3.1.0-winutils.
Step 13: If you extract this folder, then you will get bin directory. Copy all the files from the bin folder. Either you can paste and replace all the files in the bin folder of hadoop directory or you can rename the bin folder from hadoop directory and paste the complete bin folder.
Step 14: Check java version using the command java -version.
Step 15: Check hadoop installation is done properly or not using a command: hadoop version
Step 16: Open the command prompt and do the command hdfs namenode -format to format the namenode. It is done only once when the hadoop is installed. If you do it again, it will delete all the data.
Step 17: Give read/write permission for access to namenode and datanode folder chmod command using windows utility installed inside bin folder of hadoop directory.
If you do not give the appropriate permission, then it will throw permission denied exception while running the hadoop daemons.
Step 18: Start hdfs and yarn using a command start-all.cmd by navigating inside sbin directory of hadoop or you can give two seperate commands start-dfs.cmd and start-yarn.cmd. It will open two new windows after making the start-dfs command. One window will show the start of a namenode and another window will show the start of a datanode. After making start-yarn command, two new windows will appear. One window will show the start of a resource manager and another will show the start of a nodemanager.
Step 19: You can view resource manager current jobs, finished jobs etc by visiting following link: http://localhost:8088/cluster
Step 20: You can view HDFS information by using following link: http://localhost:9870/
3.0 Integration HDFS connector With MuleSoft
HDFS connector in mule 4 is used to connect hadoop and mule applications. It is used to integrate HDFS operations with mule. In order to use this connector, you need to import it anypoint exchange. Various operations that can be performed using HDFS connector are:
- Make directories: It is used to create a folder inside HDFS. Make directories connector configuration:
Make directories connector configuration:
You can test it using postman and verify it using dashboard.
- Copy from local file: It is used to copy a file from local file system to HDFS directory.
Copy from local connector configuration:
Using the above connector, we can copy the abc.txt file from desktop to /abc /folder1 directory on HDFS. You can verify it using postman and user interface.
- Get metadata: this connector is used to get metadata of a file.
Get metadata connector configuration:
You will get metadata of file abc.txt. You can verify it using a postman. You will get a json object as an output.
- Delete file: This connector is used to delete a file from the HDFS file system.
Delete a file connector configuration:
Delete a file postman output:
You can verify this operation by viewing a dashboard. You will not be able to see the abc.txt file.
- Delete directory: this connector is used to delete a directory.
Delete directory connector configuration:
Delete directory postman output:
You can verify whether a directory is deleted or not using a dashboard:
Folder1 is deleted from folder abc.