Joining Data with Streaming Using Python Code

I listed some commands so that we can debug our program in local mode. If we want to see the output of mapper, type

1
cat join2*.txt | ./map.py | sort

If we want to see the output of reducer, type

1
cat join2*.txt | ./map.py | sort | ./reducer.py

If the output are right, then launch MapReduce

1
hadoop jar /Users/logankim/Documents/hadoop-2.7.1/hadoop-streaming.jar -input /user/input -output /user/output -mapper map.py -reducer reducer.py

Generate data

Read more »

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner.

A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks.

Download & Install Hadoop

Read more »

LaTeX is a high-quality typesetting system. It includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents.

Here is the brief guide on how to install LaTeX with Sublime on Mac OS. When this is done, you should be able to build your latex files and generate PDF files.

I was using a Macbook, so please make sure that you also have a Mac by your side, and then go on with the rest of this blog.

Guide

Read more »

Hexo is a fast, simple & powerful blog framework, powered by Node.js. By using it, we can generate static web pages and store it on GitHub. I have spent few hours figuring out how this work, and I finally build it successfully. During the process, I have come across some problems, so I would like to write a simple guide on illustrating how I managed to tackle the issues.


I was using a Macbook, so please make sure that you also have a Mac by your side, and then go on with the rest of this blog.


Guide

Read more »