mapreduce编程模型分为两个阶段:map和reduce。
map阶段分为以下四个步骤:
- 输入数据格式解析:InputFormat
- Map数据处理:Mapper
- 本地Reduce数据处理:Combiner
- 数据分组:Partitioner
reduce阶段分为以下三个步骤:
- 远程数据拷贝
- 数据按照key排序
- Reduce数据处理:Reducer
- 数据输出格式:OutputFormat
其中,红色步骤是Hadoop自己实现,其他步骤都可以自定义。
以下是Hadoop中的”HelloWorld”,WordCount程序。
map方法:
// Mapper<输入key, 输入value, 输出key, 输出value> public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private IntWritable one = new IntWritable(1); private Text word = new Text(); // context程序运行上下文,提供方法,也可以用其来传递参数 @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
reduce方法:
// Reducer<输入key, 输入value, 输出key, 输出value>,输入key输入value要与Mapper的输出key输出value一致 public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable count = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } count.set(sum); context.write(key, count); } }
主方法:
public class WordCount { public static void main(String[] args) throws Exception { if (args.length != 2) { System.exit(1); } String inputPath = args[0]; String outputPath = args[1]; // 配置文件 Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); // 通过class找到jar包的绝对路径 job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(inputPath)); FileOutputFormat.setOutputPath(job, new Path(outputPath)); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
在命令行执行hadoop jar wordcount.jar com.ee.hadoop.mapred.WordCount /wordcount/input /wordcount/output
这里output这个目录不能存在,否则会报错。
好文章!666,学习了