Loading ...
Sorry, an error occurred while loading the content.

Re: Trying to run the job that uses new hadoop api using oozie!

Expand Messages
  • santhosh
    Hi, I used code given by you to run in the oozie. job is running successfully but wen I see the output, its the output of the map task.. Reducer task is not
    Message 1 of 7 , Jan 2, 2011
    View Source
    • 0 Attachment
      Hi,
      I used code given by you to run in the oozie. job is running successfully but wen I see the output, its the output of the map task.. Reducer task is not stating.

      Thanks&Regards,
      Santhosh

      --- In Oozie-users@yahoogroups.com, Mohammad Islam <kamrul@...> wrote:
      >
      >
      > I think there is something related to MR code and associated MR action configurations.
      >
      >
      > In the mean time, would you please try the followings that our QA has verified to make sure it is not any other setup issue.
      >
      > We plan to look into your use-case on next Monday, if it is not resolved by then.
      >
      >
      > The MR code is also the wordcount example come with hadoop. Code is as follows:
      >
      > package org.apache.hadoop.examples;
      >
      > import java.io.IOException;
      > import java.util.StringTokenizer;
      >
      > import org.apache.hadoop.conf.Configuration;
      > import org.apache.hadoop.fs.Path;
      > import org.apache.hadoop.io.IntWritable;
      > import org.apache.hadoop.io.Text;
      > import org.apache.hadoop.mapreduce.Job;
      > import org.apache.hadoop.mapreduce.Mapper;
      > import org.apache.hadoop.mapreduce.Reducer;
      > import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
      > import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
      > import org.apache.hadoop.util.GenericOptionsParser;
      >
      > public class WordCount {
      >
      > public static class TokenizerMapper
      > extends Mapper<Object, Text, Text, IntWritable>{
      >
      > private final static IntWritable one = new IntWritable(1);
      > private Text word = new Text();
      >
      > public void map(Object key, Text value, Context context
      > ) throws IOException, InterruptedException {
      > StringTokenizer itr = new StringTokenizer(value.toString());
      > while (itr.hasMoreTokens()) {
      > word.set(itr.nextToken());
      > context.write(word, one);
      > }
      > }
      > }
      >
      > public static class IntSumReducer
      > extends Reducer<Text,IntWritable,Text,IntWritable> {
      > private IntWritable result = new IntWritable();
      >
      > public void reduce(Text key, Iterable<IntWritable> values,
      > Context context
      > ) throws IOException, InterruptedException {
      > Context context
      > ) throws IOException, InterruptedException {
      > int sum = 0;
      > for (IntWritable val : values) {
      > sum += val.get();
      > }
      > result.set(sum);
      > context.write(key, result);
      > }
      > }
      >
      > public static void main(String[] args) throws Exception {
      > Configuration conf = new Configuration();
      > String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
      > if (otherArgs.length != 2) {
      > System.err.println("Usage: wordcount <in> <out>");
      > System.exit(2);
      > }
      > Job job = new Job(conf, "word count");
      > job.setJarByClass(WordCount.class);
      > job.setMapperClass(TokenizerMapper.class);
      > job.setCombinerClass(IntSumReducer.class);
      > job.setReducerClass(IntSumReducer.class);
      > job.setOutputKeyClass(Text.class);
      > job.setOutputValueClass(IntWritable.class);
      > FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
      > FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
      > System.exit(job.waitForCompletion(true) ? 0 : 1);
      > }
      > }
      >
      >
      > MR action definition:
      > =====================
      >
      > <map-reduce xmlns="uri:oozie:workflow:0.1">
      > <job-tracker>**FILL_IT_WITH_YOURVALUE**</job-tracker>
      > <name-node>**FILL_IT_WITH_YOURVALUE**</name-node>
      > <prepare>
      > <delete path="**FILL_IT_WITH_YOURVALUE**" />
      > </prepare>
      > <configuration>
      > <property>
      > <name>mapred.mapper.new-api</name>
      > <value>true</value>
      > </property>
      > <property>
      > <name>mapred.reducer.new-api</name>
      > <value>true</value>
      > </property>
      > <property>
      > <name>mapred.mapper.class</name>
      > <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
      > </property>
      > <property>
      > <name>mapred.reducer.class</name>
      > <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
      > </property>
      > <property>
      > <name>mapred.map.tasks</name>
      > <value>1</value>
      > </property>
      > <property>
      > <name>mapred.input.dir</name>
      > <value>**FILL_IT_WITH_YOURVALUE**</value>
      > </property>
      > <property>
      > <name>mapred.output.dir</name>
      > <value>**FILL_IT_WITH_YOURVALUE**</value>
      > </property>
      > <property>
      > <name>mapred.job.queue.name</name>
      > <value>**FILL_IT_WITH_YOURVALUE**</value>
      > </property>
      > <property>
      > <name>mapreduce.job.acl-view-job</name>
      > <value>*</value>
      > </property>
      > <property>
      > <name>oozie.launcher.mapreduce.job.acl-view-job</name>
      > <value>*</value>
      > </property>
      > </configuration>
      > </map-reduce>
      >
      >
      >
      >
      >
      >
      > On 12/30/10 9:08 PM, "santhosh" <gsanthosh_12@...> wrote:
      >
      >
      >
      >
      >
      >
      >
      > hi,
      >
      > my MR code is simple wordcount job.
      > ------------------------------------
      >
      > public class WordCount extends Configured implements Tool {
      >
      > public static class Map
      > extends Mapper<LongWritable, Text, Text, IntWritable> {
      > private final static IntWritable one = new IntWritable(1);
      > private Text word = new Text();
      >
      > public void map(LongWritable key, Text value, Context context)
      > throws IOException, InterruptedException {
      > String line = value.toString();
      > StringTokenizer tokenizer = new StringTokenizer(line);
      > while (tokenizer.hasMoreTokens()) {
      > word.set(tokenizer.nextToken());
      > context.write(word, one);
      > }
      > }
      > }
      >
      > public static class Reduce
      > extends Reducer<Text, IntWritable, Text, IntWritable> {
      > public void reduce(Text key, Iterable<IntWritable> values,
      > Context context) throws IOException, InterruptedException {
      >
      > int sum = 0;
      > for (IntWritable val : values) {
      > sum += val.get();
      > }
      > context.write(key, new IntWritable(sum));
      > }
      > }
      > }
      >
      > 2. workflow.xml is:-
      > -------------------
      > <workflow-app xmlns='uri:oozie:workflow:0.1' name='map-reduce-wf'>
      > <start to='hadoop1'/>
      > <action name='hadoop1'>
      > <map-reduce>
      > <job-tracker>${jobTracker}</job-tracker>
      > <name-node>${nameNode}</name-node>
      > <prepare>
      > <!-- <delete path="hdfs://localhost:8020/user/training/oozie/output"/> -->
      > </prepare>
      > <configuration>
      > <property>
      > <name>mapred.reducer.new-api</name>
      > <value>true</value>
      > </property>
      > <property>
      > <name>mapred.mapper.new-api</name>
      > <value>true</value>
      > </property>
      > <property>
      > <name>mapred.mapper.class</name>
      > <value>org.myorg.WordCount$Map</value>
      > </property>
      >
      > <property>
      > <name>mapred.output.key.class</name>
      > <value>org.apache.hadoop.io.Text</value>
      > </property>
      >
      > <property>
      > <name>mapred.output.value.class</name>
      > <value>org.apache.hadoop.io.IntWritable</value>
      > </property>
      > <property>
      > <name>mapred.reducer.class</name>
      > <value>org.myorg.WordCount$Reduce</value>
      > </property>
      > <!-- <property>
      > <name>mapred.input.format.class</name>
      > <value>org.apache.hadoop.mapred.KeyValueInputFormat</value>
      > </property>
      > <property>
      > <name>mapred.output.format.class</name>
      > <value>org.apache.hadoop.mapred.TextOutputFormat</value>
      > </property> -->
      >
      > <property>
      > <name>mapred.input.dir</name>
      > <value>${inputDir}</value>
      > </property>
      > <property>
      > <name>mapred.output.dir</name>
      > <value>${outputDir}</value>
      > </property>
      > <property>
      > <name> mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
      > <value>true</value>
      > </property>
      >
      > </configuration>
      > </map-reduce>
      > <ok to="end"/>
      > <error to="fail"/>
      > </action>
      > <kill name="fail">
      > <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
      > </kill>
      > <end name="end" />
      > </workflow-app>
      >
      > even tried by uncommenting the mapred.input.format.class and output.format.class.. It didn't help!
      >
      > Thanks®ards,
      > Santhosh
      >
      > --- In Oozie-users@yahoogroups.com <mailto:Oozie-users%40yahoogroups.com> , Mohammad Islam <kamrul@> wrote:
      > >
      > > Hi Santhosh,
      > > Could you please send your java code (after removing your business logic)?
      > > Also include your workflow.xml.
      > >
      > > Regards,
      > > Mohammad
      > >
      > >
      > > On 12/30/10 3:49 AM, "santhosh" <gsanthosh_12@> wrote:
      > >
      > >
      > >
      > >
      > >
      > >
      > > hi,
      > > Yes I had read that threading. I'm using the additional new properties in the workflow.xml. Even after specifying the new properties also I'm finding the same exception... wat could be the reason ?
      > > As I could execute the same job without oozie.!
      > >
      > > Thanks&Regards,
      > > Santhosh
      > >
      > > the exception in map task is:-
      > > ------------------------------
      > > java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
      > > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:863)
      > > > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:566)
      > > > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
      > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      > > > at java.security.AccessController.doPrivileged(Native Method)
      > > > at javax.security.auth.Subject.doAs(Subject.java:396)
      > > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      > > > at org.apache.hadoop.mapred.Child.main(Child.java:211)
      > > > java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
      > > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:863)
      > > > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:566)
      > > > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
      > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      > > > at java.security.AccessController.doPrivileged(Native Method)
      > > > at javax.security.auth.Subject.doAs(Subject.java:396)
      > > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      > > > at org.apache.hadoop.mapred.Child.main(Child.java:211)
      > > > java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
      > > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:863)
      > > > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:566)
      > > > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
      > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      > > > at java.security.AccessController.doPrivileged(Native Method)
      > > > at javax.security.auth.Subject.doAs(Subject.java:396)
      > > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      > > > at org.apache.hadoop.mapred.Child.main(Child.java:211)
      > > > java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
      > > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:863)
      > > > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:566)
      > > > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
      > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      > > > at java.security.AccessController.doPrivileged(Native Method)
      > > > at javax.security.auth.Subject.doAs(Subject.java:396)
      > > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      > > > at org.apache.hadoop.mapred.Child.main(Child.java:211)
      > > > java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
      > > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:863)
      > > > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:566)
      > > > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
      > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      > > > at java.security.AccessController.doPrivileged(Native Method)
      > > > at javax.security.auth.Subject.doAs(Subject.java:396)
      > > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      > > > at org.apache.hadoop.mapred.Child.main(Child.java:211)
      > > >
      > >
      > > --- In Oozie-users@yahoogroups.com <mailto:Oozie-users%40yahoogroups.com> <mailto:Oozie-users%40yahoogroups.com> , Mohammad Islam <kamrul@> wrote:
      > > >
      > > > Hi Santhosh,
      > > > Looks like you are using hadoop 20 new API. The following email that was sent few days back might give you a workaround and associated warnings. Basically you have to specify couple of new property in your workflow action definition.
      > > >
      > > > Regards,
      > > > Mohammad
      > > >
      > > > Old email sent few days back:
      > > >
      > > > Since new MR API (a.k.a. Hadoop 20 API) is neither stable nor supported, Hadoop team highly recommends not to use new MR API. Instead, Hadoop team recommends using the old API at least until Hadoop 0.22.x is released.
      > > >
      > > > The reasons behind this recommendation are as follows
      > > > You are guaranteed needing to rewrite once the api changes. You would not be saving the cost of rewrite.
      > > > The api is not final and not mature. You would be taking the risk/cost of testing the code and then have it changed on you in the future.
      > > > There is a possibility of backward incompatibility as Hadoop 20 API is not approved. You would take the risk of figuring our backward incompatibility issues.
      > > > There would not be any support efforts if users bump into a problem. You would take the risk of maintaining unsupported code.
      > > >
      > > >
      > > > Having said that, there is a way of running MR jobs written using 20 API in Oozie. Basically, you have to include the following property into MR action configuration.
      > > >
      > > > <property>
      > > > <name>mapred.reducer.new-api</name>
      > > > <value>true</value>
      > > > </property>
      > > > <property>
      > > > <name>mapred.mapper.new-api</name>
      > > > <value>true</value>
      > > > </property>
      > > >
      > > >
      > > > Regards,
      > > > Mohammad
      > > >
      > > > On 12/29/10 2:08 AM, "santhosh" <gsanthosh_12@> wrote:
      > > >
      > > >
      > > >
      > > >
      > > >
      > > >
      > > > Hi,
      > > > I'm just trying to run job using oozie 2.2.1+82 in which, mapper & reducer classes extends mapreduce.Mapper.class and mapreduce.Reducer.class respectively.
      > > > and my hadoop version is 0.20.2.
      > > > but it's failing. when I see the task tracker webconsole killed task
      > > > I'm can see an error,
      > > >
      > > > java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
      > > > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:863)
      > > > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:566)
      > > > at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
      > > > at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
      > > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
      > > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
      > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)
      > > > at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
      > > > at java.security.AccessController.doPrivileged(Native Method)
      > > > at javax.security.auth.Subject.doAs(Subject.java:396)
      > > > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
      > > > at org.apache.hadoop.mapred.Child.main(Child.java:211)
      > > >
      > > > for each of the map task.
      > > >
      > > > I could run the same job without the oozie successfully!
      > > >
      > > > I tried changing the mapred.inputformat.class to KeyValueTestInputFormat.class it didn't help..
      > > > Can I knw the reason why I'm getting this exception?
      > > >
      > > > thanks®ards,
      > > > Santhosh
      > > >
      > >
      >
    Your message has been successfully submitted and would be delivered to recipients shortly.