Loading ...
Sorry, an error occurred while loading the content.

Re: [Oozie-users] pre step for my m/r job

Expand Messages
  • Alejandro Abdelnur
    Hi Frank, You are correct, currently Oozie FS operations support exact path URIS (files or directories), not wildcards. You can move the complete directory.
    Message 1 of 4 , Dec 7, 2010
    View Source
    • 0 Attachment
      Hi Frank,

      You are correct, currently Oozie FS operations support exact path URIS
      (files or directories), not wildcards. You can move the complete
      directory.

      Cheers.

      Alejandro

      On Wed, Dec 8, 2010 at 2:29 AM, Frank <frankmarit@...> wrote:
      > Hi,
      >
      > As a pre-step for running my map-reduce action, I'd like to move files from an input directory to a processing directory to make sure that if another instance of my job fires up they don't clobber each other.
      >
      > I was trying to do this with an fs action but it didn't work. Any suggestions? Here is the snippet from my workflow.xml:
      >
      > <workflow-app name='app1' xmlns="uri:oozie:workflow:0.2">
      >  <action name='app1-pre'>
      >    <fs>
      >       <move source='${nameNode}/${app1_input_dir}/*'
      >        target='${app1_processing_dir}/${wf:id()}'/>
      >    </fs>
      >  </action>
      > </workflow-app>
      >
      > Here is the error message:
      >
      > org.apache.oozie.action.ActionExecutorException: FS006: move, source path [hdfs://localhost:8020/user/me/input/*] does not exist
      >
      > There are definitely files in that directory but maybe this move only works with directories or specific files and not wildcards?
      >
      > Maybe there is a better way to do this?
      >
      > Thanks!
      >
      >
      >
      > ------------------------------------
      >
      > Yahoo! Groups Links
      >
      >
      >
      >
    • Frank
      Thanks for the reply. Can you put in a change request for this feature? I don t really want to move the entire directory because a separate process will be
      Message 2 of 4 , Dec 7, 2010
      View Source
      • 0 Attachment
        Thanks for the reply. Can you put in a change request for this feature? I don't really want to move the entire directory because a separate process will be adding files to the input directory at the same-ish time as this job will be grabbing files to process.

        I'm trying to do a drop box approach for my pipeline of jobs. Job1 looks for files in input directory, moves them to processing to be worked on by the job which writes its result to the output directory where job2 will pick it up and so forth. Maybe there is a better way to do this? I haven't looked at the coordinator stuff yet, but maybe this does what I want?

        Thanks!

        --- In Oozie-users@yahoogroups.com, Alejandro Abdelnur <tucu@...> wrote:
        >
        > Hi Frank,
        >
        > You are correct, currently Oozie FS operations support exact path URIS
        > (files or directories), not wildcards. You can move the complete
        > directory.
        >
        > Cheers.
        >
        > Alejandro
        >
        > On Wed, Dec 8, 2010 at 2:29 AM, Frank <frankmarit@...> wrote:
        > > Hi,
        > >
        > > As a pre-step for running my map-reduce action, I'd like to move files from an input directory to a processing directory to make sure that if another instance of my job fires up they don't clobber each other.
        > >
        > > I was trying to do this with an fs action but it didn't work. Any suggestions? Here is the snippet from my workflow.xml:
        > >
        > > <workflow-app name='app1' xmlns="uri:oozie:workflow:0.2">
        > >  <action name='app1-pre'>
        > >    <fs>
        > >       <move source='${nameNode}/${app1_input_dir}/*'
        > >        target='${app1_processing_dir}/${wf:id()}'/>
        > >    </fs>
        > >  </action>
        > > </workflow-app>
        > >
        > > Here is the error message:
        > >
        > > org.apache.oozie.action.ActionExecutorException: FS006: move, source path [hdfs://localhost:8020/user/me/input/*] does not exist
        > >
        > > There are definitely files in that directory but maybe this move only works with directories or specific files and not wildcards?
        > >
        > > Maybe there is a better way to do this?
        > >
        > > Thanks!
        > >
        > >
        > >
        > > ------------------------------------
        > >
        > > Yahoo! Groups Links
        > >
        > >
        > >
        > >
        >
      • Alejandro Abdelnur
        Frank, In your approach, how do you ensure that all the files being moved are completely written and closed? Some of the files in the directory could be in the
        Message 3 of 4 , Dec 7, 2010
        View Source
        • 0 Attachment
          Frank,

          In your approach, how do you ensure that all the files being moved are
          completely written and closed? Some of the files in the directory
          could be in the process of being written.

          An alternative approach would be that you drop box like process keeps
          rolling drop directories and after a directory is rolled a _SUCCESS
          file is written in it. At that point the coordinator would then
          understand that the contents of the directory are complete and it will
          start the corresponding workflow.

          For example, you'r dropbox-like process would roll directories every
          1, /user/dropbox/$Y/$M/$D/$H, then you could have a coordinator job
          with a 1 hour frequency that looks for the corresponding hourly dir
          and kicks the workflow when the _SUCCESS file is in there.

          Hope this helps.

          Alejandro

          On Wed, Dec 8, 2010 at 8:54 AM, Frank <frankmarit@...> wrote:
          > Thanks for the reply. Can you put in a change request for this feature? I don't really want to move the entire directory because a separate process will be adding files to the input directory at the same-ish time as this job will be grabbing files to process.
          >
          > I'm trying to do a drop box approach for my pipeline of jobs. Job1 looks for files in input directory, moves them to processing to be worked on by the job which writes its result to the output directory where job2 will pick it up and so forth. Maybe there is a better way to do this? I haven't looked at the coordinator stuff yet, but maybe this does what I want?
          >
          > Thanks!
          >
          > --- In Oozie-users@yahoogroups.com, Alejandro Abdelnur <tucu@...> wrote:
          >>
          >> Hi Frank,
          >>
          >> You are correct, currently Oozie FS operations support exact path URIS
          >> (files or directories), not wildcards. You can move the complete
          >> directory.
          >>
          >> Cheers.
          >>
          >> Alejandro
          >>
          >> On Wed, Dec 8, 2010 at 2:29 AM, Frank <frankmarit@...> wrote:
          >> > Hi,
          >> >
          >> > As a pre-step for running my map-reduce action, I'd like to move files from an input directory to a processing directory to make sure that if another instance of my job fires up they don't clobber each other.
          >> >
          >> > I was trying to do this with an fs action but it didn't work. Any suggestions? Here is the snippet from my workflow.xml:
          >> >
          >> > <workflow-app name='app1' xmlns="uri:oozie:workflow:0.2">
          >> >  <action name='app1-pre'>
          >> >    <fs>
          >> >       <move source='${nameNode}/${app1_input_dir}/*'
          >> >        target='${app1_processing_dir}/${wf:id()}'/>
          >> >    </fs>
          >> >  </action>
          >> > </workflow-app>
          >> >
          >> > Here is the error message:
          >> >
          >> > org.apache.oozie.action.ActionExecutorException: FS006: move, source path [hdfs://localhost:8020/user/me/input/*] does not exist
          >> >
          >> > There are definitely files in that directory but maybe this move only works with directories or specific files and not wildcards?
          >> >
          >> > Maybe there is a better way to do this?
          >> >
          >> > Thanks!
          >> >
          >> >
          >> >
          >> > ------------------------------------
          >> >
          >> > Yahoo! Groups Links
          >> >
          >> >
          >> >
          >> >
          >>
          >
          >
          >
          >
          > ------------------------------------
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
        Your message has been successfully submitted and would be delivered to recipients shortly.