Loading ...
Sorry, an error occurred while loading the content.

Media Type for resource archives

Expand Messages
  • Jan Algermissen
    Hi, I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file. One
    Message 1 of 7 , Jan 10, 2014
    • 0 Attachment
      Hi,

      I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.

      One option would be to just use a zip-based format and an manifest file with a well known name.

      The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.

      Solution would be to unpack to disk first and go from there. Not nice.

      A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.

      Nice things about that:
      - Ordering is guaranteed
      - Full support for per-part MIME headers
      - Content-Length enables fast splitting of the parts
      - cid: URIs make for natural, standard URI-references inside the file
      - stream processing without temporary storage

      I am interested in reactions to the two alternatives or any ideas beyond that.

      Jan

      [1] Well, obviously their entities at some point in time
    • Craig McClanahan
      We (Jive) use the multipart/form-data in lots of use cases, such as uploading a document (with a well defined JSON format) plus some associated attachments.
      Message 2 of 7 , Jan 10, 2014
      • 0 Attachment
        We (Jive) use the multipart/form-data in lots of use cases, such as uploading a document (with a well defined JSON format) plus some associated attachments.  However, we don't enforce a requirement that the JSON data be the first body part -- instead, we search through the available body parts for it with some well-defined heuristics, and treat all the other body parts as attachments.

        Craig McClanahan


        On Fri, Jan 10, 2014 at 2:35 PM, Jan Algermissen <jan.algermissen@...> wrote:
         

        Hi,

        I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.

        One option would be to just use a zip-based format and an manifest file with a well known name.

        The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.

        Solution would be to unpack to disk first and go from there. Not nice.

        A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.

        Nice things about that:
        - Ordering is guaranteed
        - Full support for per-part MIME headers
        - Content-Length enables fast splitting of the parts
        - cid: URIs make for natural, standard URI-references inside the file
        - stream processing without temporary storage

        I am interested in reactions to the two alternatives or any ideas beyond that.

        Jan

        [1] Well, obviously their entities at some point in time


      • Jonathan Ballard
        Look at QUIC, and consider this format and sequence:
        Message 3 of 7 , Jan 11, 2014
        • 0 Attachment
          Look at QUIC, and consider this format and sequence:

          <script src=... type="application/json+quic" var="abc" index="123"/>
          <script src=... type="application/json+quic" var="abc" index="456"/>
          <script src=... type="application/json+quic" var="abc" index="789"/>

          Let the browser recognize the same manifest (...), let the extra micro-data wrap up the statement:

          var abc[123] = ... ;

          Slight modification to change "var"+"index" to "part" is another solution. I hope something like this makes HTML6.



          On Fri, Jan 10, 2014 at 5:59 PM, Craig McClanahan <craigmcc@...> wrote:
           

          We (Jive) use the multipart/form-data in lots of use cases, such as uploading a document (with a well defined JSON format) plus some associated attachments.  However, we don't enforce a requirement that the JSON data be the first body part -- instead, we search through the available body parts for it with some well-defined heuristics, and treat all the other body parts as attachments.

          Craig McClanahan


          On Fri, Jan 10, 2014 at 2:35 PM, Jan Algermissen <jan.algermissen@...> wrote:
           

          Hi,

          I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.

          One option would be to just use a zip-based format and an manifest file with a well known name.

          The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.

          Solution would be to unpack to disk first and go from there. Not nice.

          A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.

          Nice things about that:
          - Ordering is guaranteed
          - Full support for per-part MIME headers
          - Content-Length enables fast splitting of the parts
          - cid: URIs make for natural, standard URI-references inside the file
          - stream processing without temporary storage

          I am interested in reactions to the two alternatives or any ideas beyond that.

          Jan

          [1] Well, obviously their entities at some point in time



        • Matt McClure
          Got a link describing QUIC in more detail? My Googling failed me.
          Message 4 of 7 , Jan 11, 2014
          • 0 Attachment

            Got a link describing QUIC in more detail? My Googling failed me.

            On Jan 11, 2014 4:57 PM, "Jonathan Ballard" <dzonatas@...> wrote:
             

            Look at QUIC, and consider this format and sequence:

            <script src=... type="application/json+quic" var="abc" index="123"/>
            <script src=... type="application/json+quic" var="abc" index="456"/>
            <script src=... type="application/json+quic" var="abc" index="789"/>

            Let the browser recognize the same manifest (...), let the extra micro-data wrap up the statement:

            var abc[123] = ... ;

            Slight modification to change "var"+"index" to "part" is another solution. I hope something like this makes HTML6.



            On Fri, Jan 10, 2014 at 5:59 PM, Craig McClanahan <craigmcc@...> wrote:
             

            We (Jive) use the multipart/form-data in lots of use cases, such as uploading a document (with a well defined JSON format) plus some associated attachments.  However, we don't enforce a requirement that the JSON data be the first body part -- instead, we search through the available body parts for it with some well-defined heuristics, and treat all the other body parts as attachments.

            Craig McClanahan


            On Fri, Jan 10, 2014 at 2:35 PM, Jan Algermissen <jan.algermissen@...> wrote:
             

            Hi,

            I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.

            One option would be to just use a zip-based format and an manifest file with a well known name.

            The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.

            Solution would be to unpack to disk first and go from there. Not nice.

            A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.

            Nice things about that:
            - Ordering is guaranteed
            - Full support for per-part MIME headers
            - Content-Length enables fast splitting of the parts
            - cid: URIs make for natural, standard URI-references inside the file
            - stream processing without temporary storage

            I am interested in reactions to the two alternatives or any ideas beyond that.

            Jan

            [1] Well, obviously their entities at some point in time



          • Jonathan Ballard
            Here is one link for experiments: http://blog.chromium.org/2013/06/experimenting-with-quic.html Since IPv6, we can use Content-Length to describe parts of the
            Message 5 of 7 , Jan 12, 2014
            • 0 Attachment
              Here is one link for experiments:

              Since IPv6, we can use Content-Length to describe parts of the entire UDP transfer. From that base, there are several ideas how to describe contents of the parts, especially with Content-Type. QUIC adds connection identifiers.

              In my previous post here, even if there is no manifest, the connection identifier could cause each <script> to source from the same single transfer.

              I used one similar implementation that batched 10,000+ separate requests into packets that were only split by time instead of size. That was before QUIC.

              Notice the trade-off to wrap up <script>s without jQuery overhead (or JS loaders).


              On Sat, Jan 11, 2014 at 2:02 PM, Matt McClure <matthewlmcclure@...> wrote:

              Got a link describing QUIC in more detail? My Googling failed me.

              On Jan 11, 2014 4:57 PM, "Jonathan Ballard" <dzonatas@...> wrote:
               

              Look at QUIC, and consider this format and sequence:

              <script src=... type="application/json+quic" var="abc" index="123"/>
              <script src=... type="application/json+quic" var="abc" index="456"/>
              <script src=... type="application/json+quic" var="abc" index="789"/>

              Let the browser recognize the same manifest (...), let the extra micro-data wrap up the statement:

              var abc[123] = ... ;

              Slight modification to change "var"+"index" to "part" is another solution. I hope something like this makes HTML6.



              On Fri, Jan 10, 2014 at 5:59 PM, Craig McClanahan <craigmcc@...> wrote:
               

              We (Jive) use the multipart/form-data in lots of use cases, such as uploading a document (with a well defined JSON format) plus some associated attachments.  However, we don't enforce a requirement that the JSON data be the first body part -- instead, we search through the available body parts for it with some well-defined heuristics, and treat all the other body parts as attachments.

              Craig McClanahan


              On Fri, Jan 10, 2014 at 2:35 PM, Jan Algermissen <jan.algermissen@...> wrote:
               

              Hi,

              I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.

              One option would be to just use a zip-based format and an manifest file with a well known name.

              The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.

              Solution would be to unpack to disk first and go from there. Not nice.

              A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.

              Nice things about that:
              - Ordering is guaranteed
              - Full support for per-part MIME headers
              - Content-Length enables fast splitting of the parts
              - cid: URIs make for natural, standard URI-references inside the file
              - stream processing without temporary storage

              I am interested in reactions to the two alternatives or any ideas beyond that.

              Jan

              [1] Well, obviously their entities at some point in time




            • Edward Summers
              Hi Jan, Have you run across the WARC format yet [1]? It was built for serializing representations of resources for the Web archiving domain (Internet Archive,
              Message 6 of 7 , Jan 13, 2014
              • 0 Attachment
                Hi Jan,

                Have you run across the WARC format yet [1]? It was built for serializing representations of resources for the Web archiving domain (Internet Archive, etc) but it seems like it might have some relevance for your use case? Basically a WARC is a concatenation of HTTP responses, but you can also layer in the requests that generated them, DNS lookups, etc. Each WARC record has an id, which amounts to a manifest, and you have the ability to layer in arbitrary metadata if necessary.

                WARC is ISO 28500:2009 and ISO make you pay for the spec :-( But implementors generally know you can get the latest draft before it went to ISO for free from the Bibliothèque nationale de France who (along with a lot of other national libraries) also use it for the Web archiving efforts [3]. ArchiveTeam also have a decent list of software packages that support WARC [4].

                The ResourceSync effort might also be of interest, in particular their idea of a Resource Dump [5] — although I believe work on ResourceSync is ongoing, and may be in flux. Last time I looked ResourceSync added some extensions to Google Sitemaps that let you point at a file in a ZIP archive, and list its media type, byte length, and hash … which sounds a bit like what you might want out of a manifest?

                I’d be interested to hear what you come up with, whether you use either of these options or not.

                //Ed

                [1] https://en.wikipedia.org/wiki/Web_ARChive
                [2] http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717
                [3] http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
                [4] http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
                [5] http://www.openarchives.org/rs/0.9.1/resourcesync#ResourceDump


                On Jan 10, 2014, at 5:35 PM, Jan Algermissen <jan.algermissen@...> wrote:

                > Hi,
                >
                > I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.
                >
                > One option would be to just use a zip-based format and an manifest file with a well known name.
                >
                > The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.
                >
                > Solution would be to unpack to disk first and go from there. Not nice.
                >
                > A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.
                >
                > Nice things about that:
                > - Ordering is guaranteed
                > - Full support for per-part MIME headers
                > - Content-Length enables fast splitting of the parts
                > - cid: URIs make for natural, standard URI-references inside the file
                > - stream processing without temporary storage
                >
                > I am interested in reactions to the two alternatives or any ideas beyond that.
                >
                > Jan
                >
                > [1] Well, obviously their entities at some point in time
                >
                > ------------------------------------
                >
                > Yahoo Groups Links
                >
                >
                >
              • Jørn Wildt
                Hmmm, and what about the TAR format? http://www.fileformat.info/format/tar/corion.htm /J°rn ... Hmmm, and what about the TAR format?
                Message 7 of 7 , Jan 13, 2014
                • 0 Attachment
                  Hmmm, and what about the TAR format? http://www.fileformat.info/format/tar/corion.htm

                  /Jørn


                  On Mon, Jan 13, 2014 at 11:19 AM, Edward Summers <ehs@...> wrote:
                  Hi Jan,

                  Have you run across the WARC format yet [1]? It was built for serializing representations of resources for the Web archiving domain (Internet Archive, etc) but it seems like it might have some relevance for your use case? Basically a WARC is a concatenation of HTTP responses, but you can also layer in the requests that generated them, DNS lookups, etc. Each WARC record has an id, which amounts to a manifest, and you have the ability to layer in arbitrary metadata if necessary.

                  WARC is ISO 28500:2009 and ISO make you pay for the spec :-( But implementors generally know you can get the latest draft before it went to ISO for free from the Bibliothèque nationale de France who (along with a lot of other national libraries) also use it for the Web archiving efforts [3]. ArchiveTeam also have a decent list of software packages that support WARC [4].

                  The ResourceSync effort might also be of interest, in particular their idea of a Resource Dump [5] — although I believe work on ResourceSync is ongoing, and may be in flux. Last time I looked ResourceSync added some extensions to Google Sitemaps that let you point at a file in a ZIP archive, and list its media type, byte length, and hash … which sounds a bit like what you might want out of a manifest?

                  I’d be interested to hear what you come up with, whether you use either of these options or not.

                  //Ed

                  [1] https://en.wikipedia.org/wiki/Web_ARChive
                  [2] http://www.iso.org/iso/catalogue_detail.htm?csnumber=44717
                  [3] http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
                  [4] http://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
                  [5] http://www.openarchives.org/rs/0.9.1/resourcesync#ResourceDump


                  On Jan 10, 2014, at 5:35 PM, Jan Algermissen <jan.algermissen@...> wrote:

                  > Hi,
                  >
                  > I am thinking about a media type for bundling together a bunch of resources[1] into a single file. With these files I want to store a manifest file.
                  >
                  > One option would be to just use a zip-based format and an manifest file with a well known name.
                  >
                  > The problem with this is that useful stream processing of such a file can only be done by ensuring that the manifest is the first entry when unzipping. Apparently it requires some stunts to control the ordering of the zip entries and who knows whether the other end uses a compatible implementation.
                  >
                  > Solution would be to unpack to disk first and go from there. Not nice.
                  >
                  > A possible alternative would be to use a multipart format where I can simply require the manifest to be the first part. Then just zip that file or rely on transfer encoding to reduce the bytes on the wire.
                  >
                  > Nice things about that:
                  > - Ordering is guaranteed
                  > - Full support for per-part MIME headers
                  > - Content-Length enables fast splitting of the parts
                  > - cid: URIs make for natural, standard URI-references inside the file
                  > - stream processing without temporary storage
                  >
                  > I am interested in reactions to the two alternatives or any ideas beyond that.
                  >
                  > Jan
                  >
                  > [1] Well, obviously their entities at some point in time
                  >
                  > ------------------------------------
                  >
                  > Yahoo Groups Links
                  >
                  >
                  >



                  ------------------------------------

                  Yahoo Groups Links

                  <*> To visit your group on the web, go to:
                      http://groups.yahoo.com/group/rest-discuss/

                  <*> Your email settings:
                      Individual Email | Traditional

                  <*> To change settings online go to:
                      http://groups.yahoo.com/group/rest-discuss/join
                      (Yahoo! ID required)

                  <*> To change settings via email:
                      rest-discuss-digest@yahoogroups.com
                      rest-discuss-fullfeatured@yahoogroups.com

                  <*> To unsubscribe from this group, send an email to:
                      rest-discuss-unsubscribe@yahoogroups.com

                  <*> Your use of Yahoo Groups is subject to:
                      http://info.yahoo.com/legal/us/yahoo/utos/terms/


                Your message has been successfully submitted and would be delivered to recipients shortly.