Loading ...
Sorry, an error occurred while loading the content.

Re: [newsml] Re: NewsML MIME type

Expand Messages
  • Rob Warner
    This is a different case than data formats that are based on plain text. In the case of NewsML, the fact that it s based on XML means the content has a
    Message 1 of 13 , Sep 4, 2003
    • 0 Attachment
      This is a different case than data formats that are based on plain text.

      In the case of NewsML, the fact that it's based on XML means the content
      has a well-defined structure which, in an of itself, has utility,
      separate from the actual meaning of the data.

      This fact allows one to use powerful technologies such as XPointer &
      XLink to refer to elements within arbitrary XML documents, without
      having to understand the actual grammar in use. It also allows search
      engines to do much more flexible things with the data than if it were
      treated as either an opaque chunk of binary, or just plain text.

      Of course it's true that the type can be determined by inspection, but
      in an HTTP context that would require retrieving the whole document, and
      we all know that NewsML is not a terse grammar, so that would be an
      expensive operation.

      If we have a suitably expressive MIME type, full body retrieval wouldn't
      be necessary - the headers can be retrieved separately, potentially
      saving a huge amount of extra bandwidth & processing.

      Basically, if we don't have a MIME type ending in +xml we're effectively
      saying that XPointer, XLink & any other technology that exploits the
      fact that a document is XML cannot be used efficiently on NewsML,
      especially over slow links. That may limit the adoption of NewsML in
      certain circumstances (some developing countries for instance). Read §7
      of http://www.ietf.org/rfc/rfc3023.txt for more good reasons we should
      seriously consider assigning a new type.

      Quite apart from the technical arguments, there's the "everyone else is
      doing it" factor - a quick search picked up the following types that are
      already assigned:

      application/beep+xml
      application/cnrp+xml
      application/cpl+xml
      application/reginfo+xml
      application/vnd.criticaltools.wbs+xml
      application/vnd.irepository.package+xml
      application/vnd.liberty-request+xml
      application/vnd.llamagraphics.life-balance.exchange+xml
      application/vnd.mozilla.xul+xml
      application/vnd.pwg-xhtml-print+xml
      application/watcherinfo+xml
      application/xhtml+xml

      Some other, as-yet unregistered ones include image/svg+xml,
      application/sbml+xml, application/xenc+xml and application/mathml+xml.

      Clearly all these groups see value in identifying their content
      additionally as XML, not just some opaque datatype that can only be
      understood by special tools. I feel strongly that it makes sense for
      NewsML to follow suit.

      cheers,

      Rob

      On Wed, 2003-08-13 at 12:15, Jayson Lorenzen wrote:
      > Michael, thank you for responding. I am sorry I missed your previous
      > posting somehow and was under the impression the use of a new mimetype
      > had already been decided.
      > I agree that PDF, Postscript and RTF are plain text formats with an
      > internal
      > structure, and these definatly need a mimetype, as opening a
      > Postscript
      > file as Text and expecting to programically figure out that it is
      > infact
      > Postcript could be difficult. However, NewsML is XML, and the root
      > element is going to say so.
      > Anyway thanks for listening and I am glad your are considering all
      > options.
      >
      > Jayson Lorenzen
      > Business Wire
      >
      > >>> mdirector@... 08/13/03 01:39AM >>>
      > Jayson,
      >
      > as I pointed out in a previous posting the problem with MIME types
      > is that they are not only related to the "raw" format of data. PDF,
      > Postscript or RTF are in fact plain text formats with an internal
      > structure usually even asserting the type of structure in the
      > headline of the "text". In spite of this all these formats have
      > their specific MIME type.
      >
      > But as I already posted: we at IPTC will consider this.
      >
      > Michael
      > MD IPTC
      >
      >
      > --- In newsml@yahoogroups.com, Jayson Lorenzen
      > <jayson_lorenzen@y...> wrote:
      > > I think I agree with Bill here. If it is XML then why
      > > not call it XML, and if the user needs to know if its
      > > NewsML or not, cant they look at the root element? It
      > > is <NewsML> no?
      > >
      > > j
      > > --- Bill Kearney <ml_yahoo@i...> wrote:
      > > > > It seems the IPTC has already registered
      > > > text/vnd.IPTC.NewsML:
      > > > > http://www.iana.org/assignments/media-types/text/
      > > > (along with
      > > > > text/vnd.IPTC.NITF).
      > > > >
      > > > > This has all the disadvantages of not being
      > > > identified as XML, as
      > > > listed
      > > > > in §7 of RFC 2032
      > > > http://www.ietf.org/rfc/rfc3023.txt.
      > > > >
      > > > > Would there be any support for a new registration
      > > > for something like
      > > > > text/NewsML+xml, application/NewsML+xml or even
      > > > > text/vnd.IPTC.NewsML+xml, followed by official
      > > > deprecation of the
      > > > old
      > > > > registered types? I haven't yet read through the
      > > > RFC's to
      > > > determine if
      > > > > those types are actually valid, or which we might
      > > > prefer.
      > > >
      > > > It would seem like those are the best choices to
      > > > use. There's
      > > > nothing about a MIME type's prefix/suffix structure
      > > > that any
      > > > procesing environments know to 'care about'. Using
      > > > hamsandwich/xml
      > > > won't help *anything* understand that the data is in
      > > > XML form. The
      > > > MIME type strings are simply identifiers.
      > > >
      > > > So unless the IPTC registrations of those types are
      > > > for a format of
      > > > data that's /not/ XML then they're perfectly
      > > > appropriate for use
      > > > here. If, however, the format of data intended by
      > > > the registrations
      > > > is NOT intended to be XML (plain text or binary)
      > > > then it stands to
      > > > reason some new set would have to get created.
      > > >
      > > > Also note that using a MIME type requires telling
      > > > the thing serving
      > > > up the data to properly send it using that type.
      > > > Most web serving
      > > > environments and e-mail make it possible to properly
      > > > specify the MIME
      > > > type. Apache uses AddType directives, for example.
      > > > So don't think
      > > > it's only about file extension mapping.
      > > >
      > > > -Bill Kearney
      > > > Syndic8.com
      > > >
      > > >
      >
      > Yahoo! Groups Sponsor
      > ADVERTISEMENT
      > click here
      >
      > To Post a message, send it to: newsml@...
      >
      > To Unsubscribe, send a blank message to:
      > newsml-unsubscribe@...
      >
      > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
    • Misha Wolf
      I find Rob s arguments convincing. Misha Wolf Standards Manager Content Architecture Group Telephone +44 20 7542 6722 Mobile +44 7990 56
      Message 2 of 13 , Sep 5, 2003
      • 0 Attachment
        I find Rob's arguments convincing.

        Misha Wolf
        Standards Manager
        Content Architecture Group

        Telephone +44 20 7542 6722
        Mobile +44 7990 56 6722
        Email misha.wolf@...
        Reuters Messaging misha.wolf.reuters.com@...


        -----Original Message-----
        From: Rob Warner [mailto:warnerrob@...]
        Sent: 05 September 2003 02:21
        To: newsml@yahoogroups.com
        Subject: Re: [newsml] Re: NewsML MIME type


        This is a different case than data formats that are based on plain text.

        In the case of NewsML, the fact that it's based on XML means the content
        has a well-defined structure which, in an of itself, has utility,
        separate from the actual meaning of the data.

        This fact allows one to use powerful technologies such as XPointer &
        XLink to refer to elements within arbitrary XML documents, without
        having to understand the actual grammar in use. It also allows search
        engines to do much more flexible things with the data than if it were
        treated as either an opaque chunk of binary, or just plain text.

        Of course it's true that the type can be determined by inspection, but
        in an HTTP context that would require retrieving the whole document, and
        we all know that NewsML is not a terse grammar, so that would be an
        expensive operation.

        If we have a suitably expressive MIME type, full body retrieval wouldn't
        be necessary - the headers can be retrieved separately, potentially
        saving a huge amount of extra bandwidth & processing.

        Basically, if we don't have a MIME type ending in +xml we're effectively
        saying that XPointer, XLink & any other technology that exploits the
        fact that a document is XML cannot be used efficiently on NewsML,
        especially over slow links. That may limit the adoption of NewsML in
        certain circumstances (some developing countries for instance). Read §7
        of http://www.ietf.org/rfc/rfc3023.txt for more good reasons we should
        seriously consider assigning a new type.

        Quite apart from the technical arguments, there's the "everyone else is
        doing it" factor - a quick search picked up the following types that are
        already assigned:

        application/beep+xml
        application/cnrp+xml
        application/cpl+xml
        application/reginfo+xml
        application/vnd.criticaltools.wbs+xml
        application/vnd.irepository.package+xml
        application/vnd.liberty-request+xml
        application/vnd.llamagraphics.life-balance.exchange+xml
        application/vnd.mozilla.xul+xml
        application/vnd.pwg-xhtml-print+xml
        application/watcherinfo+xml
        application/xhtml+xml

        Some other, as-yet unregistered ones include image/svg+xml,
        application/sbml+xml, application/xenc+xml and application/mathml+xml.

        Clearly all these groups see value in identifying their content
        additionally as XML, not just some opaque datatype that can only be
        understood by special tools. I feel strongly that it makes sense for
        NewsML to follow suit.

        cheers,

        Rob

        On Wed, 2003-08-13 at 12:15, Jayson Lorenzen wrote:
        > Michael, thank you for responding. I am sorry I missed your previous
        > posting somehow and was under the impression the use of a new mimetype
        > had already been decided.
        > I agree that PDF, Postscript and RTF are plain text formats with an
        > internal
        > structure, and these definatly need a mimetype, as opening a
        > Postscript
        > file as Text and expecting to programically figure out that it is
        > infact
        > Postcript could be difficult. However, NewsML is XML, and the root
        > element is going to say so.
        > Anyway thanks for listening and I am glad your are considering all
        > options.
        >
        > Jayson Lorenzen
        > Business Wire
        >
        > >>> mdirector@... 08/13/03 01:39AM >>>
        > Jayson,
        >
        > as I pointed out in a previous posting the problem with MIME types
        > is that they are not only related to the "raw" format of data. PDF,
        > Postscript or RTF are in fact plain text formats with an internal
        > structure usually even asserting the type of structure in the
        > headline of the "text". In spite of this all these formats have
        > their specific MIME type.
        >
        > But as I already posted: we at IPTC will consider this.
        >
        > Michael
        > MD IPTC
        >
        >
        > --- In newsml@yahoogroups.com, Jayson Lorenzen
        > <jayson_lorenzen@y...> wrote:
        > > I think I agree with Bill here. If it is XML then why
        > > not call it XML, and if the user needs to know if its
        > > NewsML or not, cant they look at the root element? It
        > > is <NewsML> no?
        > >
        > > j
        > > --- Bill Kearney <ml_yahoo@i...> wrote:
        > > > > It seems the IPTC has already registered
        > > > text/vnd.IPTC.NewsML:
        > > > > http://www.iana.org/assignments/media-types/text/
        > > > (along with
        > > > > text/vnd.IPTC.NITF).
        > > > >
        > > > > This has all the disadvantages of not being
        > > > identified as XML, as
        > > > listed
        > > > > in §7 of RFC 2032
        > > > http://www.ietf.org/rfc/rfc3023.txt.
        > > > >
        > > > > Would there be any support for a new registration
        > > > for something like
        > > > > text/NewsML+xml, application/NewsML+xml or even
        > > > > text/vnd.IPTC.NewsML+xml, followed by official
        > > > deprecation of the
        > > > old
        > > > > registered types? I haven't yet read through the
        > > > RFC's to
        > > > determine if
        > > > > those types are actually valid, or which we might
        > > > prefer.
        > > >
        > > > It would seem like those are the best choices to
        > > > use. There's
        > > > nothing about a MIME type's prefix/suffix structure
        > > > that any
        > > > procesing environments know to 'care about'. Using
        > > > hamsandwich/xml
        > > > won't help *anything* understand that the data is in
        > > > XML form. The
        > > > MIME type strings are simply identifiers.
        > > >
        > > > So unless the IPTC registrations of those types are
        > > > for a format of
        > > > data that's /not/ XML then they're perfectly
        > > > appropriate for use
        > > > here. If, however, the format of data intended by
        > > > the registrations
        > > > is NOT intended to be XML (plain text or binary)
        > > > then it stands to
        > > > reason some new set would have to get created.
        > > >
        > > > Also note that using a MIME type requires telling
        > > > the thing serving
        > > > up the data to properly send it using that type.
        > > > Most web serving
        > > > environments and e-mail make it possible to properly
        > > > specify the MIME
        > > > type. Apache uses AddType directives, for example.
        > > > So don't think
        > > > it's only about file extension mapping.
        > > >
        > > > -Bill Kearney
        > > > Syndic8.com
        > > >
        > > >
        >
        > Yahoo! Groups Sponsor
        > ADVERTISEMENT
        > click here
        >
        > To Post a message, send it to: newsml@...
        >
        > To Unsubscribe, send a blank message to:
        > newsml-unsubscribe@...
        >
        > Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.



        To Post a message, send it to: newsml@...

        To Unsubscribe, send a blank message to: newsml-unsubscribe@...

        Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



        --------------------------------------------------------------- -
        Visit our Internet site at http://www.reuters.com

        Get closer to the financial markets with Reuters Messaging - for more
        information and to register, visit http://www.reuters.com/messaging

        Any views expressed in this message are those of the individual
        sender, except where the sender specifically states them to be
        the views of Reuters Ltd.
      Your message has been successfully submitted and would be delivered to recipients shortly.