Loading ...
Sorry, an error occurred while loading the content.

Re: NewsIsFree.com changes.xml file?

Expand Messages
  • mwkrus
    Hi there, ... I m here ;-) See: http://www.newsisfree.com/sources/changes/ for an HTML rendering and: http://www.newsisfree.com/HPE/xml/changes.xml for the XML
    Message 1 of 24 , Dec 19, 2001
    • 0 Attachment
      Hi there,

      --- In syndic8@y..., Tara Calishain <calumet@m...> wrote:
      > Is Mike Krus out there? I understand from Daypop that he's doing a
      > changes.xml file which Daypop is using to grab new headline files.
      > Anyone know where I can get it, or am I off base?
      I'm here ;-)


      See:
      http://www.newsisfree.com/sources/changes/ for an HTML rendering
      and:
      http://www.newsisfree.com/HPE/xml/changes.xml for the XML
      (no RSS, will do if people want it)

      It's a little different from the weblogs.com XML file (hence the
      different version number). Each entry has three extra fields:
      - id: the id of the feed on NewsIsFree
      - s8: the id of the feed on Syndic8 (0 if no match is recorded in
      our db)
      - decay: the date of the olddest item for that feed in our database.
      It's do be computed in the same way as the "when" date (i.e.
      convert file update field to seconds, substract the value of the
      field for the feed) (this isn't clear is it ? ;-). That field is
      used by Daypop to clean it's index when our links die.


      Mike
    • Tara Calishain
      ... Thanks Mike. Google sent me an announcement that they re now offering a news page on their site. I asked them, Why aren t you using the changes.xml feed
      Message 2 of 24 , Dec 19, 2001
      • 0 Attachment
        At 02:05 PM 12/19/2001, you wrote:
        >Hi there,
        >
        >--- In syndic8@y..., Tara Calishain <calumet@m...> wrote:
        > > Is Mike Krus out there? I understand from Daypop that he's doing a
        > > changes.xml file which Daypop is using to grab new headline files.
        > > Anyone know where I can get it, or am I off base?
        >I'm here ;-)
        >
        >
        >See:
        >http://www.newsisfree.com/sources/changes/ for an HTML rendering
        >and:
        >http://www.newsisfree.com/HPE/xml/changes.xml for the XML
        >(no RSS, will do if people want it)
        >
        >It's a little different from the weblogs.com XML file (hence the
        >different version number). Each entry has three extra fields:
        >- id: the id of the feed on NewsIsFree
        >- s8: the id of the feed on Syndic8 (0 if no match is recorded in
        > our db)
        >- decay: the date of the olddest item for that feed in our database.
        > It's do be computed in the same way as the "when" date (i.e.
        > convert file update field to seconds, substract the value of the
        > field for the feed) (this isn't clear is it ? ;-). That field is
        > used by Daypop to clean it's index when our links die.
        >
        >
        >Mike

        Thanks Mike. Google sent me an announcement that they're now
        offering a news page on their site. I asked them, "Why aren't you using
        the changes.xml feed from NewsIsFree, like Daypop is? That way you'd
        be assured of a fresh information listing. "

        They wrote back that they'd never heard of it and would I please
        point them to it. So I'll send them this information and keep you posted.
        I'm really trying to get a major search engine to discover changes.xml
        files (not that Danny isn't a lovely human being because he is.)

        Best,

        Tara

        Tara Calishain / tara@...
        ----
        ResearchBuzz -- Search engine, database,
        and online collection news since 1998!
        http://www.researchbuzz.com
      • Tara Calishain
        ... I want to know why the major news search engines (RocketNews, FAST, Northern Light, Yahoo Daily, Moreover-Via-AltaVista) don t use changes.xml files if
        Message 3 of 24 , Dec 19, 2001
        • 0 Attachment
          At 02:14 PM 12/19/2001, you wrote:
          >Wow.
          >
          >Talk about a mind bomb.
          >
          >I've gotta figure out a way to use this.

          I want to know why the major news search engines
          (RocketNews, FAST, Northern Light, Yahoo Daily,
          Moreover-Via-AltaVista) don't use changes.xml files if
          they're available. Yahoo has an excuse because they
          keep all their content on their sites (which allows for
          deep if not wide searching), and Moreover-via-
          AltaVista does do RSS scraping but they're picky
          about what they accept.

          But what about FAST and RocketNews and Northern
          Light? Hell, Excite could rebuild their news search
          based on RSS files if they wanted to (Excite NewsTracker
          is dead, unfortunately -- news searches on Excite are
          now handled by Dogpile Newscrawler.)

          Tara


          Tara Calishain / tara@...
          ----
          ResearchBuzz -- Search engine, database,
          and online collection news since 1998!
          http://www.researchbuzz.com
        • burton@openprivacy.org
          ... Hash: SHA1 ... Ah... no. I just think it is an accepted convention. AKA a de facto standard. Every major browser seems to support it. ... Nah. Everyone
          Message 4 of 24 , Dec 19, 2001
          • 0 Attachment
            -----BEGIN PGP SIGNED MESSAGE-----
            Hash: SHA1

            Julian Bond <julian_bond@...> writes:

            > In article <87pu5cdkou.fsf@...>, burton@...
            > writes
            > >You know that little icon that says "XML"? That should be a view-source icon.
            > >This is what we are doing for Reptile.
            >
            > This cannot be right, can it? the most common use of this icon at the
            > moment, is right click, copy shortcut followed by paste into my
            > favourite news reader. If you add "view source:" to the front of the
            > url, i'll just have to delete it again. If the xml is served from the
            > source with text/xml and I click on it, then my browser should try and
            > do something useful with it. Like display it.
            >
            > Is "view source:" really in the html4 and xhtml standards?

            Ah... no. I just think it is an accepted convention. AKA a de facto
            standard. Every major browser seems to support it.

            > This is an issue that is becoming more and more frequent. A link (maybe
            > presented with a button) that points at xml instead of http:, mailto:
            > etc. It's an issue that should be dealt with by the standards bodies not
            > individual implementations.

            Nah. Everyone is free to create URIs... just they have to prove that they are
            useful.

            I see your point though but I don't think it matters here. :)

            Kevin

            - --
            Kevin A. Burton ( burton@..., burton@..., burtonator@... )
            Location - San Francisco, CA, Cell - 415.595.9965
            Jabber - burtonator@..., Web - http://relativity.yi.org/

            Give a man a flame and keep him warm for the night. Set him on fire and keep
            him warm for the rest of his life.
            -----BEGIN PGP SIGNATURE-----
            Version: GnuPG v1.0.6 (GNU/Linux)
            Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt

            iD8DBQE8IQRxAwM6xb2dfE0RAqYkAKCNgZykkFJWEUKw2TJ1tdKVtYRHKwCgm2ZO
            oiLXJ9UbrI7ruC7Ul00Tk40=
            =q4eR
            -----END PGP SIGNATURE-----
          • burton@openprivacy.org
            ... Hash: SHA1 ... Yeah. Mozilla *really* needs to do something different for text/xml content :( ... hm. Works fine here. What version of mozilla? I think
            Message 5 of 24 , Dec 19, 2001
            • 0 Attachment
              -----BEGIN PGP SIGNED MESSAGE-----
              Hash: SHA1

              Jeff Barr <jeff@...> writes:

              > The XML tab of the Syndic8 feedinfo page has a link called "Download
              > XML". It is
              > available only to logged-in users (because I don't want to be a
              > redistribution site for
              > the XML).
              >
              > It outputs the proper "text/xml" content-type to get IE to use its special
              > "XML editor" mode.

              Yeah. Mozilla *really* needs to do something different for text/xml content :(

              > I just hacked (and then unhacked) it to send view-source:, hoping that Mozilla
              > would do something cool with it, but nothing happened. So I will leave it the
              > way it started.

              hm. Works fine here. What version of mozilla? I think this worked for me but
              I only tested it on > 0.9.5
              <snip>

              - --
              Kevin A. Burton ( burton@..., burton@..., burtonator@... )
              Location - San Francisco, CA, Cell - 415.595.9965
              Jabber - burtonator@..., Web - http://relativity.yi.org/

              Yes I know my enemies, they're the teachers who taught me to fight me;
              compromise, conformity, assimilation, submission, ignorance, hypocrisy,
              brutality, The Elite. All of which are American Dreams.
              -----BEGIN PGP SIGNATURE-----
              Version: GnuPG v1.0.6 (GNU/Linux)
              Comment: Get my public key at: http://relativity.yi.org/pgpkey.txt

              iD8DBQE8IQTYAwM6xb2dfE0RAv6vAJ99PGySAtY7hH+2SJNUyKn8kQmAUwCfSp3P
              5m+f/r24v9q0btMFUoWfCZw=
              =Wx16
              -----END PGP SIGNATURE-----
            • Julian Bond
              In article , Dave Winer writes ... I ve been pondering the implications here between the centralized
              Message 6 of 24 , Dec 20, 2001
              • 0 Attachment
                In article <163201c188c1$6f8e31a0$33a1dc40@murphy>, Dave Winer
                <dave@...> writes
                >Talk about a mind bomb.
                >I've gotta figure out a way to use this.
                >> From: mwkrus
                >> http://www.newsisfree.com/HPE/xml/changes.xml for the XML

                I've been pondering the implications here between the centralized
                approach of weblogs.com and NIF and the decentralized approach of the
                0.92 cloud element.

                The de-centralized version has the reader subscribe to each feed and
                then get a call back when each feed changes. It then goes and collects
                the new data, or reads it out of the callback. The biggest problem with
                this is that desktop readers are unlikely to have a public IP and hence
                are unlikely to be accessible to the feed source.

                Some time ago I suggested that there ought to be an RSS version of
                weblogs.com. I wonder if Dave misunderstood me when he started producing
                an RSS version of the changes.xml file. What I meant was that when a
                feed changes, it informs the aggregator (say my.userland.com) which then
                publishes a changes.xml file containing the URL of the feed XML. RSS
                readers could then use this to collect the new RSS from feeds which had
                changed as they now know the URL of the changed RSS. This is the
                centralized route with a central hub.

                There's a possible enhancement here that the Ping function that tells
                the central site that a feed has changed could contain the <item> tag(s)
                with the new content. This would save the central site having to collect
                the feed when most of the feed file will not have changed.

                The pattern here is the usual one of avoiding firewall/NAT problems by
                using a relay. And in this case the relay is the central changes.xml
                system.

                Taking this a stage further, secondary aggregators such as Daypop could
                subscribe to the central site and receive a Ping whenever changes.xml
                changed. This again could contain the new changed content. That way we
                avoid another layer of polling.

                --
                Julian Bond email: julian_bond@...
                CV/Resume: http://www.voidstar.com/cv/
                WebLog: http://www.voidstar.com/
                M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
                ICQ:33679568 tag:So many words, so little time
              • Mike Krus
                Hi, ... I don t think the changes.xml file should contain the urls to the rss files. That file is read very often, it should be kept as small as possible. I
                Message 7 of 24 , Dec 20, 2001
                • 0 Attachment
                  Hi,

                  Julian Bond wrote:

                  > What I meant was that when a
                  > feed changes, it informs the aggregator (say my.userland.com) which then
                  > publishes a changes.xml file containing the URL of the feed XML. RSS
                  > readers could then use this to collect the new RSS from feeds which had
                  > changed as they now know the URL of the changed RSS. This is the
                  > centralized route with a central hub.
                  I don't think the changes.xml file should contain the urls to the

                  rss files. That file is read very often, it should be kept as
                  small as possible. I extended it a little to add ids, to make
                  finding the actual rss feed easier. But it's main purpose is
                  not to help finding new links to RSS files.

                  The main problem, IMHO, is the other way around: if I have
                  an RSS file, where do I look for it's update notifications?
                  Some time back, I suggested this:
                  ( http://groups.yahoo.com/group/syndication/message/2642 )

                  Maybe we can extend RSS .9* to include that place:
                  <channel>
                  ...
                  <updates>http://www.weblogs.com/changes.xml</updates>
                  </channel>

                  But didn't get much feedback.

                  I really like the idea of the changes.xml file because it's
                  easy to write, easy to use, and, as you rightly point out,
                  not affected by firewalls and NATs.


                  Mike

                  --
                  NewsIsFree http://www.newsisfree.com/
                  We serve 20000+ news feeds a day, for free. Please support us!
                  http://www.paypal.com/xclick/business=donations%40newsisfree.com
                • wkearney99
                  ... Shouldn t a client only register with a cloud for callbacks if it can receive them? Shouldn t a firewalled machine avoid using a cloud since it can t
                  Message 8 of 24 , Dec 20, 2001
                  • 0 Attachment
                    > The de-centralized version has the reader subscribe to each feed and
                    > then get a call back when each feed changes. It then goes and
                    > collects the new data, or reads it out of the callback. The biggest
                    > problem with this is that desktop readers are unlikely to have a
                    > public IP and hence are unlikely to be accessible to the feed
                    > source.

                    Shouldn't a client only register with a cloud for callbacks if it can
                    receive them? Shouldn't a firewalled machine avoid using a cloud
                    since it can't accept callbacks?

                    And shouldn't that same client be using an internal HTTP proxy?
                    This, of course, suggests that the feed creators take proxy caching
                    into account (use real values not just cache avoidance)

                    > There's a possible enhancement here that the Ping function that
                    > tells the central site that a feed has changed could contain the
                    > <item> tag(s) with the new content. This would save the central
                    > site having to collect the feed when most of the feed file will
                    > not have changed.

                    This would assume several things. One that the items themselves can
                    be requested separate from the entire feed. The other is that the
                    CPU overhead in parsing this would be less of a burden than dragging
                    the "entire" RSS file across the wire again. I have to think the CPU
                    overhead is worse. Grabbing just the changed items would require the
                    client to calculate where to start. Then the server would have to be
                    able to collect and deliver just those items. This places a
                    significant amount of new CPU processing on the part of the feed
                    server.

                    It *would* be helpful if the items themselves had some sort of
                    identifying information in them. An ID and/or a timestamp would be a
                    good start. The current methods of duplicate checking are less than
                    perfect.

                    -Bill Kearney
                  • Julian Bond
                    In article , wkearney99 writes ... Of course. But this situation makes callbacks impossible, hence the need
                    Message 9 of 24 , Dec 20, 2001
                    • 0 Attachment
                      In article <9vski1+v236@...>, wkearney99
                      <wkearney99@...> writes
                      >Shouldn't a client only register with a cloud for callbacks if it can
                      >receive them? Shouldn't a firewalled machine avoid using a cloud
                      >since it can't accept callbacks?

                      Of course. But this situation makes callbacks impossible, hence the need
                      for a central relay of some sort.

                      >And shouldn't that same client be using an internal HTTP proxy?
                      >This, of course, suggests that the feed creators take proxy caching
                      >into account (use real values not just cache avoidance)

                      As a feed creator, I ought to be saving the feed as a local xml file.
                      Then it would automatically be cached along the way via normal http. But
                      I haven't coded that bit yet so it's being created out of the database
                      each time. I'm not alone in this. More work, sigh.

                      >This would assume several things. One that the items themselves can
                      >be requested separate from the entire feed. The other is that the
                      >CPU overhead in parsing this would be less of a burden than dragging
                      >the "entire" RSS file across the wire again.

                      There may be a simplification here like only passing back the latest
                      entry. But then you're likely to miss updates occasionally. Oh well.

                      I still think there's a start here, where there is a changes.xml
                      somewhere that provides the URL of the xml feeds instead of the html
                      representations.

                      --
                      Julian Bond email: julian_bond@...
                      CV/Resume: http://www.voidstar.com/cv/
                      WebLog: http://www.voidstar.com/
                      M: +44 (0)77 5907 2173 T: +44 (0)192 0412 433
                      ICQ:33679568 tag:So many words, so little time
                    • wkearney99
                      ... Right, on this we agree. How the clarity of this has escaped others mystifies me. -Bill Kearney
                      Message 10 of 24 , Dec 20, 2001
                      • 0 Attachment
                        > I still think there's a start here, where there is a changes.xml
                        > somewhere that provides the URL of the xml feeds instead of the html
                        > representations.

                        Right, on this we agree. How the clarity of this has escaped others
                        mystifies me.

                        -Bill Kearney
                      • Mike Krus
                        Hi, ... I don t! changes.xml is NOT a place to discover new RSS files. It can be used outside the context of RSS. I think syndic8 is the perfect tool to find
                        Message 11 of 24 , Dec 20, 2001
                        • 0 Attachment
                          Hi,

                          wkearney99 wrote:

                          >>I still think there's a start here, where there is a changes.xml
                          >>somewhere that provides the URL of the xml feeds instead of the html
                          >>representations.
                          >
                          > Right, on this we agree. How the clarity of this has escaped others
                          > mystifies me.
                          I don't! changes.xml is NOT a place to discover new RSS files. It

                          can be used outside the context of RSS.

                          I think syndic8 is the perfect tool to find the matching RSS
                          file for any given source, just look up the service list.


                          Mike

                          --
                          NewsIsFree http://www.newsisfree.com/
                          We serve 20000+ news feeds a day, for free. Please support us!
                          http://www.paypal.com/xclick/business=donations%40newsisfree.com
                        • Mark Paschal
                          ... In theory, is that not what the weblogs.com RSS community is for? (The operative words being in theory, since that there s nothing there right now.)
                          Message 12 of 24 , Dec 20, 2001
                          • 0 Attachment
                            Julian Bond wrote:
                            > What I meant was that when a feed changes, it informs the aggregator
                            > (say my.userland.com) which then publishes a changes.xml file
                            > containing the URL of the feed XML.

                            In theory, is that not what the weblogs.com RSS "community" is for? (The
                            operative words being "in theory," since that there's nothing there right now.)

                            http://newhome.weblogs.com/discuss/msgReader$44
                            http://newhome.weblogs.com/directory/11/outputFiles/communities/rss


                            Mark Paschal
                            markpasc@...
                          • wkearney99
                            Kevin may deserve some vindication here. That view-source URL handler is actually mentioned: http://www.w3.org/Addressing/schemes.html It appears to be
                            Message 13 of 24 , Dec 26, 2001
                            • 0 Attachment
                              Kevin may deserve some vindication here. That view-source URL
                              handler is actually mentioned:
                              http://www.w3.org/Addressing/schemes.html

                              It appears to be documented at:
                              http://developer.netscape.com/docs/manuals/js/client/jsref/location.ht
                              m#1193181

                              As to whether this is a useful addition to Syndic8 remains open for
                              debate. I'd like to see linksthat allowed opening the site and data
                              URL while in the edit view. I realize this would only contain the
                              links as they existed on page generation. That is unless javascript
                              was used to pick up the live contents of the fields.

                              -Bill Kearney
                            Your message has been successfully submitted and would be delivered to recipients shortly.