Loading ...
Sorry, an error occurred while loading the content.
 

Ennui of updatePeriod 'monthly'

Expand Messages
  • Sean M. Burke
    ... OK, I ve fretted and puzzled over this for a couple hours, and I ve got existential dread toward updatePeriod=yearly and especially updatePeriod=monthly,
    Message 1 of 10 , Jan 12, 2004
      At 02:23 PM 2004-01-09, kelliottmccrea wrote:
      >[...]I wrote my notes down at:
      >http://laughingmeme.org/archives/000392.html [...]
      >http://groups.yahoo.com/group/rss-dev/message/6009 [...]

      OK, I've fretted and puzzled over this for a couple hours, and I've got
      existential dread toward updatePeriod=yearly and especially
      updatePeriod=monthly, and everything about them.

      My biggest worry is that updatePeriod=yearly and updatePeriod=monthly are
      unavoidably vague concepts, in that reasonable individuals could arrive at
      different interpretations of exactly what this could mean:

      <sy:updatePeriod>monthly</sy:updatePeriod>
      <sy:updateFrequency>1</sy:updateFrequency>
      <sy:updateBase>2000-01-10</sy:updateBase>

      I could interpret this to mean "publishes on/by the tenth of every month"
      (at/by midnight GMT). (I will tidily dodge the horrible issues of
      timezones in this post.)

      Or I could interpret this to mean "publishes at/by the start of every
      SecsInMonth interval, starting from 2000-01-10T00:00Z", where SecsInMonth
      is 24*60*60 times uh... how many days?

      A constant 30 days-to-a-month?
      A constant 30.5? ( as given in Ian's post, above)
      A constant 31?
      A constant 365/12 ? (which is 30.416666...)
      A constant 365.25/12 ?
      The number of days in the current month?
      The number of days in the month when the RSS was last polled?
      Does it make a big difference?

      I'll leave aside computation with the last four possibilities and just
      consider the first four.

      The program attached below does all the math, but here are the variations
      for "when does the next publishing-interval start?" (i.e., "when's the next
      possible moment I can poll and expect potentially new content?"):

      Mon Jan 19 00:00:00 2004 given 30.0 days in a month
      Tue Jan 13 00:00:00 2004 given 30.5 days in a month
      Fri Feb 6 00:00:00 2004 given 31.0 days in a month
      Sun Feb 8 10:00:00 2004 given 30.4166... days in a month

      And mind you, that's just with an updatebase that's relatively
      recent. Push it back to 1970 (as is common, I think), i.e., 1970-01-10Z,
      and it spreads out even more:
      Wed Feb 11 00:00:00 2004
      Thu Feb 5 00:00:00 2004
      Fri Jan 23 00:00:00 2004
      Sun Feb 1 10:00:00 2004

      This is so vague as to be useless. And mind you, I'm not using some
      bizarre degenerate case here -- just a plain "once a month" update schedule.

      What should we do? As an implementor, I'm quite tempted to just give up
      interpreting all cases of N*monthly or N*yearly (for at least small values
      of N) and just pretend they are "once weekly".

      Have I stumbled on a reductio ad absurdum of updatePeriod=monthly?


      ~~~
      This is the Perl program that does the figuring I use above:

      use strict;

      my $days_in_month = (365/12); # The part I change
      my $m2s = int($days_in_month * 24 * 60 * 60);

      print "Given $days_in_month days in month\n",
      " So a month interval is exactly $m2s seconds.\n";

      my $base = 947_462_400;
      print "Base: $base = " . gmtime($base) . "\n";
      my $now = time();
      print "Now: $now = " . gmtime($now),
      " (= ", $now-$base, " s since base)\n";

      my $start_of_current_interval
      = int( ($now-$base) / $m2s) * $m2s + $base;
      explain("Current interval starts: ", $start_of_current_interval);

      my $start_of_next_interval = $m2s + $start_of_current_interval;
      explain("Next interval starts: ", $start_of_next_interval);
      print "\n\n";

      sub explain {
      print shift(@_);
      local $_ = $_[0];
      my $pnum = ($_-$base) / $m2s;
      print "\n" . gmtime($_),
      " = $_ s\n = $pnum intervals since updatebase\n",
      " (scalar gmtime($pnum * $days_in_month * 24*60*60 + $base))\n";
      return;
      }

      __END__
      Output:

      Given 30 days in month
      So a month interval is exactly 2592000 seconds.
      Base: 947462400 = Mon Jan 10 00:00:00 2000
      Now: 1073950948 = Mon Jan 12 23:42:28 2004 (= 126488548 s since base)
      Current interval starts:
      Sat Dec 20 00:00:00 2003 = 1071878400 s
      = 48 intervals since updatebase
      (scalar gmtime(48 * 30 * 24*60*60 + 947462400))
      Next interval starts:
      Mon Jan 19 00:00:00 2004 = 1074470400 s
      = 49 intervals since updatebase
      (scalar gmtime(49 * 30 * 24*60*60 + 947462400))

      Given 30.5 days in month
      So a month interval is exactly 2635200 seconds.
      Base: 947462400 = Mon Jan 10 00:00:00 2000
      Now: 1073950913 = Mon Jan 12 23:41:53 2004 (= 126488513 s since base)
      Current interval starts:
      Sat Dec 13 12:00:00 2003 = 1071316800 s
      = 47 intervals since updatebase
      (scalar gmtime(47 * 30.5 * 24*60*60 + 947462400))
      Next interval starts:
      Tue Jan 13 00:00:00 2004 = 1073952000 s
      = 48 intervals since updatebase
      (scalar gmtime(48 * 30.5 * 24*60*60 + 947462400))

      Given 31 days in month
      So a month interval is exactly 2678400 seconds.
      Base: 947462400 = Mon Jan 10 00:00:00 2000
      Now: 1073950970 = Mon Jan 12 23:42:50 2004 (= 126488570 s since base)
      Current interval starts:
      Tue Jan 6 00:00:00 2004 = 1073347200 s
      = 47 intervals since updatebase
      (scalar gmtime(47 * 31 * 24*60*60 + 947462400))
      Next interval starts:
      Fri Feb 6 00:00:00 2004 = 1076025600 s
      = 48 intervals since updatebase
      (scalar gmtime(48 * 31 * 24*60*60 + 947462400))

      Given 30.4166666666667 days in month
      So a month interval is exactly 2628000 seconds.
      Base: 947462400 = Mon Jan 10 00:00:00 2000
      Now: 1073951210 = Mon Jan 12 23:46:50 2004 (= 126488810 s since base)
      Current interval starts:
      Fri Jan 9 00:00:00 2004 = 1073606400 s
      = 48 intervals since updatebase
      (scalar gmtime(48 * 30.4166666666667 * 24*60*60 + 947462400))
      Next interval starts:
      Sun Feb 8 10:00:00 2004 = 1076234400 s
      = 49 intervals since updatebase
      (scalar gmtime(49 * 30.4166666666667 * 24*60*60 + 947462400))


      --
      Sean M. Burke http://search.cpan.org/~sburke/
    • Ian Davis
      ... Why give up? These elements are a hint as to when you can reasonably expect to fetch the content. Could everyone who has an aggregator that wakes up
      Message 2 of 10 , Jan 14, 2004
        On Mon, 12 Jan 2004 15:17:22 -0900, Sean M. Burke <sburke@...> wrote:

        > This is so vague as to be useless. And mind you, I'm not using some
        > bizarre degenerate case here -- just a plain "once a month" update
        > schedule.
        >
        > What should we do? As an implementor, I'm quite tempted to just give up
        > interpreting all cases of N*monthly or N*yearly (for at least small
        > values
        > of N) and just pretend they are "once weekly".

        Why give up? These elements are a hint as to when you can reasonably
        expect to fetch the content. Could everyone who has an aggregator that
        wakes up 2505600 seconds from now and fetches my feed please raise their
        hand now. Mine doesn't. In fact sometimes it can be a whole 3600 seconds
        out either way. Sometimes even my monthly print magazines come out 29 days
        or even 32 days after the previous one but on average I tend to get 12 a
        year...

        Here's an alternative algorithm that would make it more accurate, for
        those that need it:

        Find the number of integral "updatePeriod" time intervals between now and
        "updateBase". Add that number of "updatePeriod" time intervals to
        "updateBase". Find the number of seconds in an "updatePeriod" time
        interval. Divide that by the "updateFrequency" and add the result to the
        date calculated in the previous step. That's when you could fetch the feed
        next.

        I'll leave it up to the implementor to decide whether to fetch the feed ar
        1am, 2am, 3:15am, 4:56am or even the next day.

        Ian
      • Sean M. Burke
        ... OK, how about we define a month as being 28 days? I d be happy with that. That way, if it would avoid the problem case of February (esp. non-leapyear
        Message 3 of 10 , Jan 14, 2004
          At 05:03 AM 2004-01-14, Ian Davis wrote:
          >Sometimes even my monthly print magazines come out 29 days
          >or even 32 days after the previous one but on average I tend to get 12 a
          >year...

          OK, how about we define a month as being 28 days? I'd be happy with
          that. That way, if it would avoid the problem case of February (esp.
          non-leapyear ones).

          That is, to use round numbers for just sake of clarity, suppose that we
          have a crontab'd process that generates a feed on midnight of the first of
          every month (via a "0 0 1 * * make_that_feed" line, say), then if we assume
          a month is 30 days, then a client that happens to check toward the end of
          January (getting the January content) might think that there wouldn't be
          any point in checking until the beginning of March, totally missing the
          February content. But if we assume a month-period to be always 28
          days-worth-of-seconds, then this problem never arises.

          Of course, this is a corner case, but it happens. RFC 1925 and all that.

          --
          Sean M. Burke http://search.cpan.org/~sburke/
        • Sean M. Burke
          ... OK, I cooked up a draft implementation plus tests. I ve not uploaded it to CPAN yet because I d like feedback from folks on the list first. You can
          Message 4 of 10 , Jan 15, 2004
            At 02:23 PM 2004-01-09, kelliottmccrea wrote:
            >I did a question and answer with Ian a while back trying to
            >undestanding mod_syndication. His answers were pretty clear, I wasn't
            >thrilled with them, but I wrote my notes down at:
            >http://laughingmeme.org/archives/000392.html
            >If those don't make sense, let me know, and I can expand on them.

            OK, I cooked up a draft implementation plus tests. I've not uploaded it to
            CPAN yet because I'd like feedback from folks on the list first.
            You can download the dist of it here:
            http://interglacial.com/temp/XML-RSS-Timing-1.01.tar.gz
            Or can browse it here:
            http://interglacial.com/temp/XML-RSS-Timing-101/

            I would very much appreciate if folks could look at it in the next week or
            so. One of the LiveJournal developers has already told me he wants to
            deploy it as soon as it's ready.


            Long story short: it's lots of sanity-checking code, then code to implement
            skipHours and skipDays, but the sy:update* code is mostly just a wrapper
            around this:

            use constant HOUR_SEC => 60 * 60;
            use constant DAY_SEC => 60 * 60 * 24;
            use constant WEEK_SEC => 60 * 60 * 24 * 7;
            use constant MONTH_SEC => 60 * 60 * 24 * 7 * 28;
            use constant YEAR_SEC => 60 * 60 * 24 * 7 * 365;
            [...
            then some code to copy those into updatePeriod_sec as appropriate
            ...]
            $interval = int(
            ($updatePeriod_sec || 0)
            / ($updateFrequency || 1)
            );
            # So if we update 5 times daily, $interval is (DAY_SEC / 5) seconds

            return $lastPolled unless $interval; # sanity-check
            $base = $updateBase_sec || 0;
            $start_of_current_interval
            = int( ($lastPolled - $base) / $interval) * $interval + $base;
            $new_content_after= $start_of_current_interval + $interval;


            --
            Sean M. Burke http://search.cpan.org/~sburke/
          • Sean M. Burke
            It occured to me that neither in http://interglacial.com/temp/XML-RSS-Timing-101/ nor in http://laughingmeme.org/archives/000392.html is there any discussion
            Message 5 of 10 , Jan 18, 2004
              It occured to me that neither in
              http://interglacial.com/temp/XML-RSS-Timing-101/ nor in
              http://laughingmeme.org/archives/000392.html is there any discussion of the
              ttl element. So, after a bit of thinking (usually a useful prelude to
              action!), I added an implementation of ttl to the XML::RSS::Timing module
              I've been working on. Basically, if it sees no update* values, it falls
              back on a ttl value for guidance on when next to poll the feed.

              And as I was writing tests for it (usually a useful postlude to action!), I
              uncovered what I hope were the only bugs in the module: I defined the
              constants MONTH_SEC and YEAR_SEC to seven times their correct value. Oops.

              So I killed the old version's URLs mentioned in my earlier post. Here are
              the new URLs to the new and improved versions:
              http://interglacial.com/temp/XML-RSS-Timing-102/lib/XML/RSS/Timing.pm
              http://interglacial.com/temp/XML-RSS-Timing-1.02.tar.gz

              I welcome comments, thanks in advance, duty now for the future, &ct &ct, on
              this module which I hope will be a model implementation of how RSS/RDF
              readers should implement skip*, sy:update*, and ttl.

              --
              Sean M. Burke http://search.cpan.org/~sburke/
            Your message has been successfully submitted and would be delivered to recipients shortly.