Sorry, an error occurred while loading the content.

## Ennui of updatePeriod 'monthly'

Expand Messages
• ... OK, I ve fretted and puzzled over this for a couple hours, and I ve got existential dread toward updatePeriod=yearly and especially updatePeriod=monthly,
Message 1 of 10 , Jan 12, 2004
At 02:23 PM 2004-01-09, kelliottmccrea wrote:
>[...]I wrote my notes down at:
>http://laughingmeme.org/archives/000392.html [...]

OK, I've fretted and puzzled over this for a couple hours, and I've got
existential dread toward updatePeriod=yearly and especially
updatePeriod=monthly, and everything about them.

My biggest worry is that updatePeriod=yearly and updatePeriod=monthly are
unavoidably vague concepts, in that reasonable individuals could arrive at
different interpretations of exactly what this could mean:

<sy:updatePeriod>monthly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-10</sy:updateBase>

I could interpret this to mean "publishes on/by the tenth of every month"
(at/by midnight GMT). (I will tidily dodge the horrible issues of
timezones in this post.)

Or I could interpret this to mean "publishes at/by the start of every
SecsInMonth interval, starting from 2000-01-10T00:00Z", where SecsInMonth
is 24*60*60 times uh... how many days?

A constant 30 days-to-a-month?
A constant 30.5? ( as given in Ian's post, above)
A constant 31?
A constant 365/12 ? (which is 30.416666...)
A constant 365.25/12 ?
The number of days in the current month?
The number of days in the month when the RSS was last polled?
Does it make a big difference?

I'll leave aside computation with the last four possibilities and just
consider the first four.

The program attached below does all the math, but here are the variations
for "when does the next publishing-interval start?" (i.e., "when's the next
possible moment I can poll and expect potentially new content?"):

Mon Jan 19 00:00:00 2004 given 30.0 days in a month
Tue Jan 13 00:00:00 2004 given 30.5 days in a month
Fri Feb 6 00:00:00 2004 given 31.0 days in a month
Sun Feb 8 10:00:00 2004 given 30.4166... days in a month

And mind you, that's just with an updatebase that's relatively
recent. Push it back to 1970 (as is common, I think), i.e., 1970-01-10Z,
and it spreads out even more:
Wed Feb 11 00:00:00 2004
Thu Feb 5 00:00:00 2004
Fri Jan 23 00:00:00 2004
Sun Feb 1 10:00:00 2004

This is so vague as to be useless. And mind you, I'm not using some
bizarre degenerate case here -- just a plain "once a month" update schedule.

What should we do? As an implementor, I'm quite tempted to just give up
interpreting all cases of N*monthly or N*yearly (for at least small values
of N) and just pretend they are "once weekly".

Have I stumbled on a reductio ad absurdum of updatePeriod=monthly?

~~~
This is the Perl program that does the figuring I use above:

use strict;

my \$days_in_month = (365/12); # The part I change
my \$m2s = int(\$days_in_month * 24 * 60 * 60);

print "Given \$days_in_month days in month\n",
" So a month interval is exactly \$m2s seconds.\n";

my \$base = 947_462_400;
print "Base: \$base = " . gmtime(\$base) . "\n";
my \$now = time();
print "Now: \$now = " . gmtime(\$now),
" (= ", \$now-\$base, " s since base)\n";

my \$start_of_current_interval
= int( (\$now-\$base) / \$m2s) * \$m2s + \$base;
explain("Current interval starts: ", \$start_of_current_interval);

my \$start_of_next_interval = \$m2s + \$start_of_current_interval;
explain("Next interval starts: ", \$start_of_next_interval);
print "\n\n";

sub explain {
print shift(@_);
local \$_ = \$_[0];
my \$pnum = (\$_-\$base) / \$m2s;
print "\n" . gmtime(\$_),
" = \$_ s\n = \$pnum intervals since updatebase\n",
" (scalar gmtime(\$pnum * \$days_in_month * 24*60*60 + \$base))\n";
return;
}

__END__
Output:

Given 30 days in month
So a month interval is exactly 2592000 seconds.
Base: 947462400 = Mon Jan 10 00:00:00 2000
Now: 1073950948 = Mon Jan 12 23:42:28 2004 (= 126488548 s since base)
Current interval starts:
Sat Dec 20 00:00:00 2003 = 1071878400 s
= 48 intervals since updatebase
(scalar gmtime(48 * 30 * 24*60*60 + 947462400))
Next interval starts:
Mon Jan 19 00:00:00 2004 = 1074470400 s
= 49 intervals since updatebase
(scalar gmtime(49 * 30 * 24*60*60 + 947462400))

Given 30.5 days in month
So a month interval is exactly 2635200 seconds.
Base: 947462400 = Mon Jan 10 00:00:00 2000
Now: 1073950913 = Mon Jan 12 23:41:53 2004 (= 126488513 s since base)
Current interval starts:
Sat Dec 13 12:00:00 2003 = 1071316800 s
= 47 intervals since updatebase
(scalar gmtime(47 * 30.5 * 24*60*60 + 947462400))
Next interval starts:
Tue Jan 13 00:00:00 2004 = 1073952000 s
= 48 intervals since updatebase
(scalar gmtime(48 * 30.5 * 24*60*60 + 947462400))

Given 31 days in month
So a month interval is exactly 2678400 seconds.
Base: 947462400 = Mon Jan 10 00:00:00 2000
Now: 1073950970 = Mon Jan 12 23:42:50 2004 (= 126488570 s since base)
Current interval starts:
Tue Jan 6 00:00:00 2004 = 1073347200 s
= 47 intervals since updatebase
(scalar gmtime(47 * 31 * 24*60*60 + 947462400))
Next interval starts:
Fri Feb 6 00:00:00 2004 = 1076025600 s
= 48 intervals since updatebase
(scalar gmtime(48 * 31 * 24*60*60 + 947462400))

Given 30.4166666666667 days in month
So a month interval is exactly 2628000 seconds.
Base: 947462400 = Mon Jan 10 00:00:00 2000
Now: 1073951210 = Mon Jan 12 23:46:50 2004 (= 126488810 s since base)
Current interval starts:
Fri Jan 9 00:00:00 2004 = 1073606400 s
= 48 intervals since updatebase
(scalar gmtime(48 * 30.4166666666667 * 24*60*60 + 947462400))
Next interval starts:
Sun Feb 8 10:00:00 2004 = 1076234400 s
= 49 intervals since updatebase
(scalar gmtime(49 * 30.4166666666667 * 24*60*60 + 947462400))

--
Sean M. Burke http://search.cpan.org/~sburke/
• ... Why give up? These elements are a hint as to when you can reasonably expect to fetch the content. Could everyone who has an aggregator that wakes up
Message 2 of 10 , Jan 14, 2004
On Mon, 12 Jan 2004 15:17:22 -0900, Sean M. Burke <sburke@...> wrote:

> This is so vague as to be useless. And mind you, I'm not using some
> bizarre degenerate case here -- just a plain "once a month" update
> schedule.
>
> What should we do? As an implementor, I'm quite tempted to just give up
> interpreting all cases of N*monthly or N*yearly (for at least small
> values
> of N) and just pretend they are "once weekly".

Why give up? These elements are a hint as to when you can reasonably
expect to fetch the content. Could everyone who has an aggregator that
wakes up 2505600 seconds from now and fetches my feed please raise their
hand now. Mine doesn't. In fact sometimes it can be a whole 3600 seconds
out either way. Sometimes even my monthly print magazines come out 29 days
or even 32 days after the previous one but on average I tend to get 12 a
year...

Here's an alternative algorithm that would make it more accurate, for
those that need it:

Find the number of integral "updatePeriod" time intervals between now and
"updateBase". Add that number of "updatePeriod" time intervals to
"updateBase". Find the number of seconds in an "updatePeriod" time
interval. Divide that by the "updateFrequency" and add the result to the
date calculated in the previous step. That's when you could fetch the feed
next.

I'll leave it up to the implementor to decide whether to fetch the feed ar
1am, 2am, 3:15am, 4:56am or even the next day.

Ian
• ... OK, how about we define a month as being 28 days? I d be happy with that. That way, if it would avoid the problem case of February (esp. non-leapyear
Message 3 of 10 , Jan 14, 2004
At 05:03 AM 2004-01-14, Ian Davis wrote:
>Sometimes even my monthly print magazines come out 29 days
>or even 32 days after the previous one but on average I tend to get 12 a
>year...

OK, how about we define a month as being 28 days? I'd be happy with
that. That way, if it would avoid the problem case of February (esp.
non-leapyear ones).

That is, to use round numbers for just sake of clarity, suppose that we
have a crontab'd process that generates a feed on midnight of the first of
every month (via a "0 0 1 * * make_that_feed" line, say), then if we assume
a month is 30 days, then a client that happens to check toward the end of
January (getting the January content) might think that there wouldn't be
any point in checking until the beginning of March, totally missing the
February content. But if we assume a month-period to be always 28
days-worth-of-seconds, then this problem never arises.

Of course, this is a corner case, but it happens. RFC 1925 and all that.

--
Sean M. Burke http://search.cpan.org/~sburke/
• ... OK, I cooked up a draft implementation plus tests. I ve not uploaded it to CPAN yet because I d like feedback from folks on the list first. You can
Message 4 of 10 , Jan 15, 2004
At 02:23 PM 2004-01-09, kelliottmccrea wrote:
>I did a question and answer with Ian a while back trying to
>undestanding mod_syndication. His answers were pretty clear, I wasn't
>thrilled with them, but I wrote my notes down at:
>http://laughingmeme.org/archives/000392.html
>If those don't make sense, let me know, and I can expand on them.

OK, I cooked up a draft implementation plus tests. I've not uploaded it to
CPAN yet because I'd like feedback from folks on the list first.
You can download the dist of it here:
Or can browse it here:

I would very much appreciate if folks could look at it in the next week or
so. One of the LiveJournal developers has already told me he wants to
deploy it as soon as it's ready.

Long story short: it's lots of sanity-checking code, then code to implement
skipHours and skipDays, but the sy:update* code is mostly just a wrapper
around this:

use constant HOUR_SEC => 60 * 60;
use constant DAY_SEC => 60 * 60 * 24;
use constant WEEK_SEC => 60 * 60 * 24 * 7;
use constant MONTH_SEC => 60 * 60 * 24 * 7 * 28;
use constant YEAR_SEC => 60 * 60 * 24 * 7 * 365;
[...
then some code to copy those into updatePeriod_sec as appropriate
...]
\$interval = int(
(\$updatePeriod_sec || 0)
/ (\$updateFrequency || 1)
);
# So if we update 5 times daily, \$interval is (DAY_SEC / 5) seconds

return \$lastPolled unless \$interval; # sanity-check
\$base = \$updateBase_sec || 0;
\$start_of_current_interval
= int( (\$lastPolled - \$base) / \$interval) * \$interval + \$base;
\$new_content_after= \$start_of_current_interval + \$interval;

--
Sean M. Burke http://search.cpan.org/~sburke/
• It occured to me that neither in http://interglacial.com/temp/XML-RSS-Timing-101/ nor in http://laughingmeme.org/archives/000392.html is there any discussion
Message 5 of 10 , Jan 18, 2004
It occured to me that neither in
http://laughingmeme.org/archives/000392.html is there any discussion of the
ttl element. So, after a bit of thinking (usually a useful prelude to
action!), I added an implementation of ttl to the XML::RSS::Timing module
I've been working on. Basically, if it sees no update* values, it falls
back on a ttl value for guidance on when next to poll the feed.

And as I was writing tests for it (usually a useful postlude to action!), I
uncovered what I hope were the only bugs in the module: I defined the
constants MONTH_SEC and YEAR_SEC to seven times their correct value. Oops.

So I killed the old version's URLs mentioned in my earlier post. Here are
the new URLs to the new and improved versions: