Re: [xenu-usergroup] link to wikipedia
- Tilman Hausherr wrote:
> On Mon, 26 Apr 2010 11:40:55 +0100, Jack Stringer wrote:There are plenty of old admins, I can assure you :-)
>>>>> since checking links to Wikipedia seems to be a legitimate task
>>>>> for Xenu, shouldn't someone contact them and as for the removal
>>>>> of the robots.txt exclusion?. Or is there a reason that Xenu and
>>>>> Wikipedia don't work together smoothly, e.g because of the
>>>> redirects in Wikipedia?
>>>>> By the way,
>>>>> User-agent: Xenu
>>>>> Disallow: /
>>>>> is also contained in http://de.wikipedia.org/robots.txt.
>> There are a couple of thousand users using Xenu if they all started
>> sending requests to wikipedia site then the server soon gets bogged
>> down trying to deliver the pages. Its the same as those people using
>> website copying software. I have had my photography gallery go very
>> very slow at times just because someone is trying to hoover up the
>> What would be nice is to find out from wikipedia what changes need
>> to be made to Xenu so make it nicer to their systems. E.g some sort
>> of delay when getting pages from wikipedia servers.
> Xenu is already "nice", i.e. it makes a HEAD request, not a GET
> request. My opinion is that the wikipedia software is crappy. The
> organisation is mostly concentrated on collecting money, enforcing
> censorship, altering history, and being busy with itself (many of the
> admins are just very intelligent kids with too much time), instead of
> delivering a high quality product by running a Continuous Improvement
> Tilman (holder of a scarlet letter from the wikipedia arb board :-))
The software is probably rough - it *is* still a charity, and due to the
mindless antics of loads of juniville vandals, it needs a large team of
vandal fighters (not just admins - there's only 1000 regular ones) to keep
the pages more or less intact - English Wikipedia has around 150-200 pages
change per minute, and around 10% of those have to be reverted - so the
servers are already very busy, and I think allowing Xenu in will grind it to
a halt - If the Dutch mirrors go down, and I have to connect direct (from
UK) to the USA servers, then it can take 30 seconds plus for a medium page
Process Safety & Development Specialist
Don't repeat history, unreported chemical lab/plant near misses at
http://www.crhf.org.uk Only two things are certain: The universe and
human stupidity; and I'm not certain about the universe. ~ Albert
- Now it does save/export and restore the milliseconds value.
On Sun, 18 Apr 2010 13:03:46 +0200, Tilman Hausherr wrote:
>Although Xenu isn't a SEO tool, it is being "misused" as such. A guy
>asked to get the duration in milliseconds, and google has recently
>announced that loading time of websites would be taken into
>A new beta version is here:
>This is just a test so you see how it looks and give feedback. The
>milliseconds value isn't saved in the .XEN file, nor in the export file.
>(This will be done at a later time). If you need the milliseconds
>feature, please test it and give feedback about wether this is usable,
>Below are all the changes since the last regular version. If you like to
>support me, please test it and give feedback.
>24.2.2010: Check the domains of mail addresses (DNS lookup for MX
>7.12.2009: Include PARSETEST4 section in general release (convert
>characters >80H to %XX, for "international" URLs)
>19.12.2009: For "international" characters in local files: Use Unicode
>for local directory search, URL launch in browser, read/check local
>20.12.2009: But not for Windows 95/98/ME
>22.12.2009: add ".class" for applets if needed, replace "." with "/".
>27.12.2009: updated to NSIS 2.46
>10.1.2010: use version 6 list column sort arrows on XP and higher
>14.1.2010: added Description column
>15.1.2010: added warning when settings overwritten by profile
>16.1.2010: attempt at decoding .jar files for APPLET ARCHIVE thanks to
> - only one .jar archive per applet
> - no unicode in file names
> - name of archive must end with .jar
> - .jar file must be internal, or the class link will
> - .class "in Jar" property isn't saved in .XEN file
>(which prevents standard access in favor of waiting for .jar lookup)
>24.1.2010: added <video src=
>27.1.2010: improved list control divider double click (title is the
>26.2.2010: improved extra text in domain mail check
>13.3.2010: Get page body only if not redirection or redirection but no
>"Location:" in header
> (should make PARSETEST3 fix superfluous)
>30.3.2010: Abort box for ftp orphan search
>2.4.2010: [Options] Accept="*/*" (default value)
>14.4.2010: milliseconds in duration
> (in progress; missing: export, save/load)
>15.12.2009: PARSETEST4 section: replaced "> 80X" with ">= 80X"
>20.12.2009: added version check for Unicode Clipboard and Sitemap for
>Windows 95/98/ME (like 27.1.2009)
>21.12.2009: corrected broken banner links
>22.12.2009: tell "anchor occurs multiple times" only once per URL
>4.1.2010: remove stuff after "?" in mailto: due to Microsoft error in
>10.1.2010: fixed list column sort arrows wrongly displayed in unsorted
>columns (on 7, but not on XP)
>12.1.2010: fixed "//" bug in applet codebase in local url
>15.1.2010: disabled and unchecked "Inactive" checkbox after loading new
>18.1.2010: fixed title line of tab export
>20.1.2010: Don't assume URLs to be UTF-8, use current charset instead
> However: this solution isn't perfect, because the correct
>charset of an URL would be the referring URL
> But in most cases it will work, because URLs usually
>have the same charset
> Known bug: Root URL with exotic characters
>20.1.2010: Corrected exotic URLs in sitemap
>26.1.2010: Fixed % in file: URLs, only convert %XX
>27.1.2010: "Conversion to lowercase" option uses codepage for conversion
>31.1.2010: Fixed bug in report (max size + max size url), probably
>introduced on 15.1.2010
>15.3.2010: vNormalizeURL() with conversion to UTF8 prior to
> store URLs in UTF8, unless already ANSI or ISO-8859-1 (1252)
> vRemovePercents for display only
>3.4.2010: prevent reentrant calls to vDoIdle();
> set fileNotFound status if tmp URL content file deleted by
>10.4.2010: replaced "> 80X" with ">= 80X" in vAnsi2EntityEscaped()
>Yahoo! Groups Links
- On my local hard disk I have a folder containing hundreds of html files,
and an index.html file that contains links to all of them.
When I run xenu on the index file, it correctly reports no broken links,
but it also reports that all of the other files are orphans.
Why is this? What am I doing wrong?
- Don't know.... send it to me in a zip, and send me a .XEN file in a ZIP
too, at tilman at snafu dot de.
On Fri, 07 May 2010 07:59:19 +0530, Ven. S. Upatissa (g) wrote:
>On my local hard disk I have a folder containing hundreds of html files,
>and an index.html file that contains links to all of them.
>When I run xenu on the index file, it correctly reports no broken links,
>but it also reports that all of the other files are orphans.
>Why is this? What am I doing wrong?
>Yahoo! Groups Links