Loading ...
Sorry, an error occurred while loading the content.

1590Re: [syndication] HTML Encoding BooHoo...

Expand Messages
  • Morbus Iff
    May 23, 2001
    • 0 Attachment
      >>Now, the "reversion of entities" code in my RSS reader doesn't know about
      >>HTML - it just blindly reverts < to < and so forth. Is the only solution
      >>to my problem to make the code understand all the possible HTML entities?
      >>Or is there something else?
      >There's a fair bit of code around that removes all tags except a subset
      >of "Allowable html". PHP even has this as a function built into the
      >scripting language.

      Yes, but that wouldn't solve my above problem (** and see earlier message).
      In thiss case, <XML> wasn't a tag, it was part of the actual <title>.
      Removing all HTML tags wouldn't affect the <XML>, cos that's not a valid
      HTML tag anyways... Right now, my reader:

      - loads in an XML file.
      - converts any encoded </>'s to </> (to cover encoded HTML).
      this is a mass replacement, which causes the above problem.

      Ultimately, I don't want to remove tags (that's not a decision I'm willing
      to make for the users of my program, but it will be an option that they can
      choose from).

      In this case, it's not even an issue of allowable tags or not - it's an
      issue of preparing for people correctly encoding HTML (<b>) and not
      encoding HTML (<b>).

      ** I eventually tracked the culprit to nothing in my code, but rather the
      XML::Simple perl module, which seems to magick <XML> into <XML> all
      by itself. I'm still investigating, but seeing the file encoded, and then
      loading it through XML::Simple and Data::Dump[ing] it shows that it's
      autoconverted. Why, I'm not sure...

      Morbus Iff
      .sig on other machine.
    • Show all 7 messages in this topic