Loading ...
Sorry, an error occurred while loading the content.

RE: [Clip] HTML to wikiHow conversion clip

Expand Messages
  • John Shotsky
    Phillip, it would be a little easier if you include one actual example of each line you want to convert, and then how they should appear after the conversion.
    Message 1 of 8 , Dec 1, 2009
    • 0 Attachment
      Phillip, it would be a little easier if you include one actual example of each line you want to convert, and then how
      they should appear after the conversion.



      Regards,

      John



      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Mr. Phillip Sand Hansel II
      Sent: Tuesday, December 01, 2009 2:21 PM
      To: ntb-clips@yahoogroups.com
      Subject: Re: [Clip] HTML to wikiHow conversion clip





      Greetings;

      I saw there was some discussion in the past on using NoteTab to make wiki documents, but could not find a clipbook on
      converting HTML to wikiHow syntax.

      I am trying to convert web pages to something wikiHow understands, and have come up with a substitution clip that works
      fairly well, but is probably not as elegant as possible. My humble approach attacks the start and end tags individually.


      WikiHow has it's own simple syntax. There is a second level header denoted by == Header 2 ==.
      There are URL links denoted by [[link description]].
      There are image links denoted by [[Image: imagename.jpg|thumb| description]].
      There are table markup tags; which are too complicated for me to understand how to translate (out of my scope,
      presently).

      My clip so far looks like this... any suggestions appreciated.

      ; convert HTML to WikiHow markup
      ^!Jump DOC_START
      ; Convert various Headers to 2nd level hdr used by wikiHow.
      ^!Replace <H1> == SICHA
      ^!Replace <H2> == SICHA
      ^!Replace <H3> == SICHA
      ^!Replace <H4> == SICHA
      ^!Replace <H5> == SICHA
      ^!Replace </H1> == SICHA
      ^!Replace </H2> == SICHA
      ^!Replace </H3> == SICHA
      ^!Replace </H4> == SICHA
      ^!Replace </H5> == SICHA
      ;Convert List items to wikiHow ordered list flags (#)
      ^!Replace <LI> # SICHA
      ^!Replace </LI> SICHA
      ;Convert paragraphs to wikiHow ordered list flags? (#)
      ^!Replace <P> # SICHA
      ^!Replace </P> SICHA
      ;Convert links to wikiHow syntax
      ^!REPLACE <A HREF=" [[ SICHA
      ^!REPLACE </A> ]] SICHA

      ;Convert image links to wikiHow syntax
      ^!REPLACE <IMG SRC=" [Image: SICHA
      ; replace image attributes with nothing
      ^!Replace "WIDTH=\"...\"" >> "" WRS
      ^!Replace "HEIGHT=\"...\"" >> "" WRS
      ^!Replace "BORDER=\".*\"" >> "" WRS
      ^!Replace "ALT=\".+\"" >> "" WRS
      ;Fix straggler closing angle brackets?
      ^!Replace " > "" WRS
      ^!Replace " > "" WRS
      ^!Replace " > "" WRS
      ;And then add "strip HTML markup" menu item? Currently doing manually after above substitutions have been made.

      Mr. Phillip Sand Hansel II

      [Non-text portions of this message have been removed]





      [Non-text portions of this message have been removed]
    • Mr. Phillip Sand Hansel II
      Sorry for not providing a better example. My goal is to convert bona-fide HTML into the simpler wiki format. My example clip works fairly well, but leaves some
      Message 2 of 8 , Dec 1, 2009
      • 0 Attachment
        Sorry for not providing a better example. My goal is to convert bona-fide
        HTML into the simpler wiki format. My example clip works fairly well, but
        leaves some hand clean-up. I run a Toolbar Modify Change HTML tags to
        UpperCase first, run my script, then run a Toolbar Modify Strip HTML tags to
        get rid of "non-wiki supported" HTML tags that I may have missed.

        Since wiki auto-creates thumbnails, I chose to simply delete the WIDTH=123,
        HEIGHT=456, BORDER=3 image modifiers.

        Convert this...
        <H1>Trebuchet Trials</H1>
        <p>
        <ol>
        <li>The base can not exceed one meter in length and five decimeters in
        width.
        <li>The throwing arm can not exceed 1.5 meters in length.
        <li>The catapult must have a locking device.
        </ol>
        <p>
        <ul>
        <li>Estimate and obtain adequate amount of materials.
        <li>The design called for four eight foot long two-by-twos, so purchase
        five.
        <li>All 5 were used, with about two feet left over.
        </ul>
        <p>
        The <A HREF="http://members.iinet.net.au/~rmine/gctrebs.html">Grey Company
        Trebuchet</A> site...
        <p>
        <IMG SRC="buildbase.jpg" WIDTH="300" HEIGHT="225" BORDER="3" ALT="Build A
        Base">


        To this...
        == Trebuchet Trials ==

        #The base can not exceed one meter in length and 0.5 meter in width.
        #The throwing arm can not exceed 1.5 meters in length.
        #The catapult must have a locking device.

        *Estimate and obtain adequate amount of materials.
        *The design called for four eight foot long two-by-twos, so purchase five.
        *All 5 were used, with about two feet left over.

        The [[http://members.iinet.net.au/~rmine/gctrebs.html | Grey Company
        Trebuchet]] site...

        [[Image:buildbase.jpg |thumb| Build a base]]




        Mr. Phillip Sand Hansel II


        ----- Original Message -----
        From: John Shotsky
        To: ntb-clips@yahoogroups.com
        Sent: Tuesday, December 01, 2009 6:33 PM
        Subject: RE: [Clip] HTML to wikiHow conversion clip



        Phillip, it would be a little easier if you include one actual example of
        each line you want to convert, and then how
        they should appear after the conversion.

        Regards,

        John

        From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf
        Of Mr. Phillip Sand Hansel II
        Sent: Tuesday, December 01, 2009 2:21 PM
        To: ntb-clips@yahoogroups.com
        Subject: Re: [Clip] HTML to wikiHow conversion clip

        Greetings;

        I saw there was some discussion in the past on using NoteTab to make wiki
        documents, but could not find a clipbook on
        converting HTML to wikiHow syntax.

        I am trying to convert web pages to something wikiHow understands, and have
        come up with a substitution clip that works
        fairly well, but is probably not as elegant as possible. My humble approach
        attacks the start and end tags individually.

        WikiHow has it's own simple syntax. There is a second level header denoted
        by == Header 2 ==.
        There are URL links denoted by [[link description]].
        There are image links denoted by [[Image: imagename.jpg|thumb|
        description]].
        There are table markup tags; which are too complicated for me to understand
        how to translate (out of my scope,
        presently).

        My clip so far looks like this... any suggestions appreciated.

        ; convert HTML to WikiHow markup
        ^!Jump DOC_START
        ; Convert various Headers to 2nd level hdr used by wikiHow.
        ^!Replace <H1> == SICHA
        ^!Replace <H2> == SICHA
        ^!Replace <H3> == SICHA
        ^!Replace <H4> == SICHA
        ^!Replace <H5> == SICHA
        ^!Replace </H1> == SICHA
        ^!Replace </H2> == SICHA
        ^!Replace </H3> == SICHA
        ^!Replace </H4> == SICHA
        ^!Replace </H5> == SICHA
        ;Convert List items to wikiHow ordered list flags (#)
        ^!Replace <LI> # SICHA
        ^!Replace </LI> SICHA
        ;Convert paragraphs to wikiHow ordered list flags? (#)
        ^!Replace <P> # SICHA
        ^!Replace </P> SICHA
        ;Convert links to wikiHow syntax
        ^!REPLACE <A HREF=" [[ SICHA
        ^!REPLACE </A> ]] SICHA

        ;Convert image links to wikiHow syntax
        ^!REPLACE <IMG SRC=" [Image: SICHA
        ; replace image attributes with nothing
        ^!Replace "WIDTH=\"...\"" >> "" WRS
        ^!Replace "HEIGHT=\"...\"" >> "" WRS
        ^!Replace "BORDER=\".*\"" >> "" WRS
        ^!Replace "ALT=\".+\"" >> "" WRS
        ;Fix straggler closing angle brackets?
        ^!Replace " > "" WRS
        ^!Replace " > "" WRS
        ^!Replace " > "" WRS
        ;And then add "strip HTML markup" menu item? Currently doing manually after
        above substitutions have been made.

        Mr. Phillip Sand Hansel II

        [Non-text portions of this message have been removed]

        [Non-text portions of this message have been removed]
      • Sheri
        Hi Phillip, I gave it a quick stab, no guarantees. I m sure documents will still need some cleanup after using it. There s at least one long line that will
        Message 3 of 8 , Dec 2, 2009
        • 0 Attachment
          Hi Phillip,

          I gave it a quick stab, no guarantees. I'm sure documents will still need some cleanup after using it. There's at least one long line that will need to be joined after copying this clip from email or archives (the IMG line).

          Regards,
          Sheri

          ^!Replace "(?is)<H\d>(.+?)</H\d>" >> "== $1 ==" RAWS
          ^!Jump Doc_Start
          :olloop
          ^!Find "(?si)<ol>.+?</ol>" RS
          ^!Iferror olloopend
          ^!Replace "<li>" >> "#" RAHS
          ^!Jump Select_End
          ^!Goto olloop
          :olloopend
          ^!Jump Doc_start
          :ulloop
          ^!Find "(?si)<ul>.+?</ul>" RS
          ^!Iferror links
          ^!Replace "<li>" >> "*" RAHS
          ^!Jump Select_End
          ^!Goto ulloop
          :links
          ^!Replace "(?is)<A HREF=\x22(.+?)\x22>(.*?)</A>" >> "[[$1|$2]]" RAWS
          :images
          ^!Replace "(?is)<IMG SRC=\x22.*?ALT=\x22(.+?)\x22>" >> "[[Image:$1|thumb|$2]]" RAWS
          ^!Replace "(?is)<p>\R?" >> "*" RAWS
          :tags
          ^!Replace "(?i)<[^>]+>" >> "" RAWS
          ;end of clip
        • Mr. Phillip Sand Hansel II
          Sheri: I knew that what I had mostly worked, but I also see that your approach is much cleaner and more direct. Thank you for leaving me a character building
          Message 4 of 8 , Dec 2, 2009
          • 0 Attachment
            Sheri:

            I knew that what I had mostly worked, but I also see that your approach is
            much cleaner and more direct.

            Thank you for leaving me a "character building challenge"; it helped me to
            more fully understand what the code was doing. :-)

            The Image replacement step did not work as expected; that caused me to read
            some help, but when I quickly got lost on the PCRE patterns section, I
            simply sat and stared at the code until I figured out some () were missing.
            I also added an extra set of double quotes (\x22 to close the imagename.jpg
            variable) and a middle variable, $2, which is the WIDTH & HEIGHT stuff I
            throw away.

            I changed...
            ^!Replace "(?is)<IMG SRC=\x22.*?ALT=\x22(.+?)\x22>" >>
            "[[Image:$1|thumb|$2]]" RAWS

            To...
            ^!Replace "(?is)<IMG SRC=\x22(.+?)\x22(.+?)ALT=\x22(.+?)\x22>" >>
            "[[Image:$1|thumb|$3]]" RAWS

            And then it did work as expected. Thank your for the improved method, and
            for making me grow.

            I will test some more real life conversions of HTML fles, and then perhaps
            create a wikiHow page on the topic. It is a useful tool if you've got HTML
            and want to share what is says with wikiFolk.


            Mr. Phillip Sand Hansel II


            ----- Original Message -----
            From: Sheri
            To: ntb-clips@yahoogroups.com
            Sent: Wednesday, December 02, 2009 4:49 PM
            Subject: Re: [Clip] HTML to wikiHow conversion clip



            Hi Phillip,

            I gave it a quick stab, no guarantees. I'm sure documents will still need
            some cleanup after using it. There's at least one long line that will need
            to be joined after copying this clip from email or archives (the IMG line).

            Regards,
            Sheri

            ^!Replace "(?is)<H\d>(.+?)</H\d>" >> "== $1 ==" RAWS
            ^!Jump Doc_Start
            :olloop
            ^!Find "(?si)<ol>.+?</ol>" RS
            ^!Iferror olloopend
            ^!Replace "<li>" >> "#" RAHS
            ^!Jump Select_End
            ^!Goto olloop
            :olloopend
            ^!Jump Doc_start
            :ulloop
            ^!Find "(?si)<ul>.+?</ul>" RS
            ^!Iferror links
            ^!Replace "<li>" >> "*" RAHS
            ^!Jump Select_End
            ^!Goto ulloop
            :links
            ^!Replace "(?is)<A HREF=\x22(.+?)\x22>(.*?)</A>" >> "[[$1|$2]]" RAWS
            :images
            ^!Replace "(?is)<IMG SRC=\x22.*?ALT=\x22(.+?)\x22>" >>
            "[[Image:$1|thumb|$2]]" RAWS
            ^!Replace "(?is)<p>\R?" >> "*" RAWS
            :tags
            ^!Replace "(?i)<[^>]+>" >> "" RAWS
            ;end of clip
          • flo.gehrke
            ... Hi Sheri, I would like to point out just some small details regarding the last 8 lines (I ve added some line numbers for description)... 1. :links 2.
            Message 5 of 8 , Dec 3, 2009
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "Sheri" <silvermoonwoman@...> wrote:
              >
              > I gave it a quick stab, no guarantees. I'm sure documents will still need some cleanup after using it. There's at least one long line that will need to be joined after copying this clip from email or archives (the IMG line).
              >
              > Regards,
              > Sheri
              >
              > ^!Replace "(?is)<H\d>(.+?)</H\d>" >> "== $1 ==" RAWS
              > ^!Jump Doc_Start...

              Hi Sheri,

              I would like to point out just some small details regarding the last 8 lines (I've added some line numbers for description)...

              1. :links
              2. ^!Replace "(?is)<A HREF=\x22(.+?)\x22>(.*?)</A>" >> "[[$1|$2]]" RAWS
              3. :images
              4. ^!Replace "(?is)<IMG SRC=\x22.*?ALT=\x22(.+?)\x22>" >> "[[Image:$1|thumb|$2]]" RAWS
              5. ^!Replace "(?is)<p>\R?" >> "*" RAWS
              6. :tags
              7. ^!Replace "(?i)<[^>]+>" >> "" RAWS
              8. ;end of clip


              Line #2: I think, Phillip wants to see some more space in the replacement.

              Line #4: For me, it doesn't capture a second substring, so '$2' remains empty and is literally output with the replacement.

              Line #5: Produces some asterisks which I can't see in Phillip's result (omitted in my proposal).

              What would you think of replacing the last lines with...

              :links
              ^!Replace "(?is)<A HREF=\x22(.+?)\x22>(.*?)</A>" >> "[[$1\x20|\x20$2]]" RAWS
              :images
              ; Next line extended
              ^!Replace "(?isx) <IMG\x20SRC=\x22 ([^\x22]+) \x22.+ALT=\x22 (.+) \x22>" >> "[[Image:$1\x20|thumb|\x20$2]]" AWRS
              :tags
              ^!Replace "(<[^>]+>\R)+" >> "\r\n" AWRS
              ; Join the image-line
              ^!Find "^\[\[Image:\C+\]" WRS
              ^!Menu Modify/Lines/Join Lines
              ^!Jump 1
              ; end of clip


              Probably, some more improvements are needed. For example, joining more lines?

              Regards,
              Flo
            • Sheri
              ... LOL, No wonder Phillip says I gave him cause to stare at the documentation Sorry Phillip! ;) I saw that he d been waiting awhile for help, felt badly and
              Message 6 of 8 , Dec 3, 2009
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                >
                > I would like to point out just some small details regarding the
                > last 8 lines (I've added some line numbers for description)...
                >

                LOL, No wonder Phillip says I gave him cause to stare at the documentation Sorry Phillip! ;)

                I saw that he'd been waiting awhile for help, felt badly and did what I could on the fly. I don't have time, but I'm glad if you and rest of the group can help him further refine and improve it.

                Regards,
                Sheri
              • Mr. Phillip Sand Hansel II
                The first off the cuff effort was great and caused me to grow. It also caused me to upgrade to the latest Light version (I was proud that I had actually paid
                Message 7 of 8 , Dec 3, 2009
                • 0 Attachment
                  The first "off the cuff" effort was great and caused me to grow. It also caused me to upgrade to the latest Light version (I was proud that I had actually paid for 4.5 several years back... pride is funny.)

                  You've both been more than helpful, I feel that the response was very prompt. I've incorporated Flo's suggestions and they work great.

                  Thank you (and the rest of the group) for being there, you've turned a repetitive editing process into a one-click-fix. I think I understand enough to extend what I've got to other cases, should they arise.

                  Mr. Phillip Sand Hansel II


                  ----- Original Message -----
                  From: Sheri
                  To: ntb-clips@yahoogroups.com
                  Sent: Thursday, December 03, 2009 2:10 PM
                  Subject: Re: [Clip] HTML to wikiHow conversion clip



                  --- In ntb-clips@yahoogroups.com, "flo.gehrke" <flo.gehrke@...> wrote:
                  >
                  > I would like to point out just some small details regarding the
                  > last 8 lines (I've added some line numbers for description)...
                  >

                  LOL, No wonder Phillip says I gave him cause to stare at the documentation Sorry Phillip! ;)

                  I saw that he'd been waiting awhile for help, felt badly and did what I could on the fly. I don't have time, but I'm glad if you and rest of the group can help him further refine and improve it.

                  Regards,
                  Sheri





                  [Non-text portions of this message have been removed]
                Your message has been successfully submitted and would be delivered to recipients shortly.