Loading ...
Sorry, an error occurred while loading the content.

What is the status of l1 regularization in vw?

Expand Messages
  • Sanmi Koyejo
    I saw some discussion that indicated vw was in the process of implementing l1 regularization. What is the status? Regards, -Sanmi
    Message 1 of 6 , Jun 17, 2011
    View Source
    • 0 Attachment
      I saw some discussion that indicated vw was in the process of implementing l1 regularization. What is the status?

      Regards,
      -Sanmi
    • John Langford
      I just added a version of l1 regularization for the online optimizers. You can see it by using: ./vw -d --l1 0.01 --readable_model foo; cat foo Larger
      Message 2 of 6 , Jun 18, 2011
      View Source
      • 0 Attachment
        I just added a version of l1 regularization for the online optimizers.
        You can see it by using:

        ./vw -d <file> --l1 0.01 --readable_model foo; cat foo

        Larger values of l1 will create sparser models. If the flag isn't
        specified (or you use --l1 0.0), then the old behavior occurs.

        -John

        On 06/17/2011 05:50 PM, Sanmi Koyejo wrote:
        >
        > I saw some discussion that indicated vw was in the process of
        > implementing l1 regularization. What is the status?
        >
        > Regards,
        > -Sanmi
        >
        >
      • Vaclav Petricek
        Thanks a lot John! I am giving L1 a spin right now. First observation is that vw segfaults when trying to load a model where all the weights have been squashed
        Message 3 of 6 , Jun 20, 2011
        View Source
        • 0 Attachment
          Thanks a lot John! I am giving L1 a spin right now. First observation
          is that vw segfaults when trying to load a model where all the weights
          have been squashed by a too large --l1 param.

          See below on how to replicate the segfault.

          Vaclav

          $ cat data.in
          1 |a A
          0 |a A
          1 |a A

          I. Segfault on empty model

          $ vwl1 -d data.in --l1 0.5 --readable_model model.readable -f model.binary
          using no cache
          Reading from data.in
          num sources = 1
          final_regressor = model.binary
          Num weight bits = 18
          learning rate = 10
          initial_t = 1
          power_t = 0.5
          learning_rate set to 10
          average since example example current current current
          loss last counter weight label predict features
          0.666667 0.666667 3 3.0 1.0000 0.0000 2

          finished run
          number of examples = 3
          weighted example sum = 3
          weighted label sum = 2
          average loss = 0.6667
          best constant = 0.5
          best constant's loss = 0.25
          total feature number = 6

          $ cat model.readable
          Version 5.1
          Min label:0.000000 max label:1.000000
          bits:18 thread_bits:0
          ngram:0 skips:0
          index:weight pairs:

          $ vwl1 -d data.in -t -i model.binary
          using no cache
          Reading from data.in
          num sources = 1
          Segmentation fault

          II. All is good when there are non-zero weights left:

          $ vwl1 -d data.in --l1 0.000001 --readable_model model.readable -f model.binary
          using no cache
          Reading from data.in
          num sources = 1
          final_regressor = model.binary
          Num weight bits = 18
          learning rate = 10
          initial_t = 1
          power_t = 0.5
          learning_rate set to 10
          average since example example current current current
          loss last counter weight label predict features
          0.997365 0.997365 3 3.0 1.0000 0.0031 2

          finished run
          number of examples = 3
          weighted example sum = 3
          weighted label sum = 2
          average loss = 0.9974
          best constant = 0.5
          best constant's loss = 0.25
          total feature number = 6

          $ cat model.readable
          Version 5.1
          Min label:0.000000 max label:1.000000
          bits:18 thread_bits:0
          ngram:0 skips:0
          index:weight pairs:
          116060:0.496636
          214560:0.496636

          $ vwl1 -d data.in -t -i model.binary
          using no cache
          Reading from data.in
          num sources = 1
          Num weight bits = 18
          learning rate = 10
          initial_t = 1
          power_t = 0.5
          only testing
          average since example example current current current
          loss last counter weight label predict features
          0.328894 0.328894 3 3.0 1.0000 0.9933 2

          finished run
          number of examples = 3
          weighted example sum = 3
          weighted label sum = 2
          average loss = 0.3289
          best constant = 0.5
          best constant's loss = 0.25
          total feature number = 6



          On Sat, Jun 18, 2011 at 8:11 PM, John Langford <jl@...> wrote:
          > I just added a version of l1 regularization for the online optimizers.
          > You can see it by using:
          >
          > ./vw -d <file> --l1 0.01 --readable_model foo; cat foo
          >
          > Larger values of l1 will create sparser models. If the flag isn't
          > specified (or you use --l1 0.0), then the old behavior occurs.
          >
          > -John
          >
          > On 06/17/2011 05:50 PM, Sanmi Koyejo wrote:
          >>
          >> I saw some discussion that indicated vw was in the process of
          >> implementing l1 regularization. What is the status?
          >>
          >> Regards,
          >> -Sanmi
          >>
          >>
          >
          >
          >
          > ------------------------------------
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
        • Sanmi
          Does this work with --conjugate_gradient. It ran successfully, but it was not clear. ... -- Regards, Sanmi
          Message 4 of 6 , Jun 20, 2011
          View Source
          • 0 Attachment
            Does this work with --conjugate_gradient. It ran successfully, but it was not clear.

            On Sat, Jun 18, 2011 at 10:11 PM, John Langford <jl@...> wrote:
            I just added a version of l1 regularization for the online optimizers. You can see it by using:

            ./vw -d <file> --l1 0.01 --readable_model foo; cat foo

            Larger values of l1 will create sparser models. If the flag isn't specified (or you use --l1 0.0), then the old behavior occurs.

            -John

            On 06/17/2011 05:50 PM, Sanmi Koyejo wrote:

            I saw some discussion that indicated vw was in the process of implementing l1 regularization. What is the status?

            Regards,
            -Sanmi






            --
            Regards,
            Sanmi
          • John Langford
            A silly model reading bug, now fixed. Thanks. -John
            Message 5 of 6 , Jun 20, 2011
            View Source
            • 0 Attachment
              A silly model reading bug, now fixed. Thanks.

              -John

              On 06/20/2011 05:52 PM, Vaclav Petricek wrote:
              >
              > Thanks a lot John! I am giving L1 a spin right now. First observation
              > is that vw segfaults when trying to load a model where all the weights
              > have been squashed by a too large --l1 param.
              >
              > See below on how to replicate the segfault.
              >
              > Vaclav
              >
              > $ cat data.in
              > 1 |a A
              > 0 |a A
              > 1 |a A
              >
              > I. Segfault on empty model
              >
              > $ vwl1 -d data.in --l1 0.5 --readable_model model.readable -f model.binary
              > using no cache
              > Reading from data.in
              > num sources = 1
              > final_regressor = model.binary
              > Num weight bits = 18
              > learning rate = 10
              > initial_t = 1
              > power_t = 0.5
              > learning_rate set to 10
              > average since example example current current current
              > loss last counter weight label predict features
              > 0.666667 0.666667 3 3.0 1.0000 0.0000 2
              >
              > finished run
              > number of examples = 3
              > weighted example sum = 3
              > weighted label sum = 2
              > average loss = 0.6667
              > best constant = 0.5
              > best constant's loss = 0.25
              > total feature number = 6
              >
              > $ cat model.readable
              > Version 5.1
              > Min label:0.000000 max label:1.000000
              > bits:18 thread_bits:0
              > ngram:0 skips:0
              > index:weight pairs:
              >
              > $ vwl1 -d data.in -t -i model.binary
              > using no cache
              > Reading from data.in
              > num sources = 1
              > Segmentation fault
              >
              > II. All is good when there are non-zero weights left:
              >
              > $ vwl1 -d data.in --l1 0.000001 --readable_model model.readable -f
              > model.binary
              > using no cache
              > Reading from data.in
              > num sources = 1
              > final_regressor = model.binary
              > Num weight bits = 18
              > learning rate = 10
              > initial_t = 1
              > power_t = 0.5
              > learning_rate set to 10
              > average since example example current current current
              > loss last counter weight label predict features
              > 0.997365 0.997365 3 3.0 1.0000 0.0031 2
              >
              > finished run
              > number of examples = 3
              > weighted example sum = 3
              > weighted label sum = 2
              > average loss = 0.9974
              > best constant = 0.5
              > best constant's loss = 0.25
              > total feature number = 6
              >
              > $ cat model.readable
              > Version 5.1
              > Min label:0.000000 max label:1.000000
              > bits:18 thread_bits:0
              > ngram:0 skips:0
              > index:weight pairs:
              > 116060:0.496636
              > 214560:0.496636
              >
              > $ vwl1 -d data.in -t -i model.binary
              > using no cache
              > Reading from data.in
              > num sources = 1
              > Num weight bits = 18
              > learning rate = 10
              > initial_t = 1
              > power_t = 0.5
              > only testing
              > average since example example current current current
              > loss last counter weight label predict features
              > 0.328894 0.328894 3 3.0 1.0000 0.9933 2
              >
              > finished run
              > number of examples = 3
              > weighted example sum = 3
              > weighted label sum = 2
              > average loss = 0.3289
              > best constant = 0.5
              > best constant's loss = 0.25
              > total feature number = 6
              >
              > On Sat, Jun 18, 2011 at 8:11 PM, John Langford <jl@...
              > <mailto:jl%40hunch.net>> wrote:
              > > I just added a version of l1 regularization for the online optimizers.
              > > You can see it by using:
              > >
              > > ./vw -d <file> --l1 0.01 --readable_model foo; cat foo
              > >
              > > Larger values of l1 will create sparser models. If the flag isn't
              > > specified (or you use --l1 0.0), then the old behavior occurs.
              > >
              > > -John
              > >
              > > On 06/17/2011 05:50 PM, Sanmi Koyejo wrote:
              > >>
              > >> I saw some discussion that indicated vw was in the process of
              > >> implementing l1 regularization. What is the status?
              > >>
              > >> Regards,
              > >> -Sanmi
              > >>
              > >>
              > >
              > >
              > >
              > > ------------------------------------
              > >
              > > Yahoo! Groups Links
              > >
              > >
              > >
              > >
              >
              >
            • John Langford
              No, but we might get that working soon. -John
              Message 6 of 6 , Jun 20, 2011
              View Source
              • 0 Attachment
                No, but we might get that working soon.

                -John

                On 06/20/2011 08:05 PM, Sanmi wrote:
                >
                > Does this work with --conjugate_gradient. It ran successfully, but it
                > was not clear.
                >
                >
                > On Sat, Jun 18, 2011 at 10:11 PM, John Langford <jl@...
                > <mailto:jl@...>> wrote:
                >
                > I just added a version of l1 regularization for the online
                > optimizers. You can see it by using:
                >
                > ./vw -d <file> --l1 0.01 --readable_model foo; cat foo
                >
                > Larger values of l1 will create sparser models. If the flag isn't
                > specified (or you use --l1 0.0), then the old behavior occurs.
                >
                > -John
                >
                > On 06/17/2011 05:50 PM, Sanmi Koyejo wrote:
                >
                >
                > I saw some discussion that indicated vw was in the process of
                > implementing l1 regularization. What is the status?
                >
                > Regards,
                > -Sanmi
                >
                >
                >
                >
                >
                >
                > --
                > Regards,
                > Sanmi
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.