Thanks a lot John! I am giving L1 a spin right now. First observation

is that vw segfaults when trying to load a model where all the weights

have been squashed by a too large --l1 param.

See below on how to replicate the segfault.

Vaclav

$ cat data.in

1 |a A

0 |a A

1 |a A

I. Segfault on empty model

$ vwl1 -d data.in --l1 0.5 --readable_model model.readable -f model.binary

using no cache

Reading from data.in

num sources = 1

final_regressor = model.binary

Num weight bits = 18

learning rate = 10

initial_t = 1

power_t = 0.5

learning_rate set to 10

average since example example current current current

loss last counter weight label predict features

0.666667 0.666667 3 3.0 1.0000 0.0000 2

finished run

number of examples = 3

weighted example sum = 3

weighted label sum = 2

average loss = 0.6667

best constant = 0.5

best constant's loss = 0.25

total feature number = 6

$ cat model.readable

Version 5.1

Min label:0.000000 max label:1.000000

bits:18 thread_bits:0

ngram:0 skips:0

index:weight pairs:

$ vwl1 -d data.in -t -i model.binary

using no cache

Reading from data.in

num sources = 1

Segmentation fault

II. All is good when there are non-zero weights left:

$ vwl1 -d data.in --l1 0.000001 --readable_model model.readable -f model.binary

using no cache

Reading from data.in

num sources = 1

final_regressor = model.binary

Num weight bits = 18

learning rate = 10

initial_t = 1

power_t = 0.5

learning_rate set to 10

average since example example current current current

loss last counter weight label predict features

0.997365 0.997365 3 3.0 1.0000 0.0031 2

finished run

number of examples = 3

weighted example sum = 3

weighted label sum = 2

average loss = 0.9974

best constant = 0.5

best constant's loss = 0.25

total feature number = 6

$ cat model.readable

Version 5.1

Min label:0.000000 max label:1.000000

bits:18 thread_bits:0

ngram:0 skips:0

index:weight pairs:

116060:0.496636

214560:0.496636

$ vwl1 -d data.in -t -i model.binary

using no cache

Reading from data.in

num sources = 1

Num weight bits = 18

learning rate = 10

initial_t = 1

power_t = 0.5

only testing

average since example example current current current

loss last counter weight label predict features

0.328894 0.328894 3 3.0 1.0000 0.9933 2

finished run

number of examples = 3

weighted example sum = 3

weighted label sum = 2

average loss = 0.3289

best constant = 0.5

best constant's loss = 0.25

total feature number = 6

On Sat, Jun 18, 2011 at 8:11 PM, John Langford <jl@...> wrote:

> I just added a version of l1 regularization for the online optimizers.

> You can see it by using:

>

> ./vw -d <file> --l1 0.01 --readable_model foo; cat foo

>

> Larger values of l1 will create sparser models. If the flag isn't

> specified (or you use --l1 0.0), then the old behavior occurs.

>

> -John

>

> On 06/17/2011 05:50 PM, Sanmi Koyejo wrote:

>>

>> I saw some discussion that indicated vw was in the process of

>> implementing l1 regularization. What is the status?

>>

>> Regards,

>> -Sanmi

>>

>>

>

>

>

> ------------------------------------

>

> Yahoo! Groups Links

>

>

>

>