My Natural Language Processing program
- I've taken upon myself the task of creating a full-fledged natural
language processing program that could serve as a sort of "operating
system" in a computer that you talk to rather than point and click at.
My main interest deals in parsing and registering everday human logic
followed by handling queries based upon what the program has been
I finished my first generation program several months ago, which can
be downloaded here: http://geocities.com/ai_project_1 .
The program starts out as a "clean slate" whereby you begin teaching
it basic facts that you can later query. Here is a "discussion" that
I held with it.
Me: Who do you know?
Computer: I do not know anyone.
Me: There is a crazy boy and a wild girl.
Me: He is named Joe.
Computer: Confirm the name of the boy is: Joe.
Computer: Thank you.
Me: Her name is Sally.
Computer: Confirm the name of the girl is: Sally.
Computer: Thank you.
Me: There is a green pool.
Me: Yesterday, Joe swam in it.
Me: He did that because he is crazy.
Me: There is a hilly park.
Me: Sally ran in it.
Me: She is 2 meters tall.
Me: The boy is 5 feet tall.
Me: Who do you know?
Computer: I know a boy named Joe and a girl named Sally.
Me: What is Joe like?
Computer: Joe is crazy.
Me: Why is he like that?
Computer: I do not know.
Me: Where is the boy?
Computer: As of 1-14-07 Joe was in the pool.
Me: What is he doing?
Computer: As of 1-14-07 Joe was swimming.
Me: Why did he swim?
Computer: Because Joe is crazy.
Me: Is Sally crazy?
Computer: No, Sally is not crazy.
Me: She is wild.
Computer: Yes, I know.
Me: Where is she?
Computer: As of 3 minutes ago Sally was in the park.
Me: Is Joe taller than Sally?
Me: What is the park like?
Computer: It is hilly.
Me: Where is it?
Computer: I do not know the location of the park.
Very basic stuff, but not bad for my first try at programming!
I knew that I was going to need to start the whole thing over again if
I was going to be able to handle the complexity of statements that I
wanted, and also if it was going to be able to output more
naturalistic and varied responses.
My new program, so far, is able to parse statements of arbitrary
length and complexity:
"Joe and Sally walked in and around the park and playground"
"They are wild and crazy swam in the pool and ate the pizza"
"There is a boy girl and pool the boy and girl are wild and crazy they
swam in it because they are hot"
I intentionally left out the punctuation because the program is
intended to be able to parse text that is input from a speech
recognition program, which is punctuation free, of course.
I would like to get in touch with others who have the same kind of
interest in natural language processing. I would like to find people
that I can work on this project with, either as a developer, a
beta-tester, or even an output response writer (as in the response to
a statement like "Hello" or "The boy is crazy").
I have tried to find other working examples of programs like mine. I
have tried those web-based chatbots, and they are horrifyingly awful.
I have looked in the Natural Language registry website, and found
nothing very exciting. Mainly, I am looking for inspiration so that I
can create the best NLP program possible.
contact me at: ai_project_1 at yahoo dot you-know-what
> -----Original Message-----Dude - have you been through here - http://www.alicebot.org/ ?
> From: firstname.lastname@example.org
> [mailto:email@example.com] On Behalf Of dkane75
> Sent: Friday, 26 January 2007 3:23 PM
> To: firstname.lastname@example.org
> Subject: [Artificial Intelligence Group] My Natural Language Processing
> I've taken upon myself the task of creating a full-fledged natural
> language processing program that could serve as a sort of "operating
> system" in a computer that you talk to rather than point and click at.
> My main interest deals in parsing and registering everday human logic
> followed by handling queries based upon what the program has been
> I finished my first generation program several months ago, which can
> be downloaded here: http://geocities.com/ai_project_1 .
> The program starts out as a "clean slate" whereby you begin teaching
> it basic facts that you can later query.
generic categories of meaning:
Objects bias (differentiating):
BLEND - wholeness, whole numbers
BOUND - partness, rational numbers
Relationships bias (integrating):
BOND - share space, irrational numbers
BIND - share time, imaginary numbers
From these come composites as reals, complex, quaternions, octonions. All
>Yes, A.L.I.C.E. is a prime example of a "chatterbot" that is utterly
> Dude - have you been through here - http://www.alicebot.org/ ?
incapable of dealing with specfic objects. It is a simple pattern
matching engine that has precisely no ability to "learn" what you tell it.
Here is a "conversation" I just had with A.L.I.C.E:
Me: There is a boy named Joe.
Alice: Tell me more about it.
Me: He is crazy
Alice: I see. crazy is he?
Me: What is Joe like?
ALice: Like a planet around a star.
It is easy to see, then, that A.L.I.C.E. fails miserably at a most
basic task of exhibiting "intelligence".
- I figure that the most challenging part of writing a program based on
LRT (Logic Recognition Technology) is simply the task of writing the
algorithms that allow one to refer to specific objects. What I mean
by this is that there are a million ways to refer to the same thing.
The easy way to refer to objects is through proper names and pronouns
(Mike is..., or He is...). The hard way, however, is to refer to
objects through characteristics, locations, or events that they have
been involved in. The hardest way of all is a combination of these
My LRT program (formerly known as "Natural Language Processing") is
now able to accurately parse and register the following series of
1. There is a boy named Joe, a girl named Sue, and a female dog named
2. There is a hill, a park, and a deep and warm pool
3. Joe, Sue and the dog are crazy and wild and swam in the pool
4. They are on the hill.
5. There is a boy named Mike, a girl named Betty, and a male dog named
6. Mike, Betty, and Annie are crazy and wild
7. They are on the hill.
8. The hill is in the park.
9. The crazy and wild boy and girl and female dog on the hill in the
park who swam in the deep and warm pool are funny.
If your next statement is, "Joe is funny", the ouput will be something
like: CHAR_OBJ <C_1170887814.1> OF <1170887588_boy.MALE.1> ALREADY KNOWN.
Likewise, the statement, "Betty is funny", the output will be
something like: UPDATE <1170887716_girl.FEM.1> WITH CHAR_OBJ
Because I am currently focusing on logic processing, I am leaving
aside writing "human-like" responses for later. Just pay attention to
the logical correctness of the output. You would not believe how
mind-bendingly difficult is was to write the algorithm that would
effectively parse sentence #9! In fact, I don't forsee any bigger
challenges related to the task of writing a full-featured LRT program
The thing that most NLP/LRT types get hung up on is the infinitely
diverse nature of the lexicals (nouns, adjectives, adverbs) that
inhabit our language. My feeling, however, is that these lexicals are
almost completely dependent on the structural words (verbs, copulas
and prepositions). A long time ago, I made the decision to stop
worrying about how large of a "vocabulary" I should give my program.
In other words, I had to decide what the size of its "lexical domain"
should be. I chose to use only a few words that would allow me to
create an effective LRT prototype. Step by step, as the robustness of
the logic recognition increases, I will be able to consider new
lexical domains to branch out into.
Consider the following statement:
"The manically-depressed partial amputee from the Australian province
of New South Wales who recently delivered his thesis on the mating
rituals of Brazilian ring-tailed lemurs will soon get hitched to his
foxy, sassy, aristocratic beau of 20 years from Kuala Lumpur."
In order to get started on this project, I had to stop worrying about
the infinitely many ways of saying this very basic thing: "Soon, X
will marry Y". All of the characteristics of X and Y and all of the
events that X and Y have been through are in no way related to the
basic logic of the statement. Programming a computer to recognize
lexicals is a MUCH simpler task than programming it to recognize
Because I am now at the point where my program can recognize and
register the logic of the series of statements above, I will be able
to "teach it" about things such as the provinces of Australia and the
varieties of lemurs in Brazil, and on and on...
And only after doing this will I start to "teach it" how to respond in
a more human way.