translation lab

Parsing
Lab Notes on Latin-English Translation

Formal Syntax	Outlining a Solution	About Tokens
Words VS Tokens	About The Queue	Parser VS Translator
	The Parser:
Reading Tokens & Converting them to Descriptors	Parsing Statements	Dealing with Syntax
Translating Sentences	Pictures of the Objects

ALTERNATE (OLD VERSION) VIEW OF NounPhrase
ALTERNATE (OLD VERSION) VIEW OF VerbPhrase
ALTERNATE (OLD VERSION) VIEW OF Lists

Outline of solution:

Here's one approach:

The first step, and perhaps the most difficult is to identify those things in our solution that can be described as objects. To keep the solution as straight-forward as possible we will limit the number of objects we define.

Step 1: identify those things that are objects in this problem.

Each object:

- consists of .. (is/ has/ knows)
- operates on... (does)
- reports on... (output?)

Some possibilities:

maybe token (each "thing" in a sentence) *****
scanner or parser (the lexical analyzer)
the queue (where the parser will deposit each token)
the translator (the part that takes the tokens from the queue and translates them)

Step 2: lay out the main algorithm

- usual intro/ initializations
- get the arguments (command line arguments might work well here)
- open the appropriate file(s)

The Token: This is a do-dad that often gives people grief. Tokens are not magical entities. In our application each identifiable THING in our sentence is a token. Most of them are words, but some are punctuation. A token is simply a catch-all word, so we don't have to keep saying, "words and punctuation". In some applications there are more than two kinds of things in a sentence. Using tokens also gives us some leeway - we can identify other entities as tokens that are usually not considered to be part of a sentence but that we need in order to write a program to parse sentences. An example of this would be the end of the sentence - that is, the part that comes AFTER the punctuation (the EOLN marker).

The Queue

Words VS Tokens

Parser VS Translator

The Parser:
Reading Symbols and Converting them to Tokens:

each "token" will be a single word but you'll need to be able to skip white space between tokens.
you'll probably need the following information associated with each token:
- it's value ( "the", "cat", or "happy", etc.)
- it's type (see below)
you will probably end up with the following "kinds" of tokens:
- noun
- pronoun
- propernoun
- article
- conjunction
- adverb
- adjective
- verb
- punctuation
- end-of-statement

How to Get Started

Rough outline, , no error detection/recovery:
First Crack:
Let's take stock: we need to see what we already have to avoid duplication.

What do we already have?

1. that annoying Debug Class

we sort of know how it works and how to use it so we can afford to ignore this for now.

2. A Word Class

looks big and ugly - we'll come back to it later - for now let's just assume it is an object that deals with the words in the sentence.

3. A Dictionary Class

ditto - let's assume it has lists of words in different languages & ways to search for them

4. A ListElement Class

huh?
Maybe if we ignore it it won't matter (I have inside information and I know this to be true, but generally speaking, it is inefficient to get stuck on one part of a solution because we don't quite understand it. If we get stuck, we will try to continue on and see if we can come back later with more information and make sense of it then.)

5. A MyQueue Class

We sort of know what a queue is - let's look at this and confirm
- the goal is to understand this well enough that we can ignore the insides of this too

6. NotAWordException.java

We'll also ignore this for now.

The Queue: If we can figure out the public stuff of this Queue class we may not have to look at it further, so....
It uses things of type "Object". Our Word & Token classes are both Objects (by inheritance) so we can use this class to hold either Words or Tokens. To avoid confusion, and since I'm going to be putting Tokens in my queue I'll just pretend I see "Token" where ever there is "Object"
Things I can do with this queue:

1. get a reference to the Token at the front of the Queue (front)
2. put a Token in the Queue (add)
3. pop a Token off the Queue (pop)
4. see if the Queue is empty (empty)
5. display the entire contents of the Queue (without changing it) (display)

I also discover that the Queue class uses the ListElement class but I don't appear to need to know anything about the ListElement class in order to use the Queue class. Good. I'll continue to ignore the ListElement class. Time to move on.

Take a second look at the Words class - let's re-organize for our purposes:
It has:

1. a bunch of integer codes for the parts of speech (NOUN, VERB, etc.)
2. a bunch of booleans that will say if a Word is this particular part of speech (isNoun, etc.)
3. access methods for the word, it's type and it's transaltion
4. a static method that "translates" a type like "noun" into the integer code (NOUN)
5. a display function
6. some constructors

This makes it a little less intimidating. Continuing....

The Dictionary Class:

The constructor opens a text file, reads it, and builds a dictionary.
There is an add method that lets me add a word to the dictionary (it assumes English is first)
There is a find that uses a simple String (and ENGLISH or LATIN) and there is one that uses a Word - but it only uses the first part of the Word - it ignores the translation. There are actually a whole bunch of find functions - they all have the same basic goals, but they take different parameters and some behave a little differently from the others.
I can ignore the rest of the stuff in that class.

We have established what we HAVE.
Next step is to start figuring out what we NEED.
What do we need?
1. We need to be able to cope with lines of text read from the keyboard. We can use tio to help us or do our own. We only need something to read a line of text so it doesn't need to be complicated. This will do it:

public int getNewLine() {

System.out.print("Please enter your sentence. \nSeperate each thing with"

+ " at least one blank \n- " +

"including the punctuation at the end .\n");

try {

String line = in.readLine();

pos = 0;

max = line.length();

return line.length();

}

catch (IOException E) {

System.out.print("getNextLine::IO Error detected.\n\n");

max = 0;

return 0;

}
}

where in (used in "in.readLine()") is:

BufferedReader in =
new BufferedReader(new InputStreamReader(System.in));

We need try & catch because this object can throw an IOException and Java forces us to handle them.

What we are aiming for is something to "capture" words. We will assume words are seperated from each other by white space. This will also be true for the punctuation. The end-of-line will signify the end of the sentence even if we don't have the punctuation yet. For this we can make a Line Class. It should have:

a string : one whole line of input
a pos : an integer keeping track of where we are now in the line
max : the length of the line

It should be able to do:

getNewLine - prompt the user for a new line and set it up in the class
getNext - return the next non-blank string from the current line
moreLine : true until we see the end-of-line

We can write this and build a little test program to make sure it works. This should do it:

public static void main (String args[]) {

String s;

Line l = new Line();

try {

Debug.setFlag(true);

System.out.print("Here we go...\n");

l.getNewLine();

while (l.moreLine()){

s = l.getNext();

System.out.print("=" + s + "=\n");

}

System.out.print("Done.\n\n");

}

catch (Exception E) {

System.out.print("Uh-Oh!: " + E + "\n\n");

}
}

After this there is almost nothing left for the Token to do and if it weren't for that pesky punctuation, we could probably avoid using tokens altogether and deal directly with Words. Oh well - the Token class is easier now.

Now, What do we need?
There's that Token do-dad, the scanner (aka lexical analyzer, aka parser), and the translator.
Let's see if we can figure out how this will need to be packaged (as in, what does the main program look like and do?):

The main program will need:

the Dictionary
a queue of Tokens ( we will give the queue to the parser to fill up)
a parser (which needs to know about the queue and the dictionary and needs to use the line)
a translator (which needs the queue of tokens but not the dictionary)

Maybe if the parser is the only one that needs the dictionary it should be the one to own it? On the other hand, it might allow us more flexibility later if we gave the dictionary to the translator too. This opens up the possibility of providing the translator with a different dictionary at some other point.

So now, what does the main program look like?

public static void main (String args[]) {

try {
Dictionary dic = new Dictionary("List.txt");

Queue Q = new Queue();
Parser P = new Parser(dic,Q);
Translator T = new Translator(dic,Q);
Line l = new Line();

Debug.setFlag(true);

System.out.print("Here we go...\n");

P.go(l); // start the parser on the current line
Q.display(); // show the contents of the queue
T.go(); // do the translation
System.out.print("Done.\n\n");

}

catch (Exception E) {

System.out.print("Uh-Oh!: " + E + "\n\n");

}
}

The queue stuff is done. The translator will simply go through the queue, one token at a time and print out the translation to go with the word we have. That's not too hard so we can do that now and test it by putting a bunch of made-up words in the queue or we can leave it till later.

That pretty much just leaves the Parser. That's for the next section........


Translating Sentences: The translator will simply go through the queue, one token at a time and print out the translation that goes with the word we have. And TA-DA!!!! we've got it.


Pictures of the Objects: