polish postfix converter
CONTENTS of THIS PAGE...
Introduction (just below)
Ian Craw's (University of Aberdeen) Excellent (!) notes.....
English Examples Explanation What is an Interpreter? What is Parsing? Language Definition Formal Syntax
Outlining a Solution
The Parser: Reading Tokens & Converting them to Descriptors Parsing Statements Dealing with Expressions Evaluating Expressions Pictures of the Objects
Ian Craw's (University of Aberdeen) Excellent (!) notes.....
Some problems don't lend themselves to solutions described only in terms of objects. This is one of them.
Where to begin.....
At first glance this project may seem like an impossible task but once it is properly broken down it's not so bad. So let's begin with some terminology:
A Lexical analyzer takes the "raw" words in a statement and breaks the statement up into it's various components (tokens). Then the Parser takes over who's job is to determine whether or not these tokens constitute a properly formed statement according to the rules of the language (rules of syntax). The final stage then is to determine the meaning (semantics) of the raw words given their context. This is done by the Interpreter.
In natural languages (especially English) the rules of syntax are fairly well known but the interpretation of the meanings of the words is often ambiguous and the same statement can be broken up various ways.
"British Left Waffles on Falkland Islands" can be interpreted at least two ways:
[British Left] [Waffles] [on Falkland Islands]
-or-
[British] [Left] [Waffles] [on Falkland Islands]
"Squad helps dog bite victim" can also be interpreted two ways:
[Squad] [helps] [dog bite victim]
-or-
[Squad] [helps] [dog] [bite] [victim]
"Enraged Cow Injures Farmer With Ax" can be interpreted two ways too:
[Enraged Cow] [Injures] [Farmer With Ax]
-or-
[Enraged Cow] [Injures] [Farmer] [With Ax]
Well, you get the picture. What we do when we analyze these statements is to apply known rules of syntax to decompose the statement into it's syntactic components (parts of speech) so we can apply the appropriate meaning (semantics) to each part. Before we even get to the semantics part we must reduce each word (called a token) and classify it (as "subject", "verb", etc.). The resultant syntactic components are called descriptors. This allows us to check that the statement is syntactically correct (to parse it). We can conclude that the statement is syntactically correct when the descriptors match one of the known rules for syntax. In other words, if the statements can be reduced to the descriptors [subject] [verb] [object] then it is syntactically correct and we can proceed to the next step, namely to interpret the meaning. Humans are wonderfully adapted to do all of these jobs and once we "know" a language, we can go through these steps without even being conscious of it. The process is more obvious when we are forced to deal with a language with which we are not that familiar.
Computers can interpret languages too, but in order to be able to do this by computer we must find a way of eliminating ALL ambiguities. In human languages, we can make adjustments to our "lexical analyzer" on the fly to resolve ambiguities as they come up. We are very good at using context for this ("It can't mean this because that makes no sense so it must mean that"). Computers are not. As a result the rules of syntax for a computer language do not contain ambiguities. Each language has a precisely outlined set of syntactical rules and an equally precise set of semantic rules to go along with them. With this done, we can now write a program that can do the steps outlined above automatically.
1. What is an Interpreter? (From The Computer Desktop Encyclopedia Copyright © 1981-2000 The Computer Language Company Inc. All rights reserved. Ver. 13.1, 1st Quarter 2000)
A high-level programming language translator that translates and runs the program at the same time. It translates one program statement into machine language, executes it, then proceeds to the next statement. This differs from regular executable programs that are presented to the computer as binary-coded instructions. Interpreted programs remain in the same source language format the programmer wrote in: as text files, not machine language files.
Interpreted programs run slower than their compiler counterparts. Whereas the compiler translates the entire program before it is run, interpreters translate a line at a time while the program is run. However, it is very convenient to write an interpreted program, since a single line of code can be tested interactively.
Interpreted programs must always be run with the interpreter. For example, in order to run a BASIC or dBASE program, the BASIC or dBASE interpreter must be in the target computer.
If a language can be both interpreted and compiled, a program may be developed with the interpreter for ease of testing and debugging and later compiled for production use.
Interpreters and Compilers
Unlike compiled languages that have been translated into machine language (right), interpreted languages must be translated at runtime. With languages such as dBASE and BASIC (middle), the interpreter translates the original source code. Languages such as Java and Visual Basic (left) use an intermediate language. The source code is compiled into "byte code," and the bytecode is interpreted at runtime.
2. What is parsing (a somewhat more formal explanation)
An algorithm or program to determine the syntactic structure of a sentence or string of symbols in some language. A parser normally takes as input a sequence of tokens output by a lexical analyzer. It may produce some kind of abstract syntax tree as output.
3. Arithmetic Expression "Language" definition:
The Notation......(called BNF, or Backus-Naur-Form)
Rules of Syntax:
More on "Backus-Naur Form" (the notation used above)
Backus-Naur Form (from FOLDOC (on-line computer dictionary)
<language, grammar> (BNF, originally "Backus Normal Form") A formal metasyntax used to express context-free grammars. Backus Normal Form was renamed Backus-Naur Form at the suggestion of Donald Knuth.
BNF is one of the most commonly used metasyntactic notations for specifying the syntax of programming languages, command sets, and the like. It is widely used for language descriptions but seldom documented anywhere (how do you document a metasyntax?), so that it must usually be learned by osmosis (but see RFC 2234).
Consider this BNF for a US postal address:
This translates into English as: "A postal-address consists of a name-part, followed by a street-address part, followed by a zip-code part. A personal-part consists of either a first name or an initial followed by a dot. A name-part consists of either: a personal-part followed by a last name followed by an optional "jr-part" (Jr., Sr., or dynastic number) and end-of-line, or a personal part followed by a name part (this rule illustrates the use of recursion in BNFs, covering the case of people who use multiple first and middle names and/or initials). A street address consists of an optional apartment specifier, followed by a street number, followed by a street name. A zip-part consists of a town-name, followed by a comma, followed by a state code, followed by a ZIP-code followed by an end-of-line."
Note that many things (such as the format of a personal-part, apartment specifier, or ZIP-code) are left unspecified. These lexical details are presumed to be obvious from context or specified somewhere nearby.
There are many variants and extensions of BNF, possibly containing some or all of the regexp wild cards such as "*" or "+". EBNF is a common one. In fact the example above isn't the pure form invented for the ALGOL 60 report. "[]" was introduced a few years later in IBM's PL/I definition but is now universally recognised. ABNF is another extension.
(1997-11-23)
AND.... if you were paying attention you might have noticed that we don't put the '(' ')' ANYWHERE in the Post-fix version. By our clever "re-arrangement" of the original operators and operands, we have eliminated the need for the brackets. The Post-fix can now be evaluated EXACTLY the same way. If this were a compiler, we would have spent as bit of extra time compiling and generating machine code but what we gain is efficient evaluation (i.e. execution) of the expressions.
You need to use 2 stacks: 1 is a regular stack, used for temporary holding of descriptors; one will be a deque. The deque is where we will "build" the postfix expression. While building the expression, it is treated as an ordinary stack but once finished we will want to pull values off from the "bottom".
Updated: