by Philip Altman
Nearly all of us, beginner and expert alike, started programming in Atari BASIC. This language was first supplied in a cartridge and now, with minor revisions, is built into the new XL machines. Now that you've learned to use BASIC, ever wonder how it works? In this series of articles, we'll dissect Atari BASIC, took at its internal structure and find out much of what is going on behind the READY prompt.
The information presented here comes from several years' experience with Ataris, disassembly of portions of the BASIC ROM and from several publications, including De Re Atari and Compute's First and Second Books of Atari. For the technically minded, those interested in the finer details, Compute's Atari BASIC Source Book is highly recommended. Included is the entire BASIC source code, amounting to more than 130 pages. Although the ROM code can't be modified, assembly language programmers may be interested in tracing through it.
Atari BASIC is a high level computer language. It allows us to communicate with the microprocessor using English-like statements, arithmetic expressions, logical operators (<, >, etc.), and so on. The heart of the Atari, the 6502 microprocessor, is only capable of executing a sequence of numbers representing a series of elementary instructions. BASIC is a program which translates for us. It is an example of an interpreted language, which means that the translation process occurs as the program is run. One advantage of Atari BASIC is that some pre-processing, according to a complex set of internal rules, occurs as lines are entered. This is how BASIC detects certain types of errors before a program is run. Compiled languages, in contrast, convert all program statements to machine code before execution. When error-free, a compiled program runs much faster, but programming can be difficult to learn and tedious. If mistakes are made, compiled programs usually have to be debugged, re-compiled, run again, and so on.
You wouldn't recognize a BASIC program if you examined it as it exists within the Atari. This is because BASIC statements, operators and other program data are each internally represented in tokenized, or numeric, form. These tokenized programs run faster and consume less memory. There are four types of tokens in BASIC programs, ranging from 14 to 255. Tokens 128 to 255 are reserved to identify variables (this is why BASIC allows only 128 unique variables), When a string constant is tokenized, it is preceded first by a 15, which identifies it as a string constant, and then by another number representing its length. The string's contents follow. Numeric constants are preceded by a 14. Remaining tokens represent statements (e.g., CLOSE, DIM) and operators (e.g., As program lines are entered, BASIC will find the various statements and operators in internal look-up tables from which their respective tokens are derived. These tables are located at the addresses 42159 and 42979, respectively (42143 and 42974 for XLs). Program I will list these tokens and their corresponding statements/operators to the system printer or screen, if you have no printer. Press BREAK when the listing is complete.
Not shown are statement tokens, 54, representing the implied "LET" before a variable, and 55, the "ERROR" token. Note that among the operator tokens are several "=" ' each reflecting a different use of this symbol. Finally, three tokens are non-printing. Two (56 and 57) are array left parentheses, stored with array names in the VNT The third (22) is the terminator token found at the end of most lines. Statements for which no tokens are found generate errors. When a program is run, BASIC uses these token values as indices into other tables which point to the appropriate execution instructions.
BASIC will keep track of a tokenized program by expanding and contracting a set of five tables in the memory. Each is pointed to by data in a pair of addresses in memory page zero (0-255). The address of the variable name table, for example, is = PEEK(130) + 256*PEEK(131). These tables, along with their functions and pointers, are detailed below.
1. VARIABLE NAME TABLE (130,131) - a sequential list of all variable names starting with the first variable entered. Tokens are assigned in order as variables are defined, beginning with 128. Array names have a (added at the end, and strings have a $. The last character of each variable name is also stored in inverse video. By examining this character, BASIC then knows where each name ends and the variable type. The end of the VNT is marked by a dummy zero byte and pointed to by addresses 132 and 133. Program 2 is a short utility which will display the variable name table of a program, so long as it has no lines higher than 31999. Enter the lines and LIST the program to cassette or disk. To use it, load your program, ENTER the utility, and type G.32000.
2. VARIABLE VALUE TABLE (134,135) stores information on each variable in the same order. BASIC variables may be scalar (e.g., A= 5), arrays (e.g., A(1)=5) or strings (e.g., A$(1,1)= "5"). Each VVT entry consists of eight bytes. The first two are the same for every variable; the
first indicates the type (scalar, array or string) and the second, the variable index (token-128). The remaining six bytes describe specific information about the variable. In the case of a scalar, they represent the sixbyte numeric variable value, expressed in BCD (binary coded decimal). For arrays and strings, they include size and length data and a pointer into the string/array table (see below), telling BASIC where to find the current values of each of the string/array elements.
3. STATEMENT TABLE (136, 137) - stores program lines in tokenized form, in ascending numeric order. When a line is added to a program, the statement table is simply expanded to make room for the new tokens. For each new variable in the line, the VNT is enlarged and a new eight-byte VVT entry is created. The opposite occurs when a line is deleted, except that any variable names now unused are retained in the variable tables. Thus, after a while, the VNT may become cluttered with variable names no longer used in a program. BASIC has no direct means of purging these names. When a line is inserted within a program, BASIC must first determine if the line already exists. If so, the old tokens are replaced by the new tokens, updated VNT and VVT entries are created for any new variables, and ST bytes are moved up or down in memory as needed to accommodate the new line. If the inserted line is new, then BASIC puts it in its correct position by line number. Clearly, the jobs of entering lines and editing programs require a lot of byte manipulation, which BASIC accomplishes with quick memory management routines. The infamous lock-up bug, corrected in revision B (XL BASIC), resided in these routines. The last line in the statement table is the immediate mode line, assigned line number 32768. Your last instruction (LIST, RUN, etc.) is tokenized and stored here.
The following two tables are created when a program is executed:
4. STRING/ARRAY TABLE (140, 141) - a changing table which holds the current values of each element in the string/array variables. Entries in this table are created when the program is run and each string/array is dimensioned. Each string/array variable has a pointer in the VVT, telling BASIC where in the string/array table to locate its values.
5. RUNTIME STACK (142, 143) - used by BASIC mainly during execution of FOR/NEXT loops and GOSUB/RETURN/POPs. Important data is temporarily saved here, including exactly where in a program line to RETURN after completing a GOSUB. For FOR/NEXT loops, the variable token, loop limit and increment (step value) are stored.
There are three additional pointers to note. While BASIC programs and tables move higher in memory as they enlarge, the Operating System may need to move the screen display lower to accommodate graphics modes which require more memory. These pointers keep BASIC and the Operating System from conflicting with each other. In LOMEM (128, 129), the OS tells BASIC where it can begin building its tables. Without DOS, this is set to 1792 on power-up and reset. In APPMHI (14, 15), BASIC tells the OS the highest address it has used, so the OS knows just how low in memory it can move screen data without overwriting program bytes. Closely related is MEMTOP (741, 742), in which the OS tells BASIC how much memory it is using for the screen display.
A tokenized line.
Program lines are stored in a specific format. Let's take a look at the anatomy of a tokenized BASIC line as it exists in the statement table. The first two bytes always contain the line number, stored in two-byte binary form, low byte first. To retrieve the decimal line number, multiply the high byte by 256 and add the low. Valid line numbers range from 0 to 32767. The next memory location contains the line length in bytes. Adding this value to the starting address of the line gives the address of the next program line. Since lines may have multiple statements (each separated by :), the following byte shows the number of bytes in the first statement. If there is only one statement, then the line and statement length bytes will be the same. Otherwise, the statement length byte will be less. Each statement in a line starts with a byte representing the accumulated lengths of all the preceding statements. This means that the last statement must start with a byte that is equal to the overall length of the line. Next come the tokens actually comprising the line. Most lines are terminated with an end-of-line token (22).
Now, enter Program 3. After entering it, LIST it to cassette or disk. Given a line number, the program displays the line's starting address in the statement table and its length. It then shows the tokens which make up the line in numeric and character format. In our next session, we'll use this program and what we've learned so far to analyze some simple BASIC lines. Then, we'll look at the structure and components of Atari BASIC and their interactions.