|They that dally nicely with words may quickly make them wanton.
(Shakespeare: Twelfth Night III)
by Richard G. Lyons
The ability to easily manipulate alphanumeric characters was a major innovation in computer software. For any program to be considered “user friendly,” it must communicate with, and accept from the user, English words. Being able to reply to computer inquiries with “Yes” and “No” gives a user the impression of having a conversation with the machine. Furthermore, it is much more efficient for a computer to“interpret” our human language than it is for us to interpret the computer’s numerical language. String operations that delete, change, or insert groups of characters in a manuscript is the primary activity of Text Editors. All BASIC programmers soon realize the necessity of learning and understanding string operations. Although numerous articles have been published describing the ATARI 400/800 computer, they have dealt primarily with graphics capabilities. This article provides an expanded description of the ATARI BASIC String operations.
A string is an array of alphanumeric characters. These characters can consist of letters, numbers, punctuation marks, or even the special ATARI keyboard symbols. A string which contains no characters is called a “null string.” Examples of strings are:
"" (null string)
Note that each string was contained within quotation marks. The quotation marks inform the BASIC interpreter of the beginning and ending characters in a string. Consequently, quotation marks are illegal as string characters.Carriage Return (CR) is also invalid as a string character. Although most versions of BASIC restrict string lengths to 256 characters, ATARI BASIC permits strings to contain up to 32767 characters.
Since strings can be manipulated as variables they must have variable names. There are several conventions that must be followed when defining string variable names. A string name must be from 1 to 120 characters in length, begin with a letter, and end with a dollar sign ($). String names may not contain punctuation marks or ATARI special characters. Examples of string variable names and the direct definition of the contents of the strings are:
B:$="BNUM" ...illegal string name
RST$=""RESET"".illegal string characters
Although A$=“ABCD” is a valid BASIC statement, it cannot be used alone. All strings must be “dimensioned” before they can be defined or manipulated. Strings are dimensioned using the DIM statement. DIM statements allocate memory storage locations, and establish the string names for string variables. For example:
10 DIM T$(12)
permits the programmer, at some later time, to define the contents of string T$ with up to twelve characters. Note that the DIM statement does not define the contents of string T$, but merely reserves twelve memory locations. Twelve 8-bit bytes of RAM memory are allocated in the above example. Consider the following program executed on an ATARI 800 with 48K bytes of memory:
10 DIM A$(32767)
20 DIM B$(32767)
This is a valid program. Lines 10 and 20 dimension the strings A$ and B$ by giving them the ability to consist of a maximum of 32767 characters. However, strings A$ and B$ are null strings since they each contain no characters. Their dimensioned lengths are 32767, but their “character lengths” (number of characters in a string) are zero. The above program can be executed, but if an attempt is made to fill the strings with characters, an error will result because of insufficient memory space.
ATARI BASIC permits several strings to be dimensioned in one statement. For example:
10 DIM A$(100),B$(200),C$(300)
It is common practice to put DIM statements at the beginning of a program, and to dimension string variables with a number that is larger than necessary.
There are several ways of defining a string in ATARI BASIC. The most direct method is a statement which indentifies the characters in a string. Such as:
This statement merely defines string A$ as the string ABCD. Another definition technique envolves the INPUT statement. In this technique, the user of the program is prompted to define a string. Consider the following program:
10 DIM NAME$(10)
20 PRINT "WHAT IS YOUR NAME"
30 INPUT NAME$
40 PRINT "YOUR NAME IS_";NAME$
In this case, line 20 prompts the user to define the string NAME$. Line 30 performs the definition. Should the user key in more than 10 characters, only the first 10 characters would be used to define NAME$. Throughout this article, blank spaces will be identified with the symbol _ as shown in line 40.
READ statements can be used to define strings. For example:
10 DIM A$(11),B$(11),C$(11),D$(11)
20 READ A$,B$,C$,D$
30 PRINT A$;B$;C$;D$
40 DATA STRING_,DEFINITION_,USING_,READ
Line 20 defines the four strings A$,B$,C$, and D$. Read statements help minimize programming effort by defining several strings using only one statement. Note that quotation marks must not be used in DATA statements containing strings. The most flexible (and complex) way of defining strings concerns the use of subscripts and substrings.
Subscripts are numbers, or variables, used to identify portions of a string. Substrings are strings of characters that are contained in a larger string. A single character can be considered a substring. Substrings are defined by applying subscripts to a larger string.
String names can have zero, one, or two subscripts. First, let’s consider string definitions without the use of subscripts:
Statement Printed Results
l0 DIM A$(10),B$(10),C$(4)
20 A$="ABCDEF":PRINT A$ ...... ABCDEF
30 B$=A$:PRINT B$ ........ ABCDEF
40 C$=A$:PRINT C$ ............... ABCD
Line 20 directly defines string A$. Although A$ has a dimensioned length of 10, its character length is 6 since it contains only 6 characters. A$ occupies 6 bytes of memory. Line 30 defines string B$ by setting string B$ equal to the string A$. We’ll refer to string B$ as the destination string and string A$ as the source string. Line 40 illustrates an interesting characteristic of ATARI BASIC. Since string C$ has a dimensioned length of 4, only the first 4 characters of string A$ are used to define C$. Attempting to define a destination string with more characters than it can contain does not result in a software error!
Next, let’s consider string definition using a single subscript. String statements containing a single subscript take the form STRINGNAME$(s1), where s1 is the subscript. For example:
Statement Printed Results
10 DIM A$(10),B$(10)
20 A$="12345678":PRINT A$ ..... 12345678
30 B$=A$(4):PRINT B$ ........ 45678
40 B$=A$(5):PRIWT B$ ........ 5678
50 B$(5)=A$(5):PRINT B$ ..... 56785678
60 B$=A$(0):PRINT B$ ........ ERROR
70 B$=A$(9):PRINT B$ ........ ERROR
The first subscript encountered in this program is the (4) in line 30. The term A$(4) is a substring of the larger string A$. Substring A$(4) is the string of characters starting with the 4th and extending to the last character of string A$. Line 30 defines string B$ to be equal to the substring A$(4), namely “45678”. Line 40 shows a similar definition with a subscript of 5. Line 50 illustrates the definition of a destination substring, B$(5), with a source substring A$(5). This operation combines the two substrings to define string B$. Combining strings, or substrings, is known as concatenation. In order to concatenate two strings without losing any characters, the subscript of the destination substring must be equal to one plus the character length of the current destination string. This principle is illustrated in line 50. Since the character length of B$ is 4 (before line 50 was executed), the destination subscript must be 5. Line 60 illustrates that zero is not a valid subscript. If a subscript exceeds the character length of a string, an error occurs as shown in line 70.
The use of double subscripts permits the execution of additional string operations. String statements containing double subscripts take the form STRINGNAME$(s1,s2). Consider the following statements:
Statement Printed Results
10 DIM S$(15),T$(15),Q$(15)
20 S$="ATASIC":PRINT S$ .. ATASIC
30 Q$=S$(3,5):PRINT Q$ ... ASI
40 T$="RI_BA":PRIWT T$ ... RI_BA
50 S$(7,13)="T$:PRINT S$ . ATASICRI_BA
60 S$=S$(l,6):PRINT S$ ... ATASIC
70 T$(6,9)=S$(4,6):PRINT T$ RI_BASIC
80 S$(4)=T$:PRINT S$ ..... ATARI_BASIC
90 Q$=S$(1):PRINT Q$ ..... ATARI_BASIC
100 Q$=S$(1,1):PRINT Q$ .. A
Line 30 defines string Q$ to be equal to the substring S$(3,5). Substring S$(3,5) is the string of characters starting with the 3rd character and ending with the 5th character of string S$, namely “ASI”. Line 50 shows the concatenation of strings S$ and T$. An example of string truncation is illustrated in line 60. A technique for isolating the first character of a string using double subscripts is shown in line 100. This double subscript method is useful for examining a user response. For example:
10 DIM A$(5)
20 PRINT "DO YOU WANT TO CONTINUE? YES OR NO"
30 INPUT A$
40 IF A$(1,1)="Y" THEN GOTO 60
This routine examines the first character of the user’s response to line 20. Any character in a string can be isolated by double subscripts when both subscripts are set equal to the appropriate character number. It would be impractical (not to mention tedious) to demonstrate all possible single and double subscript combinations for defining strings in this article. As with any programming technique, experience is the best teacher. Therefore, the reader is encouraged to experiment on his/her own.
String operations are greatly enhanced by the use of String Functions. Some of the brief descriptions of String Functions given in Chapter 7 of the ATARI BASIC Reference Manual require further explanation.
Variables in ATARI BASIC are either numbers or strings of characters. Often times it is convenient to treat a numeric variable as a string or to treat a string variable as a number. Numeric variables can be converted to strings and string variables can be converted to a number by two String Functions, STR$ and VAL. The following statements show how the function STR$ converts a number to a string:
Statement Printed Results
10 DIM S$(20)
20 S$=STR$(l7):PRINT S$ 17
30 P=2*S$:PRINT P .... ERROR
40 S$=STR$(10/3):PRINT S$ 3.33333333
50 S$=STR$(22E17):PRINT S$ 2.2E+18
60 PRINT S$(4,4) ....... E
Line 20 defines string S$ to be the two-character string “17”. Although S$ is equal to ” 17”, it is a string variable and, as line 30 shows, it is illegal to attempt to perform an arithmetic operation on a string. Lines 40 and 50 show two more examples of converting a number to a string. The VAL function converts a string into a number. For example:
Statement Printed Results
10 DIM S$(8):S$="25":PRINT S$ 25
20 PRINT SQR(S$) ............ ERROR
30 PRINT SQR(VAL(S$)) ........ 5
40 S$="36TT”: PRINT SQR(VAL(S$)) 6
50 S$="X3.6TT":PRINT SQR(VAL(S$)) ERROR
60 PRINT 2*VAL(S$(2)) ....... 7.2
Line 10 defines S$ as a two-character string, namely “25”. Line 20 is an illegal statement because you cannot perform an arithmetic operation on a string. The VAL function in line 30 converts string S$ to the numerical value of 25. Line 30 also performs a square root operation. Lines 40, 50, and 60 show that string S$ can contain non-numerical characters but the VAL function can only be applied to numerical characters.
There are two additional String Functions that convert variables from string to numeric and vice versa. These two String Functions are ASC and CHR$. They deal primarily with obtaining the ATASCII decimal code of a character and obtaining the ATASCII character corresponding to a decimal number.
Let’s consider ASC(sexp) first:
Statement Printed Results
10 DIM A$(5):A$="VWXYZ"
20 N=ASC(A$):PRINT N ................ 86
30 N=ASC(A$(4)):PRINT N ............. 89
Note that if the string expression (sexp) is a string name, A$ in line 20, the ASC(A$) function returns the decimal ATASCII code for the first character in the string.
Line 20 sets a numeric variable N equal to the decimal code for the first character in string A$. The decimal code for any character in a string can be obtained if subscripts are used, as shown in line 30. The corresponding decimal code for ATASCII characters can be found in Appendix C of the ATARI BASIC Reference Manual.
The String Function CHR$ performs the opposite operation of ASC. CHR$ is used to obtain the
ATASCII character whose corresponding code number is an integer from 0 to 255. CHR$ has the format: CHR$(aexp). The argument (aexp) can range from 0 to 65535. This range corresponds to values that can be contained in a 16-bit word. However, the CHR$ function only operates on the least significant 8 bits of the value (aexp). Consider the following examples:
10 PRINT CHR$(65)
15 REM PRINTS AS "A"
20 PRINT CHR$(577)
25 REM PRINTS AS "A"
30 PRINT CHR$(65.49)
35 REM PRINTS AS "A"
40 PRINT CHR$(65.5)
45 REM PRINTS AS "B"
50 PRINT CHR$(2.33)
55 REM PRINTS AS "B"
60 PRINT CHR$(-65)
65 REM ERROR (negative aexp)
70 PRINT CHR$(65,66)
75 REM ERROR (one character only
Line 10 shows the most common form of CHR$; i.e., (aexp) is normally in the range of 0 to 255. If (aexp) is greater than 255, the BASIC interpreter substracts some integer multiple of 256 from (aexp) to obtain a number in the range of 0 to 255. Line 20 shows that CHR$(577) is equivalent to CHR$(65), since 577-2x256=65. Lines 30 and 40 show how (aexp) is rounded to an integer. Lines 60 and 70 show two illegal forms of (aexp).
Often times, for emphasis, it is advantageous to display a message in ATARI’s Inverse Video. (On a printer, the Inverse Video characters would appear as underlined alphanumeric characters.) A string can be converted to Inverse Video with the ASC and CHR$ Functions. The reader is encouraged to execute the following program:
10 DIM MSG$(11)
30 FOR X=1 TO LEN(MSG$)
50 NEXT X
60 PRINT MSG$
The loop, from lines 30 to 50, obtains the decimal code for each character in string MSG$, adds 128 to the code value, and then converts the new code back to an Inverse Video character.
Perhaps the most useful String Function is LEN. The format of this function is LEN(STRING-NAME$). LEN is used to obtain the character length of the string STRINGNAME$. For example:
10 DIM A$(10)
20 X=LEN(A$):PRINT X
25 REM PRINTS 0 (A$ is a null string)
30 A$="ABCD":PRINT LEN(A$)
35 REM PRINTS 4
40 A$(LEN(A$)+1)=A$:PRINT A$
45 REM PRINTS AS "ABCDABCD"
50 X=LEN(A$):PRINT X
55 REM PRINTS 8
Line 20 shows that the character length of the undefined string A$ is zero. String A$ is defined as LEN(A$) is printed in line 30. A straightforward technique for concatenation is shown in line 40. The subscript (LEN(A$)+ 1) will always point to the character position just beyond the last character in string A$. A good example of using LEN for concatenation is given on page 39 of the ATARI BASIC Reference Manual.
Although the Logical Operators NOT, AND, and OR cannot be applied to strings directly, they can be used with the LEN function. For example:
10 DIM A$(10),B$(10)
20 X=NOT LEN(A$):PRINT X
25 REM PRINTS 1 (LEN(A$)=0)
30 A$="AB":PRINT A$
35 REM PRINTS AS "AB"
40 X=NOT LEN(A$):PRINT X
45 REM PRINTS 0
50 B$=NOT A$:PRINT B$
55 REM ERROR illegal-logical operation
The ATARI BASIC Memory Management cannot concatenate strings that have a character length of some integer multiple of 256 (i.e. 256, 512, 768, etc.). The following routine uses the LEN function to guard against this problem:
10 REM * STRING LENGTH CHECK ROUTINE
20 DIM SPACE$(1)
40 FOR I=1 T0 127
50 IF LEN(A$)=I*256 THEN A$(LEN(A$)+1)=SPACE$
60 IF LEN(B$)=I*256 THEN B$(LEN(B$)+1)=SPACE$
70 NEXT I
The routine checks the character lengths of the two (previously defined) strings A$ and B$. The loop, from lines 40 to 70, checks both strings to see-if either has a length which is an exact multiple of 256. If either string does, lines 50 or 60 will add a space character to the string, enabling correct string manipulations later in the program.
There are several useful String Functions, found in other BASIC Interpreters,
which are not available in ATARI BASIC.
|LEFT(A$,I)||Returns the LEFTmost I characters in string A$|
|RIGHT$(A$,I)||Returns the RIGHTmost I characters in string A$|
|MID$(A$,IJ)||Returns J characters, starting with the Ith character, of string A$|
|POS(A$,B$)||Determines the POSition of string B$ in string A$ and returns the POSition number.|
These additional String Functions can be implemented by ATARI BASIC String Functions as shown below:
Statement Printed Results
10 DIM A$(10),B$(10)
20 A$="ABCDE":PRINT A$ ..... ABCDE
30 REM ** LEFT$ FUNCTION **
40 LET I=3
50 DIM X$(LEN(A$))
60 X$=A$(1,I):PRINT X$ ........ ABC
70 REM ** RIGHT$ FUNCTION **
85 PRINT X$ .................. CDE
90 REM ** MID$ FUNCTION **
100 LET J=2
110 X$=A$(I,I+J-1):PRINT X$ ... CD
120 REM * POS FUNCTION **
130 LET B$="DE":PRINT B$
140 FOR I=1 TO LEN(A$) ...... DE
150 IF A$(I,I+LEN(B$)-l)=B$ THEN 180
160 NEXT I
180 PRINT I .................... 4
Lines 50 and 60 implement the LEFT$ function, line 80 performs the RIGHT$ function, line 110 is the MID$ function, and lines 140-180 are the POS function. The above program is not as complex as it looks. All operations are based on previously discussed principles. This program provides a good test to see how much the reader has learned thus far.
As previously noted, the logical (or Boolean) operators NOT, AND, and OR cannot be applied directly to string variables. However, the following Relational Operators can be applied to string variables:
Relational Operator Explanation
< ....... less than
> ....... greater than
= ....... equal to
<= ....... less than or equal to
>= ....... greater than or equal to
<> ....... not equal to
When Relational Operators are applied to strings, the BASIC Interpreter converts the string’s characters to ATASCII decimal code numbers and then compares these numbers. Therefore, a character’s position in the ATASCII chart (Appendix C of the ATARI BASIC Reference Manual) will indicate its “relation” to any other character.
The execution of the following program will familiarize the reader with relational string comparisons.
10 DIM A$(20),B$(20)
30 PRINT "A$=";A$
40 PRINT "WHAT IS B$"
50 INPUT B$
60 PRINT :PRINT
70 PRINT "A$=";A$
80 PRINT "B$=";B$
90 IF A$<>B$ THEN PRINT "A$<>B$"
100 IF A$<B$ THEN PRINT "A$<>B$"
110 IF A$=B$ THEN PRINT "A$=B$"
120 IF A$>B$ THEN PRINT "A$>B$"
130 PRINT :PRINT :GOTO 30
Line 40 prompts the user to define string B$, and then lines 90-120 apply Relational Operators to strings A$ and B$. The result of the string comparisons are then printed. The user is encouraged to input various string characters for B$, such as; B$=“ABCD” and B$=“ABCDEF”.
Since the decimal codes for the ATASCII alphabet are in numerical order, the Relational Operators are useful for sorting names in alphabetical order. In the above program, if string is set equal to A$=“JONES”, the user can define (Input) string B$ to be various last names and verify the alphabetical sorting.
Any Relational Operator expression, A$=B$ for example, will return a value of 1 (if the expression is True) or a 0 (if the expression is False). This allows a Logical Operator to be applied to string comparison expressions. Consider the following:
10 DIM A$(10),B$(20)
30 IF NOT (A$=B$) THEN PRINT "A$<>B$"
40 IF NOT A$=B$ THEN PRINT "A$<>B$"
The result of the string comparison (A$=B$) in line 30 returns the value 0. Therefore, the expression NOT (A$=B$) is equal to 1 which initiates the print operation. Line 40 illustrates that the string comparison expression need not be contained in parentheses.
The Special Purpose Function ADR(String-name$) permits a programmer to ascertain, and control, where strings are stored in RAM memory. Consider the following statements:
Statement Printed Results
10 DIM A$(22),B$(10)
30 PRINT ADR(A$) ......... 2164
40 PRINT ADR(B$) ......... 2186
50 PRINT ADR(A$(3)) ...... 2166
Line 30 shows that the block of 22 memory locations, reserved for string A$, starts at location 2164. Line 40 shows that string B$ starts at memory location 2186. Note that: ADR(B$)-ADR(A$)= ‘dimensioned length’ of A$. The memory location of a single character of a string, ADR(A$(3)) for example, can be obtained by the use of subscripts, as shown in line 50. The use of subscripts with ADR is only legal if the string has been previously defined. Delete line 20 and execute the above program to verify this restriction. The BASIC Memory Management will change the memory locations of strings dependent on the number of statements in a program. Add the following statements to the above program and note the values for ADR(A$) and ADR(B$) when the program is executed:
The Special Function ADR is useful if the starting location of a string is needed in a USR (User) machine language subroutine.
There are occasions when the programmer must control the memory location of a string. This can be accomplished with a 'filler string'. Assume the string B$ must start at memory location 3000 and consider the following routine:
Line 10 establishes the memory location of the first dimensioned string A$. Line 30 establishes the dimensioned string A$. Line 30 establishes the dimensioned length of the filler string F$. The expression (3000-ADR(A$)-1) sets the length of F$ to 814, so that F$ starts at location 2186 and extends to location 2999. String memory locations are established in the order that the strings are dimensioned. So, in the above example, the filler string F$ has to be dimensioned prior to B$.
Although ATARI BASIC is not the most powerful BASIC available, it is
sufficiently flexible to implement all typical string operations. ATARI BASIC is
certainly more powerful that the ATARI BASIC Reference Manual indicates. Any
reader willing to experiment with string functions and operations, will readily
become proficient in programming string manipulations in BASIC.