The idea behind p-code is very elegant: it is an "ideal" assembly language to be run on a virtual microprocessor. Concretely, it means that it requires an interpreter, the p-machine emulator or PME, to emulate this virtual microprocessor. The big advantage is that you can write p-code "assembly" programs (or compile higher languages into p-code) and run them on many different machines. The language was optimized so as to be easy to emulate and to run smoothly on any machine. Apart from the TI-99/4A there were emulators written for the Z80, Z8000, 6502, 6509, Motorola 6800 and 68000 (Apple computers), HP-87, PDP-11, Intel 8080 and 8086 (PCs) , and GA440 microprocessors (and what on earth is the latter?).
This article is split in two pages. The present one discusses the p-system operation, another describes the TI-99/4A p-code card and accompanying software.
p-code is a fully relocatable language: all addressing is done relative to the current instruction or to the beginning of a segment. This allows for a very flexible operating system, that can load program segments in memory, move them as needed, reload them from disk at a different location, etc. The main drawback is that it is slower than real assembly language since it is interpreted. On the other hand, it is fairly compact and lends itself easily to compilation of high-level languages (mainly Pascal).
A typical p-system implementation consists in several layers:
The p-machine emulator (PME)
Run-time service package (RSP)
BIOS
Operating system
File formats
Registers
Stack
Procedure nesting
Instructions
_Calls
_Data access
_Stack manipulation
_Arithmetic & logic
_Comparisons
_Flow control
_Arrays and Strings
_Sets
_Multitasking
_Miscellaneous
Standard procedures
_Summary
_Unit-related
_String-related
_Code pool management
_Concurency procedures
_Miscellaneous procedures
_Compiler usage
Faults and execution errors
_Faults
_Execution errors
Data representation
_Words (byte sex)
_Long integers
_Real numbers
Every microprocessor contains registers, i.e. a small amount of on-chip memory. These can be general purpose or dedicated registers. For instance, the PC register on a TMS9900 microprocessor points to the next instruction to be executed. By contrast, general-purpose registers can be used by the programmer for any purpose he fancies.
The p-machine emulates only special-purpose registers. To perform any calculation you must use the memory stack:. push a number on the stack, push another number, perform an addition, and pop the result from the stack. This is a bit of a pain, unless you a are a fan of Hewlett-Packard calculators and their reverse-polish notation. On the other hand, it sure makes the p-machine easy to implement on any kind of system.
The dedicated registers are implemented in the host system memory and hold various pointers and data required for proper operation of the p-machine. They are normally used only by the p-machine emulator itself, however there are two p-codes that let you fool around with them: LPR and SPR.
IPC
This register contains a pointer to the next p-code to be executed.
SP
Points to the word at the top of the stack. This pointer changes
after
execution of almost any p-code whenever something is push on, or poped
off the stack.
CURPROC
This register contains the number of the current procedure (0-255)
in the current segment.
MP
Contains a pointer to the current activation record (MSCW).
This is required to access local variables that are allocated on the
stack
when a procedure is called. Global variables are accessed via BASE, or
indirectly via EREC.
CURTASK
Points to the current task information block (TIB).
READYQ
Points to the TIB at the top of the list of tasks ready to run.
EREC
Points to the current environment record (E_Rec),
which changes every time a call is made to a procedure in a different
segment.
The E_Rec contains many usefull values that may also be copied into a
dedicated
register. This makes the implementation of such registers optional
since
a given p-machine emulator (a.k.a. PME) can access them indirecly via
EREC.
EVEC
Points to the current environment vectors. This is an exemple of a
redundant register, since its content can be found in the E_Rec.
However,
it should be implemented on any PME.
IORESULT
Contains the completion code resulting from the last I/O operation
(0-255). This is the only register that can be accessed direcly by the
system, as it is located in the SYSCOM memory area.
BASE
Contains a pointer to the global variables. This register is
optional
since its value can be found via EREC.
SIB
Contains a pointer to the current segment information block.
Optional,
since such a pointer can be found in the E_Rec.
CURSEG
Points to the current segment in memory. Optional since such a
pointer
is found in the SIB.
As the p-machine virtual processor does not comprise any general-purpose register, all data manipulation must be performed on the stack. This is why basically any p-code affects the stack. For instance, to add two numbers you would push each of them on the stack (e.g. with LCB, load constant byte), add them (with the ADI p-code), and retrieve the result from the stack (here, using the SRO p-code to store it in global variable # 5):
<-SP
<-SP | 34 | <-SP
<-SP | 10 | | 10 | | 44 | <-SP
|//////| |//////| |//////| |//////| |//////|
LCB 10 LCB 34 ADI SRO 5
The stack is used by the p-machine to implement local variables when a procedure is entered. The caller must also place the procedure parameters on the stack prior to the call, and reserve room for the return value if the procedure is a function.
From the structure of the p-code language, you can tell it was invented by Pascal programmers: provision is made for procedures that are local to other procedures (i.e. nested into them) and a lot of p-codes are devoted to access these procedures or their variables. Let me illustrate this with an exemple:
segment SEG1
int VARG1
proc GLOB1
proc LOC1
int VAR1
end-of-LOC1
proc LOC2
end-of-LOC2
end-of-GLOB1
proc GLOB2
proc LOC3
int VAR3
proc LOC31
int VAR31
Proc LOC311
end-of-LOC311
<<you are here>>
end-of-LOC31
end-of-LOC3
proc LOC4
int VAR4
end-of-LOC4
end-of-GLOB2
end-of-SEG1
In this example, GLOB1 and GLOB2 are global procedures, VARG1 is a global variable. LOC1 and LOC2 are procedures local to GLOB1, whereas LOC3 and LOC4 are local to GLOB2. LOC31 is a procedure local to LOC3 and LOC311 is local to LOC31. Note that at each level, there can be variables (in this case, integers) local to the procedure.
Assume the program has reached the point indicated by <<you are here>>, in procedure LOC31. From there, you could:
Similarly you can access the following variables:
OK, I hope you got the picture, let's procede with p-code instructions.
A p-code (pseudo-opcode) is always 1 byte long, but it can be followed by one or more arguments. There are several types of arguments:
For your convenience I have arranged the p-codes in several families:
These p-codes are used to call a procedure. Before executing such an instruction, you should prepare the stack with the adequate parameters, as illustrated below. It is the responsability of the called procedure to pop each parameter from the stack and to leave the adequate result on it (if any). Note the practice of reserving space for the result before the first parameter. This generally will be one or two words, but may be upto four words, depending of the format used for real numbers.
< <
| MSCW | | MSCW |
< | data | | data |
| param | | param | ... | param | <
< ... | 0 | | 0 | | result| | result|
|///////| |///////| |///////| |///////| |///////|
prepare CXG 2,31 called proc RPU caller can
for call calls a proc does its job return use result
Within the program stream, each procedure entry point consists of two data words that immediately precede the first p-code to be executed. These are exit_ic that points to some (optional) exit code to be execute upon returning from the procedure and data_size that indicated how many bytes must be reserve on stack to store the procedure local variables. This space is marked as "data" in the above illustration.
You may have wondered what MSCW stands for. That's "Mark Stack Control Word", also known as "activation record". A mark stack consists in five words:
MSSTAT
MSDYN
MSIPC
MSENV
MSPROC
MSPROC hold the procedure number of the caller, MSIPC is a segment-relative pointer to the point of call, MSENV contains a pointer to the E_Rec of the caller (i.e. its segment). MSDYN points at the MSCW of the caller and MSSTAT points at the MSCW of the lexical parent (i.e. the procedure into which the current procedure is nested, if any). The latter will be needed to access variables that are not local to the procedure.
When a call is performed, the PME allocates space on the stack and initialises the MSCW. It sets IPC to point at the first p-code in the procedure, and update the CURPROC, EREC and EVEC registers. If there aren't 40 words free on the stack after data and MSCW have been reserved, a stack fault is issued that will be handled by the system.
There are several p-code calls, depending where the procedure is located:
CGP <UB>
Call Global Procedure. The procedure number in passed with UB, an
unsigned
data byte included in the code stream, just after the CGP p-code itself.
CLP <UB>
Call Local Procedure, i.e. a procedure nested into the calling
procedure.
Again, UB is the procedure number and follows CLP in memory.
CIP <DB><UB>
Call Intermediary Procedure, i.e. a procedure which is a lexical
parent
of the current one. DB is a value from 1 to 127 that indicates how many
nesting levels should be walked back to the target procedure, from MSCW
to MSCW. UB is the procedure number.
SCIPn <UB>
Where n is 1 or 2. This is a short form of CIP, that calls
procedures
<UB>, and sets the MSSTAT field of MSCW to the lexical parent
(for
SCIP1) or grandparent (for SCIP2) of the calling procedure. It is
shorter
than CIP since the nesting level is part of the opcode, so it uses two
bytes instead of three.
CXG <UB1>,<UB2>
Call eXternal Global procedure, same as CGP but targets another
segment.
UB1 is the segment number. If it's not in memory yet, a segment fault
is
issued so that the system has a chance to load the segment. UB2 is the
procedure number.
SCXGn <UB>
Where n is in the range 1 to 8. This is a short form of CXG that
calls
global procedure UB in the segment indicated by n.
CXL <UB1>,<UB2>
Call eXternal Local procedure. UB1 is the segment number, UB2 the
procedure
number.
CXI <UB1>,<DB>,<UB2>
Call eXternal Intermediary procedure. UB1 is the segment number,
UB2
the procedure number. DB is the number of nesting levels to be walked
backwards
through the MSCWs to find the procedure.
CFP <UB>
Call Formal Procedure. This allows you to code a procedure call "on
the fly" rather than including it in the code stream. Three data words
must be placed on the stack prior to execution:
|proc-num |
|e-rec-ptr|
|stat-link|
|/////////|
Proc_num is the number of the procedure to be called, it will be
placed
in the CURPROC register.
E_Rec_Ptr points to the environment record of the segment to be used.
Stat_link is the value to be placed in the MSSTAT field of the MSCW.
In other words, procedure <proc_num> in the segement indicated by <e_rec_ptr> is called. I'm not sure what's the use of <UB>, it may be a typo in the manual...
Load Static Link. DB indicates the number of static links to travel. A pointer to the MSCW that is DB static links above the current one (1=parent, 2=grandparent, etc) is placed on the stack. This comes handy to prepare the stack for a CFP instruction.
RPU
Return from ProcedUre. This is the instruction used to return from
any procedure call. The various fields in the MSCW are used to restore
the caller's environment:
Just as you can call global, local and intermediary procedures, you can access global, local or intermediary-level variables. You can also access global variables in another segment. For constants, you can either access constants located in the constant pool of the current segment or include a constant value whithin the code stream.
Global | Local | Intermediary | Extended | Constant | |
---|---|---|---|---|---|
Load address | LAO | LLA, SLLA | LDA | LAE | LCO |
Load value | LDO, SLDO | LDL, SLDL | LOD, SLOD | LDE | LDC, LDCRL LDCB, LDCI, SLDC, LDCN |
Store value | SRO | STL, SSTL | STR | STE |
Let's start with global variables:
LAO <B>
Load glObal Address. Places the address of the global variable with
offset B (in the global MSCW pointed by BASE) on the stack.
LDO <B>
LoaD glObal. Places the value of the global variable B on the stack.
SLDOn
Where n is a number from 1 to 16. This is a short form of LDO, used
to load one of the first 16 global variable. The number of the variable
is part of the p-code, and thus does not require and extra byte (or
two)
in the code stream.
SRO <B>
StoRe glObal. Stores the word currently on the stack into global
variable
B.
Now let's see local variables. They are created on stack when a
procedure
is entered, between the calling parameters and the MSCW. They are
specifed
by their offset with respect to the MSCW.
LLA <B>
Load Local Address. Places on the stack the address of the local
variable
with offset B in the data area of the current procedure (i.e. on the
stack,
below the MSCW).
SLLAn
Where n is a number from 1 to 8. This is a short form of LLA, that
loads the address of the first eight local variables. It is shorter
than
LLA because the number of the variable is part of the p-code instead of
requiring an extra byte (or two).
LDL <B>
LoaD Local. Places on the stack the value of the local variable
with
offset B below the current MSCW.
SLDLn
Where n is a number from 1 to 16. This is a short form of LDL, that
loads one of the first sixteen local variables.
STL <B>
STore Local. The word at the top of the stack is poped and placed
into
the variable with offset <B> below the current MSCW.
SSTLn
Where n is a number from 1 to 8. Short form of STL where the number
of the local variable to modify is part of the p-code.
Intermediary variables are local to a ancestor of the current procedure
and are therefore present on stack, below the MSCW of the ancestor.
They
are accessed by walking up the chain of MSSTAT words in the MSCW of the
respective procedures.
LDA <DB>,<B>
Load intermeDiary Address. Places on the stack the address of
variable
B below the MSCW found <DB> levels above the current one. DB=0
means
the current procedure (i.e. local variables), DB=1 means the lexical
parent
in which the current procedure is nested, etc
LOD <DB>,<B>
LOad intermeDiate. Places on the stack the value of the variable
with
offset B below the MSCW found <DB> levels above the current one .
SLODn <B>
Where n is 1 or 2. Short form of LOD to access variables in the
lexical
parent (SLOD1) or grandparent (SLOD2) of the current procedure.
STR <DB>,<B>
SToRe intermediate. DB indicates the number of static links to
travel
to find the proper MSCW (0=current, 1=parent, etc). The variable with
offset
<B> in the target procedure will receive the word currently on
top
of the stack.
It is also possible to access global variables in another segment.
LAE <UB>,<B>
Load Address Extended. The address of the global variable with
offset
B in segment UB is placed on the stack.
LDE <UB>,<B>
LoaD Extended. Places the value of global variable B in segment UB
on the stack.
STE
STore Extended. Pops the word on top of the stack and stores it in
global variable B in segment UB.
Constant values may be placed in a dedicated memory area called the
code
pool, or included within the p-code stream.
LCO <B>
Load Constant Offset. B is the offset of a constant in the constant
pool of the current segment (i.e. the constant "index"). The
instructions places the address of this constant on the stack, in the
form
of a segment-relative offset.
LDC <UB1>,<B>,<UB2>
LoaD Constant. B is the offset of a constant in the constant pool
of
the current segment. The instructions pushes UB2 words on the stack,
starting
from this offset. UB1 is the mode, if it is 2 and the current segment
is
of the opposite byte sex than the host processor, bytes will be swapped
upon copy.
LDCRL <B>
LoaD Constant ReaL. The real constant at offset B in the real
sub-pool
(a specialized portion of the constant pool) is loaded on stack.
Depending
on the format used by the code file, a real number uses up two or four
words on the stack.
With the following p-codes, the constant is part of the code:
LDCB <UB>
LoaD Constant Byte. Places the value <UB> on stack.
LDCI <W>
LoaD Constant Integer. Places the value W on stack.
LDCN
LoaD Constant Nil. Places the "nil" value on the stack. The
value depends on the microprocessor running the PME. With the TI-99/4A
it is >0000.
SLDCn
Short LoaD Constant. Where n is a number from 0 to 31. These
p-codes
place a number from 0 (SLDC0) to 31 (SLDC31) on the stack.
Now that we have placed variables on the stack, we can manipulate it.
DUP1
Duplicate 1 word. Pushes on top of the stack an extra copy of the
word
at top of stack.
DUPR
Duplicate real. Make an extra copy of the real at top of stack (two
or four words depending on the real number format).
SWAP
Swaps the two words at the top of the stack.
FLT
FLoaT. Replaces the integer at top of stack by the corresponding
real
number.
TNC
TruNCate real. Assuming the stack contains a real number, it will
be
replaced with the corresponding integer after truncation of the decimal
part. If the real is not in the range -32768 to 32767 a floating point
execution error is issued (and that's bad news!).
RND
RouND. Replaces the real number on top of the stack with the
corresponding
integer rounded to the nearest integer. If the number is not in the
range
-32768 to +32767, an execution error occurs.
LDB
LoaD Byte. Assuming a byte-pointer is at the top of the stack, it
is
replaced with the value of this byte (in the form of a word whose high
byte is >00).
STB
STore Byte. Assuming the stack contains a byte pointer topped with
a byte value (i.e. a word with value 0 to 255), this value is assigned
to the pointed byte. The stack is cleared afterwards.
IND <B>
INDex. Assuming the word at top of stack is the address of a
record,
it is replaced by the value found at this address plus offset
<B>.
IND 0 is thus the equivalent of LDB for words.
SINDn
Where n is a number from 0 to 7. This is a short from of IND, used
to access the first 8 words of a record.
INC <B>
INCrement. Assuming the word on top of the stack is an address, it
is indexed by B (i.e. B is added to it).
STO
STOre. Assuming the stack contains an address topped with an
integer
value, this value is stored at the indicated address. Both are
subsequently
removed from the stack.
LDRL
LoaD Real. Assuming the top of the stack contains the address of a
real number, it is replaced with the value found at this address (two
or
four words depending on the real number format).
STRL
STore ReaL. Assuming the stack contains an address topped with a
real
number, this number will be stored at the indicated address. Both the
number
and the address are then removed from the stack.
LDM <UB>
LoaD Multiple. Assuming the stack contains an address, it is
replaced
with a block of UB words found at this address. A stack fault is issued
if less than UB+20 words are available on the stack.
STM <UB>
STore Multiple. Assuming the stack contains an address, topped with
a block of UB words, these words will be stored at the indicated
address.
The stack pointer is then adjusted to remove the block and the address
from the stack.
LDP
LoaD Packed. Assuming a packed-field pointer is at the top of the
stack,
it is replaced with the value of the field it points at. This value
will
be in the form of a right-justified word, padded on the left with "0"
bits.
STP
STore Packed. Assuming the stack contains a packed-field pointer,
topped
with an integer value, the value it stored into the pointed field.
Value
and pointer are removed from the stack.
MOVe. The word on top of the stack is the address of a source block of words. The word below it is the address of the destination. B is the number of words to be copied from the source to the destination. If UB=0 the source is a variable, otherwise it is a constant and the word on top of the stack is its offset in the current segment. If UB=2 and the current segment has the opposite byte sex from the host processor, bytes will be swapped upon copy.
OK, now that we have data placed on stack, we can perform math on them. Let's begin with integers:
ABI
ABsolute value Integer. Takes the absolute value of the integer
found
on top of the stack. Exception: -32768 stays as it is (since +32768
cannot
be represented with 16 bits).
NGI
NeGate Integer. Negates the integer on top of the stack. -32768
remains
unchanged.
DECI
DECrement Integer. Decrements by one the integer on top of the
stack.
Wraps around from -32768 to +32767.
INCI
INCrement Integer. Increments by one the integer on top of the
stack.
Wraps around from +32767 to -32768.
ADI
ADd Integer. Adds up the two integers on top of the stack and
replaces
them with the result. Overflow and underflow wrap around to the
opposite
sign.
SBI
SuBstract Integer. Substract the integer on top of the stack from
the
integer below it. Overflow and underflow wrap around.
MPI
MultiPly Integer. Multiply the two integers on the top of the stack
and replace them with the result. The result is calculated as a 32-bit
number, but only the least significant 16 bits will appear on stack
(i.e.
overflows result in crazy values).
DVI
DiVide Integer. Divides the second integer from the top by the
integer
on the top of the stack, and replaces them with the result. Truncation
is always toward zero. Divisions by zero cause an execution error.
MODI
MODulo Integer. Adds or substract the word on top of the stack from
the word below it until the remainder is in the desired range. Replaces
both integers with the remainder. If the top word is zero an execution
error occurs, if it is negative the result is undefined but no error
occurs.
And now, let's do the same for real numbers. Remember, depending on the
adopted format, a real number can be two or four words long:
ABR
ABsolute Real. The real number on top of the stack is replaced with
it absolute value.
NGR
NeGate Real. Negates the real number on top of the stack.
ADR
ADd Real. Adds the two reals on top of stack and replaces them with
the result. Undeflow defaults to zero, overflow causes an execution
error.
SBR
SuBstract Real. Substract the real number on top of the stack from
the real number under it and replaces them with the result. Undeflow
defaults
to zero, overflow causes an execution error.
MPR
MultiPly Real. Multiplies the top reals on top of the stack and
replaces
them with the result. Undeflow defaults to zero, overflow causes an
execution
error.
DVR
DiVide Real. Divides the second real from the top by the real above
it and replaces them with the result. Undeflow defaults to zero,
overflow
causes an execution error.
Note that there is not p-code for modulo real. Let's conclude this
section
with a few logical operations:
LNOT
Inverts the word on top of the stack, bitwise (i.e. "0" bits
are replaced with "1" and conversely).
BNOT
Same as LNOT, but only the least significant bit is kept. Thus the
result will always be >0000 (false) or >0001 (true) no matter how
many bits were set in the word.
LAND
The two words on top of the stack are "anded" bitwise and
replaced with the result. Bitwise: 0 and 0 = 0, 0 and 1 = 0, 1 and 0 =
0, 1 and 1 = 1.
LOR
The two words on top of the stack are "ored" bitwised and
replaced with the result. Bitwise: 0 or 0 = 0, 0 or 1 = 1, 1 or 0 = 1,
1 or 1 = 1.
NB. Sadly enough, there is no p-code for XOR operations (exclusive or).
The following p-code are used to make comparison between values placed on the stack. The result of the comparison is generally a boolean word (i.e. >0001 for "true" and >0000 for "false) that will replace these values on the stack. The boolean can subsequently be used for a conditional jump (see next section).
EQUI
EQUal Integer. Places "true" on the stack if the two integers
on top of the stack are identical, "false" otherwise.
NEQI
Not EQual Integer. Places "false" on the stack if the two
integers on top of the stack are identical, "true" otherwise.
GEQI
Greater or EQual Integer. Places "true" on the stack if the
second integer from the top of the stack is greater than or equal to
the
integer on top of the stack. This is a signed comparison, i.e. >8002
(a negative number) is smaller than >7030.
LEQI
Less Than or Equal Integer. Places "true" on the stack if
the second integer from the top of the stack is smaller than or equal
to
the integer on top of the stack. This is a signed comparison, i.e.
>FFFF
(minus one) is smaller than >0001.
GEUSW
Greater or Equal UnSigned Word. Places "true" on the stack
if the second integer from the top of the stack is greater than or
equal
to the integer on top of the stack. This is an unsigned comparison,
i.e.
>8002 is greater than >7030.
LEUSW
Lower or Equal UnSigned Word. Places "true" on the stack
if the second integer from the top of the stack is smaller than or
equal
to the integer on top of the stack. This is an unsigned comparison,
i.e.
>FFFF is greater than >0001.
EQREAL
EQual REALs. Places "true" on the stack if the two reals
on top of the stack are identical.
GEREAL
Greater or Equal REAL. Places "true" on the stack if the
second real number from the top of the stack is greater than or equal
to
the real number on top of the stack. This is an signed comparison, by
definition.
LEREAL
Less than or Equal REAL. Places "true" on the stack if the
second real number from the top of the stack is lower than or equal to
the real number on top of the stack. This is an signed comparison, by
definition.
The following operations deal with array of bytes. The stack can
contain
either the address of an array or its offset whithin the segment. Extra
parameters in the code stream decide which it is.
EQBYT <UB1>,<UB2>,<B>
The two words on top of the stack should contain either the
addresses
of two byte arrays, or their offset in the current segment. UB1 and UB2
determine whether the address (UBx = 0) or the offset (UBx <> 0)
is used: UB1 refers to the word on top of the stack, UB2 to the word
below.
B is the size of the array. The arrays are compared byte per byte in
the
natural order. If they are identical "true" is placed on the
stack.
GEBYT <UB1>,<UB2>,<B>
Greater or Equal BYTe array. Same as above, but the comparison
stops
as soon as a mismatch is encountered. It returns "true" if the
mismatched byte in the "bottom" array (pointed by the second
word from the top of the stack) is greater than a byte in the "top"
array, or if both arrays are identical.
LEBYT <UB1>,<UB2>,<B>
Less or Equal BYTe array. Same as above, but the comparison returns
"true" if the first mismatched byte in the bottom array is lower
than the corresponding byte in the top array (or if there is no
mismatch).
The following p-codes deal with strings. A string is essentially the
same
than an array of bytes, excepts that it begins with a length byte. Thus
string comparisons begin with the second byte. If both strings are
identical
(upto the end of the shorter one), the size bytes are compared.
EQSTR <UB1>,<UB2>
EQual STRing. The two words on top of the stack should contain
either
the addresses of two strings, or their offset in the current segment.
UB1
and UB2 determine wether the address (UBx = 0) or the offset (UBx
<>
0) is used: UB1 refers to the word on top of the stack, UB2 to the word
below. The strings are compared byte per byte from the second byte to
the
end of the shorter string. If they are identical, their first bytes
(the
size) are compared. If these are identical, "true" is placed
on the stack.
GESTR <UB1>,<UB2>
Greater or Equal STRing. Same as above, but the comparison stops as
soon as a mismatch is encountered. It returns "true" if the mismatched
byte in the "bottom" string (pointed by the second word from
the top of the stack) is greater than a byte in the "top" string
or if both arrays are identical.
LESTR <UB1>,<UB2>
Less or Equal String. Same as above, but the comparison returns
"true"
if the first mismatched byte in the string is lower than the
corresponding
byte in the top string (or if there is no mismatch).
The following p-codes deal with packed sets, and don't ask me why they
are abbreviated PWR (packed words? power?). A set consists in upto 255
words to be accessed on a bitwise basis, preceded with a length word
(that
contains the number of words in the set).
EQPWR
The two sets on top of the stack are compared upto the end of the
shorter
one. "True" is pushed on the stack of no mismatch is found (even
if the sets are not the same size).
GEPWR
"True" is pushed on the stack if the set on top of the stack
is a subset of the set below is. Which means that all the "1"
bits in the top set are also "1" is the bottom set (I think).
LEPWR
"True" is pushed on the stack if the set on top of the stack
is a superset of the set below it, "false" is pused otherwise.
Now that we have placed a boolean on top of the stack with a comparison, we can use the following p-code to perform conditional jumps. The address where to jump to is expressed as an offset in bytes, relative to the next instructio. This nifty feature allows a p-code program to be shifted in memory during execution without messing up all the addresses.
TJP <SB>
True JumP. Jumps by SB bytes if the boolean word on top of the
stack
is "true" (i.e. its last bit is 1).
FJP <SB>
False JumP. Jumps by SB bytes (relative to the next instruction) if
the boolean word on top of the stack is "false" (i.e. its rightmost
bit is 0).
FJPL <W>
False JumP Long. Same as above, but the offset is expressed as a
word,
which means that you can jump futher than away than -128 to +127 bytes.
(And why on hearth is there no "true jump long" ?).
UJP <SB>
Unconditional JumP. Jumps by SB bytes (-128 to +127 relative to the
next instruction). The stack is ignored and not affected.
UJPL <W>
Unconditional JumP Long. Same as above, but in the ranger -32768 to
32767 bytes relative to the next instruction.
Alternatively, we can compare integers and perform the jump with a
single
instruction:
EFJ <SB>
Equal False Jump. Jumps by SB bytes (relative to the next
instruction)
if the two integers on top of the stack are different.
NFJ <SB>
Not equal False Jump. These guys like double negations! Jumps by SB
bytes if it is false that the two words on top of the stack are not
equal.
In layman's terms this means that the jump if taken if the words are
identical.
Here is a more sophisticated form of jump:
XJP <B>
case JumP (what does the X stand for in "case"?). B is the
offset of a jump table in the constant pool. The integer on top of the
stack should be an index whithin that table. The corresponding word in
the table is used as a jump offset, relative to the next instruction. A
jump table begins with two words that indicate the minimum and the
maximum
index, if the integer on top of the stack is not in this range, no jump
is taken.
Example of a jump table created by the Pascal instruction:
case I of >0002 Minimum index (table begins at 2)
2: xxx ; >0005 Maximum index (table ends at 5)
5: xxx ; >FFFA Jump offset for I=2
3:xxx; >FFFE Jump offset for I=3
end; >0000 No jump for 4 (as it does not exist)
>FFFC Jump offset for I=5
Two more p-codes that can fall in the category of flow controlling
instructions:
BPT
BreakPoinT. A breakpoint execution error (number 16) is issued
unconditionally.
This will result in entering the debugger, if it is running. NB The
debugger
is not implemented on the TI-99/4A (rats!).
CHK
CHecK subrange bounds. The stack contains three integers, from top
to bottom: the upper limit, the lower limit and the test value. If the
test value does not fall whithing the specified range (as judged by
signed
comparisons) a value range execution error is issued. After the
instruction,
the two limits are removed but the test value remains on the stack.
Examples:
44 32 44
12 -21 12
18 0 1233
OK OK Error
Arrays can contain elements of any size, but all elements in the array must have the same size. If the size is not a multiple of a byte (e.g. if it is 10 bits, or 3 bits) elements can be squeezed together, regardless of byte boundaries. This is called a packed array, just like in Pascal.
A string is an array of characters terminated with a nul byte (>00).
Assign STRing. Copies the string whose address is at the top of the stack, into another string whose address is just below on the stack. UB2 is the size of the string. UB1 determines whether the source string is a variable (UB1=0) or a constant (in which case the word at top of stack is an offset in the constant pool). If the source string happens to be greater than UB2 bytes, a "string overflow" execution error occurs.
Copy Array Parameters. Pretty much like CSP, but the parameter copied is a packed array of characters (i.e. is not necessarily nul-terminated). Thus, there can be no check for size, and <B> characters are always copied to the buffer.
Copy String Parameter. This instruction is usefull to copy a function parameter into a local variable:
MyFunction( S1: string);
{
var S2: string;
}
For obvious reasons, string parameters are passed by pointers. The word on top of the stacks is the address of a two-word parameter descriptor. If the bottom word of the descriptor is NIL, the second word is the address of the string parameter (S1 in the above example). Alternatively, the bottom word can be a pointer to an E_Rec (environment record), in which case the second word is the offset of the string in the segment corresponding to this E_Rec. If this segment is not resident, a segment fault occurs.
The second word from the top of the stack is the address of a local string buffer (S2 in the exemple) into which the string parameter will be copied. UB indicates the size of this buffer. If the parameter string happens to be larger, a "string overflow" execution error occurs.
Check STRing index. The integer on top of the stack is an index inside a string whose address is in the word below it. If this index is less than 1, or greater than the dynamic length of the string, a "value range" execution error occurs.
IndeX Array. The integer on top of the stack in an index inside an array whose address is in the word below it. B is the size in words of an array element. These two words are removed from the stack and replaced with the address of the corresponding element. The algorithm is: Result = Address + (Index * <B> * atype), where atype is 1 for word-addressed computers and 2 for byte-addressed computers.
IndeX Packed array. The integer on top of the stack in an index
inside
a packed array (i.e. an array of bit-fields) whose address is in the
word
below it. UB1 is the number of elements per word, UB2 is the size in
bits
of an element . These two words are removed from the stack and replaced
with the corresponding element.
A set consits in 0 to 255 words of bit flags, preceded by a word that indicates the total number of words in the set. Bits are numbered from 0, starting with the rightmost bit of the first word. A bit set to "1" indicates that the corresponding number is present in the set, a "0" that it is not a member of the set.
Example:
The set [0,3,5,7] is represented as:
00000000 00000001 00000000 10101001
Size in words (=1) Bits: 7 5 3 0
The set on top of the stack is stripped of its length word, then expanded or compressed until is UB words in length. Compression removes the high words of the set (even if they do not contain zero!), expansion adds zero-words on top of the stack.
If, after the operation, less than 20 words remain free for the stack to grow, a stack fault is issued.
SubRangeSet. Two integers in the range 0 to 4097 should be on top of the stack (if not, a "value range" execution error occurs). If the top word is the smallest, the empty set is placed on the stack. Otherwise, a set is placed on the stack in which all bits between these two integers (included) are set as "1".
If less than 20 words remain available on the stack after execution, a stack fault is issued.
INcluded iN. Checks for set membership. The word on top of the stack is the index of a bit in the set below it. If this bit is "1", the stack will contain "true" after the operation, otherwise it will contain "false". Note that bit numbering starts from zero, with the rightmost bit of a word.
DIFference. Removes the two sets on top of the stack and replaces them a set corresponding to their bitwise difference. The difference is computed as: Set1 AND NOT Set2 ( 0 DIF 0 = 0, 0 DIF 1 = 0, 1 DIF 0 = 1, 1 DIF 1= 0).
INTersection. The intersection of the two sets on top of the stack is computed as Set1 AND Set2 and the resulting set replaces these two on the stack.
UNIon. The two sets on top of the stack are replaced with their bitwise union. The union is computed as: Set1 OR Set2.
Waits for a semaphore. The word on top of the stack is the address of this semaphore. If the semaphore value is greater than zero it is decremented and execution continues. If it is zero, the current tasks must wait for the semaphore to become available. To this end, the current TIB is placed in the semaphore's queue (and Hang_Ptr in the TIB will point to the semaphore). Then the p-system switches to another task.
Signals a semaphore. The word on top of the stack is the address of this semaphore. If the semaphore is less than zero (or if its queue is empty) it is incremented and execution continues. Otherwise, the first task waiting for this semaphore is removed from the semaphore queue and placed in the queue of tasks ready to run (and the Hang_Ptr in its TIB is set to NIL). If this task has a higher priority than the current one, the p-system switches to it.
These do not fit in any of the above categories.
Load Processor Register. The word on top of the stack is a register number. It will be replaced on the stack by the content of this register. The register numbers are:
-3: READYQ
-2: EVEC
-1: CURTASK
0: Wait-Q
1: Prior | Flags
2: SP_Low
3: SP_Up
4: SP
5: MP
6: (reserved)
7: IPC
8: Env
9: Proc_Num | TIB_IOresult
10: Hang_Ptr
11: M_Depend
Store Processor Registers. The word on top of the stack is stored in the register whose number is in the word below it. See LPR for the register numbers. If the number is 0 or greater, all registers are first saved in the TIB, then the adequate value is placed in the TIB, finally all registers are restored from the TIB.
Enter NATive code. Starts executing assembly language right after this p-code. It may be necessary to increment IPC to a word boundary on word-oriented processors (such as the TMS9900).
Same as NAT, but IPC will be incremented by B before entering assembly language (i.e. it acts as a jump). If B=0, the first execution begins with the byte right after B in the code stream. This allows place infomation for the native-code generators uptream of a stretch of assembly language.
No OPeration. Execution continues.
Where n is a number from 1 to 6. These p-codes are reserved for internal used by compilers. Trying to execute them generates an "unimplemented" execution error.
A number of standard procedures have been included whithin the PME, either for speed, or because they contain some assembly language. These procedures should be called with CXG or SCXG1. Most of them expect parameters on the stack, and some return a result on the stack. In any case, the initial parameters will be removed from the stack by the procedure itself. The standard procedures differ from user-written procedures in that they do not return with RPU. In a sense, they can be viewed as extra p-codes...
For consistency I listed them in their Pascal form, but you can easily convert the declarations into a stack description. For instance, with the following procedure
DUMMY(UNIT: integer, PTR:addr): Boolean
the stack would look like this
______ <
| addr |
| int | <
| 0 | | bool |
|//////| |//////|
Before After
To call a procedure from assembly or p-code (when using a p-code
assembler),
you should push zero on the stack if the function is returning a value,
then push the parameters in the order they appear in the Pascal
declaration,
and finally call the procedure. When it returns, make sure you pop the
return value from the stack, if there is one.
Number | Name | Parameters | Returns |
---|---|---|---|
4 | RELOCSEG | EREC: E_Rec | - |
14 | MOVESEG | SIB: addr POOL: addr OFFSET: integer |
- |
15 | MOVELEFT | SOURCE: byte_ptr DEST: byte_ptr LEN: integer |
- |
16 | MOVERIGHT | SOURCE: byte_ptr DEST: byte_ptr LEN: integer |
- |
18 | UNITREAD | UNIT: integer BUFFER: byte_ptr LEN: integer BLOCK: integer CTRL: integer |
- |
19 | UNITWRITE | UNIT: integer BUFFER: byte_ptr LEN: integer BLOCK: integer CTRL: integer |
- |
20 | TIME | HIWORD: addr LOWORD: addr |
- |
21 | FILLCHAR | DESTR: byte_ptr LEN: byte_ptr CHAR: integer |
- |
22 | SCAN | LEN: integer EXP: Boolean CHAR: byte SOURCE :byte_ptr MASK: word |
integer |
23 | IOCHECK | - | - |
24 | GETPOOLBYTES | DEST: addr POOL: addr OFFSET: integer NBYTES: integer |
- |
25 | PUTPOOLBYTES | SOURCE: addr POOL: addr OFFSET: integer NBYTES: integer |
- |
26 | FLIPSEGBYTES | EREC: addr OFFSET: integer NWORDS: integer |
- |
27 | QUIET | - | - |
28 | ENABLE | - | - |
29 | ATTACH | SEM: addr VECTOR: integer |
- |
30 | IORESULT | - | integer |
31 | UNITBUSY | UNIT: integer | Boolean |
32 | POWEROFTEN | POWER: integer | real |
33 | UNIWAIT | UNIT: integer | - |
34 | UNITCLEAR | UNIT: integer | - |
36 | UNITSTATUS | UNIT: integer STAT_REC: addr CTRL: integer |
- |
37 | IDSEARCH | SYMREC: addr BUFFER: addr |
integer |
38 | TREESEARCH | ROOT: addr FOUNDP: addr TARGET: addr |
integer |
39 | READSEG | EREC: addr | integer |
All these procedures place status information in IORESULT, upon return. See below for details.
UNIT is the unit number. The procedure initializes the unit and returns its status into the p-machine register IORESULT.
UNITSTATUS(UNIT: integer; STAT_REC:addr;
CTRL:integer)
UNIT is the unit number.
STAT_REC is the address of a status record.
If CTRL is 0 status is written, if CTRL is 1 status is read.
UNITREAD(UNIT:integer; BUFFER:byte_ptr;
LEN:integer;
BLOCK:integer; CTRL:integer)
This procedures reads a number of bytes from a unit into a buffer.
UNIT is the unit number
BUFFER is a pointer to the destination buffer
LEN is the numbe of bytes to read
BLOCK is the number of the desired data block within the unit
CTRL selects the input mode (see BIOS for details)
UNITWRITE(UNIT:integer; BUFFER:byte_ptr;
LEN:integer;
BLOCK:integer; CTRL:integer)
This procedures writes a number of bytes from a buffer into a unit.
UNIT is the unit number
BUFFER is a pointer to the source buffer
LEN is the numbe of bytes to write
BLOCK is the number of the target data block within the unit
CTRL selects the input mode (see BIOS for details)
This procedure waits until all I/O on this unit is completed.
UNITBUSY(UNIT:integer) :boolean
This procedure returns TRUE if there are any pending I/O operations in the unit, and FALSE otherwise.
This procedure returns the value of the IORESULT register.
This procedure verifies that the IORESULT register contains zero. If not, an I/O execution error is issued.
MOVELEFT(SOURCE:byte_ptr; DEST:byte_ptr; LEN:integer)
This procedure moves a number of bytes from the source string to the destination string, starting from the left (low-order) byte. If LEN is 0 or less, nothing happens.
MOVERIGHT(SOURCE:byte_ptr; DEST:byte_ptr;
LEN:integer)
This procedures moves a number of bytes from the source string to the destination string, starting from the right (high order) byte. If LEN is 0 or less, nothing happens.
FILLCHAR(DEST:byte_ptr; LEN:byte_ptr;
CHAR:integer)
The destination string will be filled with the LEN copies of character CHAR. If LEN is 0 or less, nothing happens.
SCAN(LEN:integer; EXP:Boolean; CHAR:byte;
SOURCE:byte_ptr;
MASK:word): integer
This procedure looks for a character in a string.
If EXP is 0, the procedure scans the string SOURCE for LEN bytes,
until
it finds the byte CHAR.
If EXP is 1, the procedure scans SOURCE for a byte different from CHAR.
LEN is the maximum number of bytes to scan. If it is negative, scanning
proceeds from right to left.
MASK is not used.
The function returns the offset of CHAR within SOURCE (or LEN if no match was found).
Relocates the segment pointed at by the environment record EREC. All types of relocations are performed.
MOVESEG(SIB:addr; POOL:addr; OFFSET:integer)
SIB is a pointer to a segment information block that should contain
the address where the segment should be moved.
OFFSET is the offset whithin the code pool of the segment to be moved.
POOL points to a code pool descriptor, the first two words of which are
a pointer to the base of the code pool.
The segment found at the specified offset whithin the code pool will be moved at the location specified in the SIB and relocated. Only segment-relative relocation is performed (as other types are not affected by the move).
GETPOOLBYTES(DEST:addr; POOL:addr; OFFSET:integer; NBYTES:integer)
This procedure copies a number of bytes from the code pool into a buffer.
DEST is the address of the destination buffer.
POOL is the address of a code pool descriptor, the first two words of
which
point to base of the pool.
OFFSET is the offset of the first byte to transfer, relative to the
base
of the pool.
NBYTES is the number of bytes to transfer.
PUTPOOLBYTES(DEST:addr; POOL:addr; OFFSET:integer; NBYTES:integer)
This procedures copies NBYTES bytes from buffer SOURCE into a code pool described in POOL, starting at offset OFFSET. It's the opposite of the above.
FLIPSEGBYTES(addr:EREC, int:OFFSET, int:NWORDS:int)
This procedures flips the bytes within a number of words of a segment.
EREC is the address of an environment record describing the segment.
OFFSET is the word offset where byte flipping should start.
NWORDS is the number of words whose bytes should be flipped.
This procedures reads a segment into memory.
EREC is the address of an environment record describing the segment. Inside the E_Rec is a pointer to the SIB (segment information block) that contains the address where the segment should be read.
The procedure returns the contents of the IORESULT register.
ATTACH(SEM:addr, VECTOR:integer)
This procedure associates a semaphore with a p-machine event. The semaphore will be signaled each time the desired event occurs.
SEM is the address of a semaphore. If it is NIL, the event be
detached
from any existing semaphore.
VECTOR is the number of a p-machine event vector. It should be in the
range
0 to 63, otherwise nothing happens. I have no idea what the events
are...
This procedure disables all p-machine events such that no attached semaphore is signalled until ENABLE is called.
This procedure re-enables the p-machine events that were disabled by QUIET.
TIME(HIWORD:addr; LOWORD:addr)
Saves the value of the p-system clock into the indicated words. This clock is a 32-bit number, incremented 60 times per seconds.
POWEROFTEN(POWER:integer): real
Returns a real number equals to 10 to the power of POWER. If POWER is negative or too big, a floating point execution error is issued.
These two procedures are used by the Pascal compiler. They have nothing to do in the PME, and were only placed here to speed up compilation.
TREESEARCH(ROOT:addr; FOUNDP:addr; TARGET:addr): integer
This procedure searches the symbol table tree for a target string. Each tree node is expected to have the following structure:
RECORD
Name: PACKED ARRAY[0..7] of CHAR;
Right_link: pointer;
Left_link: pointer;
END
ROOT is the address of the (sub)tree in the symbol table.
TARGET is the address of the string to look for, a packed array of 8
characters.
FOUNDP is the address where to put the result of the search, i.e. where
the string was found.
If the target string was found, the function returns 0 and places a
pointer to the leaf node that contains it into FOUNDP.
If the target was not found, FOUNP will point to the last node that was
scanned and the function return 1 of the target should be placed to the
right of this node, -1 if it should go to the left.
IDSEARCH(SYMREC:addr; BUFFER:addr)
This procedure scans a line of text for a Pascal keyword.
BUFFER is the address of the buffer to scan
SYMREC is the address of a record with the following structure:
RECORD
SYMCURSOR: integer;
SY: integer;
OP: integer;
ID: PACKED ARRAY[0..7] OF CHARS;
END
The procedure scans BUFFER, starting at position SYMCURSOR, looking for an identifier. An identifier is a string containing only letters, digits and underscore, that begins with a letter. Upon return the first 8 characters of the identifier will be placed in ID, using padding spaces if needed, and SYMCURSOR will be updated to point after the identifier. Finally, the function looks up a table of reserved words and placed the corresponding values in SY and OP.
Here are the reserved words recognised by the procedure, and the return values. The table is laid out in two columns for convenience.
ID | SY | OP | ID | SY | OP |
---|---|---|---|---|---|
AND | 39 | 2 | NOT | 38 | 15 |
ARRAY | 44 | 15 | OF | 11 | 15 |
BEGIN | 19 | 15 | OR | 40 | 7 |
CASE | 21 | 15 | PACKED | 43 | 15 |
CONST | 28 | 15 | PROCEDUR | 31 | 15 |
DIV | 39 | 3 | PROCESS | 56 | 15 |
DO | 6 | 15 | PROGRAM | 33 | 15 |
DOWNTO | 8 | 15 | REPEAT | 22 | 15 |
ELSE | 3 | 15 | RECORD | 45 | 15 |
END | 9 | 15 | SET | 42 | 15 |
EXTERNAL | 53 | 15 | SEGMENT | 33 | 15 |
FILE | 46 | 15 | SEPARATE | 54 | 15 |
FOR | 2 | 15 | THEN | 12 | 15 |
FORWARD | 34 | 15 | TO | 7 | 15 |
FUNCTION | 32 | 15 | TYPE | 29 | 15 |
GOTO | 26 | 15 | UNIT | 50 | 15 |
IF | 20 | 15 | UNTIL | 10 | 15 |
IMPLEMENT | 52 | 15 | USES | 49 | 15 |
IN | 41 | 14 | VAR | 30 | 15 |
INTERFAC | 51 | 15 | WHILE | 23 | 15 |
LABEL | 27 | 15 | WITH | 25 | 15 |
MOD | 39 | 4 | unknown | 0 | 15 |
Two types of problems can occur while executing p-codes: faults and execution errors. A fault is a benigne condition, that requires the assistance of the p-system to be fixed. After this is done, execution can continue. By contrast, an execution error is generally fatal and will result in terminating the program that caused it and reseting the p-system.
The PME only generates two types of faults: segment faults and stack faults.
Segment faults occurs when a p-code tries to access a segment that is not (yet) in memory. These p-codes are: CAP, CSP, CXL, SCXGn, CXG, CXI, CFP, RPU, SIGNAL and WAIT. The p-machine signals the semaphore Real_Sem to allert the p-system and the fault condition is handled by the high priority task Fault_Handler, which is part of the KERNEL unit. Typically, this will result in loading the required segment into memory.
Stack faults occurs when there is not enough room on the stack. Only p-codes that place multiple words on the stack check it for room (except for real numbers). These p-codes are: ADJ, CFP, CGP, SCIPn, CIP, CLP, CXL, SCXGn, CXG, LDC, and LDM. Here, Fault_Handler will try to compact the memory so as to free some more room for the stack.
Before signalling Real_Sem, the PME saves some values into the SYSCOM area for Fault_Handler to use. These are:
Offset Name Usage
14 Real_Sem Semaphore that Fault_Handler is waiting for
18 Fault_TIB Task info block of the faulting task
20 Fault_EREC E_Rec of the segment to be loaded (current E_rec for stack faults)
22 Fault_Words Number of words needed on stack (0 with segment faults)
24 Fault_Type >80 = PME segment fault, >81 = PME stack fault
Once the problem has been taken care of, control is returned to the p-machine emulator and the p-code that caused the fault is executed again.
When an execution error occurs, the PME calls the procedure Exec_Error, which is also part of the Kernel. This call is performed with CXG 1,2 after placing two zeros on the stack, followed by the error number.
Normally, Exec_Error will not return but rather reset the p-system, although it is possible to cause it to return (for instance from a debugger program).
The possible execution errors are:
Number | Name | Culprit | Description |
---|---|---|---|
1 | Value range error | CHK, CSTR | Index out of array bounds |
2 | No proc in seg table | CLP, CFP, CGP, SCIPn, CIP CXL, SCXGn, CXG, CXI |
Call to a proc whose dictionary entry is 0 (probably because it was't linked). |
5 | Integer overflow | Long integer routines | Long integer cannot be converted to int, or too large to be poped from stack. |
6 | Divide by Zero | DVI, MODI, DVR, long integer | Division by 0. |
8 | Interrupted by user | You! | <Break> key pressed. |
10 | I/O error | IOCHECK | IORESULT was not 0. |
11 | Unimplemented | Any unimplemented p-code | Illegal or unknown p-code. |
12 | Floating point error | LDCRL, LDRL, STRL, FLT, TNC, RND, ASBR, NGR, ADR, SBR, MPR, DVR, EQREAL, LEREAL, GEREAL, POWEROFTEN |
Floating point overflow (i.e. result cannot be expressed as a float). |
13 | String overflow | CSP, ASTR, long integer routines | Destination too small to hold the string. |
16 | Break point | BPT | This will enter the Debugger, if any. |
18 | Set too large | SRS | Set larger than maximum legal size. |
19 | Segment too large | READSEG | Segment too large to be read (this should never happen). |
The p-system was designed to run on any computer system, provided the proper PME is written. However, different computers handle numbers in a different way. Everybody agrees on the definition of a byte, but things become more complicated when it comes to larger numbers.
Almost everybody agrees that a word is a 16-bit value, i.e. is made of two bytes. However, there are two conventions on how to order the bytes within a word. For instance, suppose you want to place the number >1234 in memory, at address >A000-A001. There are two ways to do it:
+-----+ +-----+
| >34 | | >12 |
+-----+ >A001 +-----+ >A001
| >12 | | >34 |
+-----+ >A000 +-----+ >A000
Big-endian Little-endian
TI, Motorola Intel, PCs
With the first convention, the bytes are stored in "natural" order, i.e. the most significant byte comes first, when moving ahead in memory. With the second, the most significant byte goes into the highest address. TI uses the first convention, so does Motorola (and consequently Apple computers), whereas Intel uses the second and so do all PC clones.
As p-code is to be ported on any machine, a considerable effort was put into it so that it is independent of this so-called "byte sex". Since all p-codes are bytes, that part was easy. But trouble began with constants and addresses, which are word-oriented. The following solution was adopted: 1) The p-system always uses the byte sex of the machine it runs onto. 2) Each segment in a code file contains a flag indicating the byte sex used by the compiler that created it. If it does not match the current byte sex, the PME will flip bytes in the relevant portions of this segment.
Again, the definition of a long integer can vary from compiler to compiler. Therefore, in the PME, integers can have any size from 2 to 10 words (thats 160 bits, about 36 decimal digits!). When it's on the stack, an integer is always preceded by a length word, i.e. a number from 2 to 10, indicating the number of words that follow.
+-----+
|>0003| Size word
+-----+
|>1234|
+-----+
|>6635| 3-word Long
+-----+
|>8701|
+-----+
|/////|
However, when a long integer is assigned to a variable, or stored on disk (e.g. as a constant embedded in a program), the length word is stripped and the long integer is forced into the default "long" size on the current processor. If this conversion is not possible, an "integer overflow" execution error is issued.
To avoid that kind of problem as much as possible, the routine CVT should be used to incorporate long constants into a program. It will convert an integer (or any combination thereof) into a long integer constant of the size required by the target processor. Thefore the code file will not contain any long integer, which makes it fully portable. Long integers will be created on the fly, when the program is run on the target machine.
Examples:
To code Use
12 CVT(12)
225543 CVT(22554)*CVT(10) + CVT(3)
-873342 -(CVT(8733)*CVT(1000) + CVT(442))
As you can see, all the parameters of CVT are integers (-32768 to +32767) and can be coded as words. CVT does the dirty work in converting them to a long integer of the appropriate size.
The CVT routine actually calls a package of routines that perform long integer math. These are part of a Pascal unit called LONGOPS, that contains three procedures:
FREADDEC reads a long integer
FWRITEDEC writes a long integer
DECOPS performs several math operations.
A peculiarity of DECOPS is that it takes a variable number of parameters, depending on the operation to be performed. This is not correct Pascal, but it's sure more convenient this way! This is possible because DECOPS is actually an assembly routine, embedded into the Pascal unit. When calling DECOPS, the required parameters are pushed on the stack, followed by OP, the number of the operation to be executed (and by the return address, obviously).
Valid DECOPS operations are:
OP | Name | Parameters | Return | Description |
---|---|---|---|---|
0 | Adjust | SIZE: word LINT: long |
fixedlong | Strips the size word of LINT and converts it into SIZE words. If not possible: causes integer overflow. |
2 | Add | LINT1: long LINT2: long |
long | Returns LINT1 + LINT2. Can cause overflow if the result is more than 10 words. |
4 | Substract | LINT1: long LINT2: long |
long | Returns LINT1 - LINT2. Can cause overflow if the result is more than 10 words. |
6 | Negate | LINT: long | long | Returns -LINT |
8 | Multiply | LINT1: long LINT2: long |
long | Returns LINT1 * LINT2. Can cause overflow if the result is more than 10 words. |
10 | Divide | LINT1: long LINT2: long |
long | Returns LINT1 / LINT2. Can cause overflow or divide-by-zero errors. |
12 | Long to String | SIZE: word ADDR: word LINT: long |
- | Converts LINT into a string and places it at address ADDR. SIZE is the maximum number of characters in the string, if it is exceeded a string-overflow error occurs. |
14 | TOS-1 to Long | TOS: long INT: word |
TOS: long RESULT: long |
Converts integer INT into a long integer leaving another long intact on the top of the stack. |
16 | Compare | TYPE: word LINT1: long LINT2: long |
boolean | If TYPE = 0, returns LINT1 < LINT2 If TYPE = 1, returns LINT1 <= LINT2 If TYPE = 2, returns LINT1 >= LINT2 If TYPE = 3, returns LINT1 > LINT2 If TYPE = 4, returns LINT1 <> LINT2 If TYPE = 5, returns LINT1 = LINT2 |
18 | Int to Long | INT: word | long | Converts integer INT into a long integer. |
20 | Long to Int | LINT: long | word | Converts long integer LINT into an integer. Can cause overflow if the result doesn't fit in 16 bits. |
Encoding real numbers in floating point format so they can be ahndled by a digital computer is quite a challenge. Not surprisingly, different people came up with widely different schemes. (Texas Instruments did a pretty good job with its 8-byte radix 100 format). But p-code is supposed to run on any machine, so how can it standardize real numbers?
Well, actually it doesn't! Real numbers calculations are always performed in the format used by the machine on which the program runs. The appropriate floating point math routines are coded inside the PME.
However, there is still a problem when it comes to program files as there may be a need to include real constants within a program. To solve this problem, the p-system defines two "canonical" formats for real constants. Whithin a program file, all real-number constants are grouped together in a so-called "constant sub-pool". When loading the program into memory, the p-system converts canonical real numbers into the format required by the target computer.
For some reasons, there are two canonical formats in p-code: 3-words reals, and 3-to-6-words reals. They are unfortunately incompatible and all compilation units in a p-code program must use the same format. There is a word in each segment that indicates the real size being used by this segment.
The 3-words format is quite simple:
The second format, is just an extension of the above one: extra words can be added upto a total of 5 mantissa words, to accomodate more decimal figures. Just as above, a negative number in words 3 to 5 serves as a terminator and all the following words will be omited. The decimal point is always assumed to be at the end of the last word in use, and the digits in this word will be left justified.
Examples:
Real number Exponent Mantissa
1234.0 0 1234 -1
-1234.0 0 -1234 -1
12345678.0 0 1234 5678
12345678.0 0 1234 5678 -1
1.234 -3 1234 -1
12.0 -2 1200 -1
12345678.2233 -4 1234 5678 2233 -1
111122223333.4444555 -8 1111 2222 3333 44444 5550
1.0 E+70 73 1000 -1
Note the two different ways to code number 12345678: the first one is the 3-words format, the next is the 3-to-6 words format (in this case, it's 4-word long).
RSP stands for Run-time Service Package. It is the part of the PME that does not deal directly with interpreting p-codes, but rather handles various run-time chores. An important part of it is the RSP/IO that deals with I/O operations and provides a standardized interface with the underlying BIOS.
The RSP/IO implements the concept of "device units". A device is a peripheral that should be accessed in a standard manner, through a few predefined operations. Note how this resembles the concept of DSRs used by TI.
Devices
Procedures
_Ioresult
_Unitread
_Unitwrite
_Unitclear
_Unitstatus
_Unitbusy
_Unitwait
Special characters
Subsidiary volumes
Each device is refered in the p-system with a unique number, from 0 to 255. Some device numbers are pre-assigned by the system:
0 (reserved)
1: CONSOLE
2: SYSTERM
3: (reserved)
4: DSK1
5: DSK2
6: PRINTER
7: REMIN
8: REMOUT
9: DSK3
10: DSK4
11: DSK5
12: DSK6
13-127: additional disks, subsidiary volumes, user-defined serial devices
128-255: user-defined devices.
Extra units defined in the TI-99/4A p-system implementation:
14: OS Card GROMs, contain the system files for the main menu.
31: TAPE Cassette tape recorder.
32: TP Thermal printer.
Most of the procedures describes below will cause a completion code to be placed in the SYSCOM area, which is an area of memory shared by the p-system and the PME. You can retrieve this completion code with the procedure IORESULT. The codes are the following:
0: No error
1: Bad block, CRC error
2: Bad device number
3: Illegal I/O request
4: Data-com timeout
5: Volume went off-line
6: File is no longer in directory
7: Illegal filename
8: No room on volume
9: Volume not found
10: File not found
11: Duplicate file
12: Attempt to open an already opened file
13: Attempt to close a file that isn't open
14: Bad format on reading a real or an integer
15: Ring buffer overflow
16: Disk is write-protected
17: Illegal block number
18: Illegal buffer address
19: Bad text file size
20-127: (reserved)
This procedures reads a given number of bytes from a device into a memory buffer.
In Pascal, its declaration would look like this:
PROCEDURE UNITREAD(UNITNUMBER: INTEGER;
VAR DATA_AREA: PACKED ARRAY[1...NBYTES-1] OF [0...255];
NBYTES: INTEGER;
[LBLOCK: INTEGER;]
[CTRL: INTEGER;] );
Ok, this is not correct syntax, since Pascal does not allow optional parameters, nor variable-length arrays, but you get the idea. The parameters are the following:
UNITNUMBER: the number of the device unit as defined
above.
DATA_AREA: A pointer to the buffer where data is to be
stored. This buffer is an array of bytes of length NBYTES.
The
pointer is actually made up of two words: a word base and a byte
offset.
Byte-oriented machines just add the two numbers to get the buffer
address.
Word-oriented machines compute the buffer address by indexing byte-wise
from the word base.
NBYTES: Number of bytes to transfer.
LBLOCK: Logical block. This is an optional parameter
used
with disk devices. Each disk is considered as an array of logical
blocks
each 512 bytes in length (If this is not the real format of the disk,
the
BIOS must do the conversion, so that it appears so to the RSP). If this
parameter is not specified, it will be passed as 0.
CTRL: Optional control word parameter, will be 0 if
ommited.
Out of 16 bits, only 7 are currently defined. Note that a given unit
may
use only a few of them.
UUUr rrrr rrrr CSPA
U = user defined control-bits.
r = r eserved bits.
C = NOCLRF. If this bit is 0, a LF (ascii >0A) will be automatically
appended to each CR (ascii >0D).
S = NOSPEC. If this bit is 1, special characters handling is disabled.
P = PHYSSECT. If this bit is 1, the disk is accessed by physical
sectors
rather than by logical block. In this case, LBLOCK contains
the
sector number, and NBYTES must be zero. The whole sector
will
be transfered, whatever its size.
A = ASSYNC. If this bit is 1, access will be assynchroneous. This
should
never be the case, so leave this bit as 0.
In summary, the stack would look like this before the call:
<
| Ctrl word |
| Log Block |
| Num bytes |
| Byte offset |
| Word base |
| Unit number |
|/////////////|
After the call, all parameters will be poped from the stack by UNITREAD itself.
This procedure writes a given number of bytes to a device. Its format is exactly identical to that of UNITREAD.
This procedure is used to reset a device. At the RSP level, this means clearing any state flag. The device can then be reset to a known state by the BIOS, but this is a device-dependent operation. The Pascal declaration is:
PROCEDURE UNITCLEAR(UNITNUMBER: INTEGER);
This procedure is intended for a device to report its status into a buffer that can be upto 30 words in length. The meaning of these words will depend on the device. The Pascal declaration of UNITSTATUS is:
PROCEDURE UNITSTATUS(UNITNUMBER: INTEGER;
VAR STATUS_WORDS: ARRAY[0..29] OF INTEGERS;
CTRL: INTEGER);
UNITNUMBER is the unit number as defined above.
STATUS_WORDS is a pointer to a memory buffer that can
accomodate
upto 30 integers.
CTRL is control word. Only 1 bit is defined:
UUUr rrrr rrrr rrrD
U = user defined control-bits.
r = reserved bits.
D = direction. If the bit is 1, the status of the input channel is
queried.
If this bit is 0, the status of the ouput channel is queried.
This procedure was intended for assynchronous environments and will therefore always return "false" with the TI-99/4A system. Its Pascal declaration is:
FUNCTION UNITBUSY(UNITNUMBER: INTEGER): BOOLEAN;
This procedure is intended for assynchronous environments. On the TI-99/4A system, it returns immediately and can thus be considered as a NOP. Its Pascal format is:
PROCEDURE UNITWAIT(UNITNUMBER: INTEGER);
As already stated, the main responsability of the RSP/IO is to handle communications with the BIOS in a standardized manner. But in addition, the RSP/IO is also in charge of processing a few special characters: two during output (CR/LF and DLE), and two during input (EOF and ALPHALOCK).
With the p-system, lines in a text file end with a single CR character (>0D), but this has the meaning of "carriage return, plus line feed". Many devices consider these two functions as distinct: CR brings the cursor left, while LF (ascii >0A) moves one line down. The RSP/IO will thus append a LF to every CR while outputing a text file. This behaviour can be suppressed by setting the NOCRLF bit as 1 in the control word.
When a text file contains a long stretch of spaces,such as an indented line, they are compressed in two bytes: the first one is the DLE character (ascii >10), the second is the number of spaces plus 32 (so as to make it a printable character). For instance, a line indented by 8 would start with >10 >28. The RSP/IO must decompress such lines and send the appropriate number of spaces to the BIOS, instead of the DLE code.
Several devices, such as CONSOLE, PRINTER or REMIN send a special character, called EOF, to signal that the end-of-file has been reached. In the p-system, EOF doesn't have a fixed ascii value. Rather, it is a variable stored in the SYSCOM area (a portion of memory shared by the system and the PME). The RSP/IO must therefore compare each incoming character to the one in SYSCOM and, if they match, process it as an EOF. With the TI-99/4A, the value of EOF is generally >83, which corresponds to Ctrl-C.
The meaning of an EOF depends on the device: for CONSOLE the rest of the input buffer will be filled with zeros, for the printer and remote devices, the EOF character is copied into the buffer. In any case, input is terminated immediately.
Apart for the hardware "alpha-lock" key on the TI-99/4A keyboard, the p-system also provides a software-based alpha-lock. When the special ALPHALOCK character is received, all following lower-case characters ('a' through 'z') will be converted to upper-case. This will continue until another ALPHALOCK character is received. As with EOF, there is no fixed ascii value for this character, rather it is defined by a variable in the SYSCOM area (with the TI-99/4A, the ascii value of this variable is generally >0C, i.e. FCTN-6).
As mentionned above, it is possible to disable the handling of EOF and ALPHALOCK by setting the NOSPEC bit to 1 in the control word. Note however that this will also disable the handling by the BIOS of the other special characters (break, start/stop, flush and character masking).
The p-system offers the possibility of creating virtual drives, that are physically implemented as a partition of an actual drive. These are called subsidiary volumes. Another responsability of the RSP/IO is to intercept calls to a subsidiary volume and to translate them into the appropriate call to the physical unit on which the subsidiary volume is implemented. Concretely, this means changing the unit number and adjusting the block address.
The SYSCOM memory area contains a unit table with an entry for each unit in the system. For storage devices, this record contains a physical disk unit and a block offset. The RSP/IO looks-up in this table for the proper unit number to use and adds to block offset to the LBLOCK parameter passed by the caller. After some sanity check for absent devices or illegal block number, it calls the BIOS with the new values.
The BIOS (Basic Input Output Subsystem) is of course machine-specific. It is in charge of interfacing the computer hardware with the PME via the RSP/IO and with p-code programs via the RSP/IO or directly. It retains the basic definitions of units, but does not have to handle them in a standardized manner.
In general, each device driver in the BIOS will handle four operations: read, write, control and status. However, the parameters required for the corresponding function may vary according to the device.
All BIOS functions should return a completion code to the RSP/IO, so that it knows if anything went wrong. These codes are a subset of those that can be returned by the IORESULT routine of the RSP/IO. The following codes can be returned by the BIOS:
0: No error
1: CRC error
2: Bad device number (e.g. unimplemented use-defined device)
3: Illegal I/O request
4: Undefined hardware error (not the same as IORESULT)
9: Device not on line (or not implemented in BIOS)
15: Ring buffer overflow
16: Disk is write-protected
17: Illegal block number
18: Illegal buffer address
128-255: Hardware-specific error.
Console
Printer
Disk drives
Remote
Serial
User-defined
System
On the TI-99/4A, the default CONSOLE input channel is the keyboard, while the default output channel is the screen. Note however that it is possible to redirect each channel to another device (e.g. output to a printer or input from a file).
Only one parameter is needed for reading from the CONSOLE:
CONSOLE is expected to contain a type-ahead buffer, able to store from 1 to 32 characters (or more) received from the keyboard until they are passed to the RSP. If additional characters are entered from the keyboard once the buffer is full, the bell should ring.
Optionally, all characters input from the console can be masked with >7F, as the left-most bit is always 0 in ascci. However, to accomodate special characters, it is possible to mask the input with >FF, i.e. to leave the characters unchanged. The value of the mask is set by the SETUP utility and stored in the SYSCOM area, an area of memory shared by the PME and the p-system.
In addition, the BIOS recognises the following control characters that regulates the output channel:
Only one parameter is needed for reading from the CONSOLE:
But the BIOS must also process some characters that have a special meaning:
Optionally, some additional functions can be offered by the BIOS. There is no fixed ascii code for these control characters, their value is determined at startup time by entries in the SYSTEM.MISCINFO file.
The effect of other non-printable characters is not strictly defined in the p-system specifications.
The CONSOLE initialisation function should be passed two parameters:
The p-system stores the values for the special characters, the character mask, and some flags into an area of memory shared with the PME, called the SYSCOM area (this area is part of the Kernel segment). Upon initialisation, a pointer to SYSCOM is passed to the console BIOS. From this pointer, the offsets of the various control characters are the following:
>003A: NOBREAK (bit value >40)
>0052: EOF character
>0053: FLUSH character
>0054: BREAK character
>0055: STOP character
>005C: Character mask (>7F or >FF)
>005D: ALPHALOCK character
Another parameter passed to the console BIOS upon initialisation is the address of the "break" routine, to be called upon reception of a BREAK character. The console BIOS should store this address for further calls.
Upon initialisation, the console BIOS should reset any pending flushed or stopped output status and clear the type ahead buffer.
The status routine requires two parameter:
Only the first status word in the buffer is used:
>0000: Number of characters currently in the input or output buffer (depending on the channel tested). If there is no output buffer, this word always contain zero. If there is no input buffer this word will be 1 if a character is ready to be read, 0 otherwise.
Only one parameter:
So as to be as general as possible, the RSP sends characters to the printer one at a time. If the printer expects characters to be passed as whole lines, it is the BIOS job to buffer the characters until an EOL (end-of-line) is encountered. Valid EOL characters are:
Only one parameter:
If the printer is able to return data, the BIOS will pass them directly to the RSP. Otherwise, input operations result in an error code 3 ("illegal operation").
Upon initialisation, the character buffer is emptied and a new line (CR + LF) is performed. No parameter is passed.
As is the case for the console, the printer status function takes two parameters:
If the printer has no self-checking abilities, this function just returns 0. For the output channel, the BIOS returns the number of characters remaining to be printed, zero is interpreted by the RSP/IO as a signal that the printer is ready to print. A non-zero value indicates that sending a character to the printer may hang the system until the printer is ready.
Disks devices are considered by the RSP as a linear array of 512-byte records. The BIOS is in charge of converting this abstraction to actual parameters needed for disk access. Note that this feature can be overriden by setting the PHYSSECT bit in the control word so that the RSP itself can do hardware-level operations (i.e. pass the actual sector number).
This function takes five parameters:
On input operations, the BIOS will only transfer the required number of characters, even if this is not a complete sector.
This function takes the same five parameters as DISKREAD.
For output operations, if the number of bytes sent is not a multiple of 512, the remaining bytes are undefined. This means that the end of the "block" is likely to contain garbage. It is the responsability of the application program, to know where the valid data ends in the block.
Only one parameter is passed:
Initialisation brings the drive to a state where it is ready to read or write. Any buffered data that has not been saved to disk yet will be lost.
The status function takes three parameters:
For disks, the status report consists in 4 words:
>0000: the number of bytes currently in the input or output
buffer
(depending on the channel tested)
>0002: the number of bytes per sector.
>0004: the number of sectors per track.
>0006: the number of tracks per disk.
The BIOS is in charge of converting the real, physical sectors, into logical 512-bytes sectors, no matter what the actual sector size may be. This may require some tricky buffering if the physical sector size is greater than 512 bytes...
Generally, the BIOS will stay away from the first two physical sectors on the disk, since these are meant for the boot program. On the TI-99/4A, the BIOS fills the whole floppy with one huge file so that all disk operations from the p-system translate into file operations for the TI-99/4A DSRs. The first three sectors on the disk will be occupied by the disk directory, the list of file and the file descriptor record (FDR), which leaves 357 sectors of 256 bytes available for the p-system.
Provision was made for the p-system to access physical sectors directly, with DISKREAD and DISKWRITE. To this end, you must set bit 1 (weight >02) in the control word. The number of bytes should be set as 0 since the whole sector will be transfered, no mater what size it is. Finally, the physical sector number must be passed as the first parameter, instead of the logical sector number. (NB The p-system designers reserved the right for the "number of bytes" parameter to be combined with the sector number, so that one could access more than 65536 sectors. Which is why it should be 0 for now).
The correspondance between logical and physical sectors is for you to calculate, using the following formula:
PhysicalSector = (TrackNumber * SectorsPerTrack) + SectorNumber -1
TrackNumber = PhysicalSector / SectorsPerTrack
SectorNumber = ( PhysicalSector mod SectorsPerTrack) +1
Where "mod" is the modulo operation, i.e. the remainder of the rounded division. Note the addition/substraction of 1, due to the fact that physical sectors are numbered from one on, whereas logical sectors are numbered from zero on. No correction is necessary for tracks, since they are numbered from zero on.
This device is inteded to be a RS232 serial connection. The BIOS should not alter any character during transmission (e.g. not mask the most significant bit). The usual transfer rate is 9600 bauds.
Single parameter passed:
Since they are read one at a time, input characters are buffered as for the keyboard, in a "type-ahead" buffer.
Only one paramter is needed:
Bytes are sent one at a time by the RSP. The BIOS transmits them as such.
Initialisation brings REMOTE to a state where it is ready to both send or receive. No parameter is passed.
The status function takes two parameters:
Only the first word of the status buffer is used:
>0000: number of characters currently stored in the input (or output) buffer.
For instance, the serial COM port on a PC.
Two parameters are passed:
Two parameters are passed
One parameter is used:
Three parameters are required:
The BIOS implementation of a user-defined device is left to the user. The only requirments is that the function return a completion code when finished, and that an error code 2 ("illegal unit number") is returned if the UNITNUMBER parameter does not match any device. User-defined devices should have numbers greater than 127.
Five parameters are passed to this function. The BIOS can choose to ignore any of them, of course:
This function takes the same parameters as USERREAD.
Only one parameter is passed to this function:
What it does is up to the user.
This function is passed three parameters:
What it reports is up to the user that defined the device.
Although the operating system is not an I/O device, the RSP can still use the standard I/O function to address the system BIOS. Each function has a particular meaning:
This function is reserved for future extension. Calling it causes an HALT (i.e. exits the p-system). Five parameters are passed:
Reserved for future extension. Causes an HALT (i.e. exits the p-system). The same five parameter are used than with SYSREAD.
The function is passed two parameters:
Resets the clock value as >00000000.
Two parameters are required:
Returns 3 words in the status buffer:
>0000: Address of the last word (not the last byte! This must
be
an even address) of RAM accessible for the system.
>0002: The least significant word of a 32-bit time value. If not
clock
is implemented, this word is 0. Otherwise, it contains the number of
1/60th
of seconds since initialisation.
>0004: The most significant word of the time value or zero if the
internal
clock in not implemented.
The operating system is in charge of loading and running p-code programs in the host machine. One of its main tasks is to react to segment faults: each time the p-code program calls a segment that is not currently in memory, a segment fault is issued. The p-system kicks in, loads the required segment in the code pool, (possibly shifting other segment to make room, or even trashing one), then it returns control to the p-code program.
The p-system manages a special memory stack, called the heap, for its internal use. Under some circonstances, the user can also store data on the heap.
System units
Code Pool
Faults
Heap
SIBs
E-Recs and E-Vects
Tasks
Syscom area
File access
The p-system is written in Pascal. It consists in the following compilation units:
Units | Usage |
---|---|
HEAPOPS EXTRAHEAP PERMHEAP |
Heap operations |
SCREENOPS | Screen operations |
FILEOPS | File and directories operations |
PASCALIO EXTRAIO SOFTOPS |
File-level I/O |
SMALLCOMMAND COMMANDIO |
I/O redirection and chaining |
STRINGOPS | String operations |
OSUTILS | Conversion utilities |
CONCURRENCY | Concurrency management |
REALOPS | Floating point operations, real numbers I/O |
LONGOPS | Long integer operations |
GOTOXY | Screen cursor control (may be user-supplied) |
KERNEL | Resident part of the OS (segment 0). Code pool maintenance, fault handling, segments loading. |
GETCMD | Subsidiary segment of KERNEL ( swappable). Processes user input, builds run-time environment. |
USERPROG | Ditto. Contains user program (at power-up: builds the initial environment). Segment 15. |
INITIALIZE | Ditto. Processes SYSTEM.MISCINFO, locates code files, builds the device table. |
PRINTERROR | Ditto. Prints run-time error messages. |
The code pool is where p-code segments are placed for execution. It is managed by the system which can move or unload segments if more room is needed. The system always keeps the loaded segment as a continous block in memory. When new segments are loaded, they are appended at the end of this area. Note that assembly language segments, which are not relocatable, are not placed in the code pool , but rather on the heap.
Depending on the machine, the code pool can be placed in between the stack and the heap (internal pool), or have a memory area for itself (external pool). I'm not sure which is the case for the TI-99/4A.
The following variables are used by the code-pool management routines:
SP_Low: Only relevant for an internal pool. Pointer to the lowest possible bound of the stack, i.e. one word above the top of an internal code pool. SP_Low is located in the TIB (Task Information Block).
HeapTop. Only relevant for an internal pool. Pointer to the top of the heap. HeapTop is part of the HeapInfo record.
Seg_Pool. Pointer to the Pool_Descriptor. Seg_Pool is part of the SIB (Segment Information Block).
The Pool_Descriptor comprises the following fields:
Record
PoolBase : fulladdress;
PoolSize : integer;
MinOffset : memptr;
MaxOffset : memptr;
Resolution : integer;
PoolHead : SIB_Ptr;
Perm_SIB : SIB_Ptr;
Extended : boolean;
end;
PoolBase is a 32-bits address that points at the base of the
code pool.
PoolSize is the size in words of the code pool. Set by the SETUP
utility.
MinOffset is the lower boundary of the code pool.
MaxOffset is the upper boundary of the code pool. Same as SP_Low
for an internal pool.
Resolution is the offset in bytes for segment alignment, set in
SYSTEM.MISCINFO. Segments always start at an address which is a
multiple
of this number.
PoolHead points at the SIB at the base of the code pool.
Perm_SIB points at a SIB that is always resident (currently
GOTOXY).
Extended is TRUE if extended memory is used. Set in
SYSTEM.MISCINFO.
When a segment fault occurs, the pool managment routines must load the required segment(s) before returning to the PME. The routines successively try to:.
With an internal code pool, the managment routines must also respond to stack fault and heap faults. To free the necessary space, the routines try to:
There is a high priority task called FaultHandler in the OS, whose job is to handle memory fault. This process is started at boostrap time, but immediately waits for a semaphore. When a routine detects a fault it just signals the semaphore and FaultHandler takes over. FaultHandler deals with the fault (hopefully), then resumes waiting for the semaphore, which returns control to the faulty task.
The semaphore resides in the SYSCOM area and is declared as:
Fault_Message = RECORD
Fault_TIB : TIB_Ptr;
Fault_E_Rec: E_Rec_Ptr;
Fault_Words: integer;
Fault_Type : SegFault..Pool_Fault;
END;
Fault_Sem: RECORD
Real_Sem, Message_Sem: semaphore;
Message: Fault_Message;
END;
As you may have guessed, the record "Message" is used by the routine that generated the fault to pass information to FaultHandler. The contents of this record is pretty much self-explanatory, but just in case:
FaultType indicates the type of fault (segment, stack, heap,
pool, etc).
TIB_Ptr points to the Task Information Block of the faulting
task.
Fault_E_Rec points to the Environment record of the current
segment
or of the missing segment (for segment faults).
Fault_Words is the number of words needed (e.g. for stack
faults).
It's 0 for segment faults.
Faults can be signaled either by the PME or by the OS itself. The PME detects only stack and segment faults, which have been described in more details in the relevant chapter.
The OS routes all faults through the EXECERROR routine. It can issue a segment fault in only one occasion: when you try to use MEMLOCK to lock a segment that's not in memory. It can also issue heap faults, when you are using MARK, NEW, VARNEW or PERMNEW and there isn't enough memory on the heap.
As discussed above, FaultHandler then tries to move around segments in the code pool, and free some memory. When doing this, it must of course make sure that the current segment will not be swapped out of memory! In case of a stack fault created by calling a routine in a different segment, both segments must be locked in memory.
The heap is a separate stack used by the OS for its own purposes. Typically, the heap and the stack grow toward each other in memory, possibly with the code pool in between them. Which means that stack faults, and possibly segment faults, can affect the amount of memory available for the heap. Conversely, the heap management routines can issue a heap fault to request additional space.
The heap managment routines are the following:
Saves the location of the top-of-heap.
Cuts the heap at the position of the corresponding MARK, discarding any variable located in this space (except those declared with PERMNEW). MARK and RELEASE can be nested, and care should be taken to match them properly.
Allocates room for a variable on top of the heap. PTR is a pointer to a variable type, which allows NEW to know how many words should be reserved. If TYPE has variants, NEW allocates enough space for the largest variant, unless instructed otherwise.
De-allocates the space reserved by NEW for the variable pointed by PTR. Afterwards, PTR will contain NIL. If a specific variant of TYPE was requested with NEW, then it should also be requested in the corresponding DISPOSE.
Allocates NWORDS words on top of the heap and places the address of the first one in the variable pointed by P. Returns the number of words effectively allocated which should be equal to NWORDS, or to zero if there wasn't enough space.
De-allocates NWORDS words from the position pointed at by PTR and places NIL in PTR. It's extremely important that NWORDS has the exact same value than was unsed in the corresponding VARNEW call.
Allocates memory for a variable just like NEW, but this space can only be reclaimed with PERMDISPOSE: even RELEASE won't destroy it. This function is reserved for internal use by the OS (e.g to chain programs, so that the next command is still available after a program has released its heap allocation).
This is the only way to de-allocate space reserved by PERNEW. Again, this procedure is reserved for system use.
These procedures are located in three compilation units:
HEAPOPS contains MARK, RELEASE and NEW.
EXTRAHEAP contains DISPOSE, VARNEW , VARAVAIL, MEMLOCK and MEMSWAP.
PERMHEAP contains PERMNEW, PERMDISPOSE and PERMRELEASE.
The heap is managed by placing marks on it. There are two types of marks:
Both types of mark share the common type MemLink:
TYPE
MemLink = RECORD
Avail_list: MemPtr;
NWords: integer;
CASE Boolean OF
true: (Last_Avail, Prev_Mark : MemPtr);
END
With MARKed marks, the variant is true, and we have the following fields:
NWords is always 0 because these marks do not reserve any
space.
Prev_Mark points to the previous mark down the heap.The bottom
mark
has NIL in this field and the topmost mark is pointed at by the global
variable HeapInfo.TopMark. This allows to easily walk the whole chain
of
marks on the heap, for instance to remove a mark with RELEASE.
Last_Avail points to the top of the available space above the
mark.
Typically, this space will be bound by the next mark, or by an
allocated
variable (or by the code pool, in the case of the topmost mark on the
heap).
Avail_list points to a list of data marks, created by either NEW
or VARNEW. The first record in this list constitues the lowest
unallocated
space above the mark.
With data marks, the variant field is false, so Last_Avail and
Prev_Mark
don't exist:
NWords is the amount of space reserved for the variable, including the two words occupied by the mark record itself. Avail_list points to the memory-reserving mark for the next variable.
HeapInfo is an important variable used for heap maintenance. It has the following structure:
VAR
HeapInfo: RECORD
Lock: semaphore;
Topmark, HeapTop: MemPtr;
END
PermList: MemPtr;
Lock is a semaphore used by heap-modifying routines, to make
sure that the heap will only be modified by one process at a time.
TopMark point to the topmost mark installed by the MARK
subroutine,
as described above.
HeapTop points at the highest allocated space on the heap. It is
used by FaultHandler to determine how close the heap is from the stack
(or from the code pool, if it's placed in between the heap and the
stack).
PermList is another variable used for heap management. It points at the top of a separate list of memory-reserving marks used uniquely for variables allocated by PERMNEW. PermList can be NIL if no variable was allocated with PERMNEW.
Example:
Here is a example of heap structure. It contains three MARKed marks (red) linked together through their Prev_Mark fields (red arrows). In each, the Last_Avail field points to the highest portion of the heap that they are allowed to use for their data marks (black arrows).
The top mark's chain of data mark consist in only one mark, that reserves space for the integer Dummy.
The bottom mark does not point to any data chain any more, but there is some free space above it, indicating that some data may have been there, but disposed of.
The middle mark point to a chain of data marks (blue) linked together by their Avail_List fields (blue arrows). In this case, the chain contains only two data marks, that reserve memory for the integer Foo, and the 3-word long-integer Bar, respectively. Note that here also, there is some empty space between Foo and Bar, where variables may have been allocated, then disposed of.
: :
| STACK or CODE |
+------------------+ <---,
| //////////////// | |
| //////////////// | |
+------------------+ | <---HeapInfo.HeapTop
| Int: Dummy | |
+------------------+ |
| Avail_List = NIL | |
| NWords = 1 | |
+------------------+ <-, |
| Avail_List -----|---' |
| NWords = 0 | |
| Last_Avail -----|-----'
| Prev_Mark -----|-------,
+------------------+ |<---HeapInfo.TopMark
| Long: Bar | |
| ( 3 words ) | |
+------------------+ |
| Avail_List = NIL | |
| NWords = 3 | |
+------------------+<-,<-, |
| //////////////// | | | |
+------------------+ | | |
| Int: Foo | | | |
+------------------+ | | |
| Avail_List -----|--' | |
| NWords = 1 | | |
+------------------+<-, | |
| Avail_List -----|--' | |
| NWords = 0 | | |
| Last_Avail -----|-----' |
| Prev_Mark -----|----, |
+------------------+<-, |<-'
| //////////////// | | |
+------------------+ | |
| Avail_List = NIL | | |
| NWords = 0 | | |
| Last_Avail -----|--' |
| Prev_Mark = NIL | |
+------------------+<---'
HeapTop is set to point at the new top of heap, as soon as space on top of the heap is requested by one of the heap management routines. This way, if a stack fault occurs, there won't be any conflict between these routines and FaultHandler.
The OS uses the heap for its own purposes, but the heap is also available for your programs. To avoid conflicts, the system puts a mark on the heap ( called EMPTYHEAP) after your program's runtime environment has been built, but before the program is actually started. Once the program terminates, the system releases EMPTYHEAP, therefore destroying anything you may have placed on the heap (unless you used PERMNEWs, which you shouldn't). Note that your program's SIBs, E_Recs and E_Vects are part of the run-time environment and will therefore appear before the EMPTYHEAP mark. So do global data in your program, and in any units it uses.
Among other things, the OS maintains a disk directory on the heap. It is pointed at by the global variable SysCom^.GDirP, but it is meant to be invisible for you. So before any heap operation (except DISPOSE), the system will remove this directory with DISPOSE and make its space available again.
A segment is considered active if it may be used by the current program, whether it is currently loaded in memory or still on disk. Each active segment is described in a Segment Information Block (SIB) on the heap. When a program requires a segment that is not yet active, an SIB is created for it on the heap. All SIBs are arranged in a double-linked list and reference each others via their Prev_SIB and Next_SIB fields.
The structure of a SIB is the following:
RECORD
Seg_Pool : ^Pool_Des;
Seg_Base : memptr;
Seg_Refs : integer;
Time_Stamp : integer;
Link_Count : integer;
Residency : -1...maxint;
Seg_Name : PACKED ARRAY [0..7] OF CHAR;
Seg_Leng : integer;
Seg_Addr : integer;
Vol_Info : VIP;
Data_Size : integer;
Res_SIB : RECORD
Next_SIB : SIB_Ptr;
Prev_SIB : SIB_Ptr;
CASE boolean OF
TRUE: Next_Sort: SIB_Ptr;
FALSE: New_Loc: memptr;
END
M_Type : integer;
END
Seg_Pool points to the Pool_Descriptor for the pool in which
the segment resides. If the segment is on the heap (as is the case for
non-relocatable segments) or if there is no external code pool,
Seg_Pool
is NIL.
Seg_Base contains the byte offset of the segment within the code
pool, or the address of the segment on the heap. If the segment is not
in memory, Seg_Base is NIL.
Seg_Refs contains the number of outstanding calls to the
segment.
It is incremented each time a routine in the segment is called by an
external
segment, it is decremented when this routine returns to its external
caller.
Time_Stamp is used by the system to determined which segment is
the least recently used, when segments should be swapped out to make
room
in the code pool. The Syscom area
contains a
16-bit variable that is incremented every time a segment is exited.
This
variable is copied in the Time_Stamp field of the segment exited.
Link_Count indicates how many external variables reference the
SIB.
The operating system uses several such variables, including the
Prev_SIB
and Next_SIB fields of other SIBs. Link_Count is incremented each time
a pointer is set to the SIB and decremented when this pointer is
destroyed.
When Link_Count becomes zero, the SIB is removed from the heap.
Residency is 0 for segments that are swappable. Values greater
than
zero indicate that the segment is locked in memory. Residency is
incremented
each time a program locks the segment (with the MEMLOCK function), and
decremented each time it is unlocked (with the MEMSWAP function). A
value
of -1 indicates a segment that was position-locked at loading time and
cannot be made swappable.
Seg_Name contains the first 8 characters of the segment name.
Seg_Len is the number of code words in the segment, including
the
relocation lists but not the reference lists.
Seg_Addr contains the number of the segment's first block on
disk.
Vol_Info points to a volume information record that contains the
drive number and diskname for the disk where the segment resides.
Data_Size is the number of data words in the code segment's data
segment. This only applies to principal segments (for others Data_Size
is 0).
Res_SIB is used to maintain the code pool by waking the list of
SIBs. Its Prev_SIB field points to the previous SIB in the list,
Next_SIB
points to the next SIB in the list. An extra field is used by the code
pool management routine to store temporary values: either Sort_SIB or
New_Loc.
Many p-code instructions require a segment number as an argument. For instance, CXL 12,3 calls procedure number 3 in segment 12. However, there is no such thing as a library of all existing segments and their corresponding numbers. In fact, only one segment has a fixed number: KERNEL, which is segment number 0. For all others, the compiler assigns each segment an arbitrary number, from 2 to 255 (number 1 is reserved for the current segment). This may be a problem when trying to link units that were compiled independently: the same segment may have been given completely different numbers during the various compilations.
Rather than having the linker patch the code to match the segment numbers, the p-system designers went for a more flexible solution. Firstly, the compiler appends a segment reference list to the codefile it creates. This is a list of segment names, together with the numbers that the compiler has assigned to them. Then, when the file is loaded, the system creates an environment vector (or E_Vect) for each segment. An E_Vect is just an array of pointers to other segments. This way, the segment numbers used by the p-codes are just indexes into the E_Vect. To find the relevant segment, the PME just reads the pointer found at a given position in the E_Vect.
The structure of an E_Vect is the following:
RECORD
Vec_Length : integer;
Map : array [1..Vec_Length] of ^E_Rec;
END
Vec_Length indicates the number of entries in the E_Vect,
i.e.
the number of segments it references (including itself).
Map is an array of pointers to other segments, arranged
according
to the segment numbers assigned at compilation time.
If you thought that the pointers in the E_Vect would point at other segments' SIBs, you were wrong. They point at special stuctures, called "Environment Records" or E_Rec. Each segment has an E_Rec, that contains pointers to important structures in this segment. The structure of an E_Rec is the following:
RECORD
Env_Data : memptr;
Env_Vect : ^E_Vect;
Env_SIB : ^SIB;
CASE boolean OF
TRUE: ( Link_Count : integer;
Next_Rec : ^E_Rec;
)
END
Env_Data points at this segment's global data, allocated on
the
heap when the program is executed.
Env_Vect points at this segment's E_Vect, an array of pointers
to
other segments' E_Recs.
Env_SIB points at this segment's SIB, which
is
allocated on the heap when the program is executed.
The next two fields are only present in the E_Rec of the principal
segment
in a compilation unit:
Link_Count indicates the number of other units that are using
this
principal segment.
Next_Rec is used to maintain a chain of all active compilation
units.
It points to the next unit in the chain.
Example:
Let's use three segments, "Tic", "Tac" and "Toe". Tic is a principal segment that references Tac and Toe. Toe is a principal segment in another compilation unit, it also references Tac. Tac is a subsidiary segment, it does not reference any other segment. Note that each segment has an entry for itself in the E_Vect, but I omited these arrows, for clarity.
"Tic" E_Rec
+------------+
| Env_Data | E_Vect 1 2 3
+------------+ +-----+-----+-----+-----+
| Env_Vect |----->| 3 | Tic | Toe | Tac |
+------------+ +-----+-----+--|--+--|--+
| Env_SIB | | |
+------------+ | |
| Link_Count | ,---------|-----'
+------------+ | |
,--| Next_Rec | | |
| +------------+ | |
| | '-----,
| ,------------------------' |
| | |
| +----------------------------------, |
| | | |
| V "Tac" E_Rec | |
| +------------+ | |
| | Env_Data | E_Vect 1 | |
| +------------+ +-----+-----+ | |
| | Env_Vect |----->| 1 | Tac | | |
| +------------+ +-----+-----+ | |
| | Env_SIB | | |
| +------------+ | |
| | |
| ,----------------------------------|-----'
| | |
| V "Toe" E_Rec |
'->+------------+ |
| Env_Data | E_Vect |
+------------+ +-----+-----+--|--+
| Env_Vect |----->| 2 | Toe | Tac |
+------------+ +-----+-----+-----+
| Env_SIB | 1 2
+------------+
| Link_Count |
+------------+
| Next_Rec =0|
+------------+
Note that Tac is referenced as segment number 2 in Toe , but as
segment
number 3 in Tic. Therefore, the instruction
CXL Tac,5 (which calls procedure 5 in Tac ) will be compiled as >93,
>03, >05 in Tic, but as >93, >02, >05 in Toe.
When executing CXL Tac,5 in the segment Tic, the PME reads >93,>03,>05, gets the third enty in Tic's E_Vect and obtains a pointer to Tac's E_Rec. Then it locates procedure number 5 in Tac and branches to it.
Similarly, when encountering a CXL Tac,5 in segment Toe (which is coded >93,>02,>05), the PME gets entry number 2 in Toe's E_Vect, which is again a pointer to Tac. You get the idea?
Finally, note the chaining between the principal segments: Tic's Nec_Rec points to Toe (red arrow). The corresponding entry for Toe is NIL, since Toe is at the end of the list in our example.
The p-system is a multitasking operating system. That's right, it turns your TI-99/4A into a multitasking machine!
A task can be seen as a "line of execution" of a program. At any time there can only be one task running (since the TI-99/4A only has one processor), but there can be many more tasks pending. The p-system achieves multitasking by quickly switching from one task to the next, providing the illusion that all tasks run at the same time.
The "main task" is the thread of execution that starts the p-system, runs any utility or user program, then exits the p-system. Anywhere in this thread, subsidiary tasks can be launched that will execute "concurently" with the main task.
Three things are required for multitasking:
A TIB is a structure allocated on the heap into which the p-system saves some critical values from the present task, before it switches to another. For instance, it saves the current instruction pointer (IPC) and most of the other p-machine registers. These values will be restored when the system switches back to this task.
The structure of a TIB is the following:
RECORD
Regs: PACKED RECORD
Wait_Q : ^TIB;
Prior : byte;
Flags : byte;
SP_Low : memptr;
SP_Upr : memptr;
SP : memptr;
MP : ^MSCW;
Reserved : integer;
IPC : integer;
Env : ^E_Rec;
Proc_Num : byte;
TI_BIOResult: byte;
Hang_Ptr : ^Sem;
M_Depend : integer;
END
Main_Task : Boolean;
Start_MSCW : ^MSCW;
END
IPC is the p-code instruction pointer. It is a
segment-relative
byte pointer.
Proc_Num is the number of the currently executing procedure in
this
task.
MP points to the local activation record (MSCW) for the task, on
the stack.
SP is the p-machine stack pointer.
SP_Low is the lower bound of the stack for this task.
SP_Upr is the upper bound of the stack for this task. Note that
each subsidiary task has its own private stack, allocated on the heap
when
the task is started. By default this stack is 200 words, but this can
be
specified when starting the task. Only the main task uses the system
stack.
Env is a pointer to the task's E_Rec (from which you can find
the
the task's SIB).
TI_BIOResult is used to save an I/O result that is local to the
task.
M_Depend contains machine-dependent data maintained by the PME.
It is initialized as 0.
Prior is the task priority. It's a number from 0 (lowest
priority)
to 255 (highest priority).
Wait_Q is used when the task is in a waiting queue. It points to
the TIB of the next task in the list.
Hang_Ptr is used when a task is blocked by a semaphore. It
points
to the semaphore (it's NIL if the task is not blocked).
Flags are reserved for future use.
Main_Task is TRUE for root ("parent") tasks, FALSE otherwise.
Start_MSCW points to the MSCW of the routine that started this
task.
At any time, there is only one task running. All other tasks ready to run are placed in a waiting queue. When it's time to switch tasks, the p-system fetches the appropriate task from the queue, switches to it, and places the current task back into the queue.
If necessary, tasks can be synchronized by using semaphores. A semaphore is a variable that can be "grabbed" by a task. If another task attempts to grab the same semaphore it will hang, i.e. the task will be suspended until the original task has "released" the semaphore. At which point the hanging task will be able to grab the semaphore and resume execution.
Actually, a semaphore can be grabbed an arbitrary number of times before it causes a task to hang. Similarly, a semaphore may be required to be released several times before is actually frees the task(s) it has hanged. In most cases however, a "grab count" of one (mutual exclusion semaphore, or "mutex") will do the job.
Each semaphore has an associated queue of tasks waiting for it, similar to the queue of tasks ready to run. Note that this queue may be empty if no task is waiting for this semaphore. Hanging and releasing a task simply means moving it from the semaphore queue to the "ready-to-run" queue, or conversely.
To grab a semaphore, use the p-code WAIT. It decrements the semaphore "grab count" or hangs the current task if the count is already zero. Which means that the p-system places the current task in the semaphore queue and switches to another task from the "ready-to-run" list.
To release a semaphore, use the p-code SIGNAL. It fetches the task at the head of the semaphore queue and places it in the "ready-to-run" queue. A task switch will only occur if this task has a higher priority than the current one. If the semaphore queue is empty, the count is incremented and nothing else happens. The same thing happens if the semaphore count is negative: this allows to implement semaphores that must be signalled several times before hanged tasks can be released.
You can also use the standard procedure ATTACH to automatically link a semaphore to a preset PME event. When this event occurs, the semaphore will be signaled. The standard procedures QUIET and ENABLE allow to globally disable or enable event-attached signalisation.
The p-system comprises a compilation unit called CONCURENCY, which implements Pascal processes (a process is the Pascal word for a task). The unit contains 3 procedures called START, STOP, and BLK_EXIT. The Pascal compiler automatically calls START in the initialisation code of a process, and STOP in its exit code. The compiler also calls BLK_EXIT in the exit code of a main process.
These procedures make use of a global variable called Task_Info, which has the following structure:
RECORD
Lock : semaphore;
Task_Done : semaphore;
N_Tasks : integer;
END
Lock is a semaphore used to ensure that only one task at a
time
will be able to modify Task_Info.
Task_Done is a semaphore used to wait for the termination of
subsidiary
tasks.
N_Tasks is the number of subsiduary tasks that have been
created
by START.
START
This procedure is called to create a new task. It reserves space on the heap for the task's TIB, stack, and activation record, and then executes a WAIT p-code. This normally causes the p-system to switch to the new process (unless there is one with a higher priority pending). The new process performs its initialisation (i.e. copying its parameters) then executes a SIGNAL to return to START, which in turn returns to its caller.
STOP
This procedure records the termination of a task. It decrements Task_Info.N_Tasks and signals Task_Info.Task_Done. It then creates a dummy semaphore and waits for it, which causes a permanent task switch for the terminated process.
BLK_EXIT
This procedure is called by the main tasks exit code and waits for termination of all subsidiary processes. It waits for Task_Info.Task_Done until Task_Info.N_Tasks becomes zero. In other words, it waits until each subsidiary task has called STOP.
The SYSCOM area is a memory location where the p-system stores importants variables that can be shared by the PME, the RSP, the BIOS and the OS, as well as by your program. From the OS point of view, the SYSCOM area can be considered as a Pascal packed record and is declared as such in the interface of the KERNEL unit.
Low-level assembly routines on the other hand, must know the offset of each variable with respect the the SYSCOM^ pointer that is defined when the system powers up. The problem here is that some of the variables are bytes, so their location will vary according to the byte sex. For instance, here are the offsets of the some keyboard special codes:
Ctrl char Little endian Big endian
EOF >52 >53 }
FLUSH >53 >52 }
BREAK >54 >55 }
STOP/START >55 >54 }
CHARMASK >5C >5D }
ALPHALOCK >5D >5C }
NOBREAK >3A >3B
The pairs defined with braces are bytes in the same word. When the byte sex changes, the two bytes are flipped whithin the word.
We saw that the RSP/IO consider every unit as an array of blocks, an illusion implemented by the BIOS. The OS takes things a level higher and introduces the concept of files. There can be two types of files: standard files that follow the Pascal conventions and are accessed with the GET and PUT functions. And intereractive files that accept characters one at a time. From the OS point of view, both types of files share a common structure called the File Information Block, or FIB.
FIB = RECORD
FWindow: Window_P;
FEOF, FEOLN: Boolean;
FState: (FJanW, FNeedChar, FGotChar);
FRecSize: integer;
FLock: semaphore;
CASE FIsOPen: Boolean OF
true: (FIsBlkd: Boolean;
FUNIT: UNITNUM;
FVID: VID;
FRptCnt: integer;
FMaxBlk: integer;
FModified: Boolean;
FHeader: DirEntry;
CASE FSoftBuf: Boolean of
true: (FNxtByte: integer;
FMaxByte: integer;
FBufChngd: Boolean;
FBuffer: PACKED ARRAY[0..FBlkSize] OF CHAR
)
)
FWindow points to the current character in the file.
FEOF is the end-of-file flag.
FEOLN is the end-of-line flag.
FState is the file type: FJanW is the standard Pascal type
(Jensen
& Wirth), FNeedChar is an interactive file waiting for a character,
FGotChar is an interactive file with a character.
FRecSize is the size of a record, in bytes. It is 0 for
unentered
files, -1 for interactive and text files.
FLock is a semaphore used to ensure that only one process at a
time
can access a file.
FIsOpen is true when the file is opened, which makes more
fields
relevant:
FIsBlkd is true if the file resides on a storage device.
FVID is the number of the volume.
FRepCnt is the number of time the window value is valid
before another
GET is needed.
FNxtBlk is the next relative block to access.
FMaxBlk is the maximum number of blocks that can be accessed.
FModified is true if the file was modified and the last
access date
should be modified.
FHeader is a copy of the file's directory entry.
FSoftBuf is true if soft-buffered I/O is used, which is the
case
for all files on storage devices, except unentered files. If it is
true,
yet another set of fields become relevant:
FNxtByte is the next byte
FMaxByte is the last byte. These two are used for buffer
management.
FBufChanged is true if the buffer was modified and needs to
be saved.
FBuffer is of course the soft buffer itself.
On a disk or another storage device is a directory, which is nothing more than an array of 78 directory record. The first record (number 0) defines the directory itself, the others entries define files.
The structure of the directory record is:
DFirstBlk | |||
DLastBlk | |||
- | DFKind | ||
Lenght (7) | D | ||
I | R | ||
N | A | ||
M | E | ||
DeovBlk | |||
DNumFiles | |||
DLoadTime | |||
DLastBoot (Year) |
DLastBoot (Month) |
DLastBoot (Day) |
The directory entries for files look like:
DFirstBlk | |||
DLastBlk | |||
StatusBit | DFKind | ||
Length (15) | N | ||
A | M | ||
E | _ | ||
O | F | ||
_ | T | ||
H | E | ||
F | I | ||
L | E | ||
DLastByte | |||
DAccess (Year) |
DAccess (Month) |
DAccess (Day) |
This section describes the file formats used by the p-system.
Code files
_Segment dictionary
_Segment header
_Procedure dictionary
_Constant pool
_Relocation list
_Segment reference list
_Linker information
A text file should only contain ASCII characters. It starts with a 2-block header that is used by the Screen Oriented Editor. If any other program uses a text file, the OS ignores the header. When a new file is created, its header blocks are filled with zeros.
Text files always have an even number of block (so the smallest text file is 4 blocks long, including the header). Each pair of blocks makes up a page. Text does not wrap from one page to the next, if necessary the end of the page is filled with zeros.
Each page contains lines terminated by <return> (ascii >0D). Optionally, if a line starts with a number of spaces, these can be compressed to save disk space. In this case, the line will start with the blank compression code (ascii >10), followed by the number of spaces plus 32.
If you want to access text files from Pascal programs, use READ, READLN, WRITE and WRITELN, although GET and PUT also work and follow the Jansen&Wirth rules for text files.
Code files contain programs, or independently compiled chunks of programs called UNITs. These can contain both p-code and assembly language. In addition, the code file contains a lot of information on the code it carries, how to link the various units, etc.
All in all, the structure of a code file is quite complex. It consists in a segment dictionary, and a series of code segments. To make things worse, the structure of a code segment varies slightly between p-code and assembly segments.
The first block in a code file is the segment dictionary. It can describe upto 16 of the segments that are part of the file. If there are more than 16 segments, extra blocks can be added to the dictionary. These extra blocks will be interspersed with the segments within the code file. Each block contains a link to the following block, so that the chain can be travelled.
A dictionary block basically consists in 6 arrays, which each upto 16 entries:
Followed by some dictionary info.
Lets examine these arrays in detail:
Seg_Dict = RECORD
Disk_Info: ARRAY[0..15] OF
RECORD
Code_Addr: integer;
Code_Leng: integer;
END
Code_Address is the block where the segments starts, relative
to the beginning of the code file. Thus, segments always start on a
block
boundary.
Code_Length is the number of words in the segment, including the
relocation list, but not the reference list.
All unused entries in this array will be zeros.
Seg_Name: ARRAY[0..15] OF
PACKED ARRAY[0..7] OF CHAR;
These are the segment names, upto seven characters in length (truncated if necessary). Unused entries are filled with spaces.
Seg_Misc: ARRAY[0..15] OF
PACKED RECORD
Seg_Type: Seg_Types;
Filler: 0..31
Has_Link_Info: Boolean;
Relocatable: Boolean;
END
Seg_Types=(No_Seg, Prog_Seg, Unit_Seg, Proc_Seg, Seprt_Seg);
Seg_type is the segment type. It will be No_Seg if this
directory
entry is empty (e.g.if there are less than 16 segments). Prog_Seg
indicates
the outer segment of a program, Unit_Seg the outer segment of a
separately
compiled unit. Proc_Seg segments contain procedures belonging to a
program
or unit. Seprt_Seg segments contain assembly language.
Has_Link_Info indicates that the segment needs to be linked. The
necessary linker information is included whithin the segment, starting
on the next block boundary after the reference list.
Relocatable indicates whether the segment is dynamically
relocatable.If
it is, it can be loaded anywhere in the code pool, and moved around if
necessary. This is always the case for p-code segments. By contrast,
statically
relocatable segments will be loaded on the OS heap and locked there
throughout
their existence. Unless special care has been taken in writing them,
all
assembly segments are statically relocatable.
Filler just reserves the remaining 5 bits in the packed record
for
future extension.
Seg_Text: ARRAY[0..15] OF integers;
This array contains the number of the block where the segment interface (in plain text) begins. The interface can be anywhere whithin the segment, provided its starts on a block boundary. It's basically a chunk of text file and thus follows the text format, except that the number of blocks does not need to be even. Its size is specified in the Seg_Familly array. Note that only unit segments have an interface section, all other segments will have zero in this array.
Seg_Info: ARRAY[0..15] OF
PACKED RECORD
Seg_Num: 0..255;
M_Type: M_Types;
Filler: 0..1;
Major_Version: Versions;
END
M_Types = (M_Pseudo, M_6809, M_PDP_11, M_8080, M_Z_80, M_GA_440, M_6502, M_6800
M_9900, M_8086, M_Z8000, M_68000, M_HP87);
Versions = (Unknown, I, II, II_1, III, IV, V, VI, VII);
Seg_Num, surprisingly enough, is the segment number, in the
range
0 to 255.
M_Type define the language the segment contains. If it's p-code,
the type will be M_Pseudo, and the type of p-machine will be indicated
in Major_Version. If the segment contains assembly language, the
processor
for which it was written is indicated here.
Major_Version indicates which version of the p-machine the
p-code
was written for. So p-code is not that portable, after all...
Seg_Famly: ARRAY[0..15] OF
RECORD
CASE Seg_Types OF
Prog_Seg, Unit_Seg:
(Data_Size: integer;
Seg_Refs: integer;
Max_Seg_Num: integer;
Text_Size: integer);
Proc_Seg, Seprt_Seg:
(Prog_Name: PACKED ARRAY[0..7] OF CHARS);
END
This array contains information that depends on the segment type.
For
program and unit segments, we have:
Data_Size is the number of words in this segment's base data
segment.
Ref_Size is the number of words in the segment's reference list.
Max_Seg_Num is the highest segment number that was assigned
within
this compilation unit (regardless of whether the referenced segments
are
part of the code file or not).
Text_Size, as we just saw, is the number of words in the
interface
section of a unit segment. For program segments this entry is zero.
Procedure segments and assembly language segment have only one field
here:
Prog_Name is the name of their parent unit (truncated to 8
characters
if necessary). If it is unknown, which will generally be the case for
assembly
segments, this entry is filled with spaces.
After the 6 arrays are a few variables that contain dictionary related info.
Next_Dict: integer;
Filler: ARRAY[1..2] OF integer;
Checksum: integer;
Ped_Block;
Ped_Block_Count;
Part_Number: RECORD A, B: integer; END
Copy_Note: string[77];
Dict_Byte_Sex: integer;
END {of Seg_Dict}
Next_Dict is the number of the next dictionary block whithin
the file, if any. If there are no more blocks, it is zero.
Part_Number is an arbitrary number for the program, as defined
by
its programmer.
Copy_Note contains an optional copyright string.
Dict_Byte_Sex always contains one. According to the machine on
which
the file was created, it will be coded either as >0001 or as
>0100.
This allows the system to determine the byte sex of the dictionary.
Checksum, Ped_Block and Ped_Block_Count are used
by
an utility called QUICKSTART, that does not come with the TI-99/4A
p-code
system. For this reason, it is not described here.
Filler reserves the last words in the block for futur use.
And if you think all this is complicated, wait till we take a look at the structure of a code segment...
By order of appearance, here are the sections that may be contained within a code segment (many are optional):
Segment header
Procedure dictionary
Constant pool
Relocation list
Segment reference list
Linker information
Each segment starts with a mandatory header containing pointers to the various tables, and some segment-related info:
Proc_Dic: integer;
Reloc_List: integer;
Seg_Name: PACKED ARRAY[1..8] OF CHAR;
Byte_Sex: integer;
Cte_Pool: integer;
Real_Size: integer;
Part_Number: RECORD A, B: integer; END
Proc_Dic is a pointer to the procedure dictionary. It's an
offset
in words from the start of the segment.
Reloc_List is a pointer to the relocation list. Zero if there is
no relocation list.
Seg_Name is the name of the segment.
Byte_Sex is always 1. If the machine that loads the segment has
a different byte sex than the one which wrote it, Byte_Sex will be read
at 256 (>0100). This lets the OS know that bytes should be flipped
whithin
this segment. Of course, p-code is byte-sex independent, but constants
embedded within the code are not and their bytes may need to be
flipped.
So are the various pointers in this header and in the procedure
dictionary.
Cte_Pool is a pointer to the constant pool. It can be zero if
there
is no constant pool in this segment.
Real_Size is the size of floating point numbers used by this
segment.
Part_Number is a record of two integers reserved for future use
(e.g. to hold a version number).
The procedure dictionary is nothing more than a list of pointers to each and every procedure in the segment. The list grows downwards, from the address pointed at by Proc_Dic. In other words, procedure numbers can be considered as negative indices whithin the dictionary. The top word in the dictionary (index 0, pointed at by Proc_Dic) is the number of procedures in the dictionary. There may be empty entries in the dictionary, whose content is 0. These correspond to procedures that were declared as EXTERNAL or FORWARD routines.
+------------+ +-----------+
| # of procs | | exit code |<-,
+------------+ <-- Proc_Dic | | |
| Proc #1 | : : |
+------------+ | code | |
| Proc #2 | +-----------+ |
+------------+ | Data_Size | |
| Proc #3 ---|-----------------> +-----------+ |
+------------+ | Exit_IC --|--'
| Proc #4 | +-----------+
+------------+
| etc |
The procedures pointers are word offset relative to the start of the segment, just like Proc_Dic itself. Which means that they may need to be flipped if byte sexes do not match.
Procedures are aligned on word boundaries and always begins with two words.
Exit_IC: integer;
Data_Size: integer;
The pointer in the procedure dictionary points at Data_Size, which is immediately followed by the executable code, whether p-code, assembly or a mixture of both.
Data_Size is the number of words of local space that the procedure needs for its variables. It does not include the calling parameters that are assumed to be already on stack when the procedure is called. No local space is reserved for pure assembly routines. For mixed language routines, Data_Size is negative if the first instruction is in assembly.
Exit_IC points to the code that must be executed when a p-code procedure is terminated. It is a offset in bytes (not words!) from the start of the segment. Exit_IC is undefined for assembly language routines, and for mixed procedures that begin with assembly. For this reason, compilers that produce mixed code procedures generally start them with at least one p-code (e.g. NAT, or NOP and NAT), so that Exit_IC will be defined.
Multi-word contants embeded in p-code are pulled out by the compiler and placed in a dedicated area, just after the last procedure in the segment. The area is pointed at by Cte_Pool. All p-code instuctions that use constants refer to them as offsets from the bottom of the code pool. This strategy allows p-code to be completely byte-sex independent. Of course, the code pool is not and its bytes will need to be flipped if the machine on which the program is loaded does not have the same sex that the machine on which the program was written.
Real numbers in floating point format represent a special case. These are coded in one of the two canonical format described above. The word Real_Size in the segment header indicates the real number format used by the machine that created the program (this word is valid even if no real number was actually defined). It will be 32 for the 3-words format (32 bits) and 64 for the 3-to-6 words format. Real numbers are clustered together in a separate region of the constant pool, which is pointed at by the very first word in the pool. The first word in the real sub-pool is the number of real constants that follow.
Example:
| Proc dict |
+------------+
| Constant |
: ... :
| Constant |
+------------+
| Real |
: ... :
| Real |
+------------+
| # of reals |
+------------+<-,
| Constant | |
: ... : |
| Constant | |
+------------+ |
| Sub-pool -|--'
+------------+ <--- Cte_Pool
| Last proc |
The relocation list is generally placed after the procedure dictionary. It is pointed at by the Rel_List word in the segment header. It contains a list of all addresses that need to be patched when the segment is loaded in memory, or moved around. Note that only assembly needs a relocation list, since p-code is address-independent.
The relocation list is made of any number of sublists. Each sublist consists in a header that specifies the size of the sublist and the type of relocation required, followed by a list of pointers to the addresses that need to be patched. Here is the structure of this header:
List_Header = PACKED RECORD
List_Size: integer;
Data_Seg_Num: 0..255;
Reloc_Type: Loc_Types;
END
Loc_Types = (Reloc_End, Seg_Rel, Base_Rel, Interp_Rel, Proc_Rel);
List_Size is the number of pointers that follow the header in
the sublist, i.e. the number of addresses to patch.
Data_Seg_Num is only used by sublists of type Base_Rel and is
zero
for other types. It contains the local number of the data segment to
which
the pointers in the sublist are relative. Its value is normally
assigned
by the Linker.
Reloc_Type specifies the type of relocation requires. Reloc_End
(value = 0) serves as a terminator to indicated that there are no more
sublists in the relocation list. The Proc_Rel type (procedure-relative)
is produced by the Assembler but will be changed to Seg_Rel
(segment-relative)
by the Linker, so you should never encounter it in a linked code file.
Example:
+------------+--------------+
| Reloc_Type | Data_Seg_Num |
+------------+--------------+ <--- Reloc_List
| List_Size |
+---------------------------+
| Pointer |
: ... :
| Pointer |
+------------+--------------+
| Reloc_Type | Data_Seg_Num |
+------------+--------------+
| List_Size |
+---------------------------+
| Pointer |
: ... :
| Pointer |
+------------+--------------+
| 0 | 0 |
+------------+--------------+
Note that the relocation list grows from top to bottom: Reloc_List points at the sublist with the highest offset in the segment, whereas the sublist with a Reloc_Type of zero (Reloc_End) is at the bottom.
The segment reference list is only needed by Unit segments. It is necessary because segment numbers are assigned arbitrarily by the compiler within a compilation unit. The segment reference list associates these unit-specific numbers the with the corresponding segment names, so that the OS can set up the proper E_Vect and E_Rec structures when building up the run-time environment. Once this is done, the segment reference list is discarded, so it won't use up any memory at execution time.
The list is placed at the end of the segment, after the relocation list (if any). Its location can be deduced by using the field Code_Length in the segment dictionary as an offset from the beginning of the segment. The size of the list (in words ) is also found in the segment dictionary, in the word Ref_Size.
The list consists simply in a series of records with the following structure:
Seg_Reg = PACKED RECORD
Seg_Name: PACKED ARRAY[0..7] OF CHAR;
Seg_Num: 0..255;
Filler: 0..255;
END
Seg_Name is the 8-character segment name.
Seg_Num is the number this segment was assigned in this
compilation
unit.
Filler is reserved for future extensions.
The end of the list is marked by a record with a Seg_Name consisting in 8 spaces, and a Seg_Num of zero. It could also be calculated from the Ref_Size value (5 words per record).
Dummy records with Seg_Name "***" are generated so that the OS can execute the initiation and termination code of a unit. When loading a program, the OS creates a list of all units that contain such a record, finishing with the main program. It then calls procedure number 1 in the first unit of the list. Procedure number 1 is reserved for the initialization/termination section of a unit. When done with initialization, the first unit calls the next one, which calls the next, etc, until the last unit calls the main program. When the program terminates, the *** list is popped and the same procedures are returned to, thereby travelling the chain back in reverse order.
Code files produced by the Assembler and the Compiler may contain some information for the Linker to use in linking assembly language segments to p-code segments. Segments produced by the Assembler always have linker information, segments produced by the Compiler only have some if they contain EXTERNAL routines. The Has_Linker_Info flag in the segment dictionary indicates whether a segment has such information or not.
Linker information is always at the end of the segment, starting on
a block boundary after the segment reference list (if any). There is no
pointer to it in the segment header, but its address can be calculated
from the segment entries in the segment dictionary:
Code_Addr + ((Code_Length + Seg_Refs + 255) / 256).
Linker information is a series of 8-word records, whose contents depends on the type of linking required. Records are padded with zeros if less than 8 words are actually used. Some records are followed by a list of pointers whithin the segment. All records can be defined with a single Pascal structure, even though they only share the common Name and LI_Type fields.
LI_Types = (EOF_Mark, Glob_Ref, Publ_Ref, Priv_Ref, Const_Ref,
Glob_Def, Publ_Def, Const_Def, Ext_Proc, Ext_Func, Sep_Proc, Sep_Func);
LI_Entry = RECORD
Name: PACKED ARRAY[0..7] OF CHARS;
CASE LI_Type: LI_Types OF
Glob_Ref, Publ_Ref, Const_Ref:
(Format: (Word, Byte, Big);
N_Refs: integer);
Priv_Ref:
(Format: (Word, Byte, Big);
NRefs: integer;
N_Words: integer);
Ext_Proc, Ext_Func:
(Src_Proc: integer;
N_Params: integer);
Sep_Proc, Sep_Func:
(Sec_Proc: integer;
N_Params: integer;
Kool_bit: boolean);
Glob_Def:
(Home_Proc: integer;
IC_Offset: integer);
Publ_Def:
(Base_Offset: integer;
Pub_Data_Seg: integer);
Const_Def:
(Const_Val: integer);
END;
Ptr_List: ARRAY[0..ceiling(N_Refs/8)] OF
ARRAY[0..7} OF integer;
Ok, let's take these one at a time. Remember that they all have a Name field that hold the name of the identifier, and a LI_Type field that indicated the type of linking required.
Glob_Ref is used to link identifiers between assembly routines.
Glob_Ref:
(Format: (Word, Byte, Big);
N_Refs: integer);
Name is the name of the identifier (a.k.a. label).
Format is always 0 (i.e. "word").
N_Refs is the number of references in the list that follows the
record.
Ptr_List: The reference list comes by chunks of 8 words. For Glob_Ref, the Linker will add the final segment offset to each of the words pointed at by the reference list. The offset could be in words or in bytes depending on the assembly language used.
Publ_Ref is used to link a label in an assembly routine to a
variable
in a compilation unit (compiled into p-code).
Publ_Ref:
(Format: (Word, Byte, Big);
N_Refs: integer);
Name is both the name of the label and the name of the
variable
in the compilation unit.
Format is always "word".
N_Refs is the number of references in the list that follows the
record.
The linker will add the offset of the variable to all words pointed at by the reference list.
Const_Ref is used to link a label in an assembly routine to a
constant
defined in a compilation unit.
Const_Ref:
(Format: (Word, Byte, Big);
N_Refs: integer);
Name is the name of the label, and the name of the constant.
Format can be "word" (0) or "byte" (1).
N_Refs is the number of references in the list that follows the
record.
The linker will place the constant byte or word at all locations specified in the reference list.
Priv_Ref is used to allocate space is the global data segment.
Priv_Ref:
(Format: (Word, Byte, Big);
NRefs: integer;
N_Words: integer)
Format is always "word".
N_Words is the number of words to allocate.
N_Refs is the number of references in the list that follows the
record.
The Linker will add the offset of the beginning of the allocated area to each word pointed at by the reference list.
Ext_Proc and Ext_Func are generated by the Compiler to
reference
EXTERNAL routines. They are not followed with a pointer list.
Ext_Proc, Ext_Func:
(Src_Proc: integer;
N_Params: integer);
Name is the name of the routine.
Src_Proc is the number assigned to the routine.
N_Params is the number of words used for parameter passing.
Sep_Proc and Sep_Func are generated by the Assembler to
declare
routines. They are not followed with a pointer list.
Sep_Proc, Sep_Func:
(Sec_Proc: integer;
N_Params: integer;
Kool_bit: boolean);
Name is the name of the routine.
Src_Proc is the number assigned to the routine.
N_Params is the number of words used for parameter passing.
Kool_Bit is true if the routine is relocatable ( created with .RELPROC
and .RELFUNC) and false otherwise.
Glob_Def is generated by the Assembler to declare a global
identifier
(with .DEF, .PROC, .FUNC, .RELPROC
or .RELFUNC). There is no pointer list.
Glob_Def:
(Home_Proc: integer;
IC_Offset: integer);
Name is an identifier valid within the segment, that can be
referenced
by other routines from the same segment.
Home_Proc is the number of the routine in which the identifier
is
defined.
IC_Offset is the offset in bytes of the label whithin the
routine
that contains it.
Publ_Def is generated by the Compiler to declare a global
variable
that should be visible to EXTERNAL routines. It is not
followed
with a pointer list.
Publ_Def:
(Base_Offset: integer;
Pub_Data_Seg: integer);
Name is the name of the variable.
Base_Offset is the offset in words of the variable, from the
start
of the segment that contains it.
Pub_Data_Seg is the local number of the data segment that
contains
the variable.
Const_Def is generated by the Compiler to declare a global
constant
that should be visible to EXTERNAL routines. It does not have a pointer
list.
Const_Def:
(Const_Val: integer);
Name is the name of the constant.
Const_Val is the value of the constant.
EOF_Mark is a special type that marks the end of the linker
information
list. Name should be filled with spaces, the remainder of the
record,
as always, filled with zeros.
Note that not all segments can have every type of linker information. Compiler generated Prog_Seg and Unit_Seg can :
Assembly generated Seprt_Seg can:
The structure of a pure assembly file, generated by the Assembler, resembles a lot the Compiler-generated code files described above. There are a few differences however.
Most importantly, each procedure has its own relocation list, located just after the body of the procedure. Since Exit_IC is not used by assembly routines, this word is used to point at the top of the relocation list (remembers, it grows downwards). These relocation lists are the only ones in which the Proc_Rel relocation type can be used. The Linker will convert all the procedure's lists into a big segment's list, in which the Proc_Rel will be replaced with Seg_Rel.
For each procedure, the Data_Size word is >FFFF, which is zero in one's complement notation. It indicates 1) that no data space is required, 2) that the first instruction in the routine is an assembly opcode.
Since the Assembler does not know the segment name, all routines that communicate with REFs and DEFs are assumed to be in the same segment.
And since assembly segments don't know what program or unit they are linked to, the Prog_Name entry in the segment dictionary is left blank. For the same reason, the Data_Seg_Num field in the List_Header record of Base_Rel sublists will contain zeros. The proper value will be filled in by the Linker. The Linker also updates the Relocatable bit in Seg_Misc arrays according to what was supplied in the linker information section.
At linking time, it is also possible that the code bodies of assembly routines will be copied into one of the segments of the compilation unit they are linked to.