Glossary

Alphabetical list

This glossary defines - in alphabetical order - the most important concepts.

Adjacent value:

is the value closest, but still inside, the inner fences. (This is where the end of the whiskers are set on a boxplot). Some commands (e.g. the BOXPLOT command will identify the adjacent values (low and high adjacent value).

Adjacent values

by extension we call adjacent values any value lying between the hinges and the inner fences. On coded displays these values often appear as '+' (upper adjacent values) and '-' (low adjacent values). In order contextes low/high are not distinguished; then '+' is used for both. Note that these codes can be different in you EDA installation. (STAT GRAPH DISTCODES displays them; SET GRAPH DISTCODE lets you change them). See also outliers (out and far out).

Allvars mode:

(Usually default mode) Whenever EDA is in ALLVARS mode and a multivariate command like FACTOR is used and no variable list is present, EDA takes includes all variables in the WA into analysis. This of course requires that the WA be rectangular. Commands using this feature are called "allvars sensitive" commands. The SET ALLVARS command is used to change the current state. If ALLVARS is OFF these commands work like normal EDA commands, i.e. the current vlist is used. If ALLVARS is ON and a vlist is present EDA behaves as in ALLVARS OFF mode, i.e. the vlist overrides the mode. (A synonym for WA) A block is a WA seen as an entity for i/o operations. A complete WA may be saved into a file or loaded from a file. A block has a block label, block descriptor and block status information.

Case selection commands:

See <Selection commands>

Casid

An alphanumeric case identifier attached to each case in the WA. Default casid is the (original) sequence number of the cases. Other casid might be built in (currently swiss cantons, europ. regions) and added using the CASID command. Any other casid can be read in and stored. These 4 character casids may be used in named values, matrix references or simple expressions. See the CASID command for more details. Casids should not contain blanks nor special symbols, to make sure the correct sure correct substitution in all instances,

Casids defined by the user are automatically stored in a system file when created; an additional possibility is to store frequently used casids (and maps) in a casid/map library. (see there).

CENTER

A CENTER or reference value is stored with each variable (variable attribute). When you create a new variable, by default the median will be computed and stored as reference values. A number of commands use a CENTER option (e.g. some coding options) to refer to this value. This allows easy comparisons with global values (e.g. global percentages etc). Several commands (DISP, PERCENT) modify this center value. Refrenence values are stored into system files (PUT) and restored by a GET command. There is also a label and descriptor attached to the CENTER, this label/descriptor is global to a whole WA, i.e. it is not always meaningful for all variables. The CENTER command is used to manage this attribute. Note that reference value and center are synonyms.

Coding

(coded displays) any display where some symbols are shown instead of the exact numeric value. Various forms are frequently used in EDA, e.g. codes showing whether a value is a far-out, out, adjacent or in value, codes corresponding to specific intervals of a variable etc. or symbols ('+' or '-') representing a value in terms of deviation from some central value, e.g. ++ might indicate that the value is two mid-spreads away from the median of its distribution. See the section on the "Art of coding" in this chapter.

Column name:

The columns (cases) of the data matrix may have a name. Default name is 'case'. Other names, e.g. 'state' may be specified with the SET COLNAM command.

Command files:

Instead of reading commands from the keyboard, commands can be entered from a file containing these commands. A file named EDAINI is automatically executed when EDA is called (startup file).

Configurations:

two distinct EDA matrices C1 and C2 mainly used to store and manipulate configurations produced by multidimensional techniques. See C1 and C2.

Control characters(*):

Sometimes you need to specify a character not available on your keyboard (e.g. graphical characters) for symbols used in coded lists or with the MOD tool. Whenever the manual states that control characters can be used, the following rules apply. EDA has a special symbol used to announce a control character, i.e. ~ (tilda) by default. This character may be followed either by a single letter, e.g. ~A (specifies control-A) or by a three digit decimal number, e.g. ~027 (note that three digits are always required. If you need to insert the tilda (or its replacement) type ~~ to produce a single ~. Please note, that this facility is only available, where mentionned in the manual. In all other cases no translation is performed.

Control options

are used with expressions and macros; they provide control of the macro or expression through an index. LET #A=#A%#TOTAL \ FOR A START=10 END=20 executes the expression 11 times; at each step A is incremented by one. See the LET/CALC/OUT commands, as well as the LOOP and EXECUTE commands in the macro section.

C1

The configuration area contains coordinates produced by FACTOR and other multidimensional techniques. This area can be manipulated by the C1 and ROTATE commands. There are also labels attached to each coordinate. These labels may be case or variable oriented, depending upon the data stored in it. If they are produced by any multidimensional technique they are variable oriented (C2 then is case oriented), but configurations may be stored from other sources and they also may be exchanged. When loading a new WA the C1 matrix is not cleared.

C2

a second configuration matrix, which usually holds individual scores or another configuration from techniques producing to configuration matrices, or techniques like configuration fitting and comparison which need two When loading a new WA the C2 matrix is not cleared. See also C1.

Data matrix

The (numeric) data to be analyzed, i.e. the WA in a narrower sense (without the matrices and documents etc) as a data matrix with a certain number of rows and columns.

Data range checking

Data entered from the keyboard may be submitted to an automatic checking of the range of the data, i.e. for percent data values outside 0-100 should be rejected. This feature may be activated with the SET command. by blanks or commas.

Defined variables,

See letter variables

Descriptor:

See entry variable descriptor. In addition to variable descriptors there are also WA descriptors, descriptors for the C1, C2 and MATRIX matrices, as well as descriptors for the GVAR and the currently defined variable ties. All descriptors take the form of a up to 48 character long short sentence describing the contents of the matrix.

Document:

A document is a text of any length attached to a variable, which can be retrieved using the DOC command. This is an optional feature and may not be present. There is also a HEADER and a NOTE explaining the nature etc of the data in the WA. Other named documents may also be included (#docs). Case documents are documents referring to specific cases. The may have a meaning either for a variable only or for a whole WA. See the special section on Documents [AM] and the DOC command.

EDA-file

EDA specific files, where the program retrieves or stores complete WAs, i.e. blocks. These files are sequential access (SAM files). Other files, which do not have an EDA specific format are called external files.

EDAINI

See command files.

EDITor:

A special module within EDA with its own syntax analysis for editing and recoding purposes. The editor works in two modes: either the editor is entered using the EDIT command or using an edit command preceeded by / meaning immediate mode, i.e after execution of the command control returns to normal EDA mode. There is also a text editor within EDA, called TED.

Expressions, arithmetic:

a algebraic expression following usual rules for operators and precedence, with a special notation for variables. (See also simple-expression).

Extreme values

--> outliers

far out values -->

outliers

File names:

file names within EDA are usually specified within "" (i.e. as names). The file name is specified as an external file name, i.e. should correspond to the conventions on your system.

Freefield input:

in some instances the program requires numeric input from the keyboard, which can be entered in freefield, i.e. without a specific format. If the number of entries is not determined by the nature of the command, a // sign is used to signal the end of input. Acceptable items must conform to the definition of simple expressions. Items are separated by blanks or commas. If an error occurs during data entry the user is asked to enter a replacement value for the value in error (or to cancel, i.e. to abort the current command). Normally data entry is submitted to range checking. In some instances items may be skipped by specifying several commas, then a default value is supplied (more details will be found with the specific commands.

Format:

Some I/O routines dealing with formatted raw data, can use Fortran-type formats. A format string is enclosed in parenthesis and may contain an A4 element for casids (first element) followed by any format element pertaining to real data (all data in EDA are real and single precision). See also *READ RAW (documented files) for alternatives. (Usually people with no experience with Fortran programming have difficulties with formats....).

Fuzz

is a system defined value used in comparisons as criterion to determine whether a value is identical to another or not, a value is identical when it lies in the interval

           (value-fuzz) < value < (value+fuzz)

Many commands dealing with this kind of problems have an option FUZZ=val used to set the fuzz value for that particular command. There is also a global fuzz value used, unless the specific >FUZZ= option is used. See SET FUZZ for details.

Global options:

several options apply to all commands without any distinction. Currently these are /d which inhibit the display of the results of the current command and /p inhibiting that output be written to the print file. The /S (stack) switch is useful in macros. See the section on global options for more details.

Groups

Cases can be grouped together using a grouping variable called GVAR, which can be defined and manipulated by the user. Many commands are offered to produce groups (like cluster analysis, coding etc) and even more commands display group memberships (lists, histograms, plots etc.). Groups are shown either as numbers (group numbers) or, when defined, names whenever this is possible. See the GVAR command for additional details. Compare also to TIES.

Group names

Groups may have names (See the GVAR command on how to define names). Currently the number of groups which may have names is limited to MAXC (an implementation constant, often 8), but this seems quite enough for most situations where you are prepared to give each individual group a specific name. If no name is defined automatic group names are generated: Group nn, where nn is the group number.

GVAR:

stands for Grouping variable (see group) The GVAR has a descriptor attached to it, describing its origin. GVAR is a global attribute of a WA, i.e. it is the same for all variables. GVARS are produced by many commands and can manipulated explicitely with the GVAR command.

High

(values etc) refers to values usually higher (larger) than the median in the distribution. Often we will also say upper.... or use the abreviation HI, eg. HI adjacent value, upper hinges etc. See also low(er).

Hinges:

Letter values at depth 1/4, roughly the quartiles sometimes called fourth. The Hinges are the endpoints of the box of a box and whisker plot.

Implementation

the process of adapting the EDA program to a specific computer AND user environment. Implementation dependent features means that these features might be different from the description in this manual. For these the user should refer to the chapter on implementations in this manual and to the local document describing the differences. (See also HELP SYSTEM)

Implementation parameters:

(NVAR, MCAS) Parameters specified on creating a specific EDA implementation, NVAR is the number (maximum) of variables in the WA; MCAS the maximal number of cases a variable can have. These parameters determine the program limitations. Further implementation parameters are MAXC, the maximal number of cluster, MXDIM the maximal number of dimensions (factor etc) and MBL, the maximum number of blocks stored in a direct access EDA-file.

Installation, EDA installation

EDA has to be installed onto your computer. As EDA offers many features for workgroups, teaching oriented options and so forth, installation may vary quite a lot, i.e. the environment you are actually using depends on how things have been installed. This is done through profile files.

In-values:

values within the hinges, as opposed to out, far out and adjacent values. Frequently shown as '*' or blank on coded displays. Note that this symbol might be different in your EDA version (See SET GRAPH DISTCODE).

Intrinsic functions

(obsolete) In EDA versions earlier than version 2.0 the ) symbol was used to refer to system constants and the like. This feature has been replaced by system constants (starting with a $). See System constants for more information.

Label:

See variable label. Note that in addition to variables in the WA, the variable oriented matrix stored into C1 and MATRIX has a separate set of labels.

labels and descriptors

In many situations you will enter the label and the descriptor of a variable at the same time, therefore the term "labels and descriptors" is used to tell the user to enter descriptive documentation for a variable, often on the same line, starting with the label followed by the descriptor, the label being the first "word" of the sentence.

Ladder of powers

See power transformations.

Load, loading

This operations always refers to copying some data into the work area from matrices like C1, C2, MATRIX or the GVAR. A message telling you that the GVAR has been loaded, means that the current GVAR has been loaded as normal variable into the WA. Compare with STORE.

Letter variable

(defined scalar variables) single letter variables A..Z can be defined within EDA and then used in variable references or option values, e.g. A=10. There are three types: constants, auto-increment variables (i.e. after each reference the value is automatically incremented) and indexed variables (i.e. indexed on a numeric variable in the WA). See also ResVars (Result variables) a single letter scalar variable A..Z.

Logging, command_log

You may ask EDA to keep all keyboard input in a file called a log file (this action is called logging). Default is NOT to log commands, i.e. you will have to turn this feature explicitly on. See SET LOG and the section on logging (chapter file connection).

Low, lower, LO

refers to the position of an observation in the distributution with respect to the median (or some other reference), e.g. lower hinge LO adjacent value(s) and so on. Opposite upper or higher.

Macro, line macro:

a repetitive execution of an EDA command specified on one line using the EXECUTE facility or defined with the DEFMAC command allowing to invoke them by name. A macro command with more than one command line can be defined using the MACRO command. See the specific commands as well as the chapter on macros for more details. A single line macro is also called an abbreviation.

Map:

EDA has the possibility to display simple maps on a character screen. There are two types of maps: built-in maps, i.e. maps which are part of the EDA system. EDA is supplied with two of these maps: Swiss cantons and CEE regions, but this might be different at your installation. The second type of maps are called user defined maps; these are maps stored under a very simple form either in a normal external file, with a WA on an EDA file (in this case the map is automatically made into the current active map whenever the corresponding WA is read from that file) or stored in a casid/map library. For more details see the MAP command, as well as the appendix, where you will find information on how to prepare such a map. The link between the data and any such map is established through the casids; in fact the casid id and the map id must be the same in order to get a map on the screen.

Marcom

a special type of analysis which deals with MARginal COMParisions between two groups (e.g. elite vs. population).

Matrices:

The different EDA matrices: WA, MATRIX, C1, C2 and/or a matrix in the mathematical sense.

MATRIX

Distance or similarity matrix stored by CORREL, FACTOR and other commands. It may be manipulated by the MATRIX command. When loading a new WA the MATRIX is not cleared.

Mode

assumed modes determining working conditions of some commands (Analysis on whole WA, error termination of macros etc). These conditions are controlled by the ASSUME command.

Modification stamp

If variables are altered using arithmetic or other transformations, a *c* mark is added at the end of the variable descriptor and in most instances the variable descriptor is modified to show the modification done to the variable. The table below shows the originators of the stamps.

   stamp              possible originators
   ---------------------------------------
   *r*       recodification (RECODE, COPY, PUT)
   *t*       transformation (arithmetic)
   *s*       1. standardize/normalize
             2. smooth (more info in descriptor)
   *c*       LET using a bracket target
   *e*       editor: case editing
   *i*       ICAS or AGROUP
   *d*       DCAS or DGROUP
   *+*       CLUSTER: group centroids added
   *a*       AGGREGATE
   *%*       PERCENT

These stamps are a visible mark that a variable has been modified and the descriptor has not been modified accordingly. The descriptor may not need modification, e.g. in the case of a correction of an error or the like, then the edit command CLDE for clear_descriptor may be used to clear the stamp. The edit command SCAN allows to search variables with modification stamps and to complete the descriptor or to clear the stamp.

Note that when a default label/descriptor is created by some commands, e.g. the NEWVAR command, the last character of the descriptor is the '*' character to signal incomplete documentation of the variable.

Module

Several specific tasks do not fit into the "normal" EDA syntax frame. Therefore they are group in a module with its own specific syntax. This is the case for TED (text editor), the EDITor (data editing) and the TOOLBOX. In order to use these modules you have to call the module; you then enter it and EDA obeys the specific syntactical rules of the module. In order to return to normal mode, you have to leave (quit) the module. The EDIT and TOOL module may also be called in immediate mode, i.e. you execute a single command within a module and return immediately to EDA mode.

Outliers:

With EDA techniques attention is very often focused on outlier detection and treatment. An outlier is a case having a value outside the "normal" range, where normal is defined according to some specified criterion. An often used criterion is a value outside the inner fences, where the inner fences are defined as one step outside the hinges and a step is simply the midspread of the distribution. If we step out further say by 1.5 steps from the hinges we define the outer fences. Values between the inner and outer fences are called "out" values, values outside the outer fences as "far out" values. We shall also use "extreme values" meaning "out" and "far out", i.e. all values outside the inner fences. In this program the definition of outliers my be changed with the SET DEFOUT command, where it is possible to act on the inner and or outer fences definition.

Often we shall represent outliers with symbols. On Boxplots far out values are marked with a "@", out values with a "0".

On coded displays the standard symbols used are '@' (High far out values), '&' (low far out values), '#' (high out values) and '=' (low out values). Sometimes low and high are not distinguished; then the high symbol is used for low and high. Note that the symbols might be different in your EDA version (use of graphics characters selected by your EDA system administrator) or defined in your profile. Furthermore you can change these symbols using the SET GRAPH DISTCODE command.

Namestrings

(option form 3): character string enclosed in " of max length of 60 characters, used to specify file names, strings to search for and the like. Strings are case-sensitive. If you omit the closing " from a string, the remainder of the command line will be included in the string, i.e. no other option would be analyzed).

Power transformations

Power transformations are essential to re-expressions. In various commands they may be specified either with reference to the ladder of powers (Tukey) using a vocabulary of the "move on step UP the ladder" style or direct reference to the power, e.g. POWER=0.5 to take the square root transformation. See the special section on power-transformations later in this chapter and the description of commands like BOXPLOT LADDER, REEXPRESSION, PLOT, TRACES etc.

Print file (PF)

The PF is a special file, where results may be kept for printing (or some other processing). When entering EDA no results are kept, i.e. all results appear on the screen and are then lost. In order to keep track of results, the user has to open a print file (either in ALL or REQUEST mode: -> PRINT command). Note that a print file may be also be opened, but currently be inactive, i.e. nothing is saved until the print file is reactivated.

Profile (file)

Every time you are executing EDA you are creating a profile, i.e. a file where EDA keeps information on your environment. The information in your current profile may be copied from a number of other profiles; normally there is a EDA system profile, i.e. a profile for all users. Often there is a group profile, i.e. a group of persons sharing a number of informations (.e.g. WA archives and the like). Furthermore you may have your own permanent profile. [Note that profiles need not exist at any level; then of course the profile of your session will be rather simple]. The profile contains informations like the location of your WA directories, the default settings of the main options (you may tailor them to your needs), the location of your map directory etc. etc. For a standard use of EDA profiles are not important to a user, but are quite useful to group or system managers. Refer to the documentation for details.

Prompting

(1) on most systems (see implementations) EDA prompts with a specific symbol for a new command. These symbols are different in each EDA mode (normal mode, EDIT, TED etc). Typically in normal EDA mode it prompts with the > symbol; But this might be different on your machine. (2) In many instances EDA asks for additional information with an explanatory text or symbol. For example EDA might asks the number of variables on a file, for labels or descriptors for a new variable or for the lines of a macro. In this cases answer with the information or cancel or stop (this will be clear from the context). If you do not know what to reply, type a question mark as first character on a line; then EDA should give you some more explanation [Sorry, this will not yet work in all instances]. In many instances default values are admitted; then EDA will tell you to which value(s) it defaults. This is done either with an explanatory text or the value is shown followed by the * symbol. The text "Does the file contain rawdata [Y*,N]" means that if you respond by a simple carriage return or Y, the file contains raw data, otherwise it doesn't. [Y,N] would mean that there is no default value.

Protected Variables

Variables are protected, i.e. alteration is refused, only when data is input from the keyboard. The REVERT command reverts the protection, i.e. an unprotected variable loaded can be protected using REVERT; a protected variable can be unprotected the same way. (See REVERT) If protected variables are present in a WA the WA protection is automatically turned on. You may unprotect the WA and all variables in that WA by using SET WAPROTECT OFF. You should clearly distinguish between protected variables (an attribute of a specific variable) and a protected WA (a global attribute). You may delete an unprotected variable from a protected WA.

protected WA

A WA is protected agains accidental overwriting either if it contains protected variables or if the WA protection switch is set. See the SET WAPROTECTION and SET SECURITY switches.

RAWIN (file)

Designates the file uses for raw data input. RAWIN is also the default generic file name if none is specified on raw data input operations. In ordinary use it is mostly used with the *READ RAW command, however macro commands exist to read and parse input lines. See the SET RAWIN command for additional information.

RAWOUT (file)

Designates the file used for raw data, text and other output operations. RAWOUT is also the default generic name. Besides the *WRITE command a number of commands have options to add output to the RAWOUT file. Note that this is not the print file, but a file intended to be read by other applications, including EDA macros. See the SET RAWOUT command for additional information.

Reference (value)

Same as CENTER, a variable attribute. See the CENTER entry in this glossary.

Replacement value

(1) A numeric value standing for a alphanumeric string (4 characters) used on the EXTRACT command. EDA maintains a memory stack which keeps these replacement values or (2) a numeric value (default -1) used where a variable has to take a value and none is given or is an arithmetically undefined variable. This is the case of a division by zero, the EXTRACT command, as well al the LABEL command (options used to extract numerical information from labels or descriptors).

Result Variables

ResVars are scalar results from commands which may be inspected and used (mainly in macros). [Earlier they were called ZVARS] There are 10 ResVars you may use by writing $0 .. $9. $0 to $4 are integer values, $5 to $9 are real values. These ResVars are specific to each command and are documented on-line. Use HELP RESVARS to show the ResVars of the current command; HELP RESVARS <cmd> to see the resvars of command <cmd>. ResVars are initially set to 0, but are not reset between commands. A specific command may set some ResVars or none at all; the others are left untouched. Therefore it is essential to use (e.g. store them etc) ResVars immediately after the command defining them, unless you are really sure that the intervening commands do no set ResVars. Z$ is a string result variable. See Z$.

Row names:

The row of the data matrix may have a name. Default name is 'variable', other names may be set by the SET ROWNAME command.

Selection (commands)

let you analyze subsets of observations. Selection commands do not alter the current WA; only selected cases are included into analysis, until the selection is turned off, either explicitely because a new selection has been specified or implicitely because the selection does no longer make any sense satisfying some specific conditions. For more details refer to the section on selection commands and groups.

Simple expressions

may be specified where named values are specified (option type II) or on data input (keyboard input only. A simple expression (do not confound it with "expression") may contain (1) numbers (2) letter variables (3) intrinsic functions (4) case references (5) case substitution and (5) at most one of the following operators: +,-, *,/ or %. In some instances items (4) and (5) may not be allowed. Examples: A+100 or 200/$NVAR. Simple number, letter values etc. without operator may be seen as a special case (only one argument) of the simple expression.

Simple logical expression

Some commands use a very simple form of logical expressions, i.e. a special form of a command line option. It takes the form IFoval, where IF identifies these expressions, <o> is a logical operator (<,>,= or ~, for less than, greater than, equal or not equal) and value for the comparison. In fact, IF>20, is just a special case of the name value option form, where other symbols (<,>,~) are permitted. This is clear when you use the equality operator: IF=24.4! Important: no spaces are allowed in a simple logical expression. Note that this form can be found only with a few commands.

startup file

See command files.

store, storing

Refers to the operation of storing data or other information from the WA into one of the matrices C1, C2, MATRIX or the GVAR, casids and the like. The opposite is LOAD.

String variables (scalar)

You may use a small number of string variables (the number is implementation dependent): their names are A$, B$, C$ etc., i.e. single letters, followed by the $ sign. String variables may be used in connection with all command input by invoking the substitution de the string variables: $A$ is replaced by the current value of string variable A$ (the first $ means "substitute"). Initially the string variables are defined as null strings, i.e. of no length; (Note that on some systems string variables contain information on the environment) therefore if no string is assigned to a string variable, the $A$ reference will simply be removed from the command line. String variables are handled with the SET command (see there for more details). There is also a string result variable called Z$. When using the string commands in macros special care has to be taken to cause substitution at the right time (see macros for more details). It is also preferable not to use string variable A$ in macros (some commands use it in a special way).

Instead of a letter you may also use $<$ causing the string to be read from the RAWIN file. RAWIN has to be connected with the SET RAWIN command; otherwise an error occurs. (This feature is explained with the RAWIN command --> section on macros, as it is most useful for macro programmers.

Tables

are a specific variable type. Table variables can only be used with commands meant for table analysis and management. Tables are created from variables using the MAKE TABLE command or from commands like the XTAB or BREAK command.

Ties

you may defined groups of variables (bundles, predefined variable lists, variable groups). The tie defining group membership is an attribute of a variable like the label, descriptor or reference value, except that a variable tie need not be defined (no default). You refer to a variable group using the #list-number convention on the variable list. The ties are stored with the variable and saved like any other information pertaining to the WA. The DESCRIBE command shows the group membership (=tie) of each variable (blank if no tie is defined). Whenever a WA is transposed a tie becomes a GVAR memberships, and GVAR memberhips are used as ties. Ties may also be defined by a cluster analysis on groups.

System (operating)

EDA runs on various operating systems; some features of EDA might depend on capabilities of the operating system, i.e. features marked as "system dependent" might or might not be available in the actual EDA software you are running; for instance only the PC version of EDA is able to read and write spreadsheet files.

System constants

or constants may be used instead of specific values in various places (options, expressions, input values). They are used by invoking their substitution with a $ sign followed by its name and in some cases additional information. See also ResVars, where a $ preceeds a number.

There are three types of constants. The names are meaningful in their first three characters. Additional characters are not analyzed up to the next separator.

The first type contains implementation constants or information on the sizes of various matrices. They are used as is with no additional specification.

    Name    Explanation
    -------------------------------------------------
    $NVAR   max. number of variables a WA may contain
    $MCAS   max. number of cases a WA may contain
    $MXF    max. number of dimensions C1 or C2 may contain
    $NVR    number of variables in the WA
    $MDIM   size of MATRIX
    $C1V    number of variables (coordinates) in C1
    $C1D    number of dimensions in C1
    $C2V    number of variables (cases, coordinates) in C2
    $C2D    number of dimensions in C2
    $NGR    number of groups in GVAR
    $RVAL   replacement value
    $MISS   replacement value
    $VLN    number of variables in the current vlist
    $UDL    upper range limit (data range)
    $LDL    lower range limit
    $NSL    number of cases selected (selection)
    $NTT    total number of cases (before selection)
    $GET    get a value from the user

A second type is used to inquire about specific variables, or a specific location in the current variable list, therefore a variable reference (name or number) is required. These constants take the form:

     $name.var

Where <name> is a constant name, and <var> is is either a variable name or the position (number) in the WA. With $VLS a variable name does not make sense, as it refers to the current variable list, i.e. <var> points to var-th element of that list.

   Name

    $NOC.var   number of cases
    $MIN.var   minimum
    $MAX.var   maximum
    $CEN.var   center
    $TIE.var   variable tie
    $PRT.var   1=protected, 0= unproteced
    $TYP.var   variable type

    $VLS.pos   variable at <pos> in the current vlist

Note that if you do not use variable names, any number within 1..$NVAR will be accepted, even for empty variables.

The third type needs two arguments. Currently only $DAT exists:

     $DAT.var.case

$DAT refers to a specific data value in the matrix. <var> refers to the variable and has the same syntax as above. <case> refers to a case (position, casid or letter variable). Note that no checks are performed if positional numbers are used, provided they are in the range 1..$MCAS for the case reference and 1 .. $NVAR for the variable reference.

Terminal input

--> Freefield input

Toolbox:

A separate module within EDA, where are to be found several tools used in connection with EDA (defining special files, computing matrices directly into MATRIX) and a series of general purpose file handling commands (sort, merge-checking, etc.).

User

You, the person using the EDA Software. Here users are explorers, eager to explore a data set and not afraid of using multiple commands. If you are looking for automatic handling of some data analyis sequences you are attempting to use the wrong software package.

User, advanced

A user who is not afraid of looking behind the scenes in order to get things done the way he or she wants things to be done, a person who is not afraid to read some technical stuff to have background information or is prepared to learn how to write macros etc.

User profile file:

The current user profile defines the user's working environment. When you call the EDA program, a user profile will be created before you are able to enter the first command. This profile may be minimal or include many contextual informations built from a system wide profile, a group profile as well as permanent user specific profile file. Ask your group administrator or the person taking care of the EDA installation on your computer for further information on profiles or help whenever you suspect trouble linked to profiles (for instance files you no longer can GET, commands working differently from what you were used to etc). Note to advanced users and administrators: There is a special document called the "Administrators Guide" which explains profiles in full detail.

Variable

a data array together with all the information attached to it: label (8 characters), descriptor (48 characters), status information (type: 1= numeric variable, 2= GVAR, 3=alpha/the number of cases and a tie) and three associated values minimum, maximum and a reference value (default= median). The user refers to a variable using an integer number referring to the relative position of the variable in the WA. Instead of numbers variable labels or ties can be specified using the # substitution. When storing a WA in an EDA file variables are packed; i.e. variables are stored consecutively with no empty variables in between. The same can be done by the PACK command. Variables can be protected or unprotected and have different usages (type of variable). You should very clearly distinguish letter variables, i.e. scalar variables and "normal" variables, i.e. vectors.

variable bundles

Variables tied together, using a TIE. (See there for details)

Variable descriptor:

an up to 48 character long text describing the variable. The descriptor may become modified automatically by transformation commands (encoding of the transforming command) if the user does not supply a new descriptor or a mark, called a modification stamp is added at the end of the descriptor to signal that the variable has been modified. In some instance, when the user does not supply a descriptor a default label and descriptor a created, e.g. the NEWVAR command. These default labels/descriptors are considered "incomplete documentation", e.g. its meaning may easily been forgotten if the user comes back to that variable only some time later. The SCAN (edit) command is used to scan the WA for this type of variables and to correct the descriptor.

Variable groups:

Variables may form groups using ties. See TIES form more details.

Variable labels:

an up to 8 character labels. If you desire to use the labels as variable references everywhere, including expressions, you should not use blanks nor any special symbol within a label. If you are using single-letter variable names in some instances you need to distinguish them from letter variables a variable name: #A refers to a letter variables, #A' to a variable stored in the WA.

Variable list

The second field on a command line, where the user specifies the variables to be analyzed. Variable lists may contain integer reference numbers, variable names, references to a predefined list, names with wildcards and letter variables. All elements except the integer numbers are preceeded by a # sign (the # symbol is considered a numerical symbol in EDA). In some instances commands need several variable lists. In such a case the "/" separates the different lists.

There are a number of commands you can be use to build variable list using more or less complex criteria: based on information in the labels or descriptors, statistical criteria in the variables and the like. These commands can be used to create a variable list; the subsequent command then takes it up (you specify no variable list) and analyses those variables.

   >VARS 1-10 SORT MEDIAN
   >LIST

In this example you variables are sorted on their median, i.e. the VARS takes variables 1 through 10, and builds a new variable list by computing the median and sorting these variables; they will then appear on the LIST command: the first variable will be the variable with the smallest median, the last with the largest.

Variable type:

(variable attribute) EDA uses different types of variables. Most variables are normal numerical variables (type 1, nominal, ordinal or binary). There are also table variables. More types will be used in the future.

Wildcard

in several instances (most frequently on variable lists) you may specify a string (e.g. a label reference) containing wildcard characters. A wildcard character tells EDA that in the position(s) marked by such a character any character may be present for a match. The EDA wildcard character is the '*' symbol, which has two different meanings depending upon the position in the string: (1) at the end of the string it means "match all" (2) within a string "match any character".

          VAR*

matches all variables starting with VAR, the characters beyond do not matter.

          ***X

looks for any four-character long label, having an 'X' in position 4 and any 3 characters preceeding it. Of course the two meanings may be combined:

          ***X*

Toolbox:

an EDA module where a collection of tools can be found. These tools are programs needed in connection with data analysis in a large sense (data manipulation, dictionary creation, report writing ... These tools are not necessarily EDA specific. They are self-contained. This collection contains tools I need and use. Other users might find them useful also, others won't.

Work area:

(WA) an NVAR variables by MCAS cases (implementation constant: see chapter on implementations) data matrix, as well as all the information (labels, documents, status) attached to it. Analyses within EDA are always performed on variables residing in this WA. In a larger sense the WA includes also the MATRIX and the CONFIGURATION stored. For data saving/retrieving purposes the WA is treated as a whole ("block"). I/O operations are performed on whole blocks, with the exception of the *COPY command. The WA need not be a rectangular data matrix. The type of command use determines whether the whole WA need be rectangular or not. The WA has a label and a descriptor. Three values of importance to the user are attached to the WA: The number of variables, the type of the WA (1 if the WA is rectangular) and the number of cases, if the WA is rectangular. Multivariate commands, like FACTOR or CLUSTER operate on the whole WA or on the variables in the vlist depending on the mode set by ASSUME ALLVARS.

Work area name

(waname) The name of a workarea is an up to 8 character long string. Upper and lower case letters are not distinguished. The WA is mainly used for reference to a WA stored in a file (GET command) or to store it into a WA archive (PUT). For other types of files the WA name is just provided for documentation purposes and has no special meaning. The work area name may often appear together with the WA archive (directory) name; then the reference might take the form DIREC:WANAME (=> files).

Work area archive

A WA archive contains collections of WA; in order to access a particular WA you need only to know the name of the WA, i.e. the 8 character name known to EDA; system specific file names are not required. Furthermore the DIR command allows you to do sophisticated searching on information stored in order to find out what WA contains the information you need. Note that WA archives are an optional, i.e. they may not be installed or active on your system. Use STAT WAARCHIVE to find out.

Work area directory

-> Work area archive

Work area library

-> Work area archive

XNAME

is used to set the designation for an X-variable in regression type situations. By default the x-variables are called "independent", but you might prefer to call them "predictor", "explanatory" or whatever term is suitable. The XNAME can be changed with the SET XNAME command or alternatively it can be changed in the profile.

YNAME

See XNAME for details. The default name is "dependent".

ZVAR

ZVARS --> see ResVars (Result variables) They are called ZVARS because in versions earlier than 2.0 they were referred to by putting a Z in front of them.

Z$

is then string Result variable and contains informations like file names. Like the the other ResVars any command may change its value (in fact only a few do), therefore before using it you have to make sure that it contains the information you need; in case of doubt make a copy of it into a permanent string variable using the SET command.

The Art of Coding

Introduction

In many situations it is preferable to replace numbers by some well chosen symbol, reflecting the specific information we are interested in. Coded displays are used very often in EDA and quite a number of commands has options to produce different forms of coded displays.

The purpose of this chapter is to introduce the principles of coding. Note that depending upon particular needs of a command, these principles may vary; for instance it is not interesting to use a blank space as a code on a coded histogram, whereas blank space is a highly efficient code in a coded list.

Forms of coding

Coding is used to stress important aspects of information we are looking for, by replacing the numerical form (which is often not easily readable) by some well chosen symbol.

The following forms of coding are used in EDA:

Distributional coding

The symbols shown reflect the position of each observation within the distribution of the variable, i.e. symbols are used to show whether a case is a far out, out, adjacent or in value. Normally different symbols are used for values below and above the median, e.g. we distinguish low and high out values.

Bin coding

The cases are grouped into a number of bins according to some criterion, and for each bin a different symbol is used. Several criteria are possible: Each bin contains an equal number of observations (Fractile coding); each bin corresponds to an interval of equal width (interval coding) or the bins are defined by the user (indicates the bin boundaries).

Reference coding

Each symbol reflects the position of a case with respect to some criterion. A criterion often used is the median; then values below the median are marked differently from cases above the median.

Instead of only marking low/high positions it is also possible to indicate the distance of a case from a reference value, by either using different symbols or by using more than one symbol, e.g. a single plus sign for a case close to the median, and 2, 3 etc plus signs for cases farther away.

Marking

All cases corresponding to some criterion are marked with a special symbols, all other cases are not marked.

"as-is coding"

The (integer) numerical value of the variable is used directly "as is", i.e. no intervals are computed. This is useful for categorical variables, if you want to show a different code for each value of the variable. In the following section we shall examine the various forms of coding in some detail. Note that the examples will mainly use the LIST command as an illustration; several other commands work essentially the same way.

Distributional coding

    >LIST 1-4 DISTCODE
   26 cases
Distributional coding (full); Symbols:(lo)"&=-*+#@"(hi)
         ZBLUSONGZFSBBSAASGATTVVNGJ
         HEURZWWLGROSLHRIGRGGIDSEEU
-----------------------------------
ICult   |+***----***+**--****++*++*
XCult   |****-+***+*****+*+-*+-&--+
Form    |+****---**++**---***+***++
Sucre   |-****+*-*#-=-******+-++**+

The following symbols are used: '@' for a high far out value, '#' for a high out value, '+' for high adjacent, a star for an in value, and '-','=','&' for low adjacent, out and far out values.

Note that these symbols might be different on your screen, as they can be changed by the EDA administrator (because your system has nicer looking symbols). You may also change the symbols for yourself, either by putting it into your profile or setting the symbols differently.

Symbols may be changed using the SET GRAPH DISTCODE command (STAT GRAPH DISTCODE shows them) for the whole interactive session or locally using a "codes" string on the command line producing that particular output. E.g. The command

LIST 1-4 DISTCODE "AB   FG"

will use A and B for high far-out and out values, F and G for low far-out and out-values and blank for others.

The DISTCODE option has an additional option SIMPLE, i.e. we do not want to distinguish between 'high' and 'low', i.e. the low and high far-out, out and adjacent values will have the same symbol. Note that it is also possible to use SET DISTCODE SIMPLE to produce the same effect for all commands.

The following example uses the same coding scheme:

>SHOW 1-4 CODED

26 cases Showing :ICult ( 1) Culture initiative legend for coded values: (HI far)@ # + - = & (LO far) canton ICult XCult Form Sucre 16 AI 7.2 + - 6 OW 7.6 + - + 8 GL 9.3 - - 15 AR 10.1 - 7 NW 10.2 - 5 SZ 10.2 - 20 TG 11.0 + 17 SG 11.1 - 3 LU 13.6 18 GR 13.8 + 14 SH 14.1 2 BE 14.3 10 FR 15.6 + # 23 VS 15.7 & +

Bin coding

The cases are grouped into a number of bins according to some criterion, for each bin a different symbol is used. Several criteria are possible. In the following example the values are divided into four groups containing approximately the same number of cases. (Four groups is the default value). This means that the distribution is broken up into four pieces (fourth, quartiles), i.e. the bin boundaries are the hinges.

  >LIST 1-4 FRACTILES
   26 cases
Bins of equal size (APPROX.); Symbols:.:*#
         ZBLUSONGZFSBBSAASGATTVVNGJ
         HEURZWWLGROSLHRIGRGGIDSEEU
-----------------------------------
ICult   |#*::....:**#**..::::##*###
XCult   |:*:#.#*::#***::#:#.##....#
Form    |#::*:...::##**...*::##**##
Sucre   |.**:*#*.:#...::#:*:#.##:*#

The symbols used here are the default symbols, '.' for the lowest fourth and '#' highest fourth. On many systems these symbols are replaced by nicer looking graphical symbols. If you want more groups (or less) you will use the "symbols" string to indicate the number of bins, by specifying a series of codes, each character standing for a bin to be defined. Then if you specify three symbols, the variable will be divided into thirds, if you specify 10, tenths and so on.

We call that first form 'fractile coding'. Other forms are available. This is usually the default coding used, unless, e.g. in the case of the LIST command, interval coding is used by default (for "historical reasons").

You may request to define de bins by cutting the variable into intervals of equal width (interval length). Again by default it will be divided into four intervals; if you need more or less use the "symbols" string to indicate the number. This is called 'interval coding'.

The READ option can be used to enter your own bin boundaries. The number of bins depend upon the number of codes in the "symbol string" (4 by default). You will then be asked to enter the bin boundaries (one less than bins requested).

Finally there is a special option for fractile coding, EXACT. The normal form proceeds as follows (example: four bins, i.e. fourths): EDA determines the hinges and the median, then all cases below the lower hinge are assigned to the first bin, the case below the median, but above the lower hinge to the second and so on. This is fine as long as (e.g.) the lower hinge has a value occurring only once in the variable. If the are several or many cases with the same value this procedure will not define bins with equal numbers of cases, but might - in some cases - produce quite different counts for the bins. In many cases this is however what you want, because for you the hinges and the median have some meaning to you. In other situations however (especially when experimenting with theoretical distributions and the like) your really want to have identical counts in the bins. Here the EXACT option will help to do exactly this.

Reference coding

Each symbol reflects the position of a case with respect to some criterion. A criterion often used is the median; then values below the median are marked differently from cases above the median.

>LIST 1-4 REFERENCE FUZZ=2.5

26 cases Reference coding below/above center; Symbols:- + ZBLUSONGZFSBBSAASGATTVVNGJ HEURZWWLGROSLHRIGRGGIDSEEU ----------------------------------- ICult |+ ---- ++ --- -++ +++ XCult | -+-++- + - + +-++----+ Form |+ - ----- +++ --- --++++++ Sucre |-+ ++ --+--- +- -+-++ +

In this example the symbols express whether a case is below or above the reference (center) value, i.e. a reference value stored with each variable. By default this value is the median, but it can be changed to contain other meaningful information. (e.g global percentages and the like).

Values equal or close to the reference value appear in this example as blanks. The example uses the fuzz=2.5 option to tell EDA, that equality is not strict equality but within a range of the reference value plus or minus 2.5. Note that if the fuzz option is not used, the EDA system wide fuzz value is used (it can be set using the SET FUZZ command. Some commands (e.g. the LIST command) have additional options for the reference value.

In the following example the reference is the median, and the distance to the median is expressed using units of 1/2 midspreads, i.e. each + symbol shows a distance of 1/2 midspread.

>LIST 1-4 CODED

26 cases variable listing units of 1/2.0 midspread case ICult XCult Form Sucre 1 ZH + + - 2 BE + 5 SZ - 6 OW - ++ - + 7 NW - 8 GL - - - 9 ZG 10 FR + +++ 11 SO + - 12 BS ++ ++ --- 13 BL --

Marking

All cases corresponding to some criterion are marked with a special symbols, all other cases are not marked.

   >list 1-4 mark if>45
   26 cases
Mark values greater than <val>; Symbols:@
         ZBLUSONGZFSBBSAASGATTVVNGJ
         HEURZWWLGROSLHRIGRGGIDSEEU
-----------------------------------
ICult   |
XCult   |   @ @   @     @ @ @@    @
Form    |
Sucre   |     @   @     @   @ @@  @

In the previous example the cases above 45 are marked with the EDA marking symbol (it might be different in your EDA version, and you can chang GRAPHSYMBOL command.

"As-is coding"

This special form of coding is mainly useful for categorical variables as it takes up the numerical (integer) values and codes them directly, i.e. no computation (intervals, reference values) is performed.

The default codes used are "0123456789", i.e. a "1" represents a numerical value of 1, i.e. the default form does not really code values, except for values below 0 (code "-") and values above 9 (code "+").

If you specify DICHOTOMY codes used will only be 0 (or space) and 1. Positive values are shown as 1 and 0 or less as 0 (you may specify different codes).

Command syntax

These various forms of coding are available with a number of commands, namely the LIST, MAP, PLOT, CASID and HISTOGRAM command.

Some commands might present slight differences with respect to the forms shown above and the syntax explained below. Namely default values might be different and some default codes might be changed. E.g. It is not always desirable to show blank "symbols" on a plot, as you will see nothing at that particular location...

The syntax chart below is taken from the HISTOGRAM command:

 <code.opt>
     BINS � [FRAC] | EXACT | READ ["symbols"]
     DISTRIBUTIONAL [SIMPLE] ["symbols"]
     REFERENCE=value ["Symbols"] [FUZZ=val]
     MARK|=val | IF>val |IF=val| IF<val | IF~val
        ["symbols"] [FUZZ=val]
     ASIS ["symbols"]
     DICHOTOMY ["symbols"]

It shows the various forms of coding. The first line shows the various forms of 'bin coding', i.e. FRACTILE (the default option), BINS requests interval coding, EXACT defines exact fractiles and READ allows you to enter bin boundaries. Optionally "symbols" is used to define alternative symbols; the number of symbols you specify determine the number of bins to create.

DISTRIBUTIONAL requests distributional coding. SIMPLE does not distinguish between lower and upper far-out, out and adjacent values. Finally symbols is used to enter alternative symbols.

REFERENCE requests reference coding. You may specify in addition different symbols and a fuzz value.

MARK requests marking. There are four conditions equality (IF=value, or just MARK=value), greater or less than a value (IF<val, IF>val) or inequality (If~val). For the equality /inequality option you may specify a FUZZ value. The "symbols" string is used to specify other symbols than the default symbols.

ASIS requests as-is mode. By default the symbols are "0123456789" (zero replaced by a space with LIST ASIS); values above 9 appear as "+" and values below 0 as "-". If you specify your own symbols, e.g. "abcd", "a" will stand for 0, "b" for 1, "c" for "2" and "d" for 3; values larger than 3 will appear as "+", values smaller than 0 as "-". Please not that you need not specify the codes for values outside the range ("-" and "+").

DICHOTOMY treats the variable as a binary variable. Values of zero or less appear as "0" (blank with LIST DICHOTOMY), positive values as "1". You may specify alternative symbols (note that only the first two symbols will be taken).

Power transformations/reexpressions

Power transformations are important to reexpressions. Many commands within EDA let you perform reexpressions or assist you in finding an appropriate re-expression of your data.

Power transformations are frequently used to transform data; John Tukey has developped a very useful framework for re-expressions, called the ladder of powers, sometimes called the Tukey's simple family of power transformations, i.e. all positions on the ladder of powers can be written as powers of the original variable.

The EDA software offers in many situations two ways of dealing with power transformations (1) reference to the ladder of powers (moving up or down) or (2) giving directly the power of the re-expression you want to obtain.

In the following example we will refer to the REEXPRESS module; please note that you will encounter similar options and commands in other contexts (like PLOT INSPECT).

Moving up or down the ladder

When you are looking for an appropriate transformation (e.g. to symmetriz a distribution) you are usually less interested in the mathematical formulation of potential transformations than in seeing what is done to your variable. Here the ladder of power image comes in handy, i.e. starting with the raw data you may move up the ladder of powers (i.e. taking squares)... and when you see - from the boxplot that will be displayed - that the symmetry problem gets even worse you might move down the ladder, let's say two steps, if the boxplot tells you that the current step on the ladder is still not enough you may move down a further step (without worrying about what kind of transformation it takes to do that) or maybe correct by going back on step.

Direct specifications of powers

The ladder of power is - for practical reasons - limited to the most used transformation (in the EDA implementation for instance the highest step up is 3, i.e. cube): If you need to take the fourth power you might then use (in the case of the REEXPRESS module) the POWER=4 command or - in the case of the TRACES command - the POWER=4 option. These options or commands let you also select a power or 2.5.