Andys Binary Folding Editor

Introduction

Andys Binary Folding Editor is primarily designed for structured browsing, although it also provides minimal editing facilities.

This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the definitions (structures or unions) within them. BE is particularly suited to displaying non-variable length definitions within the files.

This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps.

For a summary of how to use the editor, see the section Using the editor.

Features

BE has the following features :-

Command line arguments

  usage: be [-w width] [-h height] [-c colscheme] [-p] [-r]
            [-i inifile] {-I incpath} {-D symbol} {-S name=val}
            [-d defn] [-a addr] [-f field] [-v viewflags]
            {-y symfile} [-C dx] { binfile[@addr] | mx![args[@addr]] }
  flags: -w width      screen width
         -h height     screen height
         -c colscheme  set colour scheme (0 to 3, default: 0)
         -p            print data to stdout, non-interactive
         -r            restricted mode, no shelling out allowed
         -i inifile    override default initialisation file
         -I incpath    append include path(s) for use by inifile
         -D symbol     pre-$define symbol(s) for use by inifile
         -S name=val   set constant name to be value
         -d defn       initial definition to use (default: main)
         -a addr       initial address to use (default: 0)
         -f field      field name within defn (list link, or array to expand)
         -v viewflags  combinations of A,O,L,I,Y,a,e,b,o,d,h,J,+,-
         -y symfile    input symbol table file(s)
         -C dx         code disassembler extension
         binfile@addr  binary file(s) (with optional address, default: 0)
         mx!args@addr  memory extension with arguments (and optional address)

The -w and -h arguments can be used to try to override the current screen size. This doesn't work on UNIX or Win32, but does on OS/2. The -c argument allows you to choose from a small selection of colour schemes.

The -p flag causes BE to be invoked in a non-interactive manner. It decodes the address given, as a structure of the type specified, and writes the result to the screen (as stdout).

The -r flag prevents a user of BE from shelling out a nested operating system command.

The -i flag overrides the default initialisation file.

The -I flag affects the operation of the include command in the initialisation file.

The -D flag allows the definition of symbols which may be accessed via the $ifdef and similar directives in the initialisation file.

The -S flag allows the definition of a named constant for use in numeric expressions in the configuration file.

The initial structure definition and address to decode may be overridden with the -d and -a flags. Normally BE starts by looking up the definition of a 'main' definition, and decoding the data at address 0 as such. The address expression is allowed to refer to symbols in symbol tables, as it is evaluated after the symbol tables have been loaded. All the other numeric command line arguments are evaluated before any symbol table loading takes place, and so can't refer to symbols.

If the -f flag is used, it must identify a field within the specified structure. If the field is a pointer to a structure of the same type, BE will initially display a linked list of structures, rather than just one structure. Otherwise, the field is assumed to be an array of fields, and an element list is displayed instead.

The -v flag allows you to state that addresses, offsets, lengths and array indices are to be displayed next to the data on display initially (note that -vI turns off indices). You can also turn on the symbolic display of addresses. In addition, you can specify the display mode of indices, one of binary, octal, decimal or hex. The + and - keys affect the initial level of detail of display, and only has effect when used with the -f flag. This is particularly useful when combined with the -p flag.

A symbol table(s) may be specified using the -y flag. Each line of the symbol table is of the form :-

  symbolname    472484aa

Note that the address is in hex, and not 0x preceeded. This conveniently matches the symbol table layout generated by the ARM linker.

Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.

Each binary file provides data for a part of the memory space which BE can edit. Therefore each binary file may be described as a memory section.

Alternatively a memory section may be specified as mx!args. This instructs BE to load a memory extension, and to access the data indentified by the arguments via the memory extension. This feature allows BE to be extended to be able to edit non-file data directly, such as sectors on a disk.

The -C dx option may be used to extend BE by the use of a disassembler extension. This is a peice of code with a well defined interface, which BE uses to disassemble data annotated as code.

Typical invokations of BE might be :-

  be picture.bmp
    to edit a file, which is loaded into the BE memory space at 0 onwards.
  be -y gizmo.sym gizmo.rom gizmo.ram@0x8000
    to edit dumps from the RAM and ROM of a coprocessor.
    where the ROM starts at 0, and the RAM at 0x8000.
    gizmo.sym is the symbol for the microcode the coprocessor was running.
  be -y ucode.sym -i ucode.ini -C dis86 coproc!io=0x400,mem=0xc0000
    to live edit a running coprocessor.
    ucode.sym is the symbol for the microcode the coprocessor is running.
    ucode.ini is a custom initialisation file.
    BEcoproc.DLL provides BE with access to coprocessor memory.
    io=0x400,mem=0xc0000 tells BEcoproc.DLL how to find the coprocessor.
    BEdis86.DLL allows BE to disassemble any code in the data.
  be -d HEADER -a 512 -p -vA file.dat
    display the header at 512 bytes into file.dat.
    decoded data is to be written to stdout, BE is not interactive.
    addresses are to be displayed next to the data.

The initialisation file

One of the first things BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.

Under OS/2 or Windows, BE finds the initialisation file by searching along the path for the .EXE file, and then looking for a .INI file with the same name.

Under UNIX, BE looks for ~/.berc, and failing that, it looks along the path for be and then appends .ini. If be is renamed to xx, then the files will be ~/.xxrc and xx.ini.

BE can be made to look elsewhere using the -i command line option.

This initialisation file may contain C or C++ style comments.

Also, $define, $ifdef, $ifndef, $else, $endif and $error are supported, as a form of a pre-processing/conditional processing step. The -D command line option may be used to pre-$define such conditional processing symbols.

If BE is running on OS/2, then OS2 is pre-$defined. If running on Windows NT, then WIN32 is pre-$defined. If running on a type of UNIX, then UNIX is pre-$defined. If running specifically on AIX, then AIX is pre-$defined. Either BE or LE will be pre-$defined, depending upon whether BE is running on a big-endian or little-endian machine. These $defines allow you to write initialisation files with sensible defaults, relevant for the current environment.

An include directive is supported, and included files will be searched for by looking in the current directory, then along an internal include path, along the BEINCLUDE environment variable, and finally along the PATH environment variable. The internal include path is usually empty, but may be appended to by the use of the -I command line option.

By the time the initialisation file is processed, any symbol files specified on the command line will have been loaded, along with any data files. This means that initialisation files may make reference to symbols and also to the data itself.

The initialisation file contains commands to set the default data display attributes, set constant, structure definitions, alignment declarations and include statements.

As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts.

Numbers

Wherever the initialisation file calls for a number, the following variants may be used :-

number
The number can be signed, specified in binary (eg: 0b1101), octal (eg: 0o15), decimal (eg: 13) or hex (eg: 0x0d).
addr "symbolinthesymboltable"
if a symbol table is loaded, and the symbol can be found then the result is the numeric value of the symbol. Otherwise a warning is generated, and the result is the value of the constant nosym, or if that isn't defined, its 0xffffffff.
sizeof DEFN
this gives the size in bytes of the earlier defined definition called DEFN. If DEFN isn't already defined, then an error results.
offsetof DEFN "fieldname"
this gives the offset in bytes of the given field in the earlier defined DEFN. If DEFN isn't already defined, then an error results. If the field can't be found in the DEFN, then an error results.
map MAPNAME "mapletstring"
this gives the numeric value that corresponds to the given string defined in the map defintion, as explained below.
identifier
BE keeps a smallish pool of numeric global constants, which can be defined with the set command. You can refer to these in expressions. If you refer to a non-existant identifier, BE will then try to look up the name in the symbol table (ie: it will try addr "identifier"). Failing that, BE will scan all the map definitions to see if the identifier matches exactly one maplet name - if so, the result is the numeric value of that maplet. Obviously you will never be able to refer to a maplet with white space in its name by this shorthand mechanism.
` indentifier expression `
The value of the identifier if defined, or the value of the expression if not.
. (dot)
When defining a DEFN, dot evaluates to the current offset. When prompted for an address in the @ command, it is the current address. When prompted for a delta value, dot is the current delta. When using the = to change a numeric value, dot is the current value. When specifying a mask in a maplet, dot means the maplet value.
[ type attributes address ]
This trys to fetch a numeric datum of the given type (eg: n32), to take into account the given attributes (eg: signed be), from the given address. If nothing can be fetched from that address, then it is a bad expression.
[[ buff e0 e1 e2 ]]
This loops through some addresses trying to match the pattern specified. The loop is basically of the form for ( a = e0; a != e1; a += e2 ) match(buff,a). Be careful using this, there is no way to abort the scan.

It should be noted that when using the offsetof or map keywords, leading and trailing space is not significant in the "mapletstring" or "fieldname".

Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings. Operators grouped together have equal precedence. Higher precedence operators listed first :-

+, -, ~, ! unary plus, unary minus, complement, not
*, /, % multiply, divide, modulo
+, - add (plus), subtract (minus)
<<, >>, >>> shift left, shift right (signed), shift right (unsigned) [Note 1]
>, <, >=, <= greater than, less than, greater than or equal, less than or equal
==, != equal, not equal
& bitwise AND
^ bitwise exclusive OR
| bitwise inclusive OR
&& logical AND
^^ logical exclusive OR [Note 2]
|| logical inclusive OR
? : conditional expression

Note 1: The >> is a signed shift right, and >>> is the unsigned shift right (much like Java). This distinction is necessary as all numbers in BE expressions are unsigned. (This affects affects the outcome of expressions like -2/2 which is 0xfffffffe/2 which is 0x7fffffff, rather than the -1 you might expect).

Note 2: C/C++ does not have a logical exclusive OR, but BE does for symmetry.

Note also that the operator precedence now matches that of C++.

Earlier versions of BE had fewer operators and had different precedences for &, | and ^. This change shouldn't break anything.

Such numeric expressions can also be used when BE prompts for a number, not just in the initialisation file.

Some example expressions :-

  addr "tablebase" + 4 * sizeof RGB
    -- symbol tablebase plus four times the size of the RGB definition

  [ n32 be 0x70200+0x44 ] + 27
    -- fetch big-endian 32 bit word from 0x70244, then add 27

  [[ "SIGNATURE" 0x1000 0x2000 4 ]]
    -- locate "SIGNATURE" between 0x1000 and 0x2000, 4 byte aligned

Set constant

BE maintains a smallish list of global numeric constants. eg:

  set num_elements 14+5

Avoid using constant names which clash with other identifiers, such as map or structure definition names. Also, avoid clashing with reserved words in the initialisation file language.

The constant can be assigned any numeric expression, including referencing other constants.

This feature allows initialisation files with the following technique for managing multiple configurations of data :-

  $ifdef BIG_DATA_FILE
  set n_entries 100
  $else
  set n_entries 10
  $endif

  def DATA_RECORD
    {
    n_entries n32 buf 100 asc "names   "
    n_entries n32 dec         "salaries"
    }

Attempting to set a constant which is already defined produces an error.

The unset command can be used to undefine a previous value. It is not an error to unset a constant which is not previously set to anything :-

  set elems 100
  unset elems
  set elems 200

The -S command line flag can be used to set a constant before the initialisation file is processed. Because the constant is set before the initialisation file is processed, the expression the constant is set to can't refer to things within the initialisation file. Assuming the initialisation file debinfo.ini uses a constant called tabsize :-

  be -i debinfo.ini -S tabsize=10   debug.dat              is fine
  be -i debinfo.ini -S tabsize=10+4 debug.dat              is fine
  be -i debinfo.ini -S "tabsize=sizeof STRUCT" debug.dat   is illegal

The special constant nosym if set, is returned when the addr "symbol" syntax is used in an expression, to try to determine the numeric value of a symbol which isn't defined. The usual use of this is in defining a value which is miles away from any sensible value.

Commands to set the default data display attributes

When the program starts parsing the initialisation file, the default data display attributes are le unsigned hex nomul abs nonull nocode nolj noseg nozterm.

To change this default setting, just include one or more of the following keywords in the file :-

Note that when multibyte numeric values are displayed in ASCII or EBCDIC, the ordering of the characters produced works like this :-

TypeSample valueDisplays in ASCII
n80x41'A'
n160x4142'AB'
n240x414243'ABC'
n320x41424344'ABCD'

This can have the side effect that when people design eye-catcher values as numbers to store into memory, they may appear reversed when displayed. In such cases, it might make more sense to decode the field as a N byte ASCII buffer, rather than a number.

Map definitions

Mappings are BE's equivelent to C enumerated types and bitfield support.

These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-

  map compression_type
    {
    "uncompressed" 1
    "huffman"      2
    "lzw"          3
    }

If the numeric value on display matches the value given, then it can be converted to the textual description.

Bitfields may be acheived in the following fashion :-

  map pending_events
    {
    "reconfiguration" 0x0001 : 0x0001
    "flush_cache"     0x0002 : 0x0002
    "restart_io"      0x0004 : 0x0004
    }

The : symbol introduces an additional mask. The number to string conversion algorithm inside BE works like this :-

  for each maplet in the map
    if ( value & maplet.mask ) == maplet.value then
      display the maplet.name
  if some unexplained bits left over then
      display the remaining value in hex

The case where the value and following mask are the same is much more common than the case where they are not. So BE provides a typing shortcut where . in the mask means 'the same as the value'. So the above example can be written :-

  map pending_events
    {
    "reconfiguration" 0x0001 : .
    "flush_cache"     0x0002 : .
    "restart_io"      0x0004 : .
    }

It is possible to have multiple field decodes from a single value :-

  map twobitfields
    {
    "green" 0x0001 : 0x000f
    "blue"  0x0002 : 0x000f
    "red"   0x0003 : 0x000f
    "small" 0x0100 : 0x0f00
    "large" 0x0200 : 0x0f00
    }

The value 0x0243 would be converted to red|large|0x40.

It has been alluded to above, that when supplying numeric expressions, the map keyword may also be used. In the following example, the expression evaluates to 0x0105 :-

  map twobitfields "small" + 5

In fact, if there is no constant or symbol with the same name, you can use the following shorthand for the above example :-

  small + 5

Even sophisticated mappings like the following will work as expected :-

  map attribute_byte
    {
    "colour" 0x10 : 0xf0
    "red"    0x13 : 0xff
    "green"  0x14 : 0xff
    "shape"  0x20 : 0xf0
    "round"  0x23 : 0xff
    "square" 0x24 : 0xff
    }

In this example the meaning of the bottom 4 bits is dependent on the value of the top 4 bits. The top 4 bits encode whether the attribute is encoding information about the colour or shape of something, and the bottom 4 bits encode which colour or shape. The value 0x23 is displayed as "shape|round".

When displaying a maplet decoded value, the M key can be used to bring up a list of the maplets and whether they decode or not. Through this, the value can be edited.

Structure definitions

Definitions are BEs equivelent to C structures and unions.

Definitions are a list of at OFFSET clauses, align ALIGNMENT clauses and field definitions. When the structure definition is processed, then the current-offset is initialised to 0.

An at OFFSET clause moves the current-offset to the specified numeric value.

An align ALIGNMENT clause moves the current-offset to be the next integer multiple of the specified numeric value.

A field definition defines a field which lives at the current-offset into the structure. After definition of the field, the current-offset is moved to the end of the field, so that the next field will immediately follow it (unless another at OFFSET clause is used, or a union is being defined).

The size of the structure is the largest value that the current-offset ever attains. This is the value returned whenever sizeof DEFN is used as a number.

Duplicate definitions of the same named definition are not allowed.

A structure definition may have zero or more fields, align ALIGNMENT clauses and/or at OFFSET clauses.

A structure definition may behave like a C struct definition, in that each field follows on from the previous one in memory. Or it may behave like a C union definition, in that all fields overlay each other in memory, and the total size is the size of the largest field.

  def A_STRUCTURE struct
    {
    n32 "first field, bytes 0 to 3"
    n32 "next field, bytes 4 to 7"
      // sizeof A_STRUCTURE is 8
    }
  def A_UNION union
    {
    n32 "first field, bytes 0 to 3"
    n16 "second field, bytes 0 to 1"
      // sizeof A_UNION is 4
    }

The keyword struct is unnecessary, and may be omitted.

These may be combined, like in the following :-

  def MY_COMPLICATED_STRUCTURE
    {
    n32 "first field, occupying bytes 0 to 3"
    union
      {
      n32 "second field, occupying bytes 4 to 7"
      struct
        {
        n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5"
        n8  "the upper middle byte, occupying byte 6"
        n8  "the top byte, occupying byte 7"
        }
      }
    }

The at OFFSET clause also allows the same areas of a structure to be displayed in more than one way, thus also allowing the implementation of unions :-

  def UNION_THE_HARD_WAY
    {
    n32 le  "first value, bytes 0 to 3"
    at 0 n8 "the lower byte, byte 0   "
      // sizeof UNION_THE_HARD_WAY is 4
    }

Note: in the above style of example, you can't use the offsetof keyword to position a new field on top of an earlier field, because whilst you are defining a structure definition, it isn't actually fully defined yet, and so the offsetof keyword will not be able to find it.

Field definitions

Here are some examples of field definitions :-

  n8 asc "initial"
  n8 buf 20 "surname"
  n16 be unsigned dec "age"
  3 pet "pet names"
  3 n16 be unsigned dec "pet costs"
  2 n32 le unsigned hex ptr person "2 pointers to parents"
  2 n32 ptr person null "2 pointers, null legal"
  person "a person"
  n32 sym code "__main"
  1024 n32 unsigned dec "memory as 32 bit words"
  9 n16 map errorcodes "results"
  buf 100 asc zterm "a C style string"
  GENERIC_POINTER suppress "pointer"
  n32 ptr FRED add -. "link"

Each example is of the form :-

  optional-count type optional-attrs name

The field describes count data items of the specified type, count is restricted to being >= 1, and if it is > 1, then the field is initially displayed by just showing its type (eg: 10 n32 le unsigned hex "numbers"). When you select the field, you are presented with an element list, with count lines, from which you can select the element you are interested in.

The type of the data is one of n8, n16, n24, n32, buf N or DEFN, where DEFN is the name of a previously defined definition. This type may be considered to be the way in which BE is told the size of the data item concerned. n8, n16, n24 and n32 mean 8, 16, 24 or 32 bit numeric data item. buf N means a buffer of N bytes.

The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.

In addition to the data display attribute keywords given above is the map MAP attribute which means display the numeric field by looking up a textual equivelent of the numeric value using the mapping which must have previously been defined.

The ptr DEFN attribute says that the numeric value is in fact a pointer to a definition of type DEFN. DEFN need not be defined yet in the initialisation file. The mul/nomul attribute described above specifies whether to multiply the pointer value by the size of the data item being pointed to. You can use mult MULT to multiply the pointer value by MULT (therefore mul is effectively the same as mult sizeof DEFN). The null/nonull attribute described above specifies whether this pointer may be followed if the numeric value is 0. The keyword add BASE may be used, and there is also a align ALIGNMENT keyword. ALIGNMENT can only be 1, 2, 4, 8 or 16 in the current implementation. Also, the rel/abs attribute described above specifies whether to add the address of the pointer itself to the numeric value. By using combinations of the pointer keywords, various effects may be acheived :-

n32 ptr DEFN abs
fetch pointer value, and decode DEFN at that address. This case is very common for file format decoding and memory dumps.
n32 ptr DEFN add 0x40000 abs
fetch pointer value, add 0x40000, and decode DEFN at that address. This case can be used to handle multiple memory space problems.
n32 ptr DEFN mul add addr "table" abs
fetch pointer value, multiply by the size of a DEFN, add the address of the table (as determined from the symbol table), and decode the DEFN at that address. This case is typical for when the pointer is in fact a table index.
n32 ptr DEFN rel
fetch pointer value, add address of the pointer itself, and decode the DEFN at that address. When a file consists of a list of variable length structures, where the first field is the size of the structure, this provides a handy way to skip past it to the next.
n32 ptr DEFN add 8 rel
fetch pointer value, add address of the pointer itself, add the numeric value 8 (this can be negative), and decode the DEFN at that address. This case is common for when one structure includes a field which identifies an amount of data to skip before the next structure is seen.
n32 le ptr DEFN abs seg
fetch pointer value (explicitly in little endian order), mangle pointer to account for 16:16 segmented mode, and decode the DEFN at that address.
n8 ptr DEFN add 1 align 4 abs
fetch pointer value, add 1, and round up to the next 4 aligned address, before decoding DEFN at that address. Sometimes data items in files have length fields which need to be rounded up to a multiple of N (typically 2 or 4), before the next data field appears.

The procedure for following pointers is :-

  1. Fetch pointers numeric value.
  2. If nonull and pointer is 0, then don't follow the pointer.
  3. If mul, then multiply the pointer value by the size of the item being pointed to.
  4. If mult MULT, then multiple the pointer value by MULT.
  5. If add BASE, then add BASE to the pointer value.
  6. If rel, then add the address of the pointer itself.
  7. If seg, then mangle pointer address to account for the 16:16 segmented mode of x86 processors.
  8. If align ALIGNMENT, then round up pointer to the next multiple of ALIGNMENT.
  9. Decode and display data item at resultant address.

The seg keyword works by taking the top 16 bits of the pointer value as the segment, the bottom as the offset, and producing a new pointer value which is segment*16+offset. This feature may be of use for decoding large memory model program dumps which have been running on x86 processors running in real mode, or a 16:16 protected mode with a linear selector mapping. Anyone with a sensible file format to decode, or a dump taken from the memory space of a processor of a sensible architecture, can ignore this feature.

The keyword open may be given and this has the effect of increasing the level of detail that is initially displayed. See the description of the level of detail of display feature later in this document. This feature has its problems (bugs), but can be used to ensure that small arrays and short definitions are displayed in full without the user having to manually increase the level of detail by hand.

Also, the suppress keyword may be used. Normally all fields are shown when a definition is being viewed, but some can be marked as suppressed. Fields which are suppressed are shown with their values in round brackets, when you are viewing a definition with a field to a line. When a whole definition is shown on one line (by expanding the level of detail of display), those fields marked with suppress, are not shown.

The tag attribute may be given. When this field is initially displayed, the line will initially be tagged. Typically you might pre-tag one or two specific fields in a structure, if the structure were large, and certain fields were more important than others.

Finally the name of the field must be given. It is usual to pad all field names of the same definition to be the same width with spaces, so that when displayed, everything lines up nice. BE doesn't automatically work out the widest field name, and pad them all to that, because this would remove the flexibilty from the user to have some short and some long field names.

Alignment declarations

Normally, when parsing a structure definition, each field is positioned immediately after the one before (unless the union, align, or at keywords are used).

When BE begins processing the initialisation file, it believes that all n8, n16, n24 and n32 variables should be aligned on a 1 byte boundary. In other words, no special alignment is to be automatically performed.

This is radically different from the way the high level languages such as C lay out the fields within their structures and unions. These languages enforce constraints such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done because certain processor architectures either can't access certain sizes of data from odd alignments, or are slower doing so. This can be accounted for by manually adding padding to structure definitions :-

  def ALIGNED_USING_MANUAL_PADDING
    {
    n8 "fred"
    buf 3 "padding to align bill on a 4 byte boundary"
    n32 "bill"
    }

Or alternatively, the align keyword could be used :-

  def ALIGN_USING_align_KEYWORD
    {
    n8 "fred"
    align 4
    n32 "bill"
    }

It is possible to tell BE to automatically align n8, n16, n24 or n32 fields on specific byte (offset) boundaries by constructs such as the following (which corresponds to many 32 bit C compilers) :-

  align n16 2
  align n32 4

  def ALIGNED_AUTOMATICALLY
    {
    n8 "fred"
    n32 "bill"
    }

Clearly, this feature is more useful when BE is being used to probe memory spaces of running programs via an memory extension, or doing post-mortem examination of program dumps.

Most data file formats don't-need-to and/or don't-bother-to align their fields.

Include directives

The initialisation file can contain the following, as long as it is outside of any other definition :-

  include "anotherfile.ini"

Be sure to notice that this is a initialisation language command, not a pre-processor directive like $ifdef. This is why it is not $include.

There is also a tryinclude variant, which tries to open the file specified, but does not get upset if it can't :-

  tryinclude "extrastuff.ini"

Reserved words

The following are reserved words, and so should be avoided as names of constants in the initialisation file :-

  abs add addr align asc at be bin buf code dec def ebc hex include le
  lj map mul mult n16 n24 n32 n8 nocode nolj nomul nonull noseg nozterm
  null oct offsetof open ptr rel seg set signed sizeof struct suppress
  sym tag tryinclude union unset unsigned zterm

A sample initialisation file

Here is a snippet from a real initialisation file :-

le unsigned hex abs // set defaults, just to be sure
lj // allow ARM specific symbolic lookup of code addresses

map DE_
  {
  "DP_Pending" -1
  "DS_Success"  0
  "DE_Failure"  1
  }

def DPB
  {
  n32 ptr DPB       "DPB_Next   " // Link to the next one
  n32 sym code      "DPB_Address"
  n8 map DC_        "DPB_Number "
  n8                "DPB_Flag2  "
  n8 map SY_        "DPB_Flag   "
  n8 signed map DE_ "DPB_Dsb    "
  }

def NOP
  {
  DPB     "NOP_Header"
  n8      "NOP_Spare1"
  n8      "NOP_Spare2"
  n8      "NOP_Spare3"
  n8 dec  "NOP_Period"
  n32 dec "NOP_Value "
  }

def main // the entire memory map
  {
  at addr "noptable"   100 NOP     "noptable  "
  at addr "currentdpb" n32 ptr DPB "currentdpb"
  }

In the above example, note how the DPB_Next field points to another DPB. As this is the first field, it will be selected one when the DPB is first shown. Thus, if they are strung together in a linked list, it can be a simple matter of pressing Enter to step to the next element of the linked list.

Sometimes, if the 'pointer to the next' field is not the first, people code the following type of definition :-

def BLOB
  {
  at 4 // Goto where link is
  n16 ptr rel BLOB add -. "BLOB_Next      " // Note . == 4
  at 0 // Go back to top

  n16 hex                 "BLOB_FirstWord "
  n16 hex                 "BLOB_SecondWord"
  at 6 // skip link, we've already shown at the top
  n16 hex                 "BLOB_FourthWord"
  buf 512-. hex           "BLOB_PadToBlock"
  }

Although messy, this re-ordering can make traversal of long linked lists significantly faster.

This technique falls over when very long linked lists are traversed, because you must manually select the link field and press Enter to go to the next linked list element. This can be time consuming. Also, each level of nesting consumes a non-trivial amount of memory.

The solution which more effectively handles linked lists of small or large lengths is the use of the show a list mechanism, which is described later.

The supplied initialisation file

The supplied initialisation file contains enough definitions to enable you to examine the contents of many file formats.

Bitmap files supported include :-

Animation formats :-

Also, the following miscellaneous file formats :-

The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.

Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.

If you are simply interesting in looking at some of the file raw, you can use the DB, DW and DD definitions that come supplied in the default initialisation file. If you wanted to look at memory at 0x8000 as dwords, you could type :-

  @ DD Enter 0x8000 Enter Enter

Using the editor

BE displays most of the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.

BE works by presenting lists to the user. These can be lists of data fields, lists of array elements etc.. A user action can result in a new list being displayed on top of the previous one. Effectively, there is a 'stack' of lists, where you always get to see the topmost one. The level of nesting is always on display at the top right hand corner of the screen.

General keys

Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home, End, Left and Right all work in the obvious ways, traversing the list on display. The Wordstar keys ^E, ^X, ^R, ^C, ^W, ^Z, ^S and ^D also work.

As you move around the current list, your line number and total number of lines in the list are shown on the top right of the screen in the form line/totallines.

The user can discard the current list, and go back to the previous one by pressing Esc.

q or @X (ie: Alt+X) exits the program. If you have made any changes, you will be prompted as to whether BE should write them out to disk. @W can write out any unsaved changes.

p allows you to 'print' the list on display to a file. You can specify the filename, and whether to append to or overwrite any existing file of that name. Non-printable (but displayable) characters get converted to '.' dots.

f or / or F9 allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n or F10 can be used to repeat the last find. If a find is taking a long time, it may be interrupted using Ctrl+Break on OS/2 and Windows. On AIX, the Esc key may be used. The \ key will reverse the direction of the find, ready for when you next use the 'repeat the last find' function.

i allows you to generate a new list, which only has lines which include a pattern you specify. This new list pops-up on top of the current one. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern.

S can be used to generate a new list which is the same as the current list, except the lines are sorted. You are prompted for a 'sort after' pattern, and as to whether the result is to be sorted in ascending or descending order. Anything on each line, upto and including the 'sort after' pattern is ignored for the purpose of the sort.

The find, include and exclude commands normally do a straight case sensitive textual comparison. The editor can be toggled in and out of Extended Regular Expression mode (as in UNIX egrep), using the @R key. When set into this mode, future finds, includes and excludes all work with extended regular expressions. eg: include (fred|bill)[0-9]+ will include all lines with 'fred' or 'bill', followed by one or more digits.

Similarly, @I can be used to toggle in and out of case sensitive search mode.

The Extended Regular Expression mode case sensitivity mode also affects the sort command. The sort command and the use of Extended Regular Expression mode go naturally hand in hand, because you often want to be able to sort upon the Nth field of each line. It is trivial to write an ERE like ,[^,]*, which matches the first pair of commas (so the sort can be done on the third field), or 0x[0-9a-f]+ which matches the first hex number.

The Extended Regular Expression mode and case sensitivity mode also affects the 'power address slide' patterns, and tag/untag all matching commands, as explained later.

The r key causes a refresh. BE re-fetches all the data on display. The R key is a slightly more aggressive form of refresh. If a memory extension providing data to BE was caching data, this type of refresh causes it to drop its cache. Sometimes BE is used with an extension to watch live real-time data, and continual refresh is desired. By pressing the periodic update key, @U, you can put BE into a mode whereby it refreshes at regular intervals. The interval is user-selectable. You exit this mode using Ctrl+Break on OS/2 or Windows, or by using Esc on AIX.

Tags may be placed or removed within the list on display by pressing the @T key. You may quickly move backwards or forwards between tags by pressing ^Home or ^End. Tags appear as little 'T's on the right hand side of the line. Placing or removing tags in one session or list has no effect on any others.

T and U may be used to tag or untag all lines matching a given pattern or extended regular expression.

The ! key may be used to execute an operating system command. This capability can be disabled by the -r command line flag.

@V can be used to bring up a view of a regular text file. There is no text editing capability. As special cases, F1 trys to bring up the help file, and F2 trys to bring up the configuration file.

BE doesn't just maintain a single stack of lists. In fact it maintains 10 parallel stacks, or 'sessions'. You can jump between them using the @0, @1, ... @9 keys. This allows you to be looking at several places within your data at once, and to be able to easily hop between them. The current session number is the second from last number on display on the top right corner of the screen. It is initially 1.

@C copies the stack of lists from the previous session onto the current session. Typically you use this when you've found something interesting, and you'd like to leave the current session showing the interesting data, and yet you'd also like to continue investigations around that area.

Given there are 10 sessions, each with any amount of nesting, it can be easy to get lost, so the @K allows you to generate a summary of where you are in each session.

@Z may be used to pop off all the lists in the current session, and effectively reset the nesting level to 1.

@F1 to @F4 inclusive may be used to change the colour scheme to scheme 0 to 3, as initially specified by the -c command line argument, or as initially defaulting to 0.

The keys A,O,L,I toggle the display of addresses, offsets, lengths and array indices. @A, @E, @B, @O, @D and @H may be used to set the display mode of the array indices to ASCII, EBCDIC, binary, octal, decimal or hex. Also, @Y toggles the display of addresses between raw hex, and symbol table entry and offset. The @J command toggles the display of symbolic code addresses which have the lj attribute between the short and long forms. By default, at startup, BE choses only to show array indices, the array index mode is hex, addresses are not shown symbolic, and long jumps are shown in their short form. The -v command line flag can also be used to change the startup display flags.

The | (pipe-bar) key toggles the display of pipe bars between flags in a mapping. This is typically only used when a mapping has been cleverly defined to do something like RISC instruction set disassembly, to tidy up the display.

Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then pop-up a new list, decoding the memory at the given address as if it were of the specified structure type.

The C allows you to disassemble from a given address, assuming a disassembler extension has been supplied to BE via the -C command line argument.

D can be used to pass user-options through to the disassembler.

Initially, if a symbol table is supplied to BE, disassembly stops when the addresses symbols (as in symbol+offset) change. ie: BE stops disassembling more than one function. Although one compiled C function typically has one label, hand written assembler tends to have many labels within one function, so the Y key can toggle between stopping on label changes and ignoring them.

The @F key pops up a list of the memory sections BE is editing. There is one for each file (or memory extension invokation) currently being edited. Against each, BE says whether it has any unsaved changes.

The editor holds a list of 12 'address slide' patterns, and these may be displayed by pressing @M. These are used when the 'power address slide' feature is used. You can set one of the 12 patterns by using the ~F1 to ~F12 keys. To disable one, you specify a new pattern as a empty string.

The editor holds an 'address slide' delta value. Initially this delta value is 4, but it may be changed using the # key. When using #, dot '.' may be used in the numeric expression, and its current value is the current delta value. This delta value is used by the manual 'address slide' feature using the < and > keys, and also the 'power address slide' feature.

If you press ?, BE will prompt for an numeric expression, which it will then evaluate. It will show the result in unsigned decimal, signed decimal and unsigned hex.

When you use the ^L key, you are prompted for a count and a keystroke. BE presses the keystroke on the current line, and then steps down a line. It does this once for each of the count of lines you specified. The count value can be 0 or blank, meaning upto the end of the list on display. This keypress, step down and repeat loop, will stop if the keypress is not 'understood' by the line it is pressed on. This means that only keypresses which operate on a given line are sensible for using with ^L. It will also stop if the end of the list is reached.

@G can be used to go to the Nth line on display. 0 means the first line, a blank line number, or a very large number means the very last line.

When viewing data

At any given time you may be displaying some data from some start address, as indicated on the title at the top of the screen.

The . key can be used to change the current address, and the , key can be used to add to the current address.

The editor provides a feature known as 'address sliding'.

You can use the ( and ) keys to step (slide) the address backwards or forwards by 1.

You can also use the < and > keys to step (slide) the address backwards or forwards by a particular delta (as setup by the # key, described above).

The 'power address slide' feature is the combination of regular 'address sliding' with a pattern match capability. You set up the power address slide patterns and then press [ or ] (for a backwards or forwards search). You then state whether one, all, or all-in-order of the patterns must match, and how to refresh the screen as the search proceeds. You're also prompted for an address to stop at. BE then slides through memory, checking to see whether the patterns can be matched with the screen, and if so it stops.

A 'power address slide' may be interrupted via Ctrl+Break (OS/2 and Windows), or Esc (AIX).

There are a few main uses of address sliding :-

  1. You know the rough address at which a particular structure is, so you use the keys to step through memory until the display changes from a structure that looks obviously wrong, to one which looks possibly right.
  2. You wish to browse memory hex style, perhaps by using the DD definition in the default initialisation file. You set the delta value to be a page worth of data, and then use the < and > keys to page up and down.
  3. You have an array of a large number of elements, each of which is a structure defininition. You display the first one, and then set the delta to be the size of the element. Then < and > can be used to rapidly step from element to element.
  4. You use the power address sliding feature to locate a structure in the file or memory space.

The justification for the default delta of 4 is that many structures within processor memory spaces or within files are 4 byte aligned.

The @ command described earlier works a little better when you are viewing data, because a dot used in the numeric address expression is taken to mean the current address (as shown on the title).

Similarly, the C command described earlier works a little better when you are viewing data, because a dot used in the numeric address expression is taken to mean the current address (as shown on the title).

Often you may find yourself looking at a definition that is actually a member of a larger definition. If you know the offset of the smaller definiton in the larger definition, you can subtract this from the current address and display the larger parent definition. This can be awkward, so the @P key will pop-up a list of all possible parent definitions, with an entry for every time the smaller definition appears in another definition.

Manipulating the current datum

g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.

s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.

A subset of the keys a/e/b/o/d/h/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, hex, decimal, symbolic or via a mapping table.

z is displayed if you are allowed to toggle the 'stop displaying when a nul terminator is found' attribute.

The t will decode the current field as if it were raw ASCII text, and will break it up into lines upon CR, LF or CR-LF pair boundarys. The new line-by-line list pops-up on top the current list.

If the datum is a code address (marked with the code attribute in the initialisation file), then c can disassemble the code at that address.

+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.

Increasing the level of display can make BE open up an array, and enumerate the elements. eg: 3 n32 to [123,123,456].

Increasing the level of display can also make BE open up a definition, and display the fields. eg: VAR to {"name",123}.

This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.

Some examples :-

level 0 (=type) level 1 level 2 level 3
n32 7 7 7
3 n32 3 n32 [8,9,10] [8,9,10]
VAR VAR {"a",1} {"a",1}
2 VAR 2 VAR [VAR,VAR] [{"b",2},{"c",3}]
n16 ptr VAR 22->VAR 22->{"d",4} 22->{"d",4}
2 n8 ptr VAR 2 n8 ptr VAR [33->VAR,44->VAR] [33->{"e",5},44->{"f",6}]

Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. This results in a new list of fields or array elements being popped-up. The Esc key brings you back to where you are now.

There is a shorthand of the above @ command. If you are on a numeric field, and you know this is an absolute pointer to a structure definition, you can use the follow pointer key *. BE will then prompt for the definition name. This shortcut ignores any pointer information that may be deduce-able from the value on display, so even if you are looking at a relative pointer which is aligned, BE will decode a definition at an absolute address.

The editor provides the @L key, which makes the job of following long linked lists especially easy. If you looking at the members of a definition, and are on a member which is in fact a pointer to the same type of definition, then you can use the @L (show list) key. You will be presented with the elements in the linked list (at least the first 4000), and at the end the reason the link following ended. This reason can be that there are too many to show at once, 'can't fetch value', 'can't follow null pointer', or the list has 'looped back' to an element shown earlier. If your list is really long, you can always go to the last linked list element on display, select it, and then use the @L key again to get the next 4000 elements!

The = key may be used to edit the current field on display.

If the current field is a numeric value, then you can type a new expression, according to the rules for numbers and expressions used when parsing the initialisation file. Dot '.' evaluates to the fields current numeric value. Examples include :-

  1
  1+2
  addr "symbol"
  sizeof RGBTRIPLE
  map FF_ "FF_Split" | 0x20

If the current field is displayed via a mapping table, then the M key can be used to bring up a list of the maplets, and whether each of them can be decoded from the numeric value. The current fields value can be edited from this new list. Esc quits the maplet list.

If the current field is a buffer, then either ASCII data or raw hex bytes may be supplied :-

  "a string within quotes"
  @1234FF00

If the zterm attribute is applicable to the current field, then after the data is stored, a NUL terminator is appended.

The @S key toggles the suppress attribute of the current datum. This affects how the current structure shall be displayed, when displayed in short. The @N key unconditionally sets the suppress attribute of the current datum. Only non-suppressed fields are shown in the one line summary.

Del and Ins can be used to copy and paste between the current datum and a memory clipboard or file. To use the memory clipboard, simply specify a blank filename when prompted. Only smallish blocks of data (<=1MB) can be copied or pasted. The amount of data transferred is always the minimum of the datum size, the clipboard size and 1MB.

The external edit key, E, works by prompting you for an editor command. It then saves away the current datum into a temporary file and invokes the editor on it. Afterwards, the file contents are re-read. At most 1MB can be processed in this way. This might be useful if a file contained a chunk of free-flow text, and you wished to perform some complicated editing on it, involving inserting and deleting - you could externally edit that chunk using a text editor. Or, sometimes when editing binary data, you might like to see it in a typical hex dump and edit raw hex - you can externally edit with a normal hex editor. This command doesn't work if BE is running in restricted mode, ie: has been invoked with the -r command line argument.

Z will zero the current datum.

When on a maplet list

Each possible maplet in the mapping is displayed in the list. Each maplet has a mask and value, and the maplet is deemed to match if :-

  value & maplet.mask = maplet.value

In this case a 1 is displayed next to it, otherwise a 0 is shown.

If you press 0 then the value is anded with the complement of the mask.

If you press 1 then the value is anded with the complement of the mask, and then the value is or-ed in.

Although this may seem strange, the net effect is that when maps are being used for enumerations, 1 will change the value from whatever it was before to the new desired value.

When the mapping is used for decoding bitfields, 1 will turn on a bit and 0 will turn it off.

Examples of enumeration and bitfield style mappings :-

  map ENUMERATION              map BITFIELD
      {                            {
      "first value"  1             "lowest bit" 0x01 : 0x01
      "second value" 2             "next bit"   0x02 : 0x02
      "third value"  3             "high bit"   0x80 : 0x80
      }                            }

When on a line of code

If the current line of code references another routine or code code address, c can be used to pop up another list of the referenced routine.

Similarly, if data is referenced, and the address is easily determinable by the disassembler, the * can be used to follow a pointer and display a structure at that address.

When on a memory section

W can be used to write back any unsaved changes on the current memory section. This isn't normally necessary, as when you leave BE using q or @X, you are prompted as to whether you wish to save any unsaved changes on a memory section by memory section basis.

o can be used to pass an user supplied option string to the memory extension peice of code providing the memory section. The memory extension is given the memory section instance and the option string. It can parse the option string in any way it sees fit. If there is a syntax error, or other problem, it can fail the options command with an error message to say why. If a memory section is provided from a file, this command will fail (files have no options). This user-exit mechanism might be used to allow you to tell a memory extension to change how much caching it can do.

When on a power address slide pattern

These are shown in the list brought up by the @M key, as described earlier.

It is a list of 12 entries, each of which may be disabled, a pattern or an Extended Regular Expression.

You can set one of the entries using the = key. This is the same as using ~F1 to ~F12.

Many of the keystrokes listed above were chosen so as to match the default key bindings of Andys Source Code Folding Editor (AE).

Although OS/2, Windows NT and AIX machines are able to support Alt keys, not all UNIXes are. If and when I compile for other UNIXes, the Alt keys may not be accessible, or their functions may be obtained by other means.

Memory extensions

The binary file arguments to BE are normally of the form :-

  filename[@address]

This tells BE to load the file and whenever data at a memory address from address to address+filelength is accessed, to supply the data from the file.

However, it is possible to supply binary file arguments of the form :-

  extension!args[@address]

Memory extensions may be written to provide either read-only, or read-write access to their data.

Under OS/2, BE will ensure that BEextension.DLL is loaded. This DLL should be on the LIBPATH and should contain certain entrypoints which will be used by BE. BE then passes the args and address to the memory extension DLL, who does something of its own chosing with them. The memory extension DLL can then supply data to BE on request.

Under Windows, provision for memory extension DLLs also exists. The DLL is located according to the algorithm used by the Win32 LoadLibrary API.

Under AIX, memory extensions may be provided as shared librarys. They are located by following the PATH environment variable, and are named beextension.

One use of this is the provision of a memory extension for handling files too massive to load into memory all at once. The memory extension opens a file handle and reads bytes demanded by BE upon request. This memory extension is provided in BEBIG.DLL or bebig, and the user can type :-

  be big!verybigfile.dat

It ought to be noted that the author regularly uses BE on files of several megabytes in size, without a problem.

Another use is the in live-debug of running adapter cards. The memory extension can provide data bytes directly from the memory space of the adapter. args could be used to identify the slot the adapter is in. Alternatively, args could identify IO base addresses, memory window addresses, or a device driver to use to access the data. Memory extensions which do this, do exist, and they almost turn BE into a debugger (almost, because there is no run, stop, or single step). Run, stop and single step of an adapter could be driven by the options mechanism, if that were possible and/or desired. When using these, a customised initialisation file is typically also used, which understands all the structure definitions and variables used in the microcode on the adapter.

Yet another use, might be providing BE with access to physical or virtual or process specific linear address spaces, perhaps via the use of a device driver. Shared memory windows might give addressibility of datastructures in other programs.

Also, the surface of a disk or block device can be made accessible via an memory extension. Again, a memory extension which does this does exist (but it uses a non-standard mechanism for accessing the disk blocks).

Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.

BEGBM is an extension which uses the Generalised Bitmap Module library to load a bitmap file, and presents the bitmap header, palette and bits for BE to edit.

The file bememext.h documents the extension interface. Currently extensions may be built for the OS/2 version of BE using the IBM VisualAge C++ compiler (with fixpaks CTC306, CTD302 and CTU304, or later recommended), the Win32 version of BE using MS Visual C++ (version 4.2 recommended), or the AIX version of BE using the IBM xlC++ compiler.

I anticipate learning about shared library support on the various different types of UNIX, enabling similar tricks to be performed there. Apparently this area is becoming more standardised, with the new dlopen, dlsym and dlclose entrypoints.

Disassembler extensions

The -C dx command line argument is a way of telling BE to load and use a disassembler extension for displaying any code in the data.

The same rules for naming and locating disassembly extensions apply, as for memory extensions.

eg: If you have an Intel 8086 disassembler, you could type :-

  be -C 8086 dump.ram

This assumes that under OS/2 or Windows, the disassembler is provided by the file BE8086.DLL, or via be8086 under AIX.

The file bedisext.h documents the extension interface.

Flushing

When editing files, changes to the data are recorded in memory. When BE is closed down, it attempts to write back any changes back into the disk files where the data originally came from. BE will prompt you as to whether to save the changes back to disk.

If a memory extension is providing the data to BE for display, and the memory extension supports modification of the data, it has a choice :-

As most memory extensions provide a live view of some real-time data, they tend to opt for the first choice.

Installation

BE is most easily obtainable over the Internet via the links on my home page(s) :-

The usual supplied be.zip file should be expanded using unzip be or pkunzip -d be on an OS/2 or Windows machine.

Installing BE for OS/2

  1. Copy be_os2.exe to be.exe, somewhere on the path.
  2. Copy be.ini to the same directory as be.exe so it can be found.
  3. Optionally copy be.hlp to the same directory as be.exe so it can be found.
  4. Optionally copy be.htm to wherever you keep documentation.
  5. Optionally copy be.ico to the same directory as be.exe. This allows BE to have a cute icon when running in the Workplace shell.
  6. Optionally create a Workplace Shell Program Object(s) that references the BE executable. The working directory should be the directory where be.ini can be found.

Installing BE for Windows NT

  1. Copy be_win.exe to be.exe, somewhere on the path.
  2. Copy be.ini to the same directory as be.exe so it can be found.
  3. Optionally copy be.hlp to the same directory as be.exe so it can be found.
  4. Optionally copy be.htm to wherever you keep documentation.

Installing BE for UNIX, ie: AIX

  1. Copy the be executable to somewhere like /usr/bin or ~/bin, or wherever on the path you consider appropriate.
  2. Either copy the be.ini to the same directory as be so it can be found, or copy it to .berc in your home directory. BE uses your local initialisation file in preference to the common one.
  3. Optionally copy be.hlp to the same directory as be so it can be found.
  4. Optionally copy be.htm to wherever you keep documentation.

On AIX, because BE is a curses program, best keyboard and colour support is obtained by using an aixterm, or by logging in from OS/2 using HFTTERM.EXE. It should be noted that HFTTERM.EXE appears to have a bug whereby it doesn't generate the correct datastream for the @9 and @0 keystrokes.

Unfortunately I don't have continual access to all the platforms, so improvements in one version may not yet be reflected into the others.

Glossary

Address slide.
When displaying data, BE is showing a particular definition at a given address. This address is shown on the title. Address sliding is a mechanism whereby this address may be advanced forwards or backwards.
Alignment.
Certain data items are required to exist at addresses which are multiples of 2, 4 or other numbers. This is often because certain processor architectures run slower accessing mis-aligned data, or are unable to do so.
Caching.
Caching is the practice of keeping a local copy of (less easily accessible) data, for speedier access. For example, when BE uses a memory extension as a means of editing some data not in a file, the memory extension may cache some of the data in memory. If the user does a full refresh (using the R key), this cached data is discarded, so any data which is subsequently displayed definitely comes from the actual data, rather than the cached copy. Also, when the user uses BE to modify data, the data in the cache may be updated, and the real data may not immediately be updated. If the user flushes the data, any pending changes (in the cache) are written back into to the real data.
Current offset.
As a definition is being defined, the current offset indicates the byte offset within it that the next field will be placed. Typically in a C structure, each field immediately follows the previous field (subject to alignment restrictions). In a C union, all the fields can overlay each other, sharing the same offset. BEs definitions are flexible enough to handle all these cases.
Data display attributes.
Each data field on display has some data display attributes which govern the way in which the fields data is fetched from memory (ie: the endianness), and the way it is displayed.
Definition.
A definition is like a C structure or union definition. It is made up of a number of fields. A definition is defined via the def keyword in the initialisation file.
Disassembler extension.
A BE disassembler extension is a peice of (possibly user written) code which BE can call upon to disassemble raw bytes of data into some instruction set. Typically disassembler extensions exist as DLLs or shared libraries.
Endianness.
Multibyte numeric values can be stored within the data with the most or least significant byte first or last. If the least significant byte is first, then the data is typically referred to as little endian, or in the Intel byte order. If the most significant byte is last, then the data is typically referred to as big endian, or in the Motorola byte order.
Expression.
Typically refers to a numeric expression, such as 1+2*3. Wherever BE prompts for a number, any numeric expression may be used. Basic arithmetic is supported, along with symbol table lookup and support for mapping. See the section on numbers for more details.
Extended Regular Expression.
This is a powerful form of a search pattern, which allows for searching for several alternatives at once, zero or one occurance of a pattern, or one or more, or zero or more, and character classes.
Field.
A number of fields together form a definition. Fields in a definition can be made to overlay each other or not, thus acheiving the effect of C structures or unions. It is possible to tell BE to display the fields in a variety of ways, via the use of data display attributes.
Flushing.
BE may hold data in a memory cache for speed of access, and may choose to 'make the changes good' in response to a flush command. The @W key will try to flush any cached data. The W key can be used to flush cached data from a single memory section. BE prompts you as to whether you wish to flush any unsaved changes before exiting.
Initialisation file.
When BE runs it locates and processes an initialisation file which includes within it all the Level of detail.
When displaying a field, BE displays it to a specific level of detail. This level of detail may be adjusted using the + and - keys. Increasing the level of detail can result in the fields of definitions being displayed, or pointers being followed and the fields in the 'pointed-to' definitions being displayed, or elements of an array being shown.
Long jump.
The ARM instruction set only includes a branch instruction which can only jump a certain distance forwards or backwards in memory. The ARM C compiler typically generates code which uses this branch instruction. To branch long distances, a trick can be done whereby the normal branch is made to branch to an instruction which loads the instruction pointer from the word of memory immediately following. This trick means that the mapping of addresses to function names using the symbol table doesn't work properly. By using the long jump data display attribute, BE is told to take this mechanism into account, when displaying code addresses symbolically. The lj and nolj keywords are used for this purpose.
Map.
The map keyword in the initialisation file defines a mapping between numbers and strings. Essentially it is a way of mapping numbers back to more a readable enumerated type form. The map MAPNAME "MAPLETSTRING" syntax may be used in any expression in the initialisation file or at any time BE prompts you for a number, and it evaluates the the numeric equivelent of the enumerated type named value. Data displayed via mapping tables can be edited via the M key.
Memory section.
In any given invokation of BE a number of filename arguments are specified, and each of these constitutes a memory section, because the data from the file covers a section of the memory space. BE can also edit data, where the data is provided to BE via a memory extension, invoked with some parameters.
Memory space.
Every byte of data BE can edit is presented to BE at an address in the BE memory space.
Memory extension.
A BE memory extension is a peice of (possibly user written) code which provides access to the data on demand. Typically memory extensions exist as DLLs or shared libraries.
Named constant.
BE keeps a small collection of named constants. These can be created by use of the -S name=val command line argument, or through the set and unset keywords in the initialisation file.
Null pointer.
Is a pointer whose numeric value indicates that the pointer doesn't actually point to another data item at this time. Typically the numeric value 0 is used to represent this. BE has a data display attribute which indicates whether the numeric value 0 represents a null pointer. The keywords nullptr and nonullptr are used. When the user presses Enter on a pointer value, BE pops up the data in the 'pointed to' definition, unless the value is 0, and null-pointer attribute is present.
Parent definition(s)
Often definitions include other definitions. Thus any given definition will have 0 or more parent definitions which include it. When displaying a definition @P will pop-up a list of all those definitions which use the current definition on display.
Pointer.
A pointer is typically a numeric value which somehow gives the address of another definition in the data. The keyword ptr DEFN is used in a field definition to indicate that a numeric field identifies the address of another definition.
Power address slide.
This is a form of address slide, whereby BE can be made to automatically address slide until certain patterns (which can be Extended Regular Expressions) appear in the decoded data.
Session.
Navigation of the data being edited starts by displaying a list of some of the data, and bringing up other lists. You effectively build up a stack of lists, and can step back to an earlier list. This stack of lists, or thread of investigation, is referred to as a session, and BE maintains 10 independent sessions, which may be switched between via @0, @1, ... @9.
Suppressing
When displaying a definition, BE normally displays all the fields. However it is possible to display all the fields of a definition in a single one line summary, by increasing the level of detail of display. In this case, only non-suppressed fields are displayed. When viewing a structure definition with one field to a line, suppressed fields are shown in round brackets. The suppress keyword may be used in the initialisation file on a field, or the @S key may be used interactively.
Symbol table
Is typically provided from a file via the -y symtab command line argument. It is a list of names (the symbols) and their values. Typically these are code or data addresses for functions or variable within an executable program. BE can use this information so it can display addresses in symbol+offset form, or so it can allow you to type addr "symbol" in an expression and have BE substitute the numeric value of the symbol.
Zero terminator.
When strings are stored in memory or in files, often a 0 byte is appended to indicate the end of the string. BE can be told to stop displaying string data (or not) when it hits a 0 byte via the 'stop at zero terminator' data display attribute, specified using the zterm or nozterm keywords in the initialisation file.

Copying

Copying of this program is encouraged, as it is fully public domain. The source code is not publically available. Caveat Emptor.


This documentation is written and maintained by the Binary Editor author, Andy Key
nyangau@interalpha.co.uk