You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
adds experimental Ghidra disassembler and lifting backend (#1326)
* initial scaffolding
* adds support for nameless registers in the disasm backend
We don't want to model unique (temporary) registers as normal physical
registers, so we add an option to set the name to -1 to indicate that
the register is a unique anonymous variable.
* the minimal implementation
* introduces sub-instructions to the disassembler
There was some provision for that in the backend but never fully
implemented. This feature enables seambless integeration of the ghidra
backend but may also used for VLIW architecture, e.g., hexagon llvm
backend is using it.
* fixes the handling of the unique namespace
* implements semantics of sequential instructions
* specifies semantics for most of the pcode instructions
* passes the basic instruction info to the sequential semantics value
* fixes the semantics of LOAD
* improves handling of pcode namespaces
Fixes varnode classification (no longer treating addresses as
registers).
Also to prevent name clashes between virtual variables from different
scopes we use unique prefixes (aka shortcuts in ghidra's parlance) to
distinguish between them. This also makes the generated code more
readable and closer to the originally genereated pcode.
* fixes the subpiece implementation
* puts all pcode opcodes into the pcode namespace
Since they are the same for each architecture and now we have this feature.
* implements support for user-defined opcodes
we translate
`CALLOTHER(<name>,<arg1>,...,<argN>)`
to
`<name>(<arg1>,...,<argN>)`
* translates local (intra-instruction) branches into GOTOs
Branches in p-code are overloaded by the type of destination. If the
destination is a virtual address then it is a normal branch and if
it is a constant (a varnode from the constant namespace) then it is
an intra-instruction branch that represents the inner instruction
logic.
* fixes the satisfies function to account for subinstructions
before that it was only looking into the top-level instruction
* an attempt to pack subinstructions inside an instruction
(breaks lots of stuff)
* publishes subinstructions
I will probably forfeit this approach, still investigating.
* introduces the null object to the knowledge base
It was actually already there, hidden in the [obj] domain. Now it is
properly documented with well defined semantics.
* uses the null object to represent unlabeled blocks in the lifters
Also, more lifters now respect the passed label to the blk operator.
* removes the subinstruction slot from instruction
* adds labels reification to BIL semantics, also reifies gotos
all using special encoding
* implements intra-instruction gotos
* adds the sequence number documentation.
* fixes error hanlding in the goto-subinstruction primitive
* implements a proper disassembler factory that scans for ldefs files
So far not working quite correctly, as the default variables (like
word size, etc) are not properly set. Investigating...
* adds a proper processor initialization
* implements proper command-line interface
Now the plugin is able to list the targets and pass the path to ghidra root.
* adds a tentative --x86-backend option to enable ghidra for x86
* fixes offset and address calculation
* adds `is-symbol` semantic primitive
* fixes overloading of the Primus Lisp semantic definitions
The overloading was prevented by the attributes computation, which
expected no overloads. Also, makes error message more readable.
* tries to overload p-code operations based on their operand types
* passes operands types per each operand, removes extra opcodes
It looks like that not only branches are overloaded in p-code but all
operations, e.g., we can have `INT_ADD (mem:x) (mem:y) (mem:z)` that
represents `mem[x] <- mem[y] + mem[z]`.
We could also resolve this overloading by adding suffixes to
operations, e.g., `INT_ADDmr`, `INT_ADDmm`, and so on, but will
explode the number of opcodes, especially in the presence of
user-defined opcodes.
* catches the bad or unimplemented instructions during decoding
ghidra raises an exception if an instruction is not valid or there is
no semantics for it.
* adds signed ordering Primus Lisp primitives
and implemented corresponding pcode operations
* passes full information about the operand type from the backend
In p-code the semantics of an operation is defined by the types of its
operands. In case of the unique variables the type is not known to us
so we have to pass it. This commit extends the previous approach,
where we were passing only the kind of the operand (mem vs imm) to the
full type qualification, where the type of memory is represented with
Nil and the type of immediates is represented by its size in the
number of bits.
* adds the missing BOOL_NEGATE operation
* fixes the negation operator (pcode represent bool as byte)
* removes aliased registers from the register table
* fixes the selection of the default backend for x86
it should be llvm if not specified otherwise
* switches to caseless ordering of variables
Changes and documents the ordering of variables. Variables are no
compared caseless and the ordering is made loosely compatible with the
caseless lexicographical ordering of the textual representation of
variables' identifiers, e.g.,
```
```
* enables ghidra backend for the arm targets
Right now it is disabled by default, use `--arm-backend=ghidra` to
enable.
* uses pcode-x86 as the CT language for pcode in x86 targets
* adds ghidra backend to mips
* minor pretty-printing tweaks
to make things more readable
* improves primitives performance in Primus Lisp
With this optimization Primus Lisp-based lifters run five times
faster. This especially important for ghidra backend lifters, which
are fully dependent on Primus Lisp.
The idea is to let the primitive implementors provide the body of
their primitive so that every primitive is not computed via the
semantics promise but is invoked directly. Another big idea is to
provide such an interface that will allow to factor out computations
that are common to the target. The same idea could be extrapolated to
all promises.
* optimization: improves name resolution in Primus Lisp
Uses maps for names, not lists. Surprisingly not much of improvement,
something about 5%.
* optimizes unit computatation
by hoisting it out of the loop
* optimization: do not request lisp arguments if not necessary
To compute list arguments we need to invoke the theory and reflect
them into it. The resulting work is discarded if the name is not
bound. The optimization checks if the name is bound by the program and
only after that asks for the list of arguments.
* adds ghidra to powerpc (works out of the box)
* adds the `--x86-64-backend` option
it is just an alias for `--x86-backend`, for consistency
* enables ghidra for riscv (doesn't run out of the box)
* enables the ghidra backend in CI/CD
We will build Ghidra only on supported targets, right now it is Ubuntu
Bionic. We will soon add more targets and more packages. The bap
packages will be now split into `bap-core`, `bap`, and `bap-extra`.
The `bap-core` will contain the minimal part of platform without
analyses. The `bap` package will include most of the analyses, finally,
`bap-extra` will include heavy-weight analysis (in terms of extra
dependencies and build times), e.g., the symbolic executor, and the
ghidra backend.
* moves the ghidra install section lower
* downgrades the ubuntu version on CI/CD
* tries to fix the macOS build
* disables ghidra in the opam/opam used in CI
0 commit comments