Page 1 of 1
Disassembler/Assembler pair
Posted: Sun Jan 15, 2012 8:47 am
by CWood
Following on from my previous topic, regarding open source vs proprietary, I've made up my mind, that the functional part is OS, the GUI and any related features will be proprietary.
My first project under this scheme, will be the Universal Disassembler, and Universal Assembler (UDASM and UASM respectively). What differentiates these from NASM, AS, etc. is: it will be fully user configurable. You will be able to define files, filled with opcode definitions, and it will read accordingly. That means that you can program it to support any processor, any architecture etc. AND, and here's the clever bit, you can even program it to program 2 architectures in EXACTLY the same way! I could use x86 ASM on the ARM architecture, or vice versa, or even invent my own to use on both. The reason this works? It takes the opcode string, and maps it to a bitmap, of arbitrary length. So, I could add a condition for enabling paging in x86, and map it to the bitmap for page enabling code in ARM, and it would do exactly what you'd think. Or, define a new opcode, say STP, which enables paging, and map that to the bitmap. At the minute, I'm working on the disassembler, however will soon start on the assembler. Any ideas, feature requests, criticisms (constructive only, please) are more than welcome, and anyone who is well versed in law, I would appreciate it if you'd help me to write licensing terms (both proprietary and OS), as I'm not too good with that stuff, and I'd end up in a big pot of boiling water
The link, by the way, is
https://launchpad.net/universal-disassembler. (Note that the file format, DOS, is not fully implemented right now. Far from it. Still much to be done before 1.0)
Re: Disassembler/Assembler pair
Posted: Mon Jan 16, 2012 2:35 am
by Solar
CWood wrote:My first project under this scheme, will be the Universal Disassembler, and Universal Assembler (UDASM and UASM respectively). What differentiates these from NASM, AS, etc. is: it will be fully user configurable. You will be able to define files, filled with opcode definitions, and it will read accordingly.
Please state how this is different from GNU binutils' BFD / opcode library. (You are aware of those, are you?)
...anyone who is well versed in law, I would appreciate it if you'd help me to write licensing terms (both proprietary and OS)...
Anyone who is well versed in law wouldn't do that without a contract, because if you're well versed in law - or merely
say that you are - and give legal advice (like helping to write licensing terms), you become
liable. That's where the acronym "IANAL" (I am not a lawyer) came from.
If you really want to write up a proprietary license, find a lawyer who specializes in this stuff, and pay him to help you. On the other hand,
don't do this until you
have something worth releasing commercially. Because chances are good you won't, ever. (Nothing personal, it's just that the vast majority of OS projects never gets to that level of completion.)
Re: Disassembler/Assembler pair
Posted: Mon Jan 16, 2012 3:45 am
by bluemoon
CWood wrote:and here's the clever bit, you can even program it to program 2 architectures in EXACTLY the same way! I could use x86 ASM on the ARM architecture, or vice versa, or even invent my own to use on both. The reason this works? It takes the opcode string, and maps it to a bitmap, of arbitrary length.
The tricky part is to optimize the assembly code on both platform. Instruction sequence optimized on A cpu may run slow on B cpu...to make it worst each CPU version may need different optimization direction.
Then, either
1) you have some source-level translation that runs slow on some platform
2) your optimizer take control and remove the fun from assembly programmer, so they switch to HLL, say, C.
3) Programmer end up need to write separated code
Sometime speed may not be an issue, then you have a C compiler for that.
Re: Disassembler/Assembler pair
Posted: Mon Jan 16, 2012 12:33 pm
by CWood
Thank you both for your comments!
@Solar:
No, I hadn't heard of those, if I'm to be honest, but doing some research, by the looks of things, they generate the opcodes/data formats at
compile time, whereas the UDASM/UASM generate at
runtime, meaning it can be configured on-the-fly. I could be wrong, my research was only a quick Google search, so please correct me if I'm mistaken (I'm looking more into it now
.
And as for the legal side of things, for one thing, I want an open source license, which restricts what people can develop (not strictly OS, but if I end up with clever people implementing the paid features, I've lost a product...). Being honest, I'd be just as happy if someone gave me the resources/knowledge, or pointed me to them, and I write it myself, just as long as it gets written.
@bluemoon:
Not exactly, the way it works is, it has a bunch of files, which define all of the rules for a particular CPU/syntax. For instance, x86 Intel syntax would be one, ARM another. If you define the same opcode in both, but change the binary output, it *should* be optimized on both CPUs. For instance, I happen to know that (in hex, ofc), enabling paging in x86 is 0F20C06683C8010F22C0. I don't know what it is on ARM, but in the ARM definition file, you could specify the instruction STP (using my earlier example), to whatever the paging code is for that CPU. Its all CPU independent, and the user sets the target via a command line flag, (or 2, which will grow as it develops, I've got a number of ideas on the backburner now).
Hope this clarifies things,
Connor
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 5:07 am
by Solar
CWood wrote:@Solar:
No, I hadn't heard of those, if I'm to be honest, but doing some research, by the looks of things, they generate the opcodes/data formats at compile time, whereas the UDASM/UASM generate at runtime, meaning it can be configured on-the-fly.
Well... what for?
Architectures aren't exactly volatile targets, you know. And of the already-few assembler programmers out there,
even fewer will have the know-how (or, even, the desire) to change an architecture definition...
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 8:02 am
by CWood
@berkus:
Right now, it needs all of the definitions by hand, but that's only because it's early days. As I further develop the disassembler, the need for this will be reduced.
And the ADD r/m8, r8 thing, right now anyway, would need separate definitions for r8 and m8. This may change as the project gets more mature, however. Still in its infancy.
As for the licensing issue, I am intending (later on) to have 2 versions; commercial, and open source. Obviously, the commercial one will have fancier features, such as a GUI (and a bunch of features in that), etc.
And you're right, one of the sole purposes of this is, similar to Sol_Asm, to separate myself from existing toolchains
greedy i know
@Solar:
I'm aware that architectures don't change much, but with my design, as I've already said, you can define your own opcodes, that are, in fact, combinations of existing ones, pushing asm the way of low-level LISP. Furthermore, (and this is the reason for starting this project, by the way) if you want to (dis)assemble for an unsupported architecture (and later, file format/platform, etc.) all you need is the specs, and write your own config files, for said architecture. Easier to work on less-supported architectures, and a billion times easier having one tool for all architectures/platforms.
Thanks,
Connor
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 8:44 am
by Solar
CWood wrote:I'm aware that architectures don't change much, but with my design, as I've already said, you can define your own opcodes, that are, in fact, combinations of existing ones, pushing asm the way of low-level LISP.
Personally, I think that mixing opcodes and macros is a design error. The result is source code where one half of the opcodes is documented in the relevant CPU manuals, and the other half in the config file - *IF* you are lucky...
I understand where you'd want to make supporting a new architecture easy. I
don't understand why this should be configurable at runtime.
I also still think that you're trying to reinvent the BFD / libopcode wheel.
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 10:50 am
by CWood
@berkus:
Good point about the null bytes. Didn't see that
but in all fairness, the python version is only a prototype, for v1.0, it will be rewritten in C, and that should have been rewritten (I hope).
As for the commercial side of things, granted, it is a very niche market, however IDA Pro seems to have done alright (granted, I'll probably never be that big), and I'm going to release a few more products, on top of these two, under the Universal Development Suite, or UDS; debugger, compiler, and project management, for instance. And, if I'm to be honest, I'm only really doing it for fun anyway, so if I can make a $ or 2 while I'm at it, why not?
And as for the whole LISP vs Forth thing, that wasn't really the point. I only said that to illustrate the concept I'm getting at here, pertaining to the ability to customize the language to what you need.
@Solar:
I see where you're coming from, but this is only an issue on group projects, not personal ones, and in the event of group projects, these things should be documented. Again, my argument about LISP stands - for group LISP projects, the defined macros should be documented. And, having said that, it stands to reason that this argument applies to any programming language - all functions, definitions, macros, etc. should be documented somewhere, whether defined in source or config or what (just my opinion). And besides, I'm only doing this for fun, and to teach myself about assemblers/disassemblers, and (hopefully) get a useful package out of it. If I don't sell anything, or if I've reinvented an existing software package, ah well, add it to my portfolio, shelf it, and move on
Cheers,
Connor
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 10:56 am
by bluemoon
CWood wrote:I'm only doing this for fun, and to teach myself about assemblers/disassemblers
In that case, I suggest to skip the productization part, which is relatively boring. You could have spent such effort (on productization) on other place to make profit.
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 11:03 am
by Solar
A good language will
help with the documentation. Mixing opcodes (as defined by the CPU manufacturer) and opcodes (as defined by the writer of a config file for your tool) in one and the same language construct (opcodes) is the opposite: Obfuscation.
And besides, I'm only doing this for fun, and to teach myself about assemblers/disassemblers, and (hopefully) get a useful package out of it.
And for both the learning and the "getting a useful package out of it", stuff like "how will it handle later on?" are part of the deal.
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 2:32 pm
by AJ
Hi,
berkus wrote:Planning to sell something you have no idea about?
Something most PC superstores do every day
. Admittedly, the shop assistants there tend not to be involved in the design and production processes.
Cheers,
Adam
Re: Disassembler/Assembler pair
Posted: Tue Jan 17, 2012 3:33 pm
by CWood
In all fairness, one never stops learning, no matter how much one knows about a given area. If the senior developers at M$ were to go back now, and clean out all of the cruft in Windoze, and rewrite functions, one by one, it would improve tenfold. Guaranteed. Life's a learning curve, and there are no 'experts;' merely people with more knowledge/experience, and knowledge/experience is something anyone can acquire, with the right mindset.
Cheers,
Connor
Re: Disassembler/Assembler pair
Posted: Wed Jan 18, 2012 1:33 am
by Solar
I'm not sure they'd even want to. Microsoft reaped heaps of money with their products, and that's what a company is all about. Why should they change anything?
Re: Disassembler/Assembler pair
Posted: Wed Jan 18, 2012 2:11 am
by Solar
I remember the story how they burned significant time and money to find out why SimCity ran fine on DOS but crashed on Windows. (Microsoft did this, not the manufacturer of SimCity, mind you.) It turned out that SimCity accessed memory right after free()ing it. The solution? Microsoft introduced a special mode to the memory manager so that this did not lead to creashes. The reasoning was thus: SimCity was a huge hit, and for most people the 30-day-return-time had long since expired. Windows was new, so it was up to the newcomer to make sure things didn't break for the customer.
Compare that with the update policy of e.g. the Linux kernel and smoke it...
Re: Disassembler/Assembler pair
Posted: Wed Jan 18, 2012 12:41 pm
by CWood
Ok, my bad, that was a bad example. Due to backwards compatibility, and their inherent nature, OS's take on a different set of rules altogether than other software packages. But it still stands that, as I gain more knowledge, it would be a worthwhile exercise deleting and rewriting old parts of the software, applying my new knowledge, and bettering the project. And given that people never stop learning, that is something that will be done until I can no longer program, for whatever reason, or the project dies.