Page 1 of 1

MS-COFF Format (*.lib)

Posted: Wed Jul 28, 2010 4:52 pm
by revel8n
Hello,
i am working on a project (cxbx/dxbx xbox emulation) that has need of various tasks, one of which being detection of function addresses and symbol locations. In an attempt to build a better method of detecting possible locations of these symbols i plan on parsing various bits of information from the library files used to build the applications that will run in this environment. i am having some trouble handling all the various possible cases that MS-COFF files present. Most notably i have run into 3 different structures following the IMAGE_ARCHIVE_MEMBER_HEADER structure - IMAGE_FILE_HEADER, IMPORT_OBJECT_HEADER, and ANON_OBJECT_HEADER, so far. As of yet i have not found any direct references or mention on how to determine when these structures are present or in the case of ANON_OBJECT_HEADER, what possibly follows after. i would like to be able to parse these files directly to access the various information contained in the raw symbol data, relocation information, and various other cross referenced symbols, etc. i would like to avoid parsing text output from tools like 'link' and 'dumpbin' to be able to read this information directly where possible.

- Are there any more in depth information sources on the intricacies of this format?

i'll probably have a lot more questions with regard to how VGA, PCI, USB, and other hardware access is handled in the near future, as a better understanding of these processes would be very helpful in moving forward with a decent HAL layer of emulation, but first things first i suppose.

Any information and/or other insights would be greatly appreciated.

Thanks in advance.

Re: MS-COFF Format (*.lib)

Posted: Thu Jul 29, 2010 1:01 am
by Combuster
Did you have a look at the official specification yet? If so, what things are unclear about it?
http://www.microsoft.com/whdc/system/pl ... ecoff.mspx

Re: MS-COFF Format (*.lib)

Posted: Thu Jul 29, 2010 5:45 am
by revel8n
The main parts of the specification were not clear at the original time of reading that the placement of the IMAGE_FILE_HEADER and IMPORT_OBJECT_HEADER were "overlapped", the Machine corresponding to Sig1, the NumberOfSections corresponding to Sig2. i am guessing and have somewhat confirmed from a few sources that the distinction of these two structures is controlled by the values set in these two values. i wasn't sure if that was the only way to know which was which.

Even more so, those two structures are not the only ones that can appear there it seems. i later found that Link-Time Code Generation based library files also contain what appears to be an ANON_OBJECT_HEADER, which has the same Sig1 and Sig2 behavior as IMPORT_OBJECT_HEADER, but a different layout after that. The header file mentions the Version of the ANON_ structure being greater than or equal 1 specifies whether or not the CLSID value is present, but there is no reference as to the other values of this structure, whether or not it is valid to use the Version to differentiate between IMPORT_ and ANON_ header structures, whether the IMPORT_ structure can have a non-zero version number, and so on.

The documentation did not even mention the ANON_OBJECT_HEADER, i only guessed that this was the structure by comparing structure variables and structure sizes manually in a hex editor. It very well could be that i have chosen the wrong structure here or that there could be other possibles since the documentation does not even list the possible presence of the other. It appears after further manual investigation that if the ANON_OBJECT_HEADER is present, assuming i have detected the correct way to determine this, that then a "normal" IMAGE_FILE_HEADER appears, but the information after that appears to contain the LTCG specific IL data rather than true raw data as is present in normal coff data.

So ultimately, assuming all my assumptions are correct i suppose i can tell when these structures occur now. It appears i will not be able to directly use much if any information directly from the LTCG versions of the files as i have no references on the interpretation of the data in this information. i was mainly wondering if this was already know information, with more complete references present, or if there was another source of more relevant findings that would specify whether i have come across the correct way to parse the data. It seems to "work" but is it the correct way, or are there other possible interpretations i have not come across yet?

Now i just need to work through the information i have already found, and hope there are no deviations from that, until i can find/confirm more information.