C file functions like Lisp's

Programming, for all ages and all languages.
Post Reply
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

C file functions like Lisp's

Post by Alboin »

Hello,
Is there a method (In the stdlib) of writing C structures directly to a file so that all I have to do is read the file back to a variable\pointer, and resume what I was doing before the write? (Like in Lisp, where you can just print out a list to a file and read it back without any processing, because all is taken care of by the lisp machine.) Sorry if this makes no sense, I really just don't know how to put it... :) Thanks!
C8H10N4O2 | #446691 | Trust the nodes.
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

Yes, fread and fwrite are good for that :)

fread(void *ptr, size_t var_size, size_t var_count, FILE *in_file);
fwrite(void *ptr, size_t var_size, size_t var_count, FILE *out_file);

So, if you have a a struct:

typedef struct Test_S
{
int x, y;
float r,z;
} ;

And you have it defined as such:

struct Test_S Testing;

you can now write it to a file:

Code: Select all

FILE *file;
file = fopen("test.bin","wb"); //wb = write binary
fwrite(&Testing,sizeof(struct Test_S),1,file);
And read it as so:

Code: Select all

FILE *file;
file = fopen("test.bin","rb"); //rb = read binary
fread(&Testing,sizeof(struct Test_S),1,file);
Just be careful when you are using pointers in a struct (like for strings!) as it will write out the pointer rather than the data it points to.

like this stuff:

Code: Select all

typedef struct Test_S
{
 int x, y;
 char *name;
};
It will write out x and y, and the location of the pointer of name, you would have to specifically tell it to write out the data pointed to by name by looping until it hit's a NULL. To do this is pretty simple as well, or you can use strlen to find the lenth, and pass the pointer to fwrite, but still need to read one by one until you hit a null.

Code: Select all

struct Test_S Testing;

void WriteTest_S(struct Test_S *ptr, FILE *out)
{
 fwrite(ptr->x,sizeof(int),1,out);
 fwrite(ptr->y,sizeof(int),1,out);
 fwrite(ptr->name,strlen(ptr->name)+1,1,out); //Write it out plus 1 (for NULL)
}

void ReadTest_S(struct Test_S *ptr, FILE *in)
{
 char buffer[512]; //Random buffer for now.
 int len=0;
 fread(ptr->x,sizeof(int),1,in);
 fread(ptr->y,sizeof(int),1,out);
 do{
  fread(&buffer[len],sizeof(char),1,in);
  ++len;
 } while (buffer[len-1]!=0 || len==511); //Loop while buffer[len]!=0 or we are going to overflow!
 buffer[len] = 0; //If we terminated on len==511, there is no null terminator, so we add one!
 name = (char*)malloc(len+1);
 memcpy(ptr->name,buffer,len+1); //Copy it plus null!
}
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

:) Thanks! ... That's what I thought, but I wasn't sure if it would be compatible throughout different systems and compilers. (Endianness, structure formats, etc) Uhh.. So how cross platform is this technique?
C8H10N4O2 | #446691 | Trust the nodes.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Alboin wrote:So how cross platform is this technique?
Not at all, of course. ;-)

It's not "smart" in any way. Bytes get dumped from memory to file, period. Anything beyond that (i.e., endianess, struct padding, width of datatypes) is your responsibility.
Every good solution is obvious once you've found it.
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

Yeah, C isn't very cross-platform compatible, there are ways, but it'd be a real pain to get the indian-ness stuff to be non-specific. You could write 2 versions, one for big and one for little, and then check which way it is stored and use the proper load/save function. Or you could have it convert all variables to a known indianess and then write it out as a char array.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

Nonono, it's not hard at all. Look at it this way: Most cross-platform formats, like JPEG or MP3 or such, already define such matters as endianess, so you have to stick to that anyway. And if you want some application of yours to be cross-platform, you better define your format, instead of just dumping your in-memory objects to file and hoping another platform will read them in OK.

You know, that's how Microsoft ended up with that bogus MS Word format in the first place - they just binary-dumped stuff, and then couldn't open up the format, either because of "intelectual property" (their claim) or because they plain and simply didn't know anymore (my claim).

Bottom line, even if C were "cross-platform" and would define struct padding and endianess and everything, that would just eat ressources and give nothing in return, because cross-platform is achieved by well-defined formats, not "portable" binary dumping / loading.
Every good solution is obvious once you've found it.
Ready4Dis
Member
Member
Posts: 571
Joined: Sat Nov 18, 2006 9:11 am

Post by Ready4Dis »

Solar wrote:Nonono, it's not hard at all. Look at it this way: Most cross-platform formats, like JPEG or MP3 or such, already define such matters as endianess, so you have to stick to that anyway. And if you want some application of yours to be cross-platform, you better define your format, instead of just dumping your in-memory objects to file and hoping another platform will read them in OK.

You know, that's how Microsoft ended up with that bogus MS Word format in the first place - they just binary-dumped stuff, and then couldn't open up the format, either because of "intelectual property" (their claim) or because they plain and simply didn't know anymore (my claim).

Bottom line, even if C were "cross-platform" and would define struct padding and endianess and everything, that would just eat ressources and give nothing in return, because cross-platform is achieved by well-defined formats, not "portable" binary dumping / loading.
That was kinda my point, you need to have a set file format, then write the loader for it, whether you chose big or little endian is up to you, also as mentioned, some compilers use padding in their structures to align data on 4-byte boundaries (or other boundaries if they want). So if you are using something similar, make sure you tell it not to pad.
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

Solar wrote:Nonono, it's not hard at all. Look at it this way: Most cross-platform formats, like JPEG or MP3 or such, already define such matters as endianess, so you have to stick to that anyway. And if you want some application of yours to be cross-platform, you better define your format, instead of just dumping your in-memory objects to file and hoping another platform will read them in OK.

You know, that's how Microsoft ended up with that bogus MS Word format in the first place - they just binary-dumped stuff, and then couldn't open up the format, either because of "intelectual property" (their claim) or because they plain and simply didn't know anymore (my claim).

Bottom line, even if C were "cross-platform" and would define struct padding and endianess and everything, that would just eat ressources and give nothing in return, because cross-platform is achieved by well-defined formats, not "portable" binary dumping / loading.
Yet, what would the disadvantages be by simply creating a cross-platform method of dumping straight memory instead of complex file formats? By reading memory directly you would save a lot of time parsing files and transferring that data into the same structure your would already have by simply reading it from a file. Moreover, it would work for nearly every application in need of storing data. (That is, as far as I can see; Which is, somewhat near.)
C8H10N4O2 | #446691 | Trust the nodes.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

Alboin wrote:
Solar wrote:Nonono, it's not hard at all. Look at it this way: Most cross-platform formats, like JPEG or MP3 or such, already define such matters as endianess, so you have to stick to that anyway. And if you want some application of yours to be cross-platform, you better define your format, instead of just dumping your in-memory objects to file and hoping another platform will read them in OK.

You know, that's how Microsoft ended up with that bogus MS Word format in the first place - they just binary-dumped stuff, and then couldn't open up the format, either because of "intelectual property" (their claim) or because they plain and simply didn't know anymore (my claim).

Bottom line, even if C were "cross-platform" and would define struct padding and endianess and everything, that would just eat ressources and give nothing in return, because cross-platform is achieved by well-defined formats, not "portable" binary dumping / loading.
Yet, what would the disadvantages be by simply creating a cross-platform method of dumping straight memory instead of complex file formats? By reading memory directly you would save a lot of time parsing files and transferring that data into the same structure your would already have by simply reading it from a file. Moreover, it would work for nearly every application in need of storing data. (That is, as far as I can see; Which is, somewhat near.)
Read up on how file formats were developed and learn.

The first formats were plain dumps of memory, you load them at location X and reboot the machine. Embedded devices still work like this.

Then you got a bit where you needed to load a file, and then another, without unloading the first. You needed to be able to load it somewhere it wasn't originally intended.

You needed to be able to load it at any given location (next step), which inspired formats like a.out.

You then wanted it to be possible to load it at two places at once, leading to things like PIC.

After a while you got more complex loading and unloading mechanisms that for instance load the initial sections, construct the actual output, unload them, load the main section etc.


This last format exists in a few different styles such as PE and ELF. You can go back to the first in the list, but you'll end up with the last or something very similar unless you see a point in this development chain where you're going to diverge.
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

I'm not really talking about executable file formats. (In which case your perfectly on the dot.) I'm talking about simple configuration files, database-type-things, and the like. Where usually one has the data already stored in a structure in the main program. I was thinking that simply dumping the structures might have more advantages than using a custom made file format.
Candy wrote: Read up on how file formats were developed and learn.
I have actually read somewhat a little on the subject, and I am in the knowledge of their development, and have learned a tad about them. Not a whole lot mind you, but enough to find my way.

Please don't assume that I know nothing about file formats, and am posting this message in ignorance of existing standards. Thanks.
C8H10N4O2 | #446691 | Trust the nodes.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

Alboin wrote:I'm not really talking about executable file formats. (In which case your perfectly on the dot.) I'm talking about simple configuration files, database-type-things, and the like. Where usually one has the data already stored in a structure in the main program. I was thinking that simply dumping the structures might have more advantages than using a custom made file format.
Candy wrote: Read up on how file formats were developed and learn.
I have actually read somewhat a little on the subject, and I am in the knowledge of their development, and have learned a tad about them. Not a whole lot mind you, but enough to find my way.

Please don't assume that I know nothing about file formats, and am posting this message in ignorance of existing standards. Thanks.
I apologise, I wrote the above note in a bit too paternal tone. I shouldn't have.

For configuration files, you considered stuff like INI-style files, full-blown databases and XML files?

In that case, the disadvantages I can see from a straight memory dump are:
- You can't change the structure without invalidating all current outputs
- In space-terms it's inefficient
- It's not portable, not even the least bit
- It doesn't hide anything about internal state
- It's not modifiable by users without using your program
- Having a corrupt dump will require removing it and/or telling the program to regenerate it.

Advantages would be:
- Using mmap for reading settings, lots quicker
- No load/store logic that can fail
- In speed terms it's fast
- You can ignore validation since your program is the only one that can adjust it
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

Thanks!

For the disadvantages, here's how I thought to possibly overcome some of them:
Candy wrote: - In space-terms it's inefficient
zlib :D
Candy wrote: - It's not portable, not even the least bit
Candy wrote: - It doesn't hide anything about internal state
Candy wrote: - It's not modifiable by users without using your program
I was thinking about that... What about a portability layer (library, actually.) that would handle all transactions in a platform independent manner? This way, external programs could use the same library, and the internal state would be hidden.
Candy wrote: - Having a corrupt dump will require removing it and/or telling the program to regenerate it.
What about using an md5 type system, that embeds (or goes along with it some other way.) itself into the file?

After writing this it seems that this would be more of a file format in itself, except that it would be very fast (As long as it's implemented correctly.) and efficient. Moreover, it would be usable in a broad spectrum if applications.
C8H10N4O2 | #446691 | Trust the nodes.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Post by Candy »

Alboin wrote:Thanks!

For the disadvantages, here's how I thought to possibly overcome some of them:
Candy wrote: - In space-terms it's inefficient
zlib :D
That's not entirely the same system and depending on the implementation negates one or more of the advantages or doesn't compress much.

If you compress in chunks, you can read a page at a time (or such) but lose compression, If you compress the whole, you need to read it in and lose mmap ability. If you compress it a bit at a time, you're practically re-inventing the thing that XML was intended to replace.
Candy wrote: - It's not portable, not even the least bit
Candy wrote: - It doesn't hide anything about internal state
Candy wrote: - It's not modifiable by users without using your program
I was thinking about that... What about a portability layer (library, actually.) that would handle all transactions in a platform independent manner? This way, external programs could use the same library, and the internal state would be hidden.
That also up to a level amounts to taking the dump and translating it to an existing format, and back again. That system has been done once or twice, but from experience I can tell you that it isn't the kind of system you'd like to make. Especially keeping it in sync with the dump types is a pain.
Candy wrote: - Having a corrupt dump will require removing it and/or telling the program to regenerate it.
What about using an md5 type system, that embeds (or goes along with it some other way.) itself into the file?

After writing this it seems that this would be more of a file format in itself, except that it would be very fast (As long as it's implemented correctly.) and efficient. Moreover, it would be usable in a broad spectrum if applications.
If you make this work in a generic and quick way, it's a good thing. I would check out database concepts before you decide on the format. I can't help you there but I would be very interested in in particular indexing methods you would use for speedy accesses.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Post by Solar »

You want error checking, perhaps some metadata, and you want to be able to significantly change / update the format without invalidating old output (see point previously made). You might also want to have your data interchangeable with other applications...

I'd suggest looking at some of the "better thought-out" formats, for example PNG. That might give you an idea of the "added value" of a format.
Every good solution is obvious once you've found it.
User avatar
Alboin
Member
Member
Posts: 1466
Joined: Thu Jan 04, 2007 3:29 pm
Location: Noricum and Pannonia

Post by Alboin »

Solar wrote:You want error checking, perhaps some metadata, and you want to be able to significantly change / update the format without invalidating old output (see point previously made). You might also want to have your data interchangeable with other applications...

I'd suggest looking at some of the "better thought-out" formats, for example PNG. That might give you an idea of the "added value" of a format.
Actually, despite previous messages, this library of sorts would not dump the direct memory. It would actually create a format(however small that may be.) for the data. The entire purpose of the library would something to the following.

Code: Select all

struct p {
     int i;
     char c;
};

int main() {
     struct p a;
     a.i = 90;
     a.c = 'M';
    FILE *fp = fopen("whatever", "wb");
     struct_write(fp, &a, INT, CHAR);
     fclose(fp);

    struct p b;
    fp = fopen("whatever", "rb");
    struct_read(fp, &b);
    fclose(fp);
}

The main point is speed. Yes, simply speed. 8) (That is, eliminating the abstraction layer between something like a configuration file, and how the configuration file is stored in the program.)
C8H10N4O2 | #446691 | Trust the nodes.
Post Reply