deleting file contents

Programming, for all ages and all languages.
rich_m

deleting file contents

Post by rich_m »

i need to know how data can be deleted from a file (the concept)

i'm trying to do it using C , the way i do it seems to be a crazy one i.e.

say i want to delete the first two lines of a file "mail.txt",------>

i make my program open this in "r" mode, make the file pointer pass through the characters till it comes across two '\n' characters ,
then i start coping the content on to a temproary file , now this file has the contents i want .
i now copy it on to the original file by opening the orginal file(mail.txt) in the "w" mode. so i'm overwriting the file with contents it already had.

this doesnt seem to be the right way, although it works.....how is it usually done ???
rich_m

Re:deleting file contents

Post by rich_m »

well i guess my question wasnt clear enough. :-\ . I just want to know how i could delete specific data from a file, using a program. i'm tring it using "C".
Any help would be appriciated.
Curufir

Re:deleting file contents

Post by Curufir »

Well one scheme would be this:

Create memory buffer that's size BUFF_SIZE (x disk sectors long).
One int to maintain current read position in the file (FILE_OLD_POS) = 0.
One int to maintain next point at which the buffer will be written out (FILE_NEW_POS) = 0.
One int to record how much data the buffer currently contains (BUFF_POS) = BUFF_SIZE.

Open the file for read/write operations.

While BUFF_POS == BUFF_SIZE and EOF is not reached [sup]1[/sup].
{

Zero BUFF_POS

Seek to FILE_OLD_POS

While BUFF_POS != BUFF_SIZE and EOF is not reached.
{

Read the file into the buffer, discarding what you want, while updating BUFF_POS to reflect the true size of data currently in the buffer.

}

Store file position as FILE_OLD_POS, seek to FILE_NEW_POS, write BUFF_POS bytes to the file.

Store current file position as FILE_NEW_POS

}

Close the file.

Truncate the file to the new length FILE_NEW_POS.

There are probably more elegant ways to do it, but I hope that helps.

[sup]1[/sup]The EOF check just handles the case where the final part of the fiile to be written out is exactly the same size as the buffer (Won't happen often, but it WILL happen).
B.E

Re:deleting file contents

Post by B.E »

Curufir

That would work for small files but what about big files (eg. bigger than the amout of memory you have got.

The bigger the file the slower the program will get.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:deleting file contents

Post by Candy »

Open the source file once for reading, once for writing without truncation. Read the source until you have found all you don't want to be written. Create a buffer of any size you prefer, larger is faster but too large is too slow (as in, fit it in your memory). Read in the full buffer from the first, write to the second.

Convince yourself that you will never overwrite something you still have to read.
rich_m

Re:deleting file contents

Post by rich_m »

wow, so the idea i had in mind wasnt tat bad (the one i mentioned above). thanks for the explanation i'll start doing up the program. and post technical doubts in here ;D
rich_m

Re:deleting file contents

Post by rich_m »

hi suppose this is my file "members.txt"

Code: Select all

Antony
Samuel
David
Joe
Harry
and i want to only delete the string "David" , so the new file will look like this

Code: Select all

Antony
Samuel
Joe
Harry
now how do i do that? ???
Curufir

Re:deleting file contents

Post by Curufir »

B.E wrote: That would work for small files but what about big files (eg. bigger than the amout of memory you have got.
That sample scheme is independent of memory size. It's true that the bigger the buffer the faster it will go, but you could have a 512 byte buffer and it would still do the same thing.
The bigger the file the slower the program will get.
That is an unavoidable consequence of the file being bigger, you can't selectively remove bytes without rewriting the whole file.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:deleting file contents

Post by Solar »

rich_m wrote: now how do i do that? ???
Allow me to suggest an optimized method:

1) open members.txt (reading);
2) open members.tmp (writing);
3) read line from members.txt;
4) check whether you want it; if yes, write to members.tmp;
5) repeat from 3) until EOF;
6) close both files;
7) call rename( "members.tmp", "members.txt" ) from <stdlib.h>.

That spares you the effort of writing the new file back manually.

As for buffering, the buffer should be large enough for one line, for practical reasons. If you want additional buffering to improve performance, the setbuf() / setvbuf() functions of <stdio.h> are probably easier to handle than doing it manually, and leave the handling of the buffers in the hands of the library, where it belongs (IMHO).
Every good solution is obvious once you've found it.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:deleting file contents

Post by Candy »

You can overwrite the old file and truncate it, saves you a temp file (which can save considerable space and is easier on the FS management code).
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:deleting file contents

Post by Solar »

Hm?

While he still has to read from it, he can't truncate it. The only way I see for getting around a temporary file is opening it read / write and jumping to and fro... but that smells of fish.

I think there's nothing easier on the file system than the rename() I suggested, which is effectively an overwrite.
Every good solution is obvious once you've found it.
User avatar
Candy
Member
Member
Posts: 3882
Joined: Tue Oct 17, 2006 11:33 pm
Location: Eindhoven

Re:deleting file contents

Post by Candy »

Solar wrote: Hm?

While he still has to read from it, he can't truncate it. The only way I see for getting around a temporary file is opening it read / write and jumping to and fro... but that smells of fish.

I think there's nothing easier on the file system than the rename() I suggested, which is effectively an overwrite.
You allocate twice the space, then free the old half (which was allocated and defragmented!) and rename the new part to the old name, messing up things like hard links.

If you open the file without truncation, overwrite it while reading, then close it, open it at that location (you must be able to do that though, or have some trunc() function call) and truncate it, you do not copy the file and retain the old inode. For windows computers and student projects however, this is all probably moot since you just want a way that works, not reliable or efficient.

Also, this works on stuffed filesystems. It's also slightly easier on the cache.
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re:deleting file contents

Post by Solar »

Ah, got you now.

Well, I fear there's no way to do a truncate-at, at least none I know of (but I haven't implemented <stdio.h> yet so I might be wrong). I consider hardlinks a broken concept, and anyways they aren't supported by the standard library.

So at that point, it's an OS API issue. ;)
Every good solution is obvious once you've found it.
Curufir

Re:deleting file contents

Post by Curufir »

Solar wrote: Well, I fear there's no way to do a truncate-at, at least none I know of (but I haven't implemented <stdio.h> yet so I might be wrong).
There are a couple of truncate (truncate/ftruncate) functions in GLibc (Some are defined by POSIX, but I don't know how portable they are. YMMV). That's why the scheme I presented reused the old file.

If I was thinking of max efficiency I'd probably look at using mmap, treating the whole thing like it was a big array of chars, using ftruncate at the end to set the new length. Effectively letting the OS/Libc handle all the details for me.

That being said I'd recommend Solar's scheme because:
a) It's far more portable
b) If something goes horribly wrong (Power failure etc) you haven't screwed up the original file.
ark

Re:deleting file contents

Post by ark »

Strictly speaking you shouldn't assume that members.tmp doesn't exist. It's probably safer to use the standard library's support for temporary files to determine what to name the temp file.
Post Reply