Page 1 of 2
deleting file contents
Posted: Tue Feb 08, 2005 1:07 pm
by rich_m
i need to know how data can be deleted from a file (the concept)
i'm trying to do it using C , the way i do it seems to be a crazy one i.e.
say i want to delete the first two lines of a file "mail.txt",------>
i make my program open this in "r" mode, make the file pointer pass through the characters till it comes across two '\n' characters ,
then i start coping the content on to a temproary file , now this file has the contents i want .
i now copy it on to the original file by opening the orginal file(mail.txt) in the "w" mode. so i'm overwriting the file with contents it already had.
this doesnt seem to be the right way, although it works.....how is it usually done ???
Re:deleting file contents
Posted: Thu Feb 10, 2005 11:16 am
by rich_m
well i guess my question wasnt clear enough. :-\ . I just want to know how i could delete specific data from a file, using a program. i'm tring it using "C".
Any help would be appriciated.
Re:deleting file contents
Posted: Thu Feb 10, 2005 2:26 pm
by Curufir
Well one scheme would be this:
Create memory buffer that's size BUFF_SIZE (x disk sectors long).
One int to maintain current read position in the file (FILE_OLD_POS) = 0.
One int to maintain next point at which the buffer will be written out (FILE_NEW_POS) = 0.
One int to record how much data the buffer currently contains (BUFF_POS) = BUFF_SIZE.
Open the file for read/write operations.
While BUFF_POS == BUFF_SIZE and EOF is not reached [sup]1[/sup].
{
Zero BUFF_POS
Seek to FILE_OLD_POS
While BUFF_POS != BUFF_SIZE and EOF is not reached.
{
Read the file into the buffer, discarding what you want, while updating BUFF_POS to reflect the true size of data currently in the buffer.
}
Store file position as FILE_OLD_POS, seek to FILE_NEW_POS, write BUFF_POS bytes to the file.
Store current file position as FILE_NEW_POS
}
Close the file.
Truncate the file to the new length FILE_NEW_POS.
There are probably more elegant ways to do it, but I hope that helps.
[sup]1[/sup]The EOF check just handles the case where the final part of the fiile to be written out is exactly the same size as the buffer (Won't happen often, but it WILL happen).
Re:deleting file contents
Posted: Thu Feb 10, 2005 4:15 pm
by B.E
Curufir
That would work for small files but what about big files (eg. bigger than the amout of memory you have got.
The bigger the file the slower the program will get.
Re:deleting file contents
Posted: Thu Feb 10, 2005 5:19 pm
by Candy
Open the source file once for reading, once for writing without truncation. Read the source until you have found all you don't want to be written. Create a buffer of any size you prefer, larger is faster but too large is too slow (as in, fit it in your memory). Read in the full buffer from the first, write to the second.
Convince yourself that you will never overwrite something you still have to read.
Re:deleting file contents
Posted: Thu Feb 10, 2005 5:27 pm
by rich_m
wow, so the idea i had in mind wasnt tat bad (the one i mentioned above). thanks for the explanation i'll start doing up the program. and post technical doubts in here ;D
Re:deleting file contents
Posted: Thu Feb 10, 2005 5:39 pm
by rich_m
hi suppose this is my file "members.txt"
and i want to only delete the string "David" , so the new file will look like this
now how do i do that? ???
Re:deleting file contents
Posted: Thu Feb 10, 2005 5:51 pm
by Curufir
B.E wrote:
That would work for small files but what about big files (eg. bigger than the amout of memory you have got.
That sample scheme is independent of memory size. It's true that the bigger the buffer the faster it will go, but you could have a 512 byte buffer and it would still do the same thing.
The bigger the file the slower the program will get.
That is an unavoidable consequence of the file being bigger, you can't selectively remove bytes without rewriting the whole file.
Re:deleting file contents
Posted: Fri Feb 11, 2005 2:47 am
by Solar
rich_m wrote:
now how do i do that? ???
Allow me to suggest an optimized method:
1) open members.txt (reading);
2) open members.tmp (writing);
3) read line from members.txt;
4) check whether you want it; if yes, write to members.tmp;
5) repeat from 3) until EOF;
6) close both files;
7) call rename( "members.tmp", "members.txt" ) from <stdlib.h>.
That spares you the effort of writing the new file back manually.
As for buffering, the buffer should be large enough for one line, for practical reasons. If you want additional buffering to improve performance, the setbuf() / setvbuf() functions of <stdio.h> are probably easier to handle than doing it manually, and leave the handling of the buffers in the hands of the library, where it belongs (IMHO).
Re:deleting file contents
Posted: Fri Feb 11, 2005 7:35 am
by Candy
You can overwrite the old file and truncate it, saves you a temp file (which can save considerable space and is easier on the FS management code).
Re:deleting file contents
Posted: Fri Feb 11, 2005 8:02 am
by Solar
Hm?
While he still has to read from it, he can't truncate it. The only way I see for getting around a temporary file is opening it read / write and jumping to and fro... but that smells of fish.
I think there's nothing easier on the file system than the rename() I suggested, which is effectively an overwrite.
Re:deleting file contents
Posted: Fri Feb 11, 2005 8:08 am
by Candy
Solar wrote:
Hm?
While he still has to read from it, he can't truncate it. The only way I see for getting around a temporary file is opening it read / write and jumping to and fro... but that smells of fish.
I think there's nothing easier on the file system than the rename() I suggested, which is effectively an overwrite.
You allocate twice the space, then free the old half (which was allocated and defragmented!) and rename the new part to the old name, messing up things like hard links.
If you open the file without truncation, overwrite it while reading, then close it, open it at that location (you must be able to do that though, or have some trunc() function call) and truncate it, you do not copy the file and retain the old inode. For windows computers and student projects however, this is all probably moot since you just want a way that works, not reliable or efficient.
Also, this works on stuffed filesystems. It's also slightly easier on the cache.
Re:deleting file contents
Posted: Fri Feb 11, 2005 8:19 am
by Solar
Ah, got you now.
Well, I fear there's no way to do a truncate-at, at least none I know of (but I haven't implemented <stdio.h> yet so I might be wrong). I consider hardlinks a broken concept, and anyways they aren't supported by the standard library.
So at that point, it's an OS API issue.
Re:deleting file contents
Posted: Fri Feb 11, 2005 9:44 am
by Curufir
Solar wrote:
Well, I fear there's no way to do a truncate-at, at least none I know of (but I haven't implemented <stdio.h> yet so I might be wrong).
There are a couple of truncate (truncate/ftruncate) functions in GLibc (Some are defined by POSIX, but I don't know how portable they are. YMMV). That's why the scheme I presented reused the old file.
If I was thinking of max efficiency I'd probably look at using mmap, treating the whole thing like it was a big array of chars, using ftruncate at the end to set the new length. Effectively letting the OS/Libc handle all the details for me.
That being said I'd recommend Solar's scheme because:
a) It's far more portable
b) If something goes horribly wrong (Power failure etc) you haven't screwed up the original file.
Re:deleting file contents
Posted: Sun Feb 13, 2005 10:04 am
by ark
Strictly speaking you shouldn't assume that members.tmp doesn't exist. It's probably safer to use the standard library's support for temporary files to determine what to name the temp file.