I couldn't investigate this website for a period(maybe blocked by government), and when i came back I found the threads were a little off-topic, although it's because my unclear explanation.
So, pay attention please, I am not writing a text editor.
My 'vi-libraryi' works like this:
Code: Select all
/* @file hellovi.c */
#include "vi.h"
struct vi *vi = vi_new();
vi_loadfile(vi, "test.txt");
do{
vi_xor(vi); // key ^
if( vi_h(vi) ) vi_d0(vi);
} while(vi_j(vi));
vi_writefile(vi, "test.txt");
this library was originally designed for lua-shell (a *nix shell i wrote for myself). So, with this library registered as module into lua virtual machine, the lua-shell script will be like this(another example):
Code: Select all
local rely = `gcc -MM hellovi.c`
vim = Vi:new();
vim:loadstr(rely)
@yEf:r phrd
vm:print()
hellovi.o hellovi.d: hellovi.c vi.h
@ and ` are lua-shell script syntax candy to run command and pass operation sequence to 'vim' instance.
The script above is used for makefile.
And, with a binary program(named vp) based on 'vi-library', the script is equal to:
gcc -MM hellovi.c | vp "yEf:r phrd"
These are situations where 'vi-library' is used and how it works. It will never be a visual text editor. It works like a programmable vi-editor, it's virtual.
Anyway, thanks for everyone's exciting words, i learned a lot.
BTW @Brendan
Thanks for your suggestion. Your data structure is helpful. I was also thinking about that in fact.
Now I have already changed my len_of_xx to struct viline{..}, and rewrote all relative codes. It looks much more clean now!
As with int32_t, int 64_t ..., I prefer traditional int , long long ..., i think i can handle portability issues even without those data types.
Organizing blocks with linked list is nice, I will use it if someday i want to support large size file.
===================================Original post=======================
Hi, friends.
I am writing a c library that operates string(or file) in the way vim behaves, it works like a virtual vim.
I have developed this library for not a short time, and recently I am writing codes to support utf8 characters.
A problem I met is that, once considering utf8, some APIs will become complicate, I want to keep its high performance so I public this post, and let you see whether I am writing code in a right way.
This library's core structure is :
Code: Select all
struct vi{
char *curr; /* a pointer to character on which the virtual cursor stands currently*/
int currl; /* current line number */
char * lines[]; /* a two dimension array. Corresponding to the lines in a file one by one, with '\n' as EOL*/
int len_of_lines[]; /*the length of per line*/
....
}
Code: Select all
#define OFFSET_OF_CURR(vi) (vi->curr - vi->lines[vi->currl])
#define OFFSET_OF_EOL(vi) (vi->len_of_lines[vi->currl])
static inline bool vi_l(struct vi *vi){
if(OFFSET_OF_CURR(vi) + 1 < OFFSET_OF_EOL(vi)){
vi->curr++;
return true;
}
return false;
}
This is version 1:
Code: Select all
char __utf8_lenmap[]={
[1000b] = 1;
[1100b] = 2;
[1110b] = 3;
[1111b] = 4;
};
#define utf8len(c) __utf8_lenmap[ (((unsigned char)(c)) >> 4) ]
static inline bool vi_l(struct vi *vi){
if(vi->curr[0] < 128){
if(OFFSET_OF_CURR(vi) + 1 < OFFSET_OF_EOL){
vi->curr++;
return true;
}
return false;
}
int wlen = utf8len(vi->curr[0]);
if(OFFSET_OF_CURR(vi) + wlen < OFFSET_OF_EOL(vi)){
vi->curr += wlen;
return true;
}
return false;
}
For the second version, i extend the __utf8_lenmap[] to treat common ascii also as utf8, to reduce the 'if' branches and code size:
Code: Select all
char __utf8_lenmap[]={
[0] = 1;
[1] = 1;
[2] = 1;
[3] = 1;
[4] = 1;
[5] = 1;
[6] = 1;
[7] = 1;
[1000b] = 1;
[9] = 1;
[10] = 1;
[11] = 1;
[1100b] = 2;
[13] = 1;
[1110b] = 3;
[1111b] = 4;
};
static inline bool vi_l(struct vi *vi){
int char_len = utf8len(vi->curr[0]);
if(OFFSET_OF_CURR(vi) + char_len < OFFSET_OF_EOL(vi)){
vi->curr += char_len;
return true;
}
return false;
}
I will write testing code to see their performance differences when i finished the main programming job, and now, would you like tell me your opinion or suggestions?