Inserting to end of a CONS list

Programming, for all ages and all languages.
Post Reply
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Inserting to end of a CONS list

Post by Schol-R-LEA »

I am presently trying to go through the Make A Lisp process as a confidence-building exercise, and after a bit of a wrong turn (involving an ugly abuse of C++ std::vector), I have taken the classic Lisp approach of representing the code as a CONS list rather than trying to use just strings as the 'official' process seems to be recommending.

For those unfamiliar, a CONS cell or CONS pair is a structure consisting of a pair of typeless pointers - typeless in the sense that they can point to any Lisp object, including other CONS pairs. While it can be used just to hold a pair of objects, the more common use is to form a singly-linked list; the first pointer (called CAR for historical reasons, or head) points to the first element of the list, while the second (historically called CDR, or tail), points to another CONS pair. These pairs are then chained together until the end of the list, which is indicated by a NULL in the CDR element. For example, the list (42 69 613) would be represented as:

Image

This is a surprisingly versatile structure, as you could use it to construct trees and other complex data structures implicitly, but that's not really relevant right now. My problem is with my implementation, and specifically, with how I am implementing the add (insert to the end of the list - I don't want to call it 'append' as that has a particular meaning in Lisp) operation.

In short, I don't seem to be overwriting the pointer I am trying to insert to.

The relevant parts of the class and type declarations (with irrelevant part elided with the '[...]') are as follows:

Code: Select all

typedef std::shared_ptr<Env_Symbol> EnvSymbolPtr;
typedef std::shared_ptr<Environment> EnvPtr;

// provide a type list for the different subclasses of MalObject.
// I felt that this was more useful for my purposes than trying
// to use typeid().
enum MalType
{
    MAL_OBJECT,
[...]
    MAL_COLLECTION, MAL_PAIR, 
[...]
};

// an array of strings representing the type names.
extern std::string mal_type_name[];

// shortcuts for the two most frequently used pointer types.
typedef std::shared_ptr<MalObject> MalPtr;
typedef std::shared_ptr<MalPair> PairPtr;


class MalObject
{
public:
    MalObject(MalType type): m_type(type) {};
    virtual std::string to_str() {return "";};
    MalType type() {return m_type;};
    std::string type_name() {return mal_type_name[m_type];};
    virtual size_t size() {return 1;};

    // type predicates
[...]
    virtual bool is_null() {return false;};
    virtual bool is_pair() {return false;};
    virtual bool is_list() {return false;};
[,,,]
    // return values of the subclasses
[...]
    virtual PairPtr as_pair() {throw InvalidConversionException(mal_type_name[m_type], mal_type_name[MAL_PAIR]); return nullptr;};


protected:
    MalType m_type;
};

[...]


class MalCollection: public MalObject
{
public:
    MalCollection(MalType type): MalObject(type) {};
    virtual bool is_collection() {return true;};
};


class MalPair: public MalCollection
{
public:
    MalPair(MalPtr car=nullptr, MalPtr cdr=nullptr): MalCollection(MAL_PAIR), m_car(car), m_cdr(cdr) {};
    virtual std::string to_str();
    virtual std::string to_str_continued();
    virtual bool is_null() {return (m_car == nullptr && m_cdr == nullptr);};
    virtual bool is_pair() {return true;};
    virtual bool is_list() {return (m_cdr == nullptr || m_cdr->is_pair());};
    virtual PairPtr as_pair() {return std::make_shared<MalPair>(m_car, m_cdr);};
    virtual size_t size();
    virtual MalPtr operator[](size_t index);
    virtual MalPtr car() {return m_car;};
    virtual MalPtr cdr() {return m_cdr;};
    virtual void add(MalPtr addition);       // inserts an element at the end of the list
protected:
    MalPtr m_car, m_cdr;
};
The add() method is as follows:

Code: Select all

void MalPair::add(MalPtr addition)
{
    if (m_car == nullptr)
    {
        if (m_cdr == nullptr)
        {
            // insert to the first element of the list
            m_car = addition;
        }
        else if (m_cdr->type() == MAL_PAIR)
        {
            // recurse over the list
            m_cdr->as_pair()->add(addition);
        }
        else
        {
            throw ImproperListException(this->to_str());
        }
    }
    else if (m_cdr == nullptr)
    {
        // insert to the end of the list
        m_cdr = std::make_shared<MalPair>(addition);
    }
    else
    {
        if (m_cdr->type() == MAL_PAIR)
        {
            // recurse over the list
            m_cdr->as_pair()->add(addition);
        }
        else
        {
            throw ImproperListException(this->to_str());
        }
    }
}


Now, this works correctly for adding the first and second elements to a list, but fails to insert elements past the second. My test code (using the doctest library header) is:

Code: Select all

    TEST_CASE("appending single element to list")
    {
        PairPtr test = std::make_shared<MalPair>();
        CHECK(test->to_str() == "()");
        test->add(std::make_shared<MalSymbol>("foo"));
        CHECK(test->to_str() == "(foo)");
        test->add(std::make_shared<MalKeyword>("bar"));
        CHECK(test->to_str() == "(foo :bar)");
// fails with both of the following tests
        test->add(std::make_shared<MalInteger>(69));
        CHECK(test->to_str() == "(foo :bar 69)");
        test->add(std::make_shared<MalRational>(mpq_class("17/23")));
        CHECK(test->to_str() == "(foo :bar 69 17/23)");
    }
I also tried using a while() loop rather than recursion, and got more or less the same result. My best guess is that I am somehow writing to a local copy of the CDR rather than the one I am trying to modify, but given that I am operating on pointers, this should not be the case I would think. I am at a loss as to why this isn't working.

Any suggestions as to what I am overlooking?
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
nullplan
Member
Member
Posts: 1779
Joined: Wed Aug 30, 2017 8:24 am

Re: Inserting to end of a CONS list

Post by nullplan »

And here I thought the idea of Lisp was to have unchanging data. I thought that was the goal. In particular, it is well known (to me, anyway), that adding an element to the end of a list always creates a copy of the entire list. That is so since the last element of the list must be changed, but in Lisp it cannot be changed, so instead a new last node is constructed, containing the data of the previous last node and the next pointer to the new last node. In order to use that node, the node in front of it has to be changed, but it cannot be, so it is changed. &c. pp. all the way to the front of the list.

I am unfamiliar with C++'s memory management, but doesn't make_shared() create a new object under control of the shared_ptr class? That would mean your as_pair() function creates a temporary object that is worked on, then discarded. And that would fit with your findings. How to get out of this mess?

I think there's an abstraction missing here. In particular, the idea of constantly checking that the car is null is wrong to me. I mean, in theory users might be adding null pointers to the list, right? You should have a list class separate from the node class. In C, without automatic memory management, I would do something like this:

Code: Select all

struct list_node {
  void *data;
  struct list_node *next;
};

struct list {
  struct list_node *first;
  struct list_node **plast;
};

void init_list(struct list *l) {
  l->first = 0;
  l->plast = &l->first;
};

int add_to_list(struct list *l, void *d)
{
  struct list_node *n = malloc(sizeof *n);
  if (!n) return ENOMEM;
  n->data = d;
  n->next = 0;
  *l->plast = n;
  l->plast = &n->next;
  return 0;
}
The idea is to have a pointer to the last node's next pointer around. You initialize it to point at the first pointer. That way, you are always at the end of the list and can add a new element in constant time.

Good luck making this work with C++'s shared_ptr. But I think you can have a pointer to shared_ptr with the same semantics as a pointer to pointer. Not sure, though. Maybe Octo or Solar will answer, they seem to know more about C++ than I.
Carpe diem!
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Inserting to end of a CONS list

Post by Schol-R-LEA »

nullplan wrote:And here I thought the idea of Lisp was to have unchanging data. I thought that was the goal.
It is a bit more complicated than that, as most Lisps aren't strict FP languages; while FP is the preferred mode of development, it isn't enforced in many Lisps (though some such as Clojure do). They generally have a few specific mutators which can alter a CONS cell (e.g. setq, replaca, and replacd in Common Lisp, set!, set-car!, and set-cdr! in Scheme, etc.) or even a whole list, but most Lispers will avoid them unless the pure-FP alternative is worse.
nullplan wrote:In particular, it is well known (to me, anyway), that adding an element to the end of a list always creates a copy of the entire list. That is so since the last element of the list must be changed, but in Lisp it cannot be changed, so instead a new last node is constructed, containing the data of the previous last node and the next pointer to the new last node. In order to use that node, the node in front of it has to be changed, but it cannot be, so it is changed. &c. pp. all the way to the front of the list.
Yeah, this is part of why I chose to call the specific C++ method add() rather than append() (though honestly, the naming in the implementation doesn't necessarily reflect the naming in the language being implemented). The method in question is mainly intended as being internal to the implementation.
nullplan wrote:I am unfamiliar with C++'s memory management, but doesn't make_shared() create a new object under control of the shared_ptr class? That would mean your as_pair() function creates a temporary object that is worked on, then discarded. And that would fit with your findings.
Ah, you are right. Usually I would return a this pointer, but since I am working with shared_ptrs for everything else, that isn't really an option.

nullplan wrote:I think there's an abstraction missing here. In particular, the idea of constantly checking that the car is null is wrong to me. I mean, in theory users might be adding null pointers to the list, right? You should have a list class separate from the node class. In C, without automatic memory management, I would do something like this:

Code: Select all

...
The idea is to have a pointer to the last node's next pointer around. You initialize it to point at the first pointer. That way, you are always at the end of the list and can add a new element in constant time.
This would be a lot more efficient, certainly. I'd have to be careful about how I keep the end of the list up to date, but it might be a good solution.
nullplan wrote:Good luck making this work with C++'s shared_ptr. But I think you can have a pointer to shared_ptr with the same semantics as a pointer to pointer. Not sure, though. Maybe Octo or Solar will answer, they seem to know more about C++ than I.
That last part may be a way to handle this, but I'll have to look into it a bit.
Last edited by Schol-R-LEA on Sun May 28, 2023 2:47 pm, edited 1 time in total.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Inserting to end of a CONS list

Post by Schol-R-LEA »

Digging a bit deeper, I found a standard abstract class enable_shared_from_this, which provides a method shared_from_this() for this precise purpose.

Or so I thought on reading about it. When I changed the class to inherit from it (as well as the existing parent MalCollection) and applied the method to return a std::shared_ptr in as_pair(), several of the tests now throw a "bad_weak_ptr" exception.

Looking into it further, I see that this is because the purpose of a shared_ptr is to indicate ownership of the pointer; there needs to be at least one handle holding a shared pointer to the object in question already before it can share the pointer further. I am not certain I fully understand the semantics of this, but I have a feeling that this is going to require a significant re-write to proceed with my current design.

EDIT: OK, giving this more thought I am puzzled about this, as the m_cdr instance variable already is a shared pointer, meaning that it shouldn't throw that exception IIUC. What am I missing?

For reference, here is one of the now-failing tests, with the line that throws the exception marked.

Code: Select all

    TEST_CASE("proper list of two integer elements")
    {
        PairPtr test = std::make_shared<MalPair>(std::make_shared<MalInteger>(42), std::make_shared<MalPair>(std::make_shared<MalInteger>(69)));
        MalPtr test_car = test->car();
        CHECK(test_car->type() == MAL_INTEGER);
        mpz_class test_num = test_car->as_integer();
        CHECK(test_num == 42);
        PairPtr test_cdr = test->cdr()->as_pair();                    //  <-- throws exception
        CHECK(test_cdr->type() == MAL_PAIR);
        CHECK(test_cdr->to_str() == "(69)");
        MalPtr test_cadr = test_cdr->car();
        CHECK(test_cadr->type() == MAL_INTEGER);
        test_num = test_cadr->as_integer();
        CHECK(test_num == 69);
        CHECK(test->to_str() == "(42 69)");
    }
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Inserting to end of a CONS list

Post by Schol-R-LEA »

Solved: apparently, std::enable_shared_from_this<> must be inherited as public. That was careless of me.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Post Reply