Mostly, it was seen in mainframe and minicomputer architectures from the 1960s and 1970s, though obviously Intel used it in the x86 line as well as in the
iAPX 432 microprocessors. As you mention, some (but not all,
according to Wicked-Pedo) Z8000 models used 7-bit segment registers to increase the memory addressing to a maximum of 8MiB, as well.
Again, going by
Wikipedia, it was used in:
- some Burroughs mainframes, specifically the B5000 and B6500, the former apparently being the first commercial system to use segmentation. Notably, both of these systems were designed with high-level languages in mind, specifically Algol. The idea was that the programmers would never access the underlying hardware directly, and never use an assembly language; all programming, including systems programming, was meant to be done in Algol-60, though compilers for other languages did exist IIUC.
- The General Electric GE-645 mainframe, which was the primary target and reference implementation for Multics (and wow, apparently Multics was open-sourced in 2007 and emulated versions for various architectures is still being maintained... weird).
- The IBM System/38, and its successors the AS/400, the iSeries, and System i. As with Multics, a lot of the software written these is still supported under emulation.
- Prime Computer Systems Prime 400, though I gather that their other models didn't use segmentation.
That page claims that
Stratus Technologies and
Apollo Computer had a segmented memory systems as well, but that seems incorrect.
The page for Stratus seems to indicate that they only made systems based on existing hardware; they didn't make their own CPU designs at all AFAICT. Of the CPUs they did use, only the Intel Xeon was a segmented architecture.
On the page for Apollo's systems, the only original ISA they developed was the PRISM architecture, a classic RISC design with paging but not segmentation, and their earlier
workstations all used either Motorola 680x0 CPUs, or a proprietary bit-sliced implementation of the 68000 instruction set called the '2900'.
However, Hewlett-Packard
did have a 32-bit segmented design, the
FOCUS, though aside from being the first microprocessor with a full 32-bit address space to market (which I suppose were fixed 16-bit segments + 16-bit offsets with no segment overlap, but I can't seem to find out), it doesn't seem to have really influenced anything and vanished pretty quickly. Interestingly, like the Burroughs systems, it was a pure stack machine with no programmer-accessible registers, though it didn't have the restrictions on assembly programming that the B5000 did. It also had a massive 220 instructions, which for the time was huge - comparable to the
DEC VAX-11 (243 instructions for the smallest model, IIUC) and the 432 (
this document says ~225 instructions, while
this paper says 230). It seems to have less of a
troubled history than the 432, but nonetheless was a later CISC design that got swept away by the RISC revolution.
Interestingly, I am not seeing any segmented designs originating outside of the US at all (not counting derivatives such as the
NEC V20, or Soviet-era
K1810VM86 - in fact, all the segmented systems designed outside of the US appear to be x86 clones), and the last new segmented architectures I can find any record of are the Intel APX-432, the HP FOCUS, and those Zilog Z8000 variants. This seems to fit with the fact that segmentation, while used for memory protection in some cases, was primarily a means of saving addressing lines and reducing code space - you could have a total addressable space of, say, 24, 32, or (for some older mainframes) 36 bits, but only need 8, 16, or 18 bits for the majority of address arguments, and it also allowed them to use several clever tricks to reduce the total number of hardware address lines.
Which would also explain why they fell out of favor - not so much because of the problems of writing software with it in mind, but because the price of adding more address lines got to be cheaper than the segmentation support needed to avoid them, and the price of memory dropped enough that saving two bytes per address for most memory accesses just didn't seem to be worth bothering with, especially when there were more significant factors affecting memory use and performance by then (especially the fact that memory access speeds were rising much slower than CPU clock speeds, leading to a demand for chip real estate for caches and pipelining).
So, like RISC, segmented memory was primarily a pragmatic solution to the limits of the then-current technology.
(So was CISC, for that matter, which was mainly about providing a rich assembly programming environment at a time when compiler technology was still quite primitive - though of course it wasn't called CISC at the time, it was just called 'making a bigger and better instruction set', because that was what they assumed led to performance and/or less expensive software.)
But unlike RISC, which was originally advocated for its performance (and being easier to write compilers for - which IMAO is still a good enough reason to prefer it by itself), segmentation's advantages as a design principle aren't really enough to carry it past the disappearance of the specific set of problems it was created to solve.
Designs like ARM continue mainly because they are cheaper to produce in volume, make better use of chip real estate (meaning they are better suited for SoC designs), and lend themselves to energy-efficient implementations when running at low-to-medium clock rates. Segmentation, however, doesn't seem to have enough of an edge over paging in terms of memory protection to justify its complexities in terms of programming (though frankly, that's heavily overstated; there's no real reason for application programmers to even be concerned with it, and Linux aside, it is entirely possible to implement modern OSes with segmentation). They are still a lot of pain for no real gain, so most hardware manufacturers and OS devs don't bother.