It was thus said that the Great Liam Proven once stated:
So. What is a segment, what is a page, and what is the difference? I
know that pre-386 Intel x86 chips used a 64k segment size and that
this caused problems, but little more than that.
Segments have a starting address with a length associated with each
segment, and paging is ... well ... a bit more complicated.
I'll start with the 8086.
In order to address more than 64K you can do one of a few things: you can
increase the size of the addressing registers, you can bank switch memory,
or you can do something that kind of mixes the two. The 8086 does this
mixing. Each register on the 8086 is 16 bits in size, so the most that can
be addressed is 64K. To get around this, the engineers at Intel added four
segment registers that marked the start of a 64K block of memory. Since the
address space of the 8086 was 20 bits, these segment registers were
multipled by 16 (shifted left by 4) to get the physical address of the
segment, then the 16 bit offset was added to this.
So, the next instruction was pointed to by the CS:IP registers (CS is the
Code Segment register, and IP is the instruction pointer), each being 16
bits. The math looked like:
+---- ---- ---- ----+
| CS |0000
+---- ---- ---- ----+----+
+ 0000| IP |
+---- ---- ---- ----+
+------------------------+
| 20-bit physical addr |
+------------------------+
The other three segment registers are DS (data segment), SS (stack
segment) and ES (extra segment). There are implied segments to use
depending upon the instruction (anything instruction related goes through
CS, all data references use DS *except* references via BP use SS, stack
references use SS and there are a few instructions that use ES by default.
You can however, override the segment but it's an additional byte in the
instruction stream).
On the 80286, in protected mode, the segment registers no longer point to
physical memory, but are instead indecies into a segment table that
describes the segment. The segment register is still 16 bits however. The
segment register (aka selector register) now has the format:
+------------- ---+
| index | p |
+------------- ---+
The first 13 bits (I'm going from memory here so details may be a bit
sketchy) are the index into the selector table, and the last three bits are
used for protection. Each entry in the selector table has the following
fields:
Physical address 24 bits
Length of segment 16 bits
Direction flag 1 bit
Permissions a few bits
Present flag 1 bit
The direction flag instructs the CPU that the segment either starts at the
physical address with addresses increasing, or *ends* at the physical
address with addresses decreasing (for stacks typically). The present flag
is set if the segment is in physical memory, otherwist it's not and the rest
of the selector entry is not interpreted (this means that the OS can use the
rest of the entry mark where on the swap device this segment is in this
case). If you try to reference memory outside the segment, you get a fault.
The reference (sorry for the lack of diagrams---using pseudo code here,
with the CS:IP pair):
if (!segtable[cs.index].present) segfault(cs:ip);
if (ip > segtable[cs.index].length) segfault(cs:ip);
physaddr = segtable[cs.index].physaddr + ip;
The 80386 intruduced paging (and 32 bit registers but that's not important
right now). Paging is slightly different. The program works with logical
addresses:
+--------------------------------+
| 32-bit address |
+--------------------------------+
When a reference is made to memory, the CPU treats the address (on the
80386) as a collection of three offsets:
+-------------------------------+
| offset 1 | offset 2 | offset 3|
+-------------------------------+
(I don't recall the exact length of each one, but the last offset is 12 bits
in size). Offset 1 is an entry in the top level page table, with each entry
having a structure similar to the segment table for the 80286:
Physical address 32 bits
Size 32 bits
Present flag 1 bit
Permissions a few bits
Offset 2 points to a second level table with the same structure, and
offset 3 is a pointer into the actual bit of memory. So the address
calculation looks like:
if (!page1[ip.off1].present) pagefault(page1,ip);
page2 = page1[ip.off1].physaddr;
if (!page2[ip.off2].present) pagefault(page2,ip);
physaddr = page2[ip.off2].physaddr + ip.off3;
The reason for the two levels is so you can have a spare address space and
waste less memory in keeping the page tables around (each process has its
own page table).
Now, the 80386 can do both segments *and* paging at the same time (if
that's the case, I think it does the segment calculations first, then the
paging calculations) but not many operating systems actually use both (well,
they will set up a small segment table where each segment covers the full
address space, but they don't bother to swap segments).
-spc (Hope that clears some of the issues up)