On Feb 1, 2022, at 6:00 PM, Warner Losh via cctalk
<cctalk at classiccmp.org> wrote:
On Tue, Feb 1, 2022 at 12:42 PM Grant Taylor via cctalk <
cctalk at classiccmp.org> wrote:
On 2/1/22 2:14 AM, Joshua Rice via cctalk wrote:
There's several advantages to doing it that
way, including balancing
wear on a disk (especially today, with SSDs), as a dedicated swap
partition could put undue wear on certain areas of disk.
I thought avoiding this very problem was the purpose of the wear
leveling functions in SSD controllers.
All modern SSD's firmware that I'm aware of decouple the physical location
from the LBA. They implement some variation of 'append store log' that
abstracts out the LBAs from the chips the data is stored in. One big reason
for this is so that one worn out 'erase block' doesn't cause a hole in the
LBA
range the drive can store data on. You expect to retire hundreds or
thousands of erase blocks in today's NAND over the life of the drive, and
coupling LBAs to a physical location makes that impossible.
Another reason is that the flash memory write block size is larger than the sector size
exposed to the host, and the erase block size is much larger than the write block size.
So the firmware has to keep track of retired data, move stuff around to collect an erase
block worth of that, then erase it to make it available again to receive incoming writes.
The spare capacity of an SSD can be pretty substantial. I remember one some years ago
that had a bug which, in a subtle way, exposed the internal structure of the device. It
turned out the exposed capacity was 49/64th of the physical flash space. Strange
fraction, I don't think we were ever told why, but the supplier did confirm we
analyzed it correctly.
paul