NVMe Namespaces
What’s in a Name (Space)
One of the most interesting features in NVMe is namespaces. There isn’t all that much written about them, but they solve many problems that previously existed with traditional disks. A namespace is honestly, a really simple concept. Think of it as a partition on the solid state drive, but you let the NVMe controller determine where to store the data. That’s it.
Why is this such an important concept? It’s history.
Traditional Hard Disks Spin
There are two main portions to a disk - the physical media and the interface to the system. In a traditional hard disk, the physical media spins around. It has fairly bad seek times, and as such file systems try to sequentially lay out the data. When that data is laid out in a sequence, it is faster to access.
Twenty years ago, all disks were spinning. Floppy drives, hard disks, and CD-RW’s all spun around and the data really needed to be sequential to see speed. As such, the interface to the computer system was aligned around these base concepts.
SATA and SAS present a lot of the physical characteristics of the media up to the user. Just the act of creating a partition requires you to specify the starting and ending blocks. When you create multiple partitions, they have to be sequentially laid out. You can not have one partition that spans two distinct portions of the disk.
If you end up creating lots of partitions over the life of the disk, you can end up with partition fragmentation. If you’re like me, you may create a partition for VMs, scratch spaces, etc… I spin them up often, and delete them often. That sequential layout of the partitions really a legacy effect of spinning disks.
Technologies like LVM have had to spring up to handle easier segmentation of disks. But really LVM is a patch on top of a flawed system. I’ve seen users utilizing LVM on top of their SATA hard disks. It’s not a bad system, but it’s not really the most elegant (though they swear it is).
There is a better way - Namespaces
LVM solved creating arbitrary volumes on top of a set of block devices. It doesn’t force the user to care about how to sequentially lay out the data. If you’re creating multiple volumes on top of a SATA or SAS disk, this really is the best solution. But if you’re creating volumes on top of a NVMe disk, there’s a better way.
The NVMe spec introduces the concept of namespaces. A namespace is simply a block device that the controller presents to the system. Basically, the controller determines how to lay out the data on the drive, not you. So you never have to worry about the physical layout of the data - just like LVM. With solid state drives, you don’t really need to lay the data out sequentially - the drive can handle that complexity.
All NVMe disks support at least a single namespace. On a linux box, it looks like the following:
[root@smc-server thorst]# ls /dev/nvme*
/dev/nvme0 /dev/nvme0n1 /dev/nvme1 /dev/nvme1n1 /dev/nvme1n1p1 /dev/nvme1n1p2 /dev/nvme1n1p3
In the above example, there are two NVMe drives - nvme0
and nvme1
. They both
have a single namespace on them. The server above uses nvme1
as it’s boot
drive and has placed three partitions on it. The partitions are nvme1n1p1
,
nvme1n1p2
, and nvme1n1p3
. Long, but purposeful names.
[root@smc-server thorst]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 1.8T 0 disk
nvme1n1 259:1 0 465.8G 0 disk
├─nvme1n1p1 259:2 0 600M 0 part /boot/efi
├─nvme1n1p2 259:3 0 1G 0 part /boot
└─nvme1n1p3 259:4 0 464.2G 0 part
├─cl-root 253:0 0 50G 0 lvm /
├─cl-swap 253:1 0 12G 0 lvm [SWAP]
└─cl-home 253:2 0 402.2G 0 lvm /home
While many drives only support a single namespace, the NVMe specification inheritely supports multiple namespaces. This allows you to arbitrarily configure the block devices natively on the NVMe disk. If you’re on linux, you can tell if your disk supports multiple namespaces with the following command:
[root@smc-server thorst]# nvme id-ctrl /dev/nvme0 | grep nn
nn : 32
[root@smc-server thorst]# nvme id-ctrl /dev/nvme1 | grep nn
nn : 1
The nn
attribute indicates the maximum number of namespaces your disk
supports. The device nvme0
is a U.2 drive that supports 32 namespaces and
nvme1
is my M.2 boot device that only supports a single namespace.
If you are going to change namespaces, I recommend changing namespaces on data disks. You can’t shrink or grow a namespace. If your boot drive is installed on the NVMe disk, you could very well wipe your boot device with an improper namespace command.
Which Drives Support Multiple Namespaces
At this point in time, very few NVMe drives support multiple namespaces. However the list of drives that support it is in fact growing. I’ll include a list of some of the drives I’ve seen to date, but you’ll see from the list that it’s focused mostly on the enterprise space. This is because multiple namespaces are generally beneficial in multi tenant environments, and your laptop really isn’t multi-tenant.
Drive | Form Factor | Number of Namespaces |
---|---|---|
Ultrastar SN840 | U.2 | 128 |
Ultrastar SN640 | U.2 | 128 |
Intel P4510 / P4610 | U.2 | 128 |
Samsung PM1733/PM1735 | U.2 / AIC | 64 |
Micron 9300 | U.2 | 32 |
Kioxia CM6 | U.2 | 64 |
Kioxia CD6 | U.2 | 16 |
Most enterprise drives now support multiple namespaces. The last generation of enterprise drives were generally single namespace, so seeing this support across multiple drives in this generation is encouraging. One can hold out hope that eventually this will trickle into consumer drives, but that may be a fools hope.
Managing Namespaces
The nvme
command if your best friend when it comes to these disks. It’s a
robust tool, though it is somewhat lacking in explanation of capabilities.
Let’s first detect what’s in your namespace:
[root@smc-server thorst]# nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S4YNNE0N801309 SAMSUNG MZWLJ1T9HBJR-00007 1 1.92 TB / 1.92 TB 512 B + 0 B EPK98B5Q
/dev/nvme1n1 S5H7NS1NA02815E Samsung SSD 970 EVO 500GB 1 2.68 GB / 500.11 GB 512 B + 0 B 2B2QEXE7
This a single nvme device, containing a single namespace. We can tell how much
free space there is by looking at the tnvmcap
(total NVM capacity) and
unvmcap
(unallocated NVM capacity) attributes.
[root@smc-server thorst]# nvme id-ctrl /dev/nvme0 | grep mcap
tnvmcap : 1920383410176
unvmcap : 0
And we can see here that the single namespace takes up all the space of the drive, not a single byte left. If you want to create new namespaces, you first have to clear up the old ones. The following commands should do the trick.
[root@smc-server thorst]# nvme id-ctrl /dev/nvme0 | grep cntlid
cntlid : 0x41
[root@smc-server thorst]# nvme detach-ns /dev/nvme0 -n 1 -c 0x41
detach-ns: Success, nsid:1
[root@smc-server thorst]# nvme ns-rescan /dev/nvme0
[root@smc-server thorst]# nvme delete-ns /dev/nvme0 -n 1
delete-ns: Success, deleted nsid:1
[root@smc-server thorst]# nvme ns-rescan /dev/nvme0
[root@smc-server thorst]#
Let’s break down what these commands do one by one. The first command obtains the controller id. This is needed by some subsequent commands and is useful to retain.
The detach-ns
command detaches the namespace from the system. Depending on the
controller, I’ve seen really odd behavior if you don’t do a ns-rscan
afterwards. That simply rescans a given controller. I do it often, because as
you play you may see that NVMe can often hang. I make sure that whenever I do an
attach-ns, detach-ns, create-ns or delete-ns I always run a ns-rescan. I
shouldn’t have to, but depending on the controller or version of nvme-cli you
have - you might run into a whole system hang if you don’t.
Following up, the delete-ns
command actually deletes the namespace. After this
point, the data is gone - lost forever. You won’t be recovering it. Be very
careful and make sure you want to do this.
I only do this on data drives for instance. Boot devices, I tend to steer clear of messing with the namespaces. But when it comes to isolation within a data disk, I enjoy having a separate namespace.
Once this is completed, you will see that you have extra free space.
[root@smc-server thorst]# nvme id-ctrl /dev/nvme0 | grep mcap
tnvmcap : 1920383410176
unvmcap : 1920383410176
At this point, what’re you going to do with it? Let’s say you want to create a 100 Gig namespace. You would use the following commands to do so.
[root@smc-server thorst]# nvme create-ns /dev/nvme0 -s 26214387 -c 26214387 -b 4096
create-ns: Success, created nsid:1
[root@smc-server thorst]# nvme ns-rescan /dev/nvme0
[root@smc-server thorst]# nvme attach-ns /dev/nvme0 -n 1 -c 0x41
attach-ns: Success, nsid:1
[root@smc-server thorst]# nvme ns-rescan /dev/nvme0
[root@smc-server thorst]# lsblk -b
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 107374129152 0 disk
nvme1n1 259:1 0 500107862016 0 disk
├─nvme1n1p1 259:2 0 629145600 0 part /boot/efi
├─nvme1n1p2 259:3 0 1073741824 0 part /boot
└─nvme1n1p3 259:4 0 498403901440 0 part
├─cl-root 253:0 0 53687091200 0 lvm /
├─cl-swap 253:1 0 12826181632 0 lvm [SWAP]
└─cl-home 253:2 0 431887482880 0 lvm /home
The most interesting command here is the create-ns
, as that creates the
namespace. The -b 512
indicates that the block size for the namespace should
be 4096. The two most common values are 512 and 4096, but some new drives
support additional byte sizes that are common in storage controllers. The
-s 26214387 -c 26214387
specifies the size and capacity of the namespace. You
multiple the block size by the size of the namespace, and you get just over 100
Gigabytes.
Namespaces for Overprovisioning
You might notice that some drives come in multiple variants. Lets take the following examples:
Drive | Size | DWPD |
---|---|---|
Intel P4610 | 1.6 TB | ~3 |
Intel P4510 | 2.0 TB | ~1 |
Samsung PM1735 | 1.6 TB | 3 |
Samsung PM1733 | 1.92 TB | 1 |
Ultrastar DC SN840 Write Optimized | 1.6 TB | 3 |
Ultrastar DC SN840 Read Optimized | 1.92 TB | 1 |
Often times these SSDs have the same exact controller and different firmware. That allows for them to specify a different amount of overprovisioning space. The more over provisioning space, the higher the endurance and even an increase in performance.
Newer drives are managing the over provisioning space via namespaces. This gives
you more control over your own drive and it’s performance. Essentially, you
could take a 1.92 TB drive, delete the namespace, create a new 1.6 TB namespace
and likely get a drive that has 3 DWPD. Vendors don’t often share this data
directly with the customer that this is what they’re doing. However you can
generally tell if this is the case by looking at attributes in the id-ctrl
and
by running fio
performance commands against the drive. The data generally
tells the story.
Why Bother with Namespaces?
At the end of the day, why bother with namespaces? For client devices like laptops or desktops, this is all likely over kill. However for a multi-tenant environment where you’re managing different domains of data this control is valuable. The controller manages how to lay out the data, and it simplifies the management for the administrator greatly.