MSI-HOWTO.txt
[Posted August 12, 2003 by corbet]
The MSI Driver Guide HOWTO
Tom L Nguyen [email protected]
07/15/2003
1. About this guide
This guide describes the basics of Message Signaled Interrupts(MSI), the
advantages of using MSI over traditional interrupt mechanisms, and how
to enable your driver to use MSI or MSI-X. Also included is a
description of debugging features available and Frequently Asked
Questions.
2. Copyright 2003 Intel Corporation
3. What is MSI/MSI-X?
Message Signaled Interrupt (MSI), as described in the PCI Local Bus
Specification Revision 2.3 or latest, is an optional feature, and a required
feature for PCI Express devices. MSI enables a device function to request
service by sending an Inbound Memory Write on its PCI bus to the FSB as a
Message Signal Interrupt transaction. Because MSI is generated in the
form of a Memory Write, all transaction conditions, such as a Retry,
Master-Abort, Target-Abort or normal completion, are supported.
A PCI device that supports MSI must also support pin IRQ assertion interrupt
mechanism to provide backward compatibility for systems that do not
support MSI. In Systems, which support MSI, the bus driver is
responsible for initializing the message address and message data of
the device function's MSI/MSI-X capability structure during device
initial configuration.
An MSI capable device function indicates MSI support by implementing
the MSI/MSI-X capability structure in its PCI capability list. The
device function may implement both the MSI capability structure and the
MSI-X capability structure; however, the bus driver should not enable
both, but instead enable only the MSI-X capability structure.
The MSI capability structure contains Message Control register,
Message Address register and Message Data register. These registers
provide the bus driver control over MSI. The Message Control register
indicates the MSI capability supported by the device. The Message
Address register specifies the target address and the Message Data
register specifies the characteristics of the message. To request
service, the device function writes the content of the Message Data
register to the target address. The device and its software driver
are prohibited from writing to these registers.
The MSI-X capability structure is an optional extension to MSI. It uses
an independent and separate capability structure. There are some key
advantages to implementing the MSI-X capability structure over the MSI
capability structure as described below.
- Support a larger maximum number of vectors per function.
- Provide the ability for system software to configure each
vector with an independent message address and message data,
specified by a table that resides in Memory Space.
- MSI and MSI-X both support per-vector masking. Per-vector
masking is an optional extension of MSI but a required
feature for MSI-X. Per-vector masking provides the kernel
the ability to mask/unmask MSI when servicing its software
interrupt service routing handler. If per-vector masking is
not supported, then the device driver should provide the
hardware/software synchronization to ensure that the device generates
MSI when the driver wants it to do so.
4. Why use MSI?
As a benefit the simplification of board design, MSI allows board designers to
remove out of band interrupt routing. MSI is another step towards a legacy-free
environment.
Due to increasing pressure on chipset and processor packages to reduce pin
count, the need for interrupt pins is expected to diminish over time. Devices,
due to pin constraints, may implement messages to increase performance.
PCI Express endpoints uses INTx emulation (in-band messages) instead of IRQ pin
assertion. Using INTx emulation requires interrupt sharing among devices
connected to the same node (PCI bridge) while MSI is unique (non-shared) and
does not require BIOS configuration support. As a result, the PCI Express
technology requires MSI support for better interrupt performance.
Using MSI enables the device functions to support two or more vectors,
which can be configure to target different CPU's to increase scalability.
5. Configuring a driver to use MSI/MSI-X
By default, the kernel will not enable MSI/MSI-X on all
devices that support this capability because some devices. A kernel
configuration option must be selected to enable MSI/MSI-X support.
5.1 Including MSI support into the kernel
To include MSI support into the kernel requires users to rebuild
the kernel with both the configuration parameters CONFIG_PCI_USE_VECTOR and
CONFIG_PCI_MSI set. CONFIG_PCI_USE_VECTOR enables the kernel to
replace the IRQ-based scheme with VECTOR-based scheme because MSI
requires a unique vector and no BIOS interrupt-routing
table. CONFIG_PCI_MSI enables MSI support in the kernel.
During PCI device enumeration, the bus driver initializes the devices
MSI/MSI-X capability structure with ONE vector, regardless of whether
the device function is capable of supporting multiple vectors.
ONE vector is initially allocated to the device function and the vector is
stored in the irq field of the device (pci_dev) structure. This default
initialization allows legacy drivers to work without specific modification to
support MSI.
5.2 Configuring for MSI support
Due to the non-contiguous fashion in vector assignment of the
existing Linux kernel, this patch does not support multiple
messages regardless of the device function is capable of
supporting more than one vector. The bus driver initializes only
entry 0 of this capability. Existing software drivers of this
device function will work without changes if no
hardware/software synchronization is required. Otherwise, the
device driver should be updated to provide the hardware/software
synchronization due to multiple messages generated from the same
vector might be lost. In other words, once the device function
signals Vector A, it cannot signal Vector A again until it is
explicitly enabled to do so by its device driver. It is
recommended that IHVs should validate their hardware devices
against their existing device drivers once the patch is
installed. Please refer section 5.4 Debugging MSI.
5.3 Configuring for MSI-X support
Both the MSI capability structure and the MSI-X capability
structure share the same above semantics; however, due to the
ability of the system software to configure each vector of the
MSI-X capability structure with an independent message address
and message data, the non-contiguous fashion in vector assignment
of the existing Linux kernel has no impact on supporting multiple
messages on an MSI-X capable device functions. By default, as
mentioned above, ONE vector should be always allocated to the
MSI-X capability structure at entry 0. The bus driver does not
initialize other entries of the table during device enumeration.
Note that the PCI subsystem should have full control of a MSI-X table that
resides in Memory Space. The software device driver should not access this
table.
To request for additional vectors, the device software driver
should call function msix_alloc_vectors(). It is recommended that
the software driver should call this function once during the
initialization phase of the device driver. With this semantics,
the existing software device driver may work with one vector if
no hardware/software synchronization is required. It is
recommended that IHVs should validate their hardware devices
against their existing device drivers once the patch is
installed. Please refer section 5.4 Debugging MSI.
The function msix_alloc_vectors(), once invoked, enables either
all or nothing, depending on the current availability of vector
resources. If no vector resources are available, the device
function still works with ONE vector. If the vector resources are
available for the number of vectors requested by the driver, this
function will reconfigure the MSI-X capability structure of the
device with additional messages, starting from entry 1. To
emphasize this reason, for example, the device may be capable for
supporting the maximum of 32 vectors while its software driver
usually may request 4 vectors.
For each vector, after this successful call, the device driver is
responsible to call other functions like request_irq(),
enable_irq(), etc. to enable this vector with its corresponding
interrupt service handler. It is the device driver's choice to
have all vectors shared the same interrupt service handler or
each vector with a unique interrupt service handler.
In addition to the function msix_alloc_vectors(), another
function msix_free_vectors() is provided to allow the software
driver to release a number of vectors back to the vector
resources. Once invoked, the PCI subsystem disables (masks) each
vector released. These vectors are no longer valid for the
hardware device and its software driver to use.
int msix_alloc_vectors(struct pci_dev *dev, int *vector, int nvec)
This API enables the software driver to request the PCI
subsystem for additional messages. Depending on the number of
vectors available, the PCI subsystem enables either all or
nothing.
Argument dev points to the device (pci_dev) structure.
Argument vector is a pointer of integer type. The number of
elements is indicated in argument nvec.
Argument nvec is an integer indicating the number of messages
requested.
A return of zero indicates that the number of allocated vector is
successfully allocated. Otherwise, indicate resources not
available.
int msix_free_vectors(struct pci_dev* dev, int *vector, int nvec)
This API enables the software driver to inform the PCI subsystem
that it is willing to release a number of vectors back to the MSI
resource pool. Once invoked, the PCI subsystem disables each
MSI-X entry associated with each vector stored in the argument 2.
These vectors are no longer valid for the hardware device and its
software driver to use.
Argument dev points to the device (pci_dev) structure.
Argument vector is a pointer of integer type. The number of
elements is indicated in argument nvec.
Argument nvec is an integer indicating the number of messages
released.
A return of zero indicates that the number of allocated vectors
is successfully released. Otherwise, indicates a failure.
5.4 Debugging MSI
There are some devices that may have some bugs in MSI. These devices may break
once MSI support is invoked in the kernel. To debug these devices, the patch
provides two configuration parameters, CONFIG_PCI_MSI and
CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES. Both of them are not set by default. When
users set CONFIG_PCI_MSI, CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is also set by
default. After users rebuild the kernel with this combination, the
kernel enables MSI on specific devices listed in the boot parameter
"device_msi=". Users must explicitly use this boot parameter to
provide a list of specific devices they would like to have MSI
support. To emphasize this reason, users can debug on individual MSI
capable device with its existing software driver until all are fully
validated since it may be difficult to debug all the same time. The format
of "device_msi=" is similar to the format of "device_nomsi=" and will be
described in later paragraph. Note that this boot parameter is
required only if the configuration parameter CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES
is set. Otherwise, it will be ignored.
Once users completed validating these devices, they can clear the
configuration parameter CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to indicate
the kernel that MSI should be enabled on all MSI capable devices. The
boot parameter "device_msi=" is no longer required.
The patch also provides second debug option, which requires users set the
configuration parameter CONFIG_PCI_MSI and clear configuration parameter
CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES. After users rebuild the kernel with this
combination, the kernel enables MSI on all MSI capable devices by default. The
boot parameter "device_msi=" will be ignored. To disable MSI on
specific MSI capable devices, which may show some signs of
unpredictable behaviors, users must explicitly use the boot parameter
"device_nomsi=", which contains a list of specific devices users do
not want MSI enabled. These devices are default to IRQ pin assertion.
The format of this is "device_nomsi=DWORD1,DWORD2,...". Each
DWORD in a list specifies a device function in terms of device ID
(higher word) and vendor ID (lower word). DWORD should be in hex
format with a prefix 0x.
For example, "device_nomsi=0x80119005,0x10108086" indicates that
the bus driver should not enable MSI(X) on two device functions
(Device ID = 0x8011 & Vendor ID = 0x9005, and Device ID = 0x1010
and Vendor ID = 0x8086).
In addition to the boot parameter "device_nomsi=", another boot
parameter "pci_nomsi" can be used to prohibit the bus driver from
enabling MSI(X) on all MSI capable devices.
At the driver level, the software device driver can tell whether
MSI/MSI-X is enabled by reading the MSI enable bit of the
MSI/MSI-X capability structure's message control register. If
this bit is zero, the device function is default to IRQ pin
assertion. If this bit is set, the device function is using MSI
as interrupt generated mechanism.
At the user level, users can use command 'cat /proc/interrupts'
to display the vector allocated for the device and its interrupt
mode, as shown below.
CPU0 CPU1 CPU2 CPU3
0: 14175 0 17408 0 IO-APIC-edge timer
1: 123 310 0 37 IO-APIC-edge keyboard
2: 0 0 0 0 XT-PIC cascade
8: 1 0 0 0 IO-APIC-edge rtc
12: 41 0 0 813 IO-APIC-edge PS/2 Mouse
14: 2744 7017 0 0 IO-APIC-edge ide0
15: 1515 1 0 418 IO-APIC-edge ide1
169: 0 0 0 0 IO-APIC-level usb-uhci
185: 0 0 0 0 IO-APIC-level usb-uhci
193: 30 0 0 0 PCI MSI aic79xx
201: 30 0 0 0 PCI MSI aic79xx
209: 467 0 0 0 IO-APIC-level eth1
225: 15 0 0 0 IO-APIC-level aic7xxx
233: 15 0 0 0 IO-APIC-level aic7xxx
NMI: 0 0 0 0
LOC: 31446 31448 31448 31448
ERR: 0
MIS: 0
6. FAQ
Q1. Are there any limitations on using the MSI?
A1. If the PCI device supports MSI and conforms to the
specification and the platform supports the APIC local bus,
then using MSI should work.
Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
AMD processors)? In P3 IPI's are transmitted on the APIC local
bus and in P4 and Xeon they are transmitted on the system
bus. Are there any implications with this?
A2. MSI support enables a PCI device sending an inbound
memory write (0xfeexxxxx as target address) on its PCI bus
directly to the FSB. Since the message address has a
redirection hint bit cleared, it should work.
Q3. The target address 0xfeexxxxx will be translated by the
Host Bridge into an interrupt message. Are there any
limitations on the chipsets such as Intel 8xx, Intel e7xxx,
or VIA?
A3. If these chipsets support an inbound memory write with
target address set as 0xfeexxxxx, as conformed to PCI specification 2.3 or
latest, then it should work.
Q4. From the driver point of view, if the MSI is lost because
of the errors occur during inbound memory write, then it may
wait for ever. Is there a mechanism for it to recover?
A4. Since the target of the transaction is an inbound memory
write, all transaction termination conditions (Retry,
Master-Abort, Target-Abort, or normal completion) are
supported. A device sending an MSI must abide by all the PCI
rules and conditions regarding that inbound memory write. So,
if a retry is signaled it must retry, etc... We believe that
the recommendation for Abort is also a retry (refer to PCI
specification 2.3 or latest).