Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start on a programmer's model for capabilities #48

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 112 additions & 8 deletions src/cap-description.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,70 @@ space. There are two primary ISA variants, RV32I and RV64I, which provide
32-bit and 64-bit address spaces respectively. The term XLEN refers to the
width of an integer register in bits (either 32 or 64). The value of XLEN may
change dynamically at run-time depending on the values written to CSRs, so we
define XLENMAX to be widest XLEN that the implementation supports.


{cheri_base_ext_name} defines capabilities of size CLEN corresponding to 2 *
XLENMAX without including the tag bit. The value of CLEN is always calculated
based on XLENMAX regardless of the effective XLEN value.
define *XLENMAX* to be widest XLEN that the implementation supports.

{cheri_base_ext_name} defines capabilities which act on virtual addresses of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{cheri_base_ext_name} defines capabilities which act on virtual addresses of
{cheri_base_ext_name} defines capabilities which act on addresses of

I'd drop the virtual here since we also want to support systems without virtual memory.

size XLENMAX, which will also be called *CXLEN* to indicate its link to
capabilities. {cheri_base_ext_name} capabilities occupy space in memory and
registers equal to 2 * CXLEN bits, henceforth *CLEN*, and may be stored in
memory only at addresses aligned to CLEN bits. Provenance and integrity are
provided by means of *tagged memory*: for every aligned CLEN-bit region of a
register or memory that is permitted to hold a capability, there is a single
extra bit, called the *tag*, which is 1 if the location currently holds a
capability and 0 if it holds non-capability data.

It is expected that "normal" memory will be able to hold capabilites, however
memory that cannot hold capabilities may exist. The details of this are
determined by the environment.

A tag, together with a capability or CLEN bits of non-capability data, is
called a *capability value*. A capability value with a tag of 1 will be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about using "value" here since it's such an overloaded word. Maybe just use "capability" and "untagged/invalid capability" vs "valid capability"?

Not too important, but this would conflict with Arm's spec where they use "value" for the address bits of a capability

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe use "repreesntation" instead of "value"? Naming is difficult and I don't think we ever came up with an unambiguous terminology in the past...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've always used value to mean this, and it's the natural term to use. Arm's conflicting use is a bad idea that should be confined to the dustbin of history; it conflicts with the CHERI spec's terminology, is confusing, and makes it harder to refer to the whole capability. I am all for codifying "capability value" as being the whole capability.

considered interchangeable with the capability it holds; a capability value
with a tag of 0 is an *untagged value*. Capability values occupy CLEN+1 bits
of physical storage in memory or registers. {cheri_base_ext_name} redefines
all integer registers, and a subset of CSRs if they are implemented, to hold a
single capability value.

There are two basic types of capability. An *unsealed capability* represents
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are two basic types of capability. An *unsealed capability* represents
There are two basic types of capabilities. An *unsealed capability* represents

I think I'd use the plural here but I guess both works?

the authority to perform a set of operations on a set of bytes in memory and
can be used freely if it is accessible, although it may have other meanings. A
*sealed capability* is created from an underlying source of authority but
cannot be modified and can only be used in specific ways checked by the
execution environment.

=== Components of a Capability

Capabilities contain the software accessible fields described in this section.
Two views are provided to software of a capability value.

A capability value can be viewed physically, resulting in a 1-bit tag, a
CXLEN-bit *metadata*, and a CXLEN-bit *address*. A capability value can be
deconstructed physically, using the CGETTAG instruction to extract the tag and
CGETHIGH to extract the metadata; integer instructions automatically extract
the address from a capability value used as a source. If the tag is zero, the
metadata and address are two independent CXLEN-bit values. In a little endian
environment, a capability value in memory stores its address at the
lower-addressed location and its metadata at the higher. An untagged value can
be constructed in any context using the CSETHIGH instruction.

A capability value with a tag of 1 is a *tagged capability*, or just a
*capability*. Capabilities cannot be constructed arbitrarily. Capabilities
are partially ordered by strength, and a capability can be constructed from a
stronger capabilty using the CBUILDCAP instruction. A special capability
called Infinity is provided at reset; a capability can be constructed, by any
means, if and only if it is weaker than Infinity.

Capabilities which are undefined and nonconstructable in {cheri_base_ext_name}
may be given meaning and made constructable by later extensions.

A capability value may also be viewed in terms of the access it provides. In
this view, a capability value has a *tag*, an *address*, a *base*, a *length*,
*permissions*, and a *sealed state*; the CGETBASE, CGETLEN, and CGETPERM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*permissions*, and a *sealed state*; the CGETBASE, CGETLEN, and CGETPERM
*permissions*, and a *type*; the CGETBASE, CGETLEN, and CGETPERM

We have generally use "type" for this field. While the initial spec only provides two types (unsealed and sentry) and thus sealed state would be sufficient, I very strongly believe we need more types in the future (e.g. software defines ones or indirect sentries as defined in the Morello spec).

instructions can be used to view a capability value's base, length, and
permissions, respectively. A capability value with a tag of 0 is not a
capability and does not provide access, but the accessors are still defined and
have deterministic behavior. The base and length are collectively known as the
*bounds*; not all combinations of bounds and address are representable by a
capability value.

[#section_tag]
==== Tag
Expand All @@ -37,6 +91,56 @@ All locations in registers or memory able to hold a capability are CLEN+1 bits
wide including the tag bit. Those locations are referred as being _CLEN-bit_ or
_capability_ wide in this specification.

==== Bounds

A capability represents an interval portion of the address space. This is
expressed using a *base*, which is an unsigned integer less than 2^CXLEN^, and
a *length*, which is an unsigned integer less than or equal to 2^CXLEN^. The
sum of the base and the length is also called the *top*, and will be no more
than 2^XLEN^ for a well-formed capability. A capability with base *B* and
length *L* authorizes access to all bytes of the address space whose address
*A* satisfies *B* <= *A* < *B* + *L*, as further defined by the sealing and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it doesn't really make any difference to the semantics, over the past few years I've looked more at the top and base rather than the length. Historically, earlier prototypes of CHERI were defined mostly in terms of length, but since the compressed bounds format was introduced length is simply the difference between top and base.

How about rephrasing this to use base and top as the two components and length as the derived quantity?

architectural permission fields. In other words, a capability represents a
half-open interval.

Capability operations always treat addresses as unsigned. A capability's valid
region can span the gap between 2^CXLEN-1^-1 and 2^CXLEN-1^, but cannot wrap
around from 2^CXLEN^-1 to 0.

Subset and equality operations treat capability bounds as intervals, not as
sets. Two capabilities with length 0 and different bases are not equal and not
Comment on lines +110 to +111
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Subset and equality operations treat capability bounds as intervals, not as
sets. Two capabilities with length 0 and different bases are not equal and not
Subset and equality operations treat capability bounds as intervals. Two capabilities with length 0 and different bases are not equal and not

I'm not sure if the "set" terminology could be confusing for readers with less of a mathematical background. Maybe just clarifying that it's an interval is sufficient?

subsets of each other. A capability of length 0 can be constructed only inside
an existing capability for an interval which surrounds the base.
Comment on lines +112 to +113
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can construct a zero-length capability for any address between base and top.
First increment to the desired in-bounds (or == top) address, and then CSETBOUNDS with length zero.

But maybe I am just misparsing "surrounds the base"


The bounds and address of a capability are subject to alignment and
Copy link
Collaborator

@arichardson arichardson Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is very important to note representability, but I think the more detailed part with the table should be part of the bounds format section instead?

representability constraints. Informally, a large capability must have a
highly aligned base and length, and a capability's address cannot wander too
far outside its bounds. The precise conditions depend on the length. For each
length, there is a bounds alignment and a representability block (R); these
obey a simple formula except for the smallest.

.Bounds alignment and representability block by XLEN and length
[#bounds_align,options=header,align="center",width="55%"]
|==============================================================================
^| CXLEN ^| Length ^| Bounds alignment (Q) ^| Representability block (R)

^| 32 ^| 0 to 511 ^| 1 ^| 128
^| 32 ^| 512 to 1023 ^| 8 ^| 256
^| 32 ^| 1024 to 2047 ^| 16 ^| 512
^| 32 ^| 2^N^ to 2^N+1^-1 ^| 2^N-6^ ^| 2^N-1^
^| 64 ^| 0 to 4095 ^| 1 ^| 2048
^| 64 ^| 4096 to 8191 ^| 8 ^| 2048
^| 64 ^| 8192 to 16383 ^| 16 ^| 4096
^| 64 ^| 2^N^ to 2^N+1^-1 ^| 2^N-9^ ^| 2^N-1^
|==============================================================================

A capability can have a base of B and a length of L only if B and L are
divisible by the corresponding Q. A capability with a base of B and a length
of L can have an address of A only if (A/R - B/R) mod (2^CXLEN^/R) is in the
range -1 to 6. The set of addresses a capability can have is called the
capability's *representable region*; it is not necessarily a numeric interval
and can wrap around address 0.

[#section_cap_perms]
==== Architectural Permissions (AP)

Expand Down Expand Up @@ -192,7 +296,7 @@ also seals the return address capability (if any) since it is the entry point
to the caller function.

[#section_cap_bounds]
==== Bounds
==== Bounds (details of binary encoding)

ifdef::cheri_v9_annotations[]
NOTE: *CHERI v9 Note:* The bounds mantissa width is different in XLENMAX=32.
Expand Down
Loading