Select
Overview
Select provides:
- An API for taking slices (elements selected by the Cartesian
product of vectors of subscripts for each axis) of array-like
objects. The most important function is
select
. Unless you want to define additional methods forselect
, this is pretty much all you need from this library. See the API reference for additional details. - An extensible DSL for selecting a subset of valid subscripts. This is useful if, for example, you want to resolve column names in a data frame in your implementation of select.
- A set of utility functions for traversing selections in array-like objects.
It combines the functionality of dplyr’s slice, select and sample methods.
Basic Usage
The most frequently used form is:
where each selection
specifies a set of subscripts along the
corresponding axis. The selection specifications are found below.
To select a column, pass in t
for the rows selection1
, and the
columns names (for a data frame) or column number (for an array) for
selection2
. For example, to select the second column of this array
(remember Common Lisp has zero based arrays, so the second column is
at index 1.
and to select a column from the mtcars
data frame:
if you’re selecting from a data frame, you can also use the column
or columns
command:
to select an entire row, pass t
for the column selector, and the
row(s) you want for selection1
. This example selects the first row
(second row in purely array terms, which are 0 based):
Selection Specifiers
Selecting Single Values
A non-negative integer selects the corresponding index, while a negative integer selects an index counting backwards from the last index. For example:
These are called singleton slices. Each singleton slice drops the dimension: vectors become atoms, matrices become vectors, etc.
Selecting Ranges
(range start end)
selects subscripts i where start <= i < end.
When end is nil
, the last index is included (cf. subseq). Each
boundary is resolved according to the other rules, if applicable, so
you can use negative integers:
Selecting All Subscripts
t selects all subscripts:
Selecting w/ Sequences
Sequences can be used to make specific selections from the object. For example:
Masks
Bit Vectors
Bit vectors can be used to select elements of arrays and sequences as well:
Which
which
returns an index of the positions in SEQUENCE which satisfy PREDICATE.
Sampling
You may sample sequences, arrays and data frames with the sample
generic function, and extend it for your own objects. The function signature is:
By default in common lisp, key
values that are not provide are nil
, so you need to turn them on if you want them.
:skip-unselected t
means to not return the values of the object that were not part of the sample. This is turned off by default because a common use case is splitting a data set into training and test groups, and the second value is ignored by default in Common Lisp. The let-plus
package, imported by default in select
, makes it easy to destructure into test and training. This example is from the tests included with select:
Note the setting of *random-state*
. You should use this pattern of setting *random-state*
to a saved seed anytime you need reproducible results (like in a testing scenerio).
The size of the sample is determined by the value of n
, which must be between 0 and the number of rows (for an array
) or length if a sequence
. If (< n 1)
, then n
indicates a proportion of the sample, e.g. 2/3 (values of n
less than one may be rational
or float
. For example, let’s take a training sample of 2/3 of the rows in the mtcars
dataset:
You can see that mtcars
has 32 rows, and has been divided into 2/3 and 1/3 proportional samples for training / test.
You can also take samples of sequences (lists and vectors), for example using the DATA
variable defined above:
list
objects can also be sampled:
Note that n
is rounded up when the number of elements is odd and a proportional number is requested.
Extensions
The previous section describes the core functionality. The semantics can be extended. The extensions in this section are provided by the library and prove useful in practice. Their implementation provide good examples of extending the library.
including
is convenient if you want the selection to include the
end of the range:
nodrop
is useful if you do not want to drop dimensions:
All of these are trivial to implement. If there is something you are
missing, you can easily extend select
. Pull request are
welcome.
(ref)
is a version of (select)
that always returns a single
element, so it can only be used with singleton slices.
Select Semantics
Arguments of select
, except the first one, are meant to be
resolved using canonical-representation
, in the select-dev
package. If you want to extend select
, you should define methods
for canonical-representation
. See the source code for the best
examples. Below is a simple example that extends the semantics with
ordinal numbers.
Note the following:
- The value returned by
canonical-representation
needs to be constructed usingcanonical-singleton
,canonical-range
, orcanonical-sequence
. You should not use the internal representation directly as it is subject to change. - You can assume that
axis
is an integer; this is the default. An object may define a more complex mapping (such as, for example, named rows & columns), but unless a method specialized to that is found,canonical-representation
will just query its dimension (withaxis-dimension
) and try to find a method that works on integers. - You need to make sure that the subscript is valid, hence the assertion.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.