Basic ‘cpp11eigen’ usage

Motivations

The development of cpp11eigen emerges from the desire to follow a simplified approach towards R and C++ integration by building on top of cpp11, a ground up rewrite of C++ bindings to R with different design trade-offs and features. cpp11eigen aims at providing an additional layer to put the end-user focus on the computation instead of configuration (Vaughan, Hester, and François 2023).

Eigen is a linear algebra library for the C++ language, aiming towards a good balance between speed and ease of use. It is justified in the fact that C++, in its current form, is very valuable to address bottlenecks that we find with interpreted languages such as R and Python but it does not provide data structures nor functions for linear algebra (Sanderson and Curtin 2016).

RcppEigen was first published to CRAN in 2011, and it allows to use Eigen via Rcpp, a widely extended R package to call C++ functions from R (Eddelbuettel and Sanderson 2014).

Design choices

The design choices in cpp11eigen are:

  • Providing a simpler implementation that makes the library easier to understand, maintain, and extend, benefiting both current users and future contributors.
  • Offering a completely header-only approach, eliminating Application Binary Interface compatibility issues and simplifying library integration and distribution.
  • Facilitating vendoring, which allows for the inclusion of the library directly in projects, thus simplifying dependency management and distribution.

These ideas reflect a comprehensive effort to provide an efficient interface for integrating C++ and R that aligns with the Tidy philosophy (Wickham et al. 2019), addressing both the technical and community-driven aspects that influence software evolution.

These choices have advantages and disadvantages. A disadvantage is that cpp11eigen will not convert data types automatically, the user must be explicit about data types, especially when passing data from R to C++ and then exporting the final computation back to R. An advantage is that cpp11eigen codes, including its internal templates, can be adapted to work with Python via pybind11 (Jakob, Rhinelander, and Moldovan 2016).

cpp11eigen uses Hansen (2022) notation, meaning that matrices are column-major and vectors are expressed as column vectors (i.e., N × 1 matrices).

Examples

Convention: input R matrices are denoted by x, y, z, and output or intermediate C++ matrices are denoted by X, Y, Z. The example functions can be called from R scripts and should have proper headers as in the following code:

#include <cpp11.hpp>
#include <cpp11eigen.hpp>

using namespace Eigen;
using namespace cpp11;

[[cpp11::register]] // allows using the function in R
doubles_matrix<> solve_mat(doubles_matrix<> x) {
  MatrixXd Y = as_Matrix(x); // convert from R to C++
  MatrixXd Yinv = Y.inverse(); // Y^(-1)
  return as_doubles_matrix(Yinv); // convert from C++ to R
}

This example includes the Eigen, cpp11 and cpp11eigen libraries, and allows interfacing C++ with R (i.e., the #include <cpp11.hpp>). It also loads the corresponding namespaces (i.e., the using namespace cpp11) in order to simplify the notation (i.e., using MatrixXd instead of Eigen::MatrixXd).

The as_Matrix() function is provided by cpp11eigen to pass a matrix object from R to C++ and that Eigen can read.

The as_doubles_matrix() function is also provided by cpp11eigen to pass a MatrixXd object from C++ to R.

Ordinary Least Squares

Given a design matrix X and and outcome vector y, one function to obtain the OLS estimator β̂ = (XtX)−1(XtY) as a matrix (i.e., column vector) is:

MatrixXd ols_(const doubles_matrix<>& y, const doubles_matrix<>& x) {
  MatrixXd Y = as_Matrix(y);  // Col<double> Y = as_Col(y); also works
  MatrixXd X = as_Matrix(x);

  MatrixXd XtX = X.transpose() * X;        // X'X
  MatrixXd XtX_inv = XtX.inverse();        // (X'X)^(-1)
  MatrixXd beta = XtX_inv * X.transpose() * Y;  // (X'X)^(-1)(X'Y)

  return beta;
}

[[cpp11::register]] doubles_matrix<> ols_mat(const doubles_matrix<>& y,
                                             const doubles_matrix<>& x) {
  MatrixXd beta = ols_(y, x);
  return as_doubles_matrix(beta);
}

The ols_mat() function receives inputs from R and calls ols_() to do the computation on C++ side. The use of const and & are specific to the C++ language and allow to pass data from R to C++ while avoiding copying the data, therefore saving time and memory.

The ols_dbl() function does the same but returns a vector instead of a matrix.

Additional Examples

The package repository includes the directory cpp11eigentest, which contains an R package that uses Eigen, and that provides additional examples for eigenvalues, Cholesky and QR decomposition, and linear models.

References

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. RcppArmadillo: Accelerating R with High-Performance C++ Linear Algebra.” Computational Statistics & Data Analysis 71 (March): 1054–63. https://doi.org/10.1016/j.csda.2013.02.005.
Hansen, Bruce. 2022. Econometrics. Princeton University Press.
Jakob, Wenzel, Jason Rhinelander, and Dean Moldovan. 2016. Pybind11 — Seamless Operability Between c++11 and Python. https://github.com/pybind/pybind11.
Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based c++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.
Vaughan, Davis, Jim Hester, and Romain François. 2023. Cpp11: A c++11 Interface for r’s c Interface. https://CRAN.R-project.org/package=cpp11.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.