Sample dataset

In this vignette, we will generate a point cloud using a sample of 2-dimensional points on the unit circle’s circumference; this will be stored in a variable named circle_df.

# create reproducible dataset
set.seed(42)
angles <- runif(25, 0, 2 * pi)
circle_df <- data.frame(x = cos(angles),
                        y = sin(angles))

# take a peek at first 6 rows
head(circle_df)
#>            x          y
#> 1  0.8601210 -0.5100900
#> 2  0.9228553 -0.3851468
#> 3 -0.2251251  0.9743299
#> 4  0.4842164 -0.8749483
#> 5 -0.6289353 -0.7774577
#> 6 -0.9928106 -0.1196957

Above, each of the 100 rows represents a single point, with each of the 2 columns representing a Cartesian coordinate for a single dimension. Column x contains the x-coordinates of the 100 points and column y contains the respective y-coordinates. To confirm that the points in circle_df do lie on the circumference of a circle, we can quickly create a scatterplot.

# scatterplot of circle2d
plot(circle_df, xlab = "x", ylab = "y", main = "2-d circle point cloud")

Calculating persistent homology

Given that the points in circle_df are uniformly distributed across the circumference of a circle without any error or noise, we expect a single prominent 1-cycle to be present in its persistent homology. The Ripser C++ library is wrapped by R using Rcpp, and performs calculations on a Vietoris-Rips complex created with the input point cloud. These calculations result in a data frame that contains all the necessary information to characterize the persistence of homological features within circle_df, and can be performed with a single line of R code using ripserr.

# calculate persistent homology
circ_phom <- vietoris_rips(circle_df)

# print first 6 features (ordered by dimension and birth)
head(circ_phom)
#>   dimension birth      death
#> 1         0     0 0.01509939
#> 2         0     0 0.01846671
#> 3         0     0 0.02540582
#> 4         0     0 0.02859409
#> 5         0     0 0.04180345
#> 6         0     0 0.06699952

# print last 6 features (ordered by dimension and birth)
tail(circ_phom)
#>    dimension    birth     death
#> 20         0 0.000000 0.4582335
#> 21         0 0.000000 0.5059727
#> 22         0 0.000000 0.5793416
#> 23         0 0.000000 0.5812266
#> 24         0 0.000000 0.7170409
#> 25         1 1.026735 1.7859821

Each row in the homology matrix returned by the vietoris_rips function (variable named circ_phom) represents a single feature (cycle). The homology matrix has 3 columns in the following order:

  1. dimension: if 0, represents a 0-cycle; if 1, represents a 1-cycle; and so on.
  2. birth: the radius of the Vietoris-Rips complex at which this feature was first detected.
  3. death: the radius of the Vietoris-Rips complex at which this feature was last detected.

Persistence of a feature is generally defined as the length of the interval of the radius within which the feature exists. This can be calculated as the numerical difference between the second (birth) and third (death) columns of the homology matrix. Confirmed in the output of the head and tail functions above, the homology matrix is ordered by dimension, with the birth column used to compare features of the same dimension. As expected for circle_df, the homology matrix contains a single prominent 1-cycle (last line of tail’s output). Although we suspect the feature to be a persistent 1-cycle, comparison with the other features in the homology matrix is required to confirm that it is sufficiently persistent. This task is done far more easily with an appropriate visualization (e.g. topological barcode or persistence diagram) than by eyeballing the contents of circ_phom. The ggtda and TDAstats R packages could be useful to create such visualizations.

Dataset formats

Although we used a data frame to store the example point cloud, a matrix or dist object would work equally well. Let’s test this out by creating the matrix and dist equivalents of circle_df and checking if the resulting vietoris_rips outputs are equal using base::all.equal to compare data frames.

# matrix format
circ_mat <- as.matrix(circle_df)
head(circ_mat)
#>               x          y
#> [1,]  0.8601210 -0.5100900
#> [2,]  0.9228553 -0.3851468
#> [3,] -0.2251251  0.9743299
#> [4,]  0.4842164 -0.8749483
#> [5,] -0.6289353 -0.7774577
#> [6,] -0.9928106 -0.1196957

# dist object
circ_dist <- dist(circ_mat)
head(circ_dist)
#> [1] 0.1398085 1.8388207 0.5238567 1.5128695 1.8936112 1.0621818

# calculate persistent homology of each
mat_phom <- vietoris_rips(circ_mat)
dist_phom<- vietoris_rips(circ_dist)

# compare equality
all.equal(circ_phom, mat_phom)
#> [1] TRUE
all.equal(circ_phom, dist_phom)
#> [1] TRUE

Coefficients in various prime fields

Ripser can calculate persistent homology with coefficients in various prime fields (\(\mathbb{Z}/p\mathbb{Z}\)). ripserr implements this functionality via the p parameter in vietoris_rips.

# prime field Z/2Z (default)
circ_phom2 <- vietoris_rips(circle_df)

# prime field Z/5Z
circ_phom5 <- vietoris_rips(circle_df, p = 5L)

# confirm equal outputs
all.equal(circ_phom2, circ_phom5)
#> [1] TRUE