Selected publication:
Entropy-Bounded Representation of Point Grids
with Travis Gagie and Gonzalo Navarro. ISAAC 2010. [PDF]
Click to show/hide abstract.We give the first fully compressed representation of a set of $m$ points on an $n \times n$ grid, taking $H+o(H)$ bits of space, where $H=\lg {n^2 \choose m}$ is the entropy of the set. This representation supports range counting, range reporting, and point selection queries, with a performance that is comparable to that of uncompressed structures and that improves upon the only previous compressed structure. Operating within entropy-bounded space opens a new line of research on an otherwise well-studied area, and is becoming extremely important for handling large datasets.
Succinct Representations of Separable Graphs
with Guy E. Blelloch. CPM 2010. [PDF]
Click to show/hide abstract.We consider the problem of highly space-efficient representation of separable graphs while supporting queries in constant time in the RAM with logarithmic word size. In particular, we show constant-time support for adjacency, degree and neighborhood queries. For any monotone class of separable graphs, the storage requirement of the representation is optimal to within lower order terms. Separable graphs are those that admit a $O(n^c)$-separator theorem where $c < 1$. Many graphs that arise in practice are indeed separable. For instance, graphs with a bounded genus are separable. In particular, planar graphs (genus $0$) are separable and our scheme gives the first succinct representation of planar graphs with a storage requirement that matches the information-theory minimum to within lower order terms with constant time support for the queries. We, furthers, show that we can also modify the scheme to succinctly represent the combinatorial planar embedding of planar graphs (and hence encode planar maps).
Dynamic Succinct Ordered Trees
with J.I. Munro. ICALP 2009. [PDF]
Click to show/hide abstract.We study the problem of maintaining a dynamic tree succinctly, in $2n+o(n)$ bits, under updates of the following form: insertion or deletion of a leaf, insertion of a node on an edge (edge subdivision) or deletion of a node with only one child (the child becomes a child of the grandparent). We allow satellite data of a fixed (but not necessarily constant) size to be associated to the nodes of the tree. We support update operations in constant amortized time and support access to satellite data and basic navigation operations in worst-case constant time; the basic navigation operations includes parent, first/last-child, previous/next-child. We demonstrate that to allow fast support for more extended operations such as determining the $i$-th child of a node, rank of a child among its siblings, or subtree size, we require a restrictive update strategy for which we propose the finger-update model where updates are performed by a finger which is only allowed to crawl on the tree (between a child and a parent or between consecutive siblings). Under this model, we describe how the named operations can be performed in worst-case constant time. Previous work on dynamic succinct ordered trees is mainly restricted to binary trees and achieves poly-logarithmic or ``poly-log-log" update time under a more restricted model. Best previous result on ordinal trees achieves only sublinear amortized update time and ``poly-log-log" query time.
Universal Succinct Representations of Trees?
with S. Srinivasa Rao, Rajeev Raman . ICALP 2009. [PDF]
Click to show/hide abstract.We consider the succinct representation of ordinal and cardinal trees on the RAM with logarithmic word size. Given a tree $T$, our representations support the following operations in $O(1)$ time: (i) BP-substring(i,b), which reports the substring of length $b$ bits ($b$ is at most the wordsize) beginning at position $i$ of the balanced parenthesis representation of $T$, (ii) DFUDS-substring(i,b), which does the same for the depth first unary degree sequence representation, and (iii) a similar operation for tree-partition based representations of $T$. We give:
- an asymptotically space-optimal $2n + o(n)$ bit representation of $n$-node ordinal trees that supports all the above operations with $b = \Theta(\log n)$, answering an open question from [He et al., ICALP'07].
- an asymptotically space-optimal $C(n,k) + o(n)$-bit representation of $k$-ary cardinal trees, that supports (with $b = \Theta(\sqrt{\log n})$) the operations (ii) and (iii) above, on the ordinal tree obtained by removing labels from the cardinal tree, as well as the usual label-based operations. As a result, we obtain a fully-functional cardinal tree representation with the above space complexity. This answers an open question from [Raman et al, SODA'02].
Our new representations are able to simultaneously emulate the BP, DFUDS and partitioned representations using a single instance of the data structure, and thus aim towards universality. They not only support the union of all the ordinal tree operations supported by these representations, but will also automatically inherit any new operations supported by these representations in the future.
Finding a Hausdorff Core of a Polygon
with Reza Dorrigiv, Stephane Durocher, Robert Fraser, Alejandro Lopez-Ortiz, J. Ian Munro, Alejandro Salinger, Matthew Skala. WADS 2009. [PDF]
Click to show/hide abstract.Given a simple polygon P, we consider the problem of finding a convex polygon Q contained in P that minimizes H(P;Q), where H denotes the Hausdorff distance. We call such a polygon Q a Hausdorff core of P. We describe polynomial-time approximations for both the minimization and decision versions of the Hausdorff core problem, and we provide an argument supporting the hardness of the problem.
Evaluation of General Set Expressions
with Ehsan Chiniforooshan, Mahdi Mirzazadeh. ISAAC 2008. [PDF]
Click to show/hide abstract.We consider the problem of evaluating an expression over sets. The sets are preprocessed and are therefore sorted, and the operators can be any of union, intersection, difference, complement, and symmetric difference (exclusive union). Given the expression as a formula and the sizes of the input sets, we are interested in the worst-case complexity of evaluation (in terms of the size of the sets). The problem is motivated by document retrieval in search engines where a user query translates directly to an expression over the sets containing the user- entered words. Special cases of of this problem have been studied where the expression has a restricted form. In this paper, we present an efficient algorithm to evaluate the most general form of a set expression. We show a lower bound on this problem for expressions of the form $E_1$, or $E_1 - E_2$ where $E_1$ and $E_2$ are expressions with union, intersection, and symmetric difference operators. We demonstrate that the algorithm's complexity matches the lower bound in these instances. We, moreover, conjecture that the algorithm works optimally, even when we allow difference and complement operations in E1 and E2.
Succinct Representations of Arbitrary Graphs
with J. Ian Munro. ESA 2008. [PDF] (© Springer-Verlag)
Click to show/hide abstract.We consider the problem of encoding a graph with $n$ vertices and $m$ edges compactly supporting adjacency, neighborhood and degree queries in constant time in the log(n)-bit word RAM model. The adjacency query asks whether there is an edge between two vertices, the neighborhood query reports the neighbors of a given vertex in constant time per neighbor, and the degree query reports the number of incident edges to a given vertex.
We study the problem in the context of succinctness, where the goal is to achieve the optimal space requirement as a function of $n$ and $m$, to within lower order terms. We prove a lower bound in the cell probe model that it is impossible to achieve the information-theory lower bound within lower order terms unless the graph is too sparse (namely $m=o(n^\delta)$ for any constant $\delta > 0$) or too dense (namely $m = o(n^{2-\delta})$ for any constant $\delta > 0$).
Furthermore, we present a succinct encoding for graphs for all values of $n,m$ supporting queries in constant time. The space requirement of the representation is always within a multiplicative $1+\epsilon$ factor of the information-theory lower bound for any arbitrarily small constant $\epsilon > 0$. This is the best achievable space bound according to our lower bound where it applies. The space requirement of the representation achieves the information-theory lower bound tightly to within lower order terms when the graph is sparse ($m=o(n^\delta)$ for any constant $\delta > 0$).
A Uniform Approach Towards Succinct Representation of Trees
with J. Ian Munro. SWAT 2008. [PDF] (© Springer-Verlag)
Click to show/hide abstract.We propose a uniform approach for succinct representation of various families of trees. The method is based on two recursive decomposition of trees into subtrees. Recursive decomposition of a structure into substructures is a common technique in succinct data structures and has already been shown fruitful in succinct representation of ordinal trees and dynamic binary trees. We take an approach that simplifies the existing representation of ordinal trees allowing the same full set of navigational operations. The approach applied to cardinal (ie k-ary) trees yields a space-optimal succinct representation which allows full set of cardinal-type operations (eg determining child labeled i) as well as the full set of ordinal-type operations (eg reporting subtree size). Existing space-optimal succinct representations had not been able to support both types of operations.
We demonstrate how the approach can be applied to obtain a space-optimal succinct representation for the family of free trees where the order of children is insignificant. Furthermore, we show that our approach can be used to obtain entropy-based succinct representations. We show that our approach matches the degree-distribution entropy suggested by Jansson. We discuss that our approach can be made adaptive to various other entropy measures.
Cache-Oblivious Output-Sensitive Two-Dimensional Convex Hull
with Peyman Afshani. CCCG 2007
Click to show/hide abstract.We consider the problem of two-dimensional output-sensitive convex hull in the cache-oblivious model. That is, we are interested in minimizing the number of cache faults caused to determine the convex hull of a set of N points on a plane. We are interested in the output-sensitive case where number of cache misses are analyzed in the worst case based on both the input size N and output size H (number of extreme points that lie on the final convex hull).
We present a simple algorithm which almost matches this lower bound, as it can only be an additional term ${N\over B}\log\log {M\over B}$ away from the optimal. We will argue that $\log\log {M\over B}$ is very small in practice and thus it is negligible.
On the Complexity of Finding an Unknown Cut Via Vertex Queries
with Peyman Afshani, Ehsan Chiniforooshan, Reza Dorrigiv, Mehdi Mirzazadeh, Narges Simjour, Hamid Zarrabi-Zadeh. COCOON 2007
Click to show/hide abstract.We investigate the problem of finding an unknown cut through querying vertices of a graph G. Our complexity measure is the number of submitted queries. To avoid some worst cases, we make a few assumptions which allow us to obtain an algorithm with the worst case query complexity of $O(k)+2k\log{n\over k}$ in which k is the number of vertices adjacent to cut-edges. We also provide a matching lowerbound and then prove if G is a tree our algorithm can asymptotically achieve the information theoretic lowerbound on the query complexity. Finally, we show it is possible to remove our extra assumptions but achieve an approximate solution.
Succinct representation of finite abelian groups
with J. Ian Munro. ISSAC 2006
Click to show/hide abstract.We consider the problem of representing and performing computations on finite abelian groups. Assuming a \lg n-bit word model and considering any abelian group of order $n$, we show how to represent the group in constant number of words and perform three fundamental group operations of equality testing, multiplication, and inversion in constant number of word operations, provided we have the platform instruction to reverse the bits of a word.
Cache-oblivious comparison-based algorithms on multisets
with Paolo Ferragina, Gianni Franceschini, J. Ian Munro. ESA 2005. [PDF] (© Springer-Verlag)
Click to show/hide abstract.We study three comparison-based problems related to multisets in the cache-oblivious model: Duplicate elimination, multisorting and finding the most frequent element (the mode). We are interested in minimizing the cache complexity (or number of cache misses) of algorithms for these problems in the context under which cache size and block size are unknown. We give algorithms with cache complexities within a constant factor of the optimal for all the problems. In the case of determining the mode, the optimal algorithm is randomized as the deterministic algorithm differs from the lower bound by a sublogarithmic factor. We can achieve optimality either with a randomized method or if given, along with the input, lg lg of relative frequency of the mode with a constant additive error.
Worst Case Optimal Union-Intersection Expression Evaluation
with Ehsan Chiniforooshan, Mehdi Mirzazadeh. ICALP 2005
Click to show/hide abstract.We consider the problem of evaluating an expression consisting of unions and intersections of some sorted sets. Given the expression and the sizes of the sets, we are interested in the worst-case complexity of evaluating the expression in terms of the sizes of the sets. We assume no set is repeated in the expression. We show a lower bound on this problem and present an algorithm that matches the lower bound asymptotically.