Data Structures And Algorithms
Learning Some Popular Gaussian Graphical Models without Condition Number Bounds (1905.01282v1)
Jonathan Kelner, Frederic Koehler, Raghu Meka, Ankur Moitra
2019-05-03
Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are assumed to be sparse. While there are a variety of algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph structure with a logarithmic number of samples, they assume various conditions that require the precision matrix to be in some sense well-conditioned. Here we give the first polynomial-time algorithms for learning attractive GGMs and walk-summable GGMs with a logarithmic number of samples without any such assumptions. In particular, our algorithms can tolerate strong dependencies among the variables. We complement our results with experiments showing that many existing algorithms fail even in some simple settings where there are long dependency chains, whereas ours do not.
RLE edit distance in near optimal time (1905.01254v1)
Raphaël Clifford, Paweł Gawrychowski, Tomasz Kociumaka, Daniel P. Martin, Przemysław Uznański
2019-05-03
We show that the edit distance between two run-length encoded strings of compressed lengths and respectively, can be computed in time. This improves the previous record by a factor of . The running time of our algorithm is within subpolynomial factors of being optimal, subject to the standard SETH-hardness assumption. This effectively closes a line of algorithmic research first started in 1993.
Fully Dynamic Single-Source Reachability in Practice: An Experimental Study (1905.01216v1)
Kathrin Hanauer, Monika Henzinger, Christian Schulz
2019-05-03
Given a directed graph and a source vertex, the fully dynamic single-source reachability problem is to maintain the set of vertices that are reachable from the given vertex, subject to edge deletions and insertions. While there has been theoretical work on this problem, showing both linear conditional lower bounds for the fully dynamic problem and insertions-only and deletions-only upper bounds beating these conditional lower bounds, there has been no experimental study that compares the performance of fully dynamic reachability algorithms in practice. Previous experimental studies in this area concentrated only on the more general all-pairs reachability or transitive closure problem and did not use real-world dynamic graphs. In this paper, we bridge this gap by empirically studying an extensive set of algorithms for the single-source reachability problem in the fully dynamic setting. In particular, we design several fully dynamic variants of well-known approaches to obtain and maintain reachability information with respect to a distinguished source. Moreover, we extend the existing insertions-only or deletions-only upper bounds into fully dynamic algorithms. Even though the worst-case time per operation of all the fully dynamic algorithms we evaluate is at least linear in the number of edges in the graph (as is to be expected given the conditional lower bounds) we show in our extensive experimental evaluation that their performance differs greatly, both on random as well as on real-world instances.
Positive-Instance Driven Dynamic Programming for Graph Searching (1905.01134v1)
Max Bannach, Sebastian Berndt
2019-05-03
Research on the similarity of a graph to being a tree - called the treewidth of the graph - has seen an enormous rise within the last decade, but a practically fast algorithm for this task has been discovered only recently by Tamaki (ESA 2017). It is based on dynamic programming and makes use of the fact that the number of positive subinstances is typically substantially smaller than the number of all subinstances. Algorithms producing only such subinstances are called positive-instance driven (PID). We give an alternative and intuitive view on this algorithm from the perspective of the corresponding configuration graphs in certain two-player games. This allows us to develop PID-algorithms for a wide range of important graph parameters such as treewidth, pathwidth, and treedepth. We analyse the worst case behaviour of the approach on some well-known graph classes and perform an experimental evaluation on real world and random graphs.
Benchmark Instances and Branch-and-Cut Algorithm for the Hashiwokakero Puzzle (1905.00973v1)
Leandro C. Coelho, Gilbert Laporte, Arinei Lindbeck, Thibaut Vidal
2019-05-02
Hashiwokakero, or simply Hashi, is a Japanese single-player puzzle played on a rectangular grid with no standard size. Some cells of the grid contain a circle, called island, with a number inside it ranging from one to eight. The remaining positions of the grid are empty. The player must connect all of the islands by drawing a series of horizontal or vertical bridges between them, respecting a series of rules: the number of bridges incident to an island equals the number indicated in the circle, at most two bridges are incident to any side of an island, bridges cannot cross each other or pass through islands, and each island must eventually be reachable from any other island. In this paper, we present some complexity results and relationships between Hashi and well-known graph theory problems. We give a formulation of the problem by means of an integer linear mathematical programming model, and apply a branch-and-cut algorithm to solve the model in which connectivity constraints are dynamically generated. We also develop a puzzle generator. Our experiments on 1440 Hashi puzzles show that the algorithm can consistently solve hard puzzles with up to 400 islands.
A Detailed Analysis of Quicksort Algorithms with Experimental Mathematics (1905.00118v2)
Yukun Yao
2019-04-30
We study several variants of single-pivot and multi-pivot Quicksort algorithms and consider them as discrete probability problems. With experimental mathematics, explicit expressions for expectations, variances and even higher moments of their numbers of comparisons and swaps can be obtained. For some variants, Monte Carlo experiments are performed, the numerical results are demonstrated and the scaled limiting distribution is also discussed.
Submodular Streaming in All its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity (1905.00948v1)
Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, Amin Karbasi
2019-05-02
Streaming algorithms are generally judged by the quality of their solution, memory footprint, and computational complexity. In this paper, we study the problem of maximizing a monotone submodular function in the streaming setting with a cardinality constraint . We first propose Sieve-Streaming++, which requires just one pass over the data, keeps only elements and achieves the tight -approximation guarantee. The best previously known streaming algorithms either achieve a suboptimal -approximation with memory or the optimal -approximation with memory. Next, we show that by buffering a small fraction of the stream and applying a careful filtering procedure, one can heavily reduce the number of adaptive computational rounds, thus substantially lowering the computational complexity of Sieve-Streaming++. We then generalize our results to the more challenging multi-source streaming setting. We show how one can achieve the tight -approximation guarantee with shared memory while minimizing not only the required rounds of computations but also the total number of communicated bits. Finally, we demonstrate the efficiency of our algorithms on real-world data summarization tasks for multi-source streams of tweets and of YouTube videos.
Log Diameter Rounds Algorithms for -Vertex and -Edge Connectivity (1905.00850v1)
Alexandr Andoni, Clifford Stein, Peilin Zhong
2019-05-02
Many modern parallel systems, such as MapReduce, Hadoop and Spark, can be modeled well by the MPC model. The MPC model captures well coarse-grained computation on large data --- data is distributed to processors, each of which has a sublinear (in the input data) amount of memory and we alternate between rounds of computation and rounds of communication, where each machine can communicate an amount of data as large as the size of its memory. This model is stronger than the classical PRAM model, and it is an intriguing question to design algorithms whose running time is smaller than in the PRAM model. In this paper, we study two fundamental problems, -edge connectivity and -vertex connectivity (biconnectivity). PRAM algorithms which run in time have been known for many years. We give algorithms using roughly log diameter rounds in the MPC model. Our main results are, for an -vertex, -edge graph of diameter and bi-diameter , 1) a parallel time -edge connectivity algorithm, 2) a parallel time biconnectivity algorithm, where the bi-diameter is the largest cycle length over all the vertex pairs in the same biconnected component. Our results are fully scalable, meaning that the memory per processor can be for arbitrary constant , and the total memory used is linear in the problem size. Our -edge connectivity algorithm achieves the same parallel time as the connectivity algorithm of Andoni et al. (FOCS 2018). We also show an conditional lower bound for the biconnectivity problem.
Budget-Feasible Mechanism Design for Non-Monotone Submodular Objectives: Offline and Online (1905.00848v1)
Georgios Amanatidis, Pieter Kleer, Guido Schäfer
2019-05-02
The framework of budget-feasible mechanism design studies procurement auctions where the auctioneer (buyer) aims to maximize his valuation function subject to a hard budget constraint. We study the problem of designing truthful mechanisms that have good approximation guarantees and never pay the participating agents (sellers) more than the budget. We focus on the case of general (non-monotone) submodular valuation functions and derive the first truthful, budget-feasible and -approximate mechanisms that run in polynomial time in the value query model, for both offline and online auctions. Prior to our work, the only -approximation mechanism known for non-monotone submodular objectives required an exponential number of value queries. At the heart of our approach lies a novel greedy algorithm for non-monotone submodular maximization under a knapsack constraint. Our algorithm builds two candidate solutions simultaneously (to achieve a good approximation), yet ensures that agents cannot jump from one solution to the other (to implicitly enforce truthfulness). Ours is the first mechanism for the problem where---crucially---the agents are not ordered with respect to their marginal value per cost. This allows us to appropriately adapt these ideas to the online setting as well. To further illustrate the applicability of our approach, we also consider the case where additional feasibility constraints are present. We obtain -approximation mechanisms for both monotone and non-monotone submodular objectives, when the feasible solutions are independent sets of a -system. With the exception of additive valuation functions, no mechanisms were known for this setting prior to our work. Finally, we provide lower bounds suggesting that, when one cares about non-trivial approximation guarantees in polynomial time, our results are asymptotically best possible.
Near Optimal Jointly Private Packing Algorithms via Dual Multiplicative Weight Update (1905.00812v1)
Zhiyi Huang, Xue Zhu
2019-05-02
We present an improved -jointly differentially private algorithm for packing problems. Our algorithm gives a feasible output that is approximately optimal up to an additive factor as long as the supply of each resource is at least , where is the number of resources. This improves the previous result by Hsu et al.~(SODA '16), which requires the total supply to be at least , and only guarantees approximate feasibility in terms of total violation. Further, we complement our algorithm with an almost matching hardness result, showing that supply is necessary for any -jointly differentially private algorithm to compute an approximately optimal packing solution. Finally, we introduce an alternative approach that runs in linear time, is exactly truthful, can be implemented online, and can be -jointly differentially private, but requires a larger supply of each resource.