Data Structures And Algorithms
Limits of Ordered Graphs and their Applications (1811.02023v2)
Omri Ben-Eliezer, Eldar Fischer, Amit Levi, Yuichi Yoshida
2018-11-05
The emerging theory of graph limits exhibits an analytic perspective on graphs, showing that many important concepts and tools in graph theory and its applications can be described more naturally (and sometimes proved more easily) in analytic language. We extend the theory of graph limits to the ordered setting, presenting a limit object for dense vertex-ordered graphs, which we call an \emph{orderon}. As a special case, this yields limit objects for matrices whose rows and columns are ordered, and for dynamic graphs that expand (via vertex insertions) over time. Along the way, we devise an ordered locality-preserving variant of the cut distance between ordered graphs, showing that two graphs are close with respect to this distance if and only if they are similar in terms of their ordered subgraph frequencies. We show that the space of orderons is compact with respect to this distance notion, which is key to a successful analysis of combinatorial objects through their limits. We derive several applications of the ordered limit theory in extremal combinatorics, sampling, and property testing in ordered graphs. In particular, we prove a new ordered analogue of the well-known result by Alon and Stav [RS&A'08] on the furthest graph from a hereditary property; this is the first known result of this type in the ordered setting. Unlike the unordered regime, here the random graph model with an ordering over the vertices is \emph{not} always asymptotically the furthest from the property for some . However, using our ordered limit theory, we show that random graphs generated by a stochastic block model, where the blocks are consecutive in the vertex ordering, are (approximately) the furthest. Additionally, we describe an alternative analytic proof of the ordered graph removal lemma [Alon et al., FOCS'17].
The Online Best Reply Algorithm for Resource Allocation Problems (1805.02526v2)
Max Klimm, Daniel Schmand, Andreas Tönnis
2018-05-07
We study the performance of a best reply algorithm for online resource allocation problems with a diseconomy of scale. In an online resource allocation problem, we are given a set of resources and a set of requests that arrive in an online manner. Each request consists of a set of feasible allocations and an allocation is a set of resources. The total cost of an allocation vector is given by the sum of the resources' costs, where each resource's cost depends on the total load on the resource under the allocation vector. We analyze the natural online procedure where each request is allocated greedily to a feasible set of resources that minimizes the individual cost of that particular request. In the literature, this algorithm is also known as a one-round walk-in congestion games starting from the empty state. For unweighted resource allocation problems with polynomial cost functions with maximum degree , upper bounds on the competitive ratio of this greedy algorithm were known only for the special cases . In this paper, we show a general upper bound on the competitive ratio of for the unweighted case where denotes the Lambert-W function on . For the weighted case, we show that the competitive ratio of the greedy algorithm is bounded from above by .
Improved Algorithms for Time Decay Streams (1907.07574v1)
Vladimir Braverman, Harry Lang, Enayat Ullah, Samson Zhou
2019-07-17
In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a \emph{coreset}, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for -median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores points where is the half-life of the decay function and is the aspect ratio of the dataset. Our techniques extend to -means clustering and -estimators as well.
A spectral bound on hypergraph discrepancy (1907.04117v3)
Aditya Potukuchi
2019-07-07
Let be a -regular hypergraph on vertices and edges. Let be the incidence matrix of and let us denote . We show that the discrepancy of is . As a corollary, this gives us that for every , the discrepancy of a random -regular hypergraph with vertices and edges is almost surely as grows. The proof also gives a polynomial time algorithm that takes a hypergraph as input and outputs a coloring with the above guarantee.
Seedless Graph Matching via Tail of Degree Distribution for Correlated Erdos-Renyi Graphs (1907.06334v2)
Mahdi Bozorg, Saber Salehkaleybar, Matin Hashemi
2019-07-15
The graph matching problem refers to recovering the node-to-node correspondence between two correlated graphs. A previous work theoretically showed that recovering is feasible in sparse Erdos-Renyi graphs if and only if the probability of having an edge between a pair of nodes in one of the graphs and also between the corresponding nodes in the other graph is in the order of , where n is the number of nodes. In this paper, we propose a graph matching algorithm which obtains correct matching with high probability in Erdos-Renyi graphs for the region of without using a seed set of pre-matched node pairs as an input. The algorithm assigns structurally innovative features to high-degree nodes based on the tail of empirical degree distribution of their neighbor nodes. Then, it matches the high-degree nodes according to these features, and finally obtains a matching for the remaining nodes. We evaluate the performance of proposed algorithm in the regions of and . Experiments show that it outperforms previous works in both regions.
Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View (1907.00289v2)
Jelena Diakonikolas, Lorenzo Orecchia
2019-06-29
This note provides a novel, simple analysis of the method of conjugate gradients for the minimization of convex quadratic functions. In contrast with standard arguments, our proof is entirely self-contained and does not rely on the existence of Chebyshev polynomials. Another advantage of our development is that it clarifies the relation between the method of conjugate gradients and general accelerated methods for smooth minimization by unifying their analyses within the framework of the Approximate Duality Gap Technique that was introduced by the authors.
The Complexity of Partial Function Extension for Coverage Functions (1907.07230v1)
Umang Bhaskar, Gunjan Kumar
2019-07-16
Coverage functions are an important subclass of submodular functions, finding applications in machine learning, game theory, social networks, and facility location. We study the complexity of partial function extension to coverage functions. That is, given a partial function consisting of a family of subsets of and a value at each point, does there exist a coverage function defined on all subsets of that extends this partial function? Partial function extension is previously studied for other function classes, including boolean functions and convex functions, and is useful in many fields, such as obtaining bounds on learning these function classes. We show that determining extendibility of a partial function to a coverage function is NP-complete, establishing in the process that there is a polynomial-sized certificate of extendibility. The hardness also gives us a lower bound for learning coverage functions. We then study two natural notions of approximate extension, to account for errors in the data set. The two notions correspond roughly to multiplicative point-wise approximation and additive approximation. We show upper and lower bounds for both notions of approximation. In the second case we obtain nearly tight bounds.
Learning to Link (1907.00533v2)
Maria-Florina Balcan, Travis Dick, Manuel Lang
2019-07-01
Clustering is an important part of many modern data analysis pipelines, including network analysis and data retrieval. There are many different clustering algorithms developed by various communities, and it is often not clear which algorithm will give the best performance on a specific clustering task. Similarly, we often have multiple ways to measure distances between data points, and the best clustering performance might require a non-trivial combination of those metrics. In this work, we study data-driven algorithm selection and metric learning for clustering problems, where the goal is to simultaneously learn the best algorithm and metric for a specific application. The family of clustering algorithms we consider is parameterized linkage based procedures that includes single and complete linkage. The family of distance functions we learn over are convex combinations of base distance functions. We design efficient learning algorithms which receive samples from an application-specific distribution over clustering instances and simultaneously learn both a near-optimal distance and clustering algorithm from these classes. We also carry out a comprehensive empirical evaluation of our techniques showing that they can lead to significantly improved clustering performance.
Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression (1907.07167v1)
Deeksha Adil, Richard Peng, Sushant Sachdeva
2019-07-16
Linear regression in -norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving -regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any Our algorithm is simple to implement and is guaranteed to find a -approximate solution in iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.
A Faster Algorithm for Cuckoo Insertion and Bipartite Matching in Large Graphs (1611.07786v2)
Megha Khosla, Avishek Anand
2016-11-23
Hash tables are ubiquitous in computer science for efficient access to large datasets. However, there is always a need for approaches that offer compact memory utilisation without substantial degradation of lookup performance. Cuckoo hashing is an efficient technique of creating hash tables with high space utilisation and offer a guaranteed constant access time. We are given locations and items. Each item has to be placed in one of the locations chosen by random hash functions. By allowing more than one choice for a single item, cuckoo hashing resembles multiple choice allocations schemes. In addition it supports dynamically changing the location of an item among its possible locations. We propose and analyse an insertion algorithm for cuckoo hashing that runs in \emph{linear time} with high probability and in expectation. Previous work on total allocation time has analysed breadth first search, and it was shown to be linear only in \emph{expectation}. Our algorithm finds an assignment (with probability 1) whenever it exists. In contrast, the other known insertion method, known as \emph{random walk insertion}, may run indefinitely even for a solvable instance. We also present experimental results comparing the performance of our algorithm with the random walk method, also for the case when each location can hold more than one item. As a corollary we obtain a linear time algorithm (with high probability and in expectation) for finding perfect matchings in a special class of sparse random bipartite graphs. We support this by performing experiments on a real world large dataset for finding maximum matchings in general large bipartite graphs. We report an order of magnitude improvement in the running time as compared to the \emph{Hopkraft-Karp} matching algorithm.