PODS program

Proceedings

PODS 2017 papers will be accessed here.

Conference Program: PODS Sessions

This page describes the complete PODS Conference program.

The PODS 2017 program will also be available at http://confer.csail.mit.edu/sigmod2017. Confer is a conference program management tool with a crowd-sourcing component that automatically groups papers into sessions to maximize the number of papers each attendee likes that he or she will get to see.

 

 

 

PODS Keynote Talk

Monday 8:15-9:40
Location: ♛Continental A
Session Chair: Jan Van den Bussche (Hasselt University)

  • Data Citation: a Computational Challenge (pods313sd)
    Susan Davidson (University of Pennsylvania)
    Abstract: Most information is now published in complex, structured, evolving datasets or databases. As such, there is increasing demand that this digital information should be treated in the same way as conventional publications and cited appropriately. Unlike traditional publications, however, which have a fixed granularity to which citations can be attached (e.g. a paper in a conference proceedings, or chapter in a book), the granularity of data varies when retrieved by a query over a database. Since there are a potentially infinite number of queries, each accessing and generating different subsets of the database, it is impossible to explicitly attach a citation to every possible result set and/or query. Data citation is therefore a computational challenge, whose solution draws on two well-studied problems in database theory: query answering using views, and provenance. In this talk, I will give an overview of what is being done in data citation and highlight several open research problems, both practical and theoretical.
    Bio: Susan B. Davidson received the B.A. degree in Mathematics from Cornell University in 1978, and the M.A. and Ph.D. degrees in Electrical Engineering and Computer Science from Princeton University in 1980 and 1982. Dr. Davidson is the Weiss Professor of Computer and Information Science (CIS) at the University of Pennsylvania, where she has been since 1982, and currently serves as Chair of the board of the Computing Research Association. Dr. Davidson’s research interests include database and web-based systems, scientific data management, provenance, crowdsourcing, and data citation. Dr. Davidson was the founding co-director of the Penn Center for Bioinformatics from 1997-2003, and the founding co-director of the Greater Philadelphia Bioinformatics Alliance. She served as Deputy Dean of the School of Engineering and Applied Science from 2005-2007 and Chair of CIS from 2008-2013. Her awards include: ACM Fellow, Corresponding Fellowship of the Royal Society of Edinburgh (2015), Lenore Rowe Williams Award (2002), Fulbright Scholar and recipient of a Hitachi Chair (2004), Trustees’ Council of Penn Women/Provost Award (April 2015) for her work on advancing women in engineering, and IEEE TCDE Impact Award (2017).

 

 

 

RESEARCH SESSIONS

 

Session 1: New formal frameworks

Monday 9:40-10:30
Location: ♛Continental A
Session Chair: Paris Koutris (University of Wisconsin-Madison)

 

 

Session 2: Algorithms, data structures, benchmarking

Monday 11:00-12:40
Location: ♛Continental A
Session Chair: Floris Geerts (University of Antwerp)

  • Benchmarking the chase (pods043)
    Michael Benedikt (University of Oxford), George Konstantinidis (University of Oxford), Giansalvatore Mecca (University of Basilicata), Boris Motik (University of Oxford), Paolo Papotti (Arizona State University), Donatello Santoro (University of Basilicata) and Efthymia Tsamoura (University of Oxford)
  • Efficient and Provable Multi-Query Optimization (pods040)
    Tarun Kathuria (Microsoft Research) and S Sudarshan (Indian Institute of Technology)
  • Write-Optimized Skip Lists (pods118)
    Michael Bender (Stony Brook University), Martín Farach-Colton (Rutgers University), Rob Johnson (Stony Brook University), Simon Mauras (ENS Lyon), Tyler Mayer (Stony Brook University), Cynthia Phillips (Sandia National Laboratories) and Helen Xu (MIT)
  • Output-optimal Parallel Algorithms for Similarity Joins (pods085)
    Xiao Hu (HKUST), Yufei Tao (University of Queensland) and Ke Yi (HKUST)

 

 

Gems of PODS and Test-of-Time Award

Monday 14:00-15:30
Location: ♛Continental A
Session Chair: Floris Geerts (University of Antwerp)

  • The Semiring Framework for Database Provenance (ToT Award)
    (pods319vt)
    Val Tannen (University of Pennsylvania)
    Abstract: Imagine a computational process that uses a complex input consisting of multiple “items” (e.g., files, tables, tuples, parameters, configuration rules) The provenance analysis of such a process allows us to understand how the different input items affect the output of the computation. It can be used, for example, to derive confidence in the output (given confidences in the input items), to derive the minimum access clearance for the output (given input items with different classifications), to minimize the cost of obtaining the output (given a complex input item pricing scheme). It also applies to probabilistic reasoning about an output (given input item distributions), as well as to output maintenance, and to debugging.
    Provenance analysis for queries, views, database ETL tools, and schema mappings is strongly influenced by their declarative nature, providing mathematically nice descriptions of the output-inputs correlation. In a series of papers starting with PODS 2007 we have developed an algebraic framework for describing such provenance based on commutative semirings and semimodules over such semirings. So far, the framework has exploited usefully the observation that, for database provenance, data use has two flavors: joint and alternative.
    Here, we have selected several insights that we consider essential for the appreciation of this framework’s nature and effectiveness and we also give some idea of its applicability.

    Bio: Val Tannen is a professor in the Department of Computer and Information Science of the University of Pennsylvania. He joined Penn after receiving his PhD from the Massachusetts Institute of Technology in 1987. After working for a time in Programming Languages, his current research interests are in Databases. Moreover, he has always been interested in applications of Logic to Computer Science and since 1994 he has also worked in Bioinformatics, leading a number of interdisciplinary projects. In Databases, he and his students and collaborators have worked on query language design and on models and systems for query optimization, parallel query processing, and data integration. More recently their work has focused on models and systems for data sharing, data provenance, the management of uncertain information and algorithmic provisioning for what-if analysis. Tannen has received the 20 year Test-of-Time Award from ICDT and the 10 year Test-of-Time Award from PODS. He is an ACM Fellow.

 

  • Data Integration: After the Teenage Years (pods316ah)
    Alon Halevy (Recruit Institute of Technology)
    Abstract: Twenty five years ago, data integration was a concern mainly for large enterprises with many autonomous data sources. Since then, while data integration became even more important to enterprises, the technology has ventured out into new areas, such as the Web and recently into voice-activated consumer devices that answer a plethora of questions in your home. At the core of these systems is a mechanism for describing the semantics and capabilities of individual data sources. Database theory has made important contributions to the area of modeling data sources and query answering in data integration systems. I this talk I will cover some of the main developments in the field of data integration since the mid-90’s, and discuss the challenges that lie ahead.

    Bio: Alon Halevy is the C.E.O of the Recruit Institute of Technology. From 2005 to 2015 he headed the Structured Data Management Research group at Google. Prior to that, he was a professor of Computer Science at the University of Washington in Seattle, where he founded the Database Group. In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space, and in 2004, Dr. Halevy founded Transformic, a company that created search engines for the deep web, and was acquired by Google. Dr. Halevy is a Fellow of the Association for Computing Machinery, the author of the book “The Infinite Emotions of Coffee”, and co-author of the book “Principles of Data Integration”.

 

 

PODS Session 3: Concurrency, JSON, learning and privacy

Monday 16:00-18:05
Location: ♛Continental A
Session Chair: Stijn Vansummeren (Université Libre de Bruxelles)

 

 

PODS Session 4: Best paper award, ontologies and probabilistic databases

Tuesday 14:00-15:40
Location: ♛Continental A
Session Chair: Andreas Pieris (University of Edinburgh)

 

 

PODS Session 5: Enumeration problems

Tuesday 16:00-18:05
Location: ♛Continental A
Session Chair: Ke Yi (Hong Kong University of Science and Technology)

 

 

PODS Session 6: Best student paper, streaming and sketches

Wednesday 14:00-15:40
Location: ♛Continental A
Session Chair: Yufei Tao (University of Queensland)

 

 

PODS Session 7: Dependencies, graphs and query evaluation

Wednesday 16:00-18:05
Location: ♛Continental A
Session Chair: Angela Bonifati (Université de Lyon)

 

 

POSTER SESSIONS

 

SIGMOD/PODS Poster & Demo Session 1

Tuesday 16:00-18:00
Location: ⚓Stevens Salon D

  • All papers from PODS Research Sessions  3, 4 and 7
  • All SIGMOD papers from Tuesday: SIGMOD Research Sessions 1-10
  • SIGMOD demos

 

SIGMOD/PODS Poster & Demo Session 2

Wednesday 16:00-18:00
Location: ⚓Stevens Salon D

  • All papers from PODS Research Sessions 1, 2, 5 and 6
  • All SIGMOD papers from Wednesday: SIGMOD Research Sessions 11-20
  • SIGMOD demos

 

 

 

TUTORIAL SESSIONS

 

Invited Tutorial 1

Tuesday 11:00-12:20
Location: ♛Continental A
Session Chair: Benny Kimelfeld (Technion)

  • Random Sampling on Big Data: Techniques and Applications
    Ke Yi (Hong Kong University of Science and Technology)
    Abstract: Random sampling is a powerful tool for big data analytics. It can be used whenever complete accuracy is not required, while offering order-of-magnitude improvements in query efficiency. Random sampling has been extensively studied in both the statistics and computer science literature. This tutorial will take a “sample” of this huge literature, by focusing on those techniques that are most relevant to the database community. This includes sampling over streaming and distributed data, importance sampling, merge-reduce sampling, and sampling for approximate query processing.
    Bio: Ke Yi is an Associate Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He obtained his Bachelor’s degree from Tsinghua University (2001) and PhD from Duke University (2006), both in computer science. His research spans theoretical computer science and database systems. He has received a Google Faculty Research Award (2010), the Young Investigator Research Award from HKUST (2012), a SIGMOD Best Demonstration Award (2015), and the SIGMOD Best Paper Award (2016). He currently serves as an Associate Editor of ACM Transactions on Database Systems and IEEE Transactions on Knowledge and Data Engineering.

    Tutorial 1 slides (3MB).pptx

 

 

Invited Tutorial 2

Wednesday 11:00-12:20
Location: ♛Continental A
Session Chair: Semih Salihoglu (University of Waterloo)

  • Communication Cost in Parallel Query Processing — A Tutorial (pods322ds)
    Dan Suciu (University of Washington)
    Abstract: Populating a relational schema from textual content, a problem commonly known as Information Extraction, is pervasive in We consider the following problem: what is the amount of communication required to compute a query in parallel on p servers, over a large input database? To study this problem we define a variant of Valiant’s BSP, where servers are infinitely powerful and where the cost is given by the maximum communication per server, and the number of rounds. Query evaluation in this model has been studied for full conjunctive queries. A simple lower bound on the communication per server and number of rounds is expressed in terms of the fractional edge covering number of the query’s hypergraph, and it follows from Atserias, Grohe, and Marx’ (AGM) upper bound on the query size; however, no matching algorithm is known for this lower bound.
    Algorithms for computing full conjunctive queries are based on hash partitioning the input data, and the key difficulty is that they can only be applied when each partitioning step is applied to skew-free data. We will start by discussing the case when the computation is limited to a single round of communication. In this case, if the data is skew-fee, then a tight bound is given in terms of the fractional vertex covering number of the query’s hypergraph, and this result can be used to also derive a tight bound on arbitrary input data. Next, we consider the multi-round case. Here, the key algorithmic ingredient is a technique that uses additional rounds in order to handle skewed values in the data. Using this technique it has been shown recently that the tight bound for evaluating a query where all relations have arity at most two is given by the general bound derived from the AGM inequality. The case when the input relations have arbitrary arities remains open; it is not even known whether this bound is given in terms of fractional edge cover, or the fractional vertex cover.

    Bio: Dan Suciu is a Professor in Computer Science at the University of Washington. He received his Ph.D. from the University of Pennsylvania in 1995, was a principal member of the technical staff at AT&T Labs and joined the University of Washington in 2000. Suciu is conducting research in data management, with an emphasis on topics related to Big Data and data sharing, such as probabilistic data, data pricing, parallel data processing, data security. He is a co-author of two books Data on the Web: from Relations to Semistructured Data and XML, 1999, and Probabilistic Databases, 2011. He is a Fellow of the ACM, holds twelve US patents, received the best paper award in SIGMOD 2000 and ICDT 2013, the ACM PODS Alberto Mendelzon Test of Time Award in 2010 and in 2012, the 10 Year Most Influential Paper Award in ICDE 2013, the VLDB Ten Year Best Paper Award in 2014, and is a recipient of the NSF Career Award and of an Alfred P. Sloan Fellowship. Suciu serves on the VLDB Board of Trustees, and is an associate editor for JACM, VLDBJ, ACM TWEB, and Information Systems and is a past associate editor for ACM TODS and ACM TOIS. Suciu’s PhD students Gerome Miklau, Christopher Re and Paris Koutris received the ACM SIGMOD Best Dissertation Award in 2006, 2010, and 2016 respectively, and Nilesh Dalvi was a runner up in 2008.

    Tutorial 2 slides (2MB).pptx

 

PODS Welcome Reception

Sunday 18:00-20:00
Location: International South

 

PODS Business Meeting

Monday 19:00-20:00
Location: ♛Continental A