Index of Bibliography Topics

Database System Control and Tuning

[dira05] Karl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, and Graham Wood. Automatic performance diagnosis and tuning in Oracle. In Second Biennial Conference on Innovative Data Systems Research (CIDR'05), pages 84-94, 2005. [ bib | .pdf | .pdf ]
[nath05] Dushyanth Narayanan, Eno Thereska, and Anastassia Ailamaki. Continuous resource monitoring for self-predicting DBMS. In International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'05), pages 239-248, 2005. [ bib | .pdf ]
[cosu04] Glenn Colaco and Darrell Suggs. Database performance with NAS: Optimizing Oracle on NFS. Technical Report TR-3322, Network Applicance Corp., May 2004. [ bib | .pdf | .pdf ]
[brke03] R. Braumandl, A. Kemper, and D. Kossmann. Quality of service in an information economy. ACM Transactions on Internet Technology, 3(4):291-333, 2003. [ bib | .pdf ]
Discusses distributed query processing with QoS guarantees for each query (not query class). Query evaluation plans are adapted on-the-fly.
[dies03] Yixin Diao, Frank Eskesen, Steven Froehlich, Joseph L. Hellerstein, Lisa F. Spainhower, and Maheswaran Surendra. Generic online optimization of multiple configuration parameters with application to a database server. In Proc. 14th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM), number 2867 in Lecture Notes in Computer Science, pages 3-15. Springer-Verlag, 2003. [ bib | .pdf ]
A generic approach to feedback of control of systems based on the Nelder-Mead simplex method. Application is to buffer pool configuration.
[paro03] Sujay Parekh, Kevin Rose, Joseph L. Hellerstein, Sam Lightstone, Matthew Huras, and Victor Chang. Managing the performance impact of administrative utilities. In Self-Managing Distributed Systems - 14th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2003), number 2867 in Lecture Notes in Computer Science. Springer-Verlag, 2003. [ bib | .pdf ]
[wabu02] Wenguang Wang and Rick Bunt. A self-tuning page cleaner for DB2. In International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), October 2002. [ bib | .pdf | .pdf ]
[mapo02b] P. Martin, W. Powley, H. Li, and K. Romanufa. Managing database server performance to meet QoS requirements in electronic commerce systems. International Journal on Digital Libraries, 3(4):316-324, 2002. [ bib | .pdf | .pdf ]
Describes the Quartermaster system, with application to database buffer configuration.
[zama02] Hamzeh Zawawy, Pat Martin, and Hossam Hassanein. Capacity planning for database management systems using analytical modeling. In Proc. CASCON, 2002. [ bib ]
[chwe00] Surajit Chaudhuri and Gerhard Weikum. Rethinking database system architecture: Towards a self-tuning RISC-Style database system. In Proc. International Conference on Very Large Data Bases, pages 1-10, 2000. [ bib | .pdf ]
[chch99] Surajit Chaudhuri, Eric Christensen, Goetz Graefe, Vivek R. Narasayya, and Michael J. Zwilling. Self-tuning technology in Microsoft SQL Server. In Bulletin of the IEEE Technical Committee on Data Engineering [loch99], pages 20-26. [ bib ]
[scva99] K. Bernhard Schiefer and Gary Valentin. DB2 Universal Database performance tuning. In Bulletin of the IEEE Technical Committee on Data Engineering [loch99], pages 12-19. [ bib ]
[loch99] David Lomet and Surajit Chaudhuri, editors. Bulletin of the IEEE Technical Committee on Data Engineering, volume 22(2), June 1999. [ bib | .pdf ]
Special Issue on Self-Tuning Databases and Application Tuning
[vibr98] Radek Vingralek, Yuri Breitbart, and Gerhard Weikum. Snowball: Scalable storage on networks of workstations with balanced load. Distributed and Parallel Databases, 6(2):117-156, April 1998. [ bib | .ps.gz ]
[scwe98] Peter Scheuermann, Gerhard Weikum, and Peter Zabback. Data partitioning and load balancing in parallel disk systems. The VLDB Journal, 7(1):48-66, 1998. [ bib | .pdf ]
[brca96] Kurt P. Brown, Michael J. Carey, and Miron Livny. Goal-oriented buffer management revisited. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 353-364, June 1996. [ bib | .pdf ]
Describes the class fencing technique buffer management technique.
[gaio96] Minos N. Garofalakis and Yannis E. Ioannidis. Multi-dimensional resource scheduling for parallel queries. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'96), pages 365-376, June 1996. [ bib | DOI | .pdf ]
[dagr95] Diane L. Davison and Goetz Graefe. Dynamic resource brokering for multi-user query execution. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'95), pages 281-292, 1995. [ bib | DOI | .pdf ]
[logh94] David Lomet and Shahram Ghandeharizadeh, editors. Bulletin of the IEEE Technical Committee on Data Engineering, volume 17(3), September 1994. [ bib | .pdf | .pdf ]
Special Issue on Data Placement for Parallelism
[scwe94] Peter Scheuermann, Gerhard Weikum, and Peter Zabback. “disk cooling” in parallel disk systems. In IEEE Data Engineering Bulletin [logh94], pages 29-40. [ bib ]
[brme94] Kurt P. Brown, Manish Mehta, Michael J. Carey, and Miron Livny. Towards automated performance tuning for complex workloads. In Proc. International Conference on Very Large Data Bases, pages 72-84, 1994. [ bib | .pdf ]
[scwe93] Peter Scheuermann, Gerhard Weikum, and Peter Zabback. Adaptive load balancing in disk arrays. In Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO'93), pages 345-360, 1993. [ bib | .ps ]
[hewa91] Hans-Ulrich Heiss and Roger Wagner. Adaptive load control in transaction processing systems. In 17th International Conference on Very Large Data Bases, pages 47-54, September 1991. [ bib | .PDF | .pdf ]
Feedback control of concurrency. Assumes that the relationship between level of concurrency and throughput is unimodal. Proposes two control mechanisms. Incremental steps is a proportional control mechanism. Parabolic approximation mechanism fits recent concurrency/throughput measurements to a parabola and uses the peak of the parabola to choose a target concurrency level. Considers the use of admission control and transaction aborts to achieve the target level of concurrency.
[mowe91] Axel Mönkeberg and Gerhard Weikum. Conflict-driven load control for the avoidance of data-contention thrashing. In Proceedings of the Seventh International Conference on Data Engineering, pages 632-639, April 1991. [ bib | .pdf ]
[ngfa91] Raymond Ng, Christos Faloutsos, and Timos Sellis. Flexible buffer allocation based on marginal gains. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'91), pages 387-396, 1991. [ bib | DOI | .pdf ]
[cakr90] Michael J. Carey, Sanjay Krishnamurthi, and Miron Livny. Load control for locking: The 'half-and-half' approach. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 72-84, April 1990. [ bib | .pdf ]
[lafo88] S. Lafortune. Modeling and analysis of transaction execution in database systems. IEEE Transactions on Automatic Control, 33(5):439-447, May 1988. [ bib | .pdf ]
Formulates the concurrency control problem as a control problem for discrete-event dynamical systems.

Web and N-tier System Control and Tuning

[zhch08] Q. Zhang, L. Cherkasova, N. Mi, and E. Smirni. A regression-based analytic model for capacity planning of multi-tier applications. Cluster Computing, 11(3):197-211, September 2008. [ bib | .pdf ]
[stwh07] Malgorzata Steinder, Ian Whalley, David Carrera, and Ilona Gawedaand David M. Chess. Server virtualization in autonomic management of heterogeneous workloads. In Proc. IFIP/IEEE Int'l Symp. on Integrated Network Management (IM'07), pages 139-148, May 2007. [ bib | .pdf ]
[zhbi07] Wei Zheng, Ricardo Bianchini, and Thu Nguyen. Automatic configuration of internet services. In Proc. EuroSys 2007, pages 219-230, March 2007. [ bib | .pdf ]
[tast07] Chunqiang Tang, Malgorzata Steinder, Michael Spreitzer, and Giovanni Pacifici. A scalable application placement controller for enterprise data centers. In Proc. Int'l Conference on World Wide Web (WWW'07), pages 331-340, 2007. [ bib | http | .pdf ]
[toka06] Alexander Totok and Vijay Karamcheti. Improving performance of internet services through reward-driven request prioritization. In Proc. IEEE International Workshop on Quality of Service (IWQoS'06), June 2006. [ bib | .pdf | .pdf ]
[kaki06] A. Karve, T. Kimbrel, G. Pacifici, M. Spreitzer, M Steinder, M Sviridenko, and A. Tantawi. Dynamic placement for clustered web applications. In Proc. International Conference on World Wide Web (WWW'06), pages 595-604, May 2006. [ bib | .pdf | .pdf ]
[kaka05b] Christos Karamanolis, Magnus Karlsson, and Xiaoyun Zhu. Designing controllable computer systems. In USENIX Workshop on Hot Topics in Operating Systems, pages 49-54, June 2005. [ bib | .pdf | .pdf ]
[zhwa05] Xiaoyun Zhu, Zhikui Wang, and S. Singhal. Utility-driven workload management using nested control design. Technical Report HPL-2005-193(R.1), HP Laboratories Palo Alto, March 2005. also appeared in Proc. American Control Conference, June 2006. [ bib | .pdf | .pdf ]
[pase05] G. Pacifici, W. Segmuller, M. Spreitzer, M. Steinder, A. Tantawi, and A. Youssef. Managing the response time for multi-tiered web applications. Technical Report RC 23651, IBM, 2005. [ bib | .pdf ]
[ursh05] Bhuvan Urgaonkar, Prashant J. Shenoy, Abhishek Chandra, and Pawan Goyal. Dynamic provisioning of multi-tier internet applications. In International Conference on Autonomic Computing (ICAC'05), pages 217-228, 2005. [ bib | .pdf ]
[kami04] Abhinav Kamra, Vishal Misra, and Erich Nahum. Yaksha: A self-tuning controller for managing the performance of 3-tiered web sites. In International Workshop on Quality of Service (IWQoS), June 2004. [ bib | .pdf | .pdf ]
[kewa04] Jeffrey O. Kephart and William E. Walsh. An artificial intelligence perspective on autonomic computing policies. In IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'04), pages 3-12, June 2004. [ bib | .pdf ]
[elna04] Sameh Elnikety, Erich Nahum, John Tracey, and Willy Zwaenepoel. A method for transparent admission control and request scheduling in dynamic e-commerce web sites. In International World Wide Web Conference, pages 276-286, May 2004. [ bib | .pdf | .pdf ]
Gatekeeper system tracks sliding average service time for each type of servlet in the application server. System also obtains an off-line estimate of system capacity, which is expressed as an offered service time load. To implement admission control, Gatekeeper tracks to the current load, which is the sum of the estimated service times of currently-running servlets. It blocks execution of new servlets if the execution would bring the current load above the system capacity. To obtain an off-line estimate of system capacity, system throughput is measured at various levels of offered load. Throughput as a function of offered load is assumed to have a single peak - the maximum offered load at which peak throughput occurs is taken to be the system capacity.
[coch04] Ira Cohen, Jeffrey S. Chase, Moisés Goldszmidt, Terence Kelly, and Julie Symons. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Symposium on Operating System Design and Implementation (OSDI'04), pages 231-244, 2004. [ bib | .pdf | .pdf ]
[xili04] Bowei Xi, Zhen Liu, Mukund Raghavachari, Cathy H. Xia, and Li Zhang. A smart hill-climbing algorithm for application server configuration. In World Wide Web Conference (WWW'04), 2004. [ bib | .pdf ]
[chsh03] Abhishek Chandra and Prashant Shenoy. Effectiveness of dynamic resource allocation for handling internet flash crowds. Technical Report TR03-37, Department of Computer Science, University of Massachusetts at Amherst, November 2003. [ bib | .pdf | .pdf ]
[lish03] Xue Liu, Lui Sha, Yixin Diao, Steve Froehlich, Joseph L. Hellerstein, and Sujay Parekh. Online response time optimization of apache web server. In International Workshop on Quality of Service (IWQoS 2003), 2003. [ bib | .ps ]
[absh02] Tarek F. Abdelzaher, Kang G. Shin, and Nina Bhatti. Performance guarantees for web server end-systems: A control-theoretical approach. IEEE Transactions on Parallel and Distributed Systems, 13(1):80-96, 2002. [ bib | .pdf | .pdf ]
[meba01] Daniel A. Menascé, Daniel Barbará, and Ronald Dodge. Preserving qos of e-commerce sites through self-tuning: a performance model approach. In Proceedings of the 3rd ACM conference on Electronic Commerce, pages 224-234, 2001. [ bib | .pdf ]
Describes a system that implements N-tier admissions controls and n-tier multiprogramming level control. Uses hill climbing to search configuration space and a queueing model to predict performance of a target configuration. Single-tier queueing models are composed to build an N-tier model. Monitoring system measures QoS constraint compliance and resource utilizations to feed to the queueing model.

Physical Database Design

[brch07] Nicolas Bruno and Surajit Chaudhuri. An online approach to physical design tuning. In Proc. International Conference on Data Engineering (ICDE'07), pages 826-835, April 2007. [ bib | .pdf ]
[agch06] Sanjay Agrawal, Eric Chu, and Vivek Narasayya. Automatic physical design tuning: workload as a sequence. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'06), pages 683-694, 2006. [ bib | .pdf ]
Considers a version of the database physical design problem in which the input is a sequence of queries and updates. The goal is to recommend a target physical design for each query or update in the sequence, taking into account both the effect of the physical design on the cost of executing the query or update and the cost of changing the physical design.
[brch06] Nicolas Bruno and Surajit Chaudhuri. To tune or not to tune? a lightweight physical design alerter. In Proc. International Conference on Very Large Data Bases (VLDB'06), pages 499-510, 2006. [ bib | .pdf | .pdf ]
[brch06b] Nicolas Bruno and Surajit Chaudhuri. Physical design refinement: The merge-reduce approach. In Proc. International Conference on Extending Database Technology (EDBT'06), number 3896 in Lecture Notes in Computer Science, pages 386-404. Springer-Verlag, 2006. [ bib | .pdf ]
[brch05] Nicolas Bruno and Surajit Chaudhuri. Automatic physical database tuning: A relaxation-based approach. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD'05), 2005. [ bib ]
Assumes that the optimizer requests indexes that it thinks might be useful for a particular query. These requested indexes form the initial configuration, which is then modified to meet a space constraint.
[coba05] Mariano P. Consens, Denilson Barbosa, Adrian M. Teisanu, and Laurent Mignet. Goals and benchmarks for autonomic configuration recommenders. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'05), 2005. [ bib | .pdf ]
Uses random queries that are generated from templates. Templates constrain generated queries to ensure that they are reasonable and that they can benefit from indexing.
[abha04] Ashraf Aboulnaga, Peter J. Haas, Mokhtar Kandil, Sam Lightstone, Guy M. Lohman, Volker Markl, Ivan Popivanov, and Vijayshankar Raman. Automated statistics collection in DB2UDB. In International Conference on Very Large Data Bases (VLDB '04), pages 1146-1157, August 2004. [ bib | .pdf | .pdf ]
[agch04] Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollór, Arunprasad P. Marathe, Vivek R. Narasayya, and Manoj Syamala. Database tuning advisor for Microsoft SQL server. In International Conference on Very Large Data Bases (VLDB '04), pages 1110-1121, 2004. [ bib | .pdf ]
[zizu04] Daniel C. Zilio, Calisto Zuzarte, Sam Lightstone, Wenbin Ma, Guy M. Lohman, Roberta Cochrane, Hamid Pirahesh, Latha S. Colby, Jarek Gryz, Eric Alton, Dongming Liang, and Gary Valentin. Recommending materialized views and indexes with IBM DB2 design advisor. In IEEE International Conference on Autonomic Computing (ICAC'04), pages 180-188, 2004. [ bib | .pdf ]
General approach is to generate candidate MVs and indexes based on the workload, and then filter to meet a space constraint. Multiquery optimization is used when generating candidate MVs.
[agch03] Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, and Vivek Narasayya. Automating layout of relational databases. In International Conference on Data Engineering (ICDE'03), pages 607-618, 2003. [ bib | .pdf ]
[razh02] Jun Rao, Chun Zhang, Guy M. Lohman, and Nimrod Megiddo. Automating physical database design in a parallel database. In Proc. ACM SIGMOD International Conference on Management of Data, pages 558-569, 2002. [ bib | .pdf ]
Automatic hash-based relation partitioning in shared nothing systems.
[agch01] Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. Materialized view and index selection tool for Microsoft SQL Server 2000. In Proc. ACM SIGMOD International Conference on Management of Data, page 608, 2001. [ bib ]
This is a one-page description of a SIGMOD demo.
[chna01] Surajit Chaudhuri and Vivek Narasayya. Automating statistics management for query optimizers. IEEE Transactions on Knowledge and Data Engineering, 13(1):7-20, 2001. [ bib | .pdf ]
Hardcopy on file. This is the journal version of [chna00]. A variety of heuristic techniques for choosing minimal sets of heuristics in such a way that the quality of plans produced by the optimizer is not reduced.
[agch00] Sanjay Agrawal, Surajit Chaudhuri, and Vivek R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In Proc. International Conference on Very Large Data Bases, pages 496-505, 2000. [ bib | .pdf ]
[chna00] Surajit Chaudhuri and Vivek Narasayya. Automating statistics management for query optimizers. In 16th International Conference on Data Engineering, pages 339-348, 2000. [ bib ]
The journal version of this paper is [chna01].
[leki00] Mong Li Lee, Masaru Kitsuregawa, Beng Chin Ooi, Kian-Lee Tan, and Anirban Mondal. Towards self-tuning data placement in parallel database systems. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pages 225-236, 2000. [ bib | .pdf ]
Adaptive declustering in shared-nothing systems using a two-level, tree-structured index.
[vazu00] Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. DB2 Advisor: An optimizer smart enough to recommend its own indexes. In 16th International Conference on Data Engineering, pages 101-110, 2000. [ bib | .pdf ]
General recommendation method is to add virtual indexes to the schema, optimize the query, and check whether any virtual statistics are used in the optimal plan. Statistics for virtual indexes are inferred from existing column statistics. To recommend indexes for a workload, recommend for each query in the workload in sequence and then greedily select a subset of the recommended indexes.
[chna98] Surajit Chaudhuri and Vivek R. Narasayya. Autoadmin 'what-if' index analysis utility. In Proc. ACM SIGMOD International Conference on Management of Data, pages 367-378, 1998. [ bib | .pdf ]
How to implement hypothetic database configurations, so that workload costs can be estimated under those configurations. Configuration includes hypothetic indexes and statistics that allow the optimizer to decide whether such an index should be used. Proposes that sampling be used to collect the statistics. Allows specification of scale factors so configurations with larger/smaller databases can be simulated. Presents an analysis interface that supports workload analysis and configuration analysis for current and hypothetical configurations.
[chna97] Surajit Chaudhuri and Vivek R. Narasayya. An efficient cost-driven index selection tool for Microsoft SQL Server. In Proc. International Conference on Very Large Data Bases, pages 146-155, 1997. [ bib | .pdf ]
Assumes that an upper bound is given on the number of indexes. Workload is specified as a set of SQL DML statements, including insert, delete and update. Search space includes both single and multi-attribute indexes. Index configurations are evaluated by the DBMS optimizer, and several techniques are used to reduce the number of configurations for which optimizer evaluation is required. To generate a set of candidate indexes, this method determines an optimal index configuration independently for each query in the workload. The initial candidate set is then taken as the union of the indexes in the single-query optimal configurations. A hybrid exhaustive/greedy approach is used to control search. To find a k-index configuration, first find the optimal m-index configuration (m <= k) using exhaustive search, then add k-m indexes greedily. Multi-column indexes are handled by first finding an good configuration with single-column indexes, then generating and adding a set of candidate two-column indexes, and then rerunning the optimizer on the new candidate set. This is repeated to handle indexes with more than two columns.
[logh94] David Lomet and Shahram Ghandeharizadeh, editors. Bulletin of the IEEE Technical Committee on Data Engineering, volume 17(3), September 1994. [ bib | .pdf | .pdf ]
Special Issue on Data Placement for Parallelism
[fisc88] Sheldon J. Finkelstein, Mario Schkolnick, and Paolo Tiberio. Physical database design for relational databases. ACM Transactions on Database Systems, 13(1):91-128, 1988. [ bib ]
[come78] Douglas Comer. The difficulty of optimum index selection. ACM Transactions on Database Systems, 3(4):440-445, 1978. [ bib ]

Workload Characterization

[kalm12] D. Kalmuk. DB2 10.1: Leveraging new capabilities within the WLM best practices. IDUG DB2 Tech Conference presentation, November 2012. [ bib | .pdf ]
[kalm10] David Kalmuk. DB2 workload management overview. IBM slide deck, 2012. [ bib | .pdf ]
[guku09] Ajay Gulati, Chethan Kumar, and Irfan Ahmad. Storage workload characterization and consolidation in virtualized environments. In Proc. Int'l Workshop on Virtualization Performance: Analysis, Characterization, and Tools (VPACT'09), 2009. [ bib | .pdf ]
[choz08] L. Cherkasova, K. Ozonat, N. Mi, J. Symons, and E. Smirni. Anomaly? application change? or workload change? In Proc. of the International Conference on Dependable Systems and Networks (DSN'08), 2008. [ bib | .pdf ]
[bopl04] R. Bonilla-Lucas, Peter Plachta, Aamer Sachedina, Daniel Jiménez-González, Calisto Zuzarte, and Josep-Lluis Larriba-Pey. Characterization of the data access behavior for TPC-C traces. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, pages 115-122, March 2004. [ bib | .pdf ]
[wama04] Ted J. Wasserman, Patrick Martin, David B. Skillicorn, and Haider Rizvi. Developing a characterization of business intelligence workloads for sizing new database systems. In Proceedings of the 7th ACM International Workshop on Data Warehousing and OLAP, pages 7-13. ACM Press, 2004. [ bib | .pdf ]
Query features are response time, CPU utilization during query execution, sequential and random I/O throughput during query execution, and the join degree. Queries are then clustered in feature space, using singular value decomposition and semi-discrete decomposition. TPC-H was used for evaluation.
[chgu02] Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek Narasayya. Compressing SQL workloads. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'02), pages 488-499, 2002. [ bib | .pdf ]
A workload is a weighted set of SQL statements. Problem is to find a subset of these statements that can be used to replace the original workload as an input to certain applications, such that the substitution has little impact on the application's output. Applications considered are automatic index selection and approximate query answering.
[yuch92] Philip S. Yu, Ming-Syan Chen, Hans-Ulrich Heiss, and Sukho Lee. On workload characterization of relational database environments. IEEE Transactions on Software Engineering, 18(4):347-355, 1992. [ bib | .pdf ]
Presents some statistics characterizing the database workload generated by some kind of corporate accounting application. Both structural statistics (e.g., number of statements of different types, number of predicates and attributes used in each statement, transaction sizes) and statement execution statistics (e.g., number of tuples retrieved) are collected. There is no attempt to demonstrate that these statistics are useful for any specific purpose.

Buffer Management and Memory Management

[soch08] Gokul Soundararajan, Jin Chen, Mohamed Sharaf, and Cristiana Amza. Dynamic partitioning of the cache hierarchy in shared data centers. In Proc. Int'l Conference on Very Large Data Bases (VLDB'08), August 2008. [ bib | .pdf ]
[yafa08] Gala Yadgar, Michael Factor, Kai Li, and Assaf Schuster. Mc2: Multiple clients on a multilevel cache. In Proc. Int'l Conference on Distributed Computing Systems (ICDCS'08), June 2008. [ bib | .pdf | .pdf ]
Extends Karma ([yafa07]) two a two-tier scenario in which multiple clients which share data also share a second-tier cache. Space is partitioned among clients, with one additional partition used for shared data. Within each partition, space is managed using Karma.
[gama08] Charles Garrod, Amit Manjhi, Anastasia Ailamaki, Bruce Maggs, Todd Mowry, Christopher Olston, and Anthony Tomasic. Scalable query result caching for web applications. Proc. of the VLDB Endowment, 1(1):550-561, 2008. [ bib | DOI | .pdf ]
[gill08] Binny Gill. On multi-level exclusive caching: Offline optimality and why promotions are better than demotions. In Proc. USENIX Conference on File and Storage Technologies (FAST'08), pages 49-65, 2008. [ bib | .pdf | .pdf ]
Includes lower and upper bounds on optimal off-line performance for multi-level caches. Proposes a scheme called PROMOTE for managing multi-tier caches. As a requested page is passed up through the cache tiers, each cache decides whether it will be responsible for caching the page. Once a cache has decided to cache the page, it notifies the higher level caches of this by attaching a flag to the page as it is passed up through tiers. The higher level caches then do not cache the page. This enforces exclusiveness among the caches. Pages that are repeatedly requested should tend to migrate to higher level caches. Behaviour on writes is not specified, e.g, can a write affect which cache is responsible for a particular page. This policy requires modification of the caching policies at every tier, as each tier must abide by caching decisions made at lower tiers, and must inform upper tiers of its decisions.
[yafa07] Gala Yadgar, Michael Factor, and Assaf Schuster. Karma: Know-it-all replacement for a multilevel cache. In Proc. USENIX Conference on File and Storage Technologies (FAST'07), February 2007. [ bib | .pdf | .pdf ]
Assumes caches support read, read-save, and demote. Requires that blocks by grouped into ranges by the application. Application must also specify the frequency of access and access pattern for each range. Each range is assigned some space in some cache in the hierarchy, and each range's space is then managed using a separate replacement policy. Experiments used PostgreSQL explain to generate range and access pattern hints - however, explain data covers the situation before the DBMS cache.
[fasc06] Michael Factor, Assaf Schuster, and Gala Yadgar. Multilevel cache management based on application hints. Technical Report CS-2006-02, Technion Computer Science Department, 2006. [ bib | .pdf | .pdf ]
[chzh05] Zhifeng Chen, Yan Zhang, Yuanyuan Zhou, Heidi Scott, and Berni Schiefer. Empirical evaluation of multi-level buffer cache collaboration for storage systems. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS'05), pages 145-156, 2005. [ bib | .pdf ]
Compares “hierarchically aware” approachs to “aggressively-collaborative” approaches. The former are transparent to the storage client (e.g., the DBMS), the latter are not. Aggressively-collaborative approaches include two types of hint-passing: access patterns and application semantics. Example of semantic hint is a hint that a block will be read only one time. Even more aggressive is content-aware caching, where the caches try explicitly to avoid duplication. Also considers some additional optimizations: Quick eviction of duplicated blocks (DU) removes pages from the buffer when they are read, Semantics-Directed Caching (SE) uses “importance” values from the storage client (in an ill-specified way) to affect buffering at the storage server. General conclusion is that the agressively-collaborative approaches do not help much compared to the hierarchically-aware approaches.
[fitz04] Brad Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124):5, August 2004. [ bib | http | .html ]
[zhch04] Yuanyuan Zhou, Zhifeng Chen, and Kai Li. Second-level buffer cache management. IEEE Transactions on Parallel and Distributed Systems, 15(7), July 2004. [ bib | .ps ]
Presents a trace-based characterization of access patterns for second-tier (L2) buffer caches, noting that L2 cache reference streams don't exhibit any small reuse distances. Presents the MQ algorithm for managing L2 cache. MQ uses multiple LRU queues. Pages are promoted to higher queues according to frequency of reference. Replacements happen in low queues first. An aging mechanism is used to demote pages that cool down. Also describes so-called global replacement algorithms, in which the L2 cache is informed when replacements are made at L1. Evaluation is by trace-driven simulation and also by experiment with an storage system cache implementation.
[bamo04] S. Bansal and D. Modha. CAR: Clock with adaptive replacement. In Proc. of the 3nd USENIX Symposium on File and Storage Technologies (FAST'04), March 2004. [ bib | .pdf | .pdf ]
[bopl04] R. Bonilla-Lucas, Peter Plachta, Aamer Sachedina, Daniel Jiménez-González, Calisto Zuzarte, and Josep-Lluis Larriba-Pey. Characterization of the data access behavior for TPC-C traces. In Proc. IEEE International Symposium on Performance Analysis of Systems and Software, pages 115-122, March 2004. [ bib | .pdf ]
[jizh04] Song Jiang and Xiaodong Zhang. ULC: A file block placement and replacement protocol to effectively exploit hierarchical locality in multi-level buffer caches. In Proc. 24th International Conference on Distributed Computing Systems (ICDCS'04), pages 168-177, 2004. [ bib | .pdf ]
[chzh03] Zhifeng Chen, Yuanyuan Zhou, and Kai Li. Eviction-based cache placement for storage caches. In Proceedings of the USENIX 2003 Annual Technical Conference, pages 269-282, June 2003. [ bib | .pdf | .pdf ]
Eviction-based placement means that a page is loaded into the storage cache when it is evicted from the storage client's cache, as opposed to when it is requested by the client. Proposes tracking evictions transparently by monitoring the target addresses of read requests. Evicted pages are prefetched from disk into the storage system cache at the time of predicted eviction from the storage client cache.
[albo03] Mehmet Altinel, Christof Bornhovd, Sailesh Krishnamurthy, C. Mohan, Hamid Pirahesh, and Berthold Reinwald. Cache tables: Paving the way for an adaptive database cache. In Proc. International Conference on Very Large Data Bases, pages 718-729, 2003. [ bib | .pdf | .pdf ]
[medh03] Nimrod Megiddo and Dharmendra S. Modha. ARC: A self-tuning, low overhead replacement cache. In Proc. USENIX Conference on File and Storage Technology (FAST'03), 2003. [ bib | .pdf | .pdf ]
ARC maintains two LRU queues, one for pages that have been referenced once and one for pages that have been referenced more than once. For each queue, there is also a ghost queue that tracks additional pages. The sizes of the two queues are adjusted dynamically. A hit in the single-reference ghost queue causes the single-reference queue to grow. A hit in the multi-reference ghost queue causes the multi-referenced queue to grow.
[wowi02] Theodore M. Wong and John Wilkes. My cache or yours? making storage more exclusive. In USENIX Annual Technical Conference (USENIX 2002), pages 161-175, June 2002. [ bib | .pdf | .pdf ]
Notes that storage system caches are often LRU-based, and points out the multi-tier cache inclusion problem: that a second-tier LRU cache behind a first-tier LRU cache will contain many of the same pages as the first-tier cache. Defines a “demote” operation to deal with this problem. Demote sends to the second tier a block that has been evicted from the first tier. Argues that the cost of sending these blocks to the second tier is low because storage networks have lots of bandwidth. Also defines a “demote” buffer management policy for tier two. This puts blocks read by tier one at the LRU end of the buffer (like our +read-read policy) and puts blocks demoted by the first tier at the MRU end.
[aram02] Ismail Ari, Ahmed Amer, Robert Gramacy, Ethan L. Miller, Scott Brandt, and Darrell D. E. Long. ACME: Adaptive caching using multiple experts. In Workshop on Distributed Data and Structures 4 (WDAS), pages 143-158. Carleton Scientific, March 2002. [ bib | .pdf | .pdf ]
[mapo02] Patrick Martin, Wendy Powley, and Xiaoyi Xu. Configuring buffer pools in DB2 UDB. In Proc. CASCON, 2002. [ bib ]
[mali00] Patrick Martin, Hoi-Ying Li, Min Zheng, Keri Romanufa, and Wendy Powley. Dynamic reconfiguration algorithm: Dynamically tuning multiple buffer pools. In 11th International Conference on Database and Expert Systems Applications (DEXA), pages 92-101, 2000. [ bib | .pdf ]
[saha00] Prasenjit Sarkar and John H. Hartman. Hint-based cooperative caching. ACM Transactions on Computer Systems, 18(4):387-419, 2000. [ bib | .pdf ]
Describes a cooperative two-level caching system in which first-level caches may fetch blocks from other first-level caches. A hint is potentially inaccurate information about the locations of blocks in the first-level caches.
[phgo95] Vidyadhar Phalke and Bhaskarpillai Gopinath. An inter-reference gap model for temporal locality in program behavior. In Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pages 291-300, 1995. [ bib | .pdf ]
[josh94] Theodore Johnson and Dennis Shasha. 2Q: A low overhead high performance buffer management replacement algorithm. In Proc. International Conference on Very Large Data Bases (VLDB'94), pages 439-450, 1994. [ bib | .PDF ]
[onon93] Elizabeth J. O'Neil, Patrick E. O'Neil, and Gerhard Weikum. The LRU-K page replacement algorithm for database disk buffering. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'93), pages 297-306, 1993. [ bib | .pdf ]
[muho92] D. Muntz and P. Honeyman. Multi-level caching in distributed file systems - or - your cache ain't nuthin' but trash. In Proceedings of the USENIX Winter Conference, pages 305-313, January 1992. [ bib | .pdf ]
Simulation study of bi-level LRU caches for a distributed file system. Found that increasing client cache size quickly reduces the hit rate of the second tier cache.
[pazd91] Mark Palmer and Stanley Zdonik. Fido: A cache that learns to fetch. In Proc. Int'l Conference on Very Large Data Bases, 1991. [ bib | .PDF ]
[duha82] A. H. Duke, M. H. Hartung, J. D. Huntley, and F. J. Marschner. Buffered writing in a peripheral storage hierarchy. IBM Technical Disclosure Bulletin, 25(4):2075-2076, September 1982. [ bib ]
Describes a scheme for synchronizing writes in batches, rather than individually.
[bela66] L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78-101, 1966. [ bib | .pdf ]

Query Optimization

[gaku09] Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet Wiener, Armando Fox, Michael Jordan, and David Patterson. Predicting multiple performance metrics for queries: Better decisions enabled by machine learning. In Proc. Int'l Conference on Data Engineering (ICDE'09), 2009. [ bib | .pdf | .pdf ]
[bach05] Brian Babcock and Surajit Chaudhuri. Towards a robust query optimizer: a principled and practical approach. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD'05), pages 119-130, 2005. [ bib | .pdf ]
User specified confidence threshold specifies how likely it should be that the actual cost of the plan is less than or equal to the reported (point) cost estimate. Cardinality estimates for base tables and intermediate query results are based on join synopses, under the restriction that only foreign-key joins are permitted. The cardinality distribution is determined using the join synopsis and Bayes' rule.
[babi05] Shivnath Babu, Pedro Bizarro, and David J. DeWitt. Proactive re-optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'05), 2005. [ bib | .pdf ]
Represents uncertainty in operator estimates using intervals, and tries to identify plans that are robust (close to optimal) throughout the interval, or a set of switchable plans that cover the interval. Implements a run-time switch operation, and modifies other operators so that they initially produce a randomized sample of their output.
[mara04] Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, and Miso Cilimdzic. Robust query processing through progressive optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04), pages 659-670, 2004. [ bib | .pdf ]
Hardcopy on file. Plan reoptimization is triggered by cardinality checking operators inserted into the plan. Checks determine whether the actual cardinality differs from the predicted cardinality by enough that the optimizer would have chosen a different plan. The plan transition cardinalities are determined at optimization time. Reoptimization considers the use of materialized intermediate results when reoptimizing the query.
[ilra03] Ihab F. Ilyas, Jun Rao, Guy M. Lohman, Dengfeng Gao, and Eileen Lin. Estimating compilation time of a query optimizer. In ACM SIGMOD Conference, pages 373-384, 2003. [ bib | .pdf | .pdf ]
Claims average 30% estimation time error with 3% of compilation time overhead to produce the estimate.
[vopa02] Kristofer Vorwerk and G. N. Paulley. On implicate discovery and query optimization. In Proc. International Database Engineering and Applications Symposium (IDEAS'02), pages 2-11, July 2002. [ bib | .pdf ]
[brch02] Nicolas Bruno and Surajit Chaudhuri. Exploiting statistics on query expressions for optimization. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'02), 2002. [ bib | .pdf ]
[ghpa02] Antara Ghosh, Jignashu Parikh, Vibhuti S. Sengar, and Jayant R. Haritsa. Plan selection based on query clustering. In International Conference on Very Large Data Bases (VLDB'22), pages 179-190, 2002. [ bib | .pdf | .pdf ]
Classifies SPJ queries using features such as number of tables in the query, the number of SARGable predicates in the query, and the sizes of the tables. For each class of queries, the system maintains a plan template, which can be instantiated for any query in the class. To process a query, system first tries to match it to an existing class. If it can, the class plan template is instantiated and the resulting plan is used for the query. Otherwise, the query is compiled and used to start a new class.
[stlo01] Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. LEO - DB2's LEarning Optimizer. In Proceedings of the International Conference on Very Large Data Bases (VLDB), pages 19-28, 2001. [ bib | .pdf | .pdf ]
[koss00] Donald Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422-469, 2000. [ bib | http | .pdf ]
[kade98] Navin Kabra and David J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'98), pages 106-117, 1998. [ bib | .pdf | .pdf ]
[cogr94] Richard L. Cole and Goetz Graefe. Optimization of dynamic query evaluation plans. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'94), pages 150-160, 1994. [ bib | .pdf ]
How to produce dynamic query evaluation plans that include ChoosePlan operators. Generalizes the optimizer so that it understands that cost estimates may only partially order the candidate evaluation plans.
[grae93] Goetz Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73-169, 1993. [ bib | http | .pdf ]
[iong92] Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, and Timos K. Sellis. Parametric query optimization. In 18th International Conference on Very Large Data Bases (VLDB'92), pages 103-114, August 1992. [ bib | .PDF | .pdf ]

Replication and Distribution

[keji10] Bettina Kemme, Ricardo Jiménez Peris, and Marta Pati no Martínez. Database Replication. Number 7 in Synthesis Lectures on Data Management. Morgan & Claypool, 2010. [ bib | .pdf ]
[ceca08] Emmanuel Cecchet, George Candea, and Anastasia Ailamaki. Middleware-based database replication: The gaps between theory and practice. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'08), pages 739-752, 2008. [ bib | http | .pdf ]
[eldr07] Sameh Elnikety, Steven Dropsho, and Willy Zwaenepoel. Tashkent+: Memory-aware load balancing and update filtering in replicated databases. In Proc. EuroSys 2007, pages 399-412, March 2007. [ bib | .pdf ]
[eldr06] Sameh Elnikety, Steven Dropsho, and Fernando Pedone. Tashkent: Uniting durability with transaction ordering for high-performance scalable database replication. In Proc. EuroSys2006, April 2006. [ bib | .pdf | .pdf ]
[load06] Jacob R. Lorch, Atul Adya, William J. Bolosky, Ronnie Chaiken, John R. Douceur, and Jon Howell. The SMART way to migrate replicated stateful services. In Proc. EuroSys2006, April 2006. [ bib | .pdf | .pdf ]
[soam06] Gokul Soundararajan, Cristiana Amza, and Ashvin Goel. Database replication policies for dynamic content applications. In Proc. EuroSys2006, April 2006. [ bib | .pdf | .pdf ]
[befe06] Philip A. Bernstein, Alan Fekete, Hongfei Guo, Raghu Ramakrishnan, and Pradeep Tamma. Relaxed-currency serializability for middle-tier caching and replication. In Proc. ACM SIGMOD international conference on Management of data (SIGMOD'06), pages 599-610, 2006. [ bib | .pdf ]
[maai06] Amit Manjhi, Anastassia Ailamaki, Bruce M. Maggs, Todd C. Mowry, Christopher Olston, and Anthony Tomasic. Simultaneous scalability and security for data-intensive web applications. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'06), pages 241-252, 2006. [ bib | .pdf ]
[elpe05] Sameh Elnikety, Willy Zwaenepoel, and Fernando Pedone. Database replication using generalized snapshot isolation. In IEEE Symposium on Reliable Distributed Systems (SRDS'05), pages 73-84, October 2005. [ bib | .pdf ]
[amco05] Cristiana Amza, Alan L. Cox, and Willy Zwaenepoel. A comparative evaluation of transparent scaling techniques for dynamic content servers. In Proc. International Conference on Data Engineering (ICDE'05), pages 230-241, 2005. [ bib | .pdf ]
Studies the impact of several scaling issues: scheduling and concurrency control, load balancing, and query result caching.
[gula05] Hongfei Guo, Per-Åke Larson, and Raghu Ramakrishnan. Caching with 'good enough' currency, consistency, and completeness. In Proc. International Conference on Very Large Data Bases (VLDB'05), pages 457-468, 2005. [ bib | .pdf | .pdf ]
[like05] Yi Lin, Bettina Kemme, Marta Pati no Martínez, and Ricardo Jiménez-Peris. Middleware based data replication providing snapshot isolation. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'05), pages 419-430, 2005. [ bib | .pdf ]
[paji05] Marta Pati no Martinez, Ricardo Jiménez-Peris, Bettina Kemme, and Gustavo Alonso. MIDDLE-R: Consistent database replication at the middleware level. ACM Transactions on Computer Systems, 23(4):375-423, 2005. [ bib | DOI | .pdf ]
[olma05] Christopher Olston, Amit Manjhi, Charles Garrod, Anastassia Ailamaki, Bruce M. Maggs, and Todd C. Mowry. A scalability service for dynamic web applications. In Proc. Second Biennial Conference on Innovative Data Systems Research (CIDR'05), pages 56-69, January 2005. [ bib | .pdf | .pdf ]
[sash05] Yasushi Saito and Marc Shapiro. Optimistic replication. ACM Computing Surveys, 37(1):42-81, 2005. [ bib | .pdf ]
[cema04] Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. C-JDBC: Flexible database clustering middleware. In USENIX 2004 Annual Technical Conference, FREENIX Track, pages 9-18, 2004. [ bib | .pdf | .pdf ]
[gula04] Hongfei Guo, Per-Åke Larson, Raghu Ramakrishnan, and Jonathan Goldstein. Relaxed currency and consistency: How to say Good Enough in SQL. In Proc. ACM SIGMOD international conference on Management of data (SIGMOD'04), pages 815-826, 2004. [ bib | .pdf ]
[lago04] Per-Åke Larson, Jonathan Goldstein, and Jingren Zhou. MTCache: Transparent mid-tier database caching in SQL Server. In Proc. International Conference on Data Engineering (ICDE'04), pages 177-189, 2004. [ bib | .pdf ]
[plal04] Christian Plattner and Gustavo Alonso. Ganymed: Scalable replication for transactional web applications. In ACM/IFIP/USENIX International Middleware Conference (Middleware 2004), number 3231 in Lecture Notes in Computer Science, pages 155-174, 2004. [ bib | .pdf ]
[amco03] Cristiana Amza, Alan L. Cox, and Willy Zwaenepoel. Distributed versioning: Consistent replication for scaling back-end databases of dynamic content web sites. In ACM/IFIP/USENIX International Middleware Conference (Middleware 2003), number 2672 in Lecture Notes in Computer Science, pages 282-304, 2003. [ bib | .pdf ]
Assumes predeclaration of access sets in the application. Goal is one-copy serializability in a cluster environment, where individual queries in a transaction may be routed to different servers in the cluster. Transactions are serialized by scheduler based on the predeclared access sets. Read-one, write-all is used to handle replicas. Version numbers are used to track which updates have been applied to each table, and it is assumed that the back-end DBMS will apply version-tagged updates in the desired order.
[amco03b] Cristiana Amza, Alan L. Cox, and Willy Zwaenepoel. Conflict-aware scheduling for dynamic content applications. In USENIX Symposium on Internet Technologies and Systems, 2003. [ bib | .pdf | .pdf ]
[pepa02] Ricardo Jiménez-Peris, Marta Pati no Martínez, Bettina Kemme, and Gustavo Alonso. Improving the scalability of fault-tolerant database clusters. In Proc. International Conference on Distributed Computing Systems (ICDCS'02), pages 477-484, 2002. [ bib | .pdf | .pdf ]
Transactions are classified, and the database is partitioned into conflict classes such that each transaction class uses a single conflict class. The conflict classes need not be disjoint. Each conflict class has a primary site. Replicas are updated lazily. Global serialization order is determined by an atomic broadcast mechanism which is used to deliver the transaction requests. This protocol needs to predict conflicts between transactions.
[keal00] Bettina Kemme and Gustavo Alonso. Don't be lazy, be consistent: Postgres-R, a new way to implement database replication. In Proceedings of the International Conference on Very Large Data Bases (VLDB'00), pages 134-143, 2000. [ bib | .pdf | .pdf ]
Eager replication protocol that uses atomic broadcast to help serialize transactions.
[brko99] Yuri Breitbart, Raghavan Komondoor, Rajeev Rastogi, S. Seshadri, and Abraham Silberschatz. Update propagation protocols for replicated databases. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'99), pages 97-108, 1999. [ bib | .pdf ]
[anbr98] Todd A. Anderson, Yuri Breitbart, Henry F. Korth, and Avishai Wool. Replication, consistency, and practicality: Are these mutually exclusive? In Proc. ACM SIGMOD international conference on Management of data (SIGMOD'98), pages 484-495, 1998. [ bib | .pdf ]
[brko97] Yuri Breitbart and Henry F. Korth. Replication and consistency: being lazy helps sometimes. In Proc. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'97), pages 173-184, 1997. [ bib | .pdf ]
[grhe96] Jim Gray, Pat Helland, Patrick E. O'Neil, and Dennis Shasha. The dangers of replication and a solution. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'96), pages 173-182, 1996. [ bib | .pdf ]
[daga85] Susan B. Davidson, Hector Garcia-Molina, and Dale Skeen. Consistency in a partitioned network: a survey. ACM Computing Surveys, 17(3):341-370, 1985. [ bib | .pdf ]
A classic.

General Resource Management and Control

[shba06] Piyush Shivam, Shivnath Babu, and Jeff Chase. Learning application models for utility resource planning. In Proc. IEEE Int'l Conference on Autonomic Computing (ICAC'06), June 2006. [ bib | .pdf | .pdf ]
[wozh06] Murray Woodside, Tao Zheng, and Marin Litoiu. Service system resource management based on a tracked layered performance model. In Proc. IEEE International Conference on Autonomic Computing (ICAC'06), June 2006. [ bib | .pdf ]
[beme05] Mohamed Bennani and Daniel A. Menasce. Resource allocation for autonomic data centers using analytic performance models. In IEEE International Conference on Autonomic Computing (ICAC'05), pages 229-240, 2005. [ bib | .pdf ]
[teda05] Gerald Tesauro, Rajarshi Das, William E. Walsh, and Jeffrey O. Kephart. Utility-function-driven resource allocation in autonomic systems. In IEEE International Conference on Autonomic Computing (ICAC'05), pages 342-343, 2005. [ bib | .pdf ]
[wazh05] Zhikui Wang, Xiaoyun Zhu, and Sharad Singhal. Utilization vs. SLO-based control for dynamic sizing of resource partitions. Technical Report HPL-2005-126R1, HP Laboratories, 2005. [ bib | .pdf | .pdf ]
[apca04] K. Appleby, S. B. Gao, J. R. Giles, and K.-W. Lee. Policy-based automated provisioning. IBM Systems Journal, 43(1):121-135, 2004. [ bib | .pdf ]
[chsh03] Abhishek Chandra and Prashant Shenoy. Effectiveness of dynamic resource allocation for handling internet flash crowds. Technical Report TR03-37, Department of Computer Science, University of Massachusetts at Amherst, November 2003. [ bib | .pdf | .pdf ]
[boda03] Craig Boutilier, Rajarshi Das, Jeffrey O. Kephart, Gerald Tesauro, and William E. Walsh. Cooperative negotiation in autonomic systems using incremental utility elicitation. In Proceedings of the 19th Conference in Uncertainty in Artificial Intelligence, pages 89-97, August 2003. [ bib | .pdf ]
[utta03] Sandeep Uttamchandani, Carolyn Talcott, and David Pease. Eos: An approach of using behavior implications for policy-based self-management. In Proc. 14th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM), number 2867 in Lecture Notes in Computer Science, pages 16-27. Springer-Verlag, 2003. [ bib | .pdf ]
This is an attempt to define the behavioural implications of policies (event-condition-action rules) so that a self-managing system can reason about which policies to use to adjust itself.
[paga02] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury, T. Jayram, and J. Bigus. Using control theory to achieve service level objectives in performance management. Real Time Systems Journal, 23(1-2), 2002. [ bib | CiteSeer | .pdf | .pdf ]
[zhlu02] Ronghua Zhang, Chenyang Lu, Tarek F. Abdelzaher, and John A. Stankovic. Controlware: A middleware architecture for feedback control of software performance. In Proc. International Conference on Distributed Computing Systems (ICDCS 2002), pages 301-310, 2002. [ bib | .pdf ]
[chan01] Jeffrey S. Chase, Darrell C. Anderson, Prachi N. Thakar, Amin Vahdat, and Ronald P. Doyle. Managing energy and server resources in hosting centres. In Proceedings of the 18th ACM Symposium on Operating System Principles (SOSP'01), pages 103-116, 2001. [ bib | .pdf | .pdf ]
Describes a system called Muse for allocating data center resources to servers. Hosted services are assumed to be able to scale via the the allocation of more servers or the allocation of more of the resources of a given server. For energy-conscious provisioning, keep active servers at a target utilization level and put inactive servers into low-power mode. Proposes resource allocation policies based on a comparison of resource costs and benefits. Costs refer to the costs of providing resources. Benefit refers to the utility of a particular level of application performance. Muse is expected to understand the relationship between resource allocation and application performance levels so that it can optimize the allocation of available resources among the competing applications.
[shli00] Molly H. Shor, Kang Li, Jonathan Walpole, David Steere, and Calton Pu. Application of control theory to modeling and analysis of computer systems. In Proceedings of the Japan-USA-Vietnam Workshop on Research and Education in Systems, Computation and Control Engineering, HoChiMinh City, Vietnam, June 2000. [ bib | .pdf | .pdf ]
[kalu00] M. Katchabaw, H. Lutfiyya, and M. Bauer. Driving resource management with application-level quality of service specifications. Journal of Decision Support Systems, 28(2):71-87, 2000. [ bib | .ps ]
Describes some generic tools for embedding monitoring and control hooks into application software.
[gost99] Ashvin Goel, David Steere, Calton Pu, and Jonathan Walpole. Adaptive resource management via modular feedback control. Technical Report CSE-99-003, Department of Computer Science and Engineering, Oregon Graduate Institute, January 1999. [ bib | .pdf | .pdf ]
[gost98] Ashvin Goel, David Steere, Calton Pu, and Jonathan Walpole. SWiFT: A feedback control and dynamic reconfiguration toolkit. Technical Report CSE-98-009, Department of Computer Science and Engineering, Oregon Graduate Institute, September 1998. [ bib | .pdf ]
[auca98] Christina Aurrecoechea, Andrew T. Campbell, and Linda Hauw. A survey of QoS architectures. Multimedia Systems, 6(3):138-151, 1998. [ bib | CiteSeer | .pdf | .pdf ]
Emphasis is on QoS for multimedia streams.
[amei97] J. Aman, C. K. Eilert, D. Emmes, P. Yocom, and D. Dillenberger. Adaptive algorithms for managing a distributed data processing workload. IBM Systems Journal, 36(2):242-283, 1997. [ bib | .pdf ]

Grids, Virtualization, Utility Computing

[cosi10] Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proc. ACM Symp. on Cloud Computing, June 2010. [ bib | .pdf ]
Benchmarks defines two so-called tiers: performance and scale-up. The former considers latency and throughput as offered load increases with a fixed amount of resources. The latter looks at traditional scale-up (does performance stay flat as more data, offered load and resources are added) and elastic speedup (does performance improve if more resources are added under constant load). Benchmark is designed to be extensible, but core workload consists of randomized inserts, updates, reads and sequential scans of keyed records. Benchmark is implemented as a multi-threaded Java program with an interface layer used to customize interactions with specific data managers. Not clear whether this is a closed-loop or open-loop client.
[daag10] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. G-Store: A scalable data store for transactional multi key access in the cloud. In Proc. ACM Symp. on Cloud Computing, June 2010. [ bib | .pdf ]
Argues that many web applications need atomic multi-key access. Allows definition of transient, arbitrary key-groups, across which atomic operations are possible. Key groups are implemented by transferring ownership of all keys in a group to a single leader node in the underlying storage system, so that it can coordinate atomic operations without the need for a distributed coordination protocol. Leader uses write-ahead logging to support failure recovery at the leader node. However, it seems that while the leader is down, the group is unavailable.
[kili10] Emre Kiciman, Benjamin Livshits, Madanlal Musuvathi, and Kevin C. Webb. Fluxo: A system for internet service programming by non-expert developers. In Proc. ACM Symp. on Cloud Computing, June 2010. [ bib | .pdf ]
Restricted application programming model supporting common architectural patterns for web services. Dataflow programming model with nodes representing computation and edges representing data flow.
[alco10] Peter Alvaro, Tyson Condie, Neil Conway, Khaled Elmeleegy, Joseph M . Hellerstein, and Russell Sears. BOOM analytics: Exploring data-centric, declarative programming for the cloud. In Proc. EuroSys Conf., April 2010. [ bib | .pdf | .pdf ]
[voch10] Hoang Tam Vo, Chun Chen, and Beng Chin Ooi. Towards elastic transactional cloud storage with range query support. In Proc. Int'l Conf. on Very Large Data Bases, 2010. [ bib | .pdf | .pdf ]
[wuji10] Sai Wu, Dawei Jiang, Ben Chin Ooi, and Kun-Lung Wu. Efficient b+-tree based indexing for cloud data processing. In Proc. Int'l Conf. on Very Large Data Bases, 2010. [ bib | .pdf | .pdf ]
[tiiy09] Omesh Tickoo, Ravi Iyer, Ramesh Illikkal, and Don Newell. Modeling virtual machine performance: Challenges and approaches. In Proc. Workshop on Hot Topics in Measurement and Modeling of Computer Systems, June 2009. [ bib | .pdf | .pdf ]
[kroe09] Kirk L. Kroeker. The evolution of virtualization. Communications of the ACM, 52(3):18-20, March 2009. [ bib ]
Tech-lite article talking about virtualization on hand-held devices, about virtualization for software deployment, and about performance and management.
[arfo09] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the clouds: A Berkeley view of cloud computing. Technical Report UCB/EECS-2009-28, University of California at Berkeley, February 2009. [ bib | .pdf | .pdf ]
[agsi09] Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, and Raghu Ramakrishnan. Asynchronous view maintenance for vlsd databases. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'09), pages 179-192, 2009. [ bib | DOI | .pdf ]
[krhe09] Tim Kraska, Martin Hentschel, Gustavo Alonso, and Donald Kossmann. Consistency rationing in the cloud: Pay only when it matters. Proc. of the VLDB Endowment, 2(1):253-264, 2009. [ bib | .pdf | .pdf ]
Proposes that data be assigned to one of three consistency levels: A, B, or C. Data assigned to level C have only session consistency and eventual consistency of updates. Data assigned to level A have serializable consistency. Data in the B category have adaptive consistency, switching between session consistency and serializability at runtime.
[lawh09] Horacio Andrés Lagar-Cavilla, Joseph Andrew Whitney, Adin Matthew Scannell, Philip Patchin, Stephen M. Rumble, Eyal de Lara, Michael Brudno, and Mahadev Satyanarayanan. Snowflock: Rapid virtual machine cloning for cloud computing. In Proc. ACM European Conference on Computer Systems (EuroSys'09), pages 1-12, 2009. [ bib | DOI | .pdf ]
Snowflock implements an fork (clone) operation for running VMs. The is no implicit synchronization or communication between parent and clone after the fork - anything required must be coded explicitly. Cloned children live on a virtual network with the parent, and can only communicate within this network. SnowFlock starts clones with little initial state, and additional state is shipped on demand from the parent, which uses copy-on-write to preserve a snapshot of its state as of the time of cloning. Each clone gets a virtual disk which is a snapshot of the parent's as of the time of cloning. This is implemented with using copy-on-write at the parent, which serves pages to the clones (via blocktap) as necessary. This mechanism is intended for the root device, not for I/O intensive data devices.
[hude08] Wenjin Hu, Todd Deshane, and Jeanna Matthews. Solaris virtualization options. :login, 33(5):7-17, October 2008. [ bib ]
Mostly a how-to guide for system admistrators, covering Containers, Solaris xVM and Solaris xVM VirtualBox.
[chje08] Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: Easy and efficient parallel processing of massive data sets. In Proc. Int'l Conference on Very Large Data Bases (VLDB'08), 2008. [ bib | .pdf ]
[cora08] Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. PNUTS: Yahoo!'s hosted data serving platform. Proc. of the VLDB Endowment, 1(2):1277-1288, 2008. [ bib | DOI | .pdf ]
[cule08] Brendan Cully, Geoffrey Lefebvre, Dutch T. Meyer, Mike Feeley, Norman C. Hutchinson, and Andrew Warfield. Remus: High availability via asynchronous virtual machine replication. In Proc. USENIX Symposium on Networked Systems Design and Implementation (NSDI), page 161, 2008. [ bib | .pdf | .pdf ]
[degh08] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008. [ bib | DOI ]
[minh08] Umar Farooq Minhas. A performance evaluation of database systems on virtual machines. Technical Report CS-2008-01, David R. Cheriton School of Computer Science, University of Waterloo, January 2008. Masters thesis. [ bib | .pdf | .pdf ]
[olre08] Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: A not-so-foreign language for data processing. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pages 1099-1110, 2008. [ bib | .pdf ]
[sico08] Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, and Raghu Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In Proc. ACM Int'l Conference on Management of Data (SIGMOD'08), pages 765-778, 2008. [ bib | http | .pdf ]
[shde07] Piyush Shivam, Azbayar Demberel, Pradeep Gunda, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, Shivnath Babu, and Jeffrey S. Chase. Automated and on-demand provisioning of virtual machines for database applications. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'07), pages 1079-1081, June 2007. [ bib | DOI | .pdf ]
demo paper
[sopo07] Stephen Soltesz, Herbert Potzl, Marc Fiuczynski, Andy Bavier, and Larry Peterson. Container-based operating system virtualization: A scalable high-performance alternative to hypervisors. In Proc. EuroSys 2007, pages 275-288, March 2007. [ bib | .pdf ]
[deha07] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, and Avinash Lakshman. Dynamo: Amazon's highly available key-value store. In Proc. ACM Symposium on Operating Systems Principles (SOSP'07), pages 205-220, 2007. [ bib | DOI | .pdf ]
[isbu07] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proc. EuroSys Conference, pages 59-72, 2007. [ bib | .pdf ]
[pazh07] Pradeep Padala, Xiaoyun Zhu, Zhikui Wang, Sharad Singhal, and Kang G. Shin. Performance evaluation of virtualization technologies for server consolidation. Technical Report HPL-2007-59, HP Laboratories Palo Alto, 2007. [ bib | .pdf | .pdf ]
Compares Xen, OpenVZ, and base Linux configurations. Looks at two-tier (Apache+PHP and MySQL) system under a RUBiS workload. Considers a variety of configurations: both tiers on a single physical node, each tier on a different node, and multiple application stacks with the web tiers on one node and the database tiers on another node. Found higher CPU overhead in the Xen configuration, relative to OpenVZ and base Linux. Found that Xen DomU had much higher L2 cache miss count than the base Linux system, but is it not clear how much of this is from the kernel in DomU and how much is from the application.
[chde06] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: a distributed storage system for structured data. In Proc. USENIX Symposium on Operating System Design and Implementation (OSDI'06), 2006. [ bib | .pdf ]
Key space is partitioned into ranges called tablets. BigTable uses multiple independent tablet servers to serve tablets. Tablets are assigned to servers by a BigTable master node. One server at a time per tablet. Tablet server uses a commit log in GFS to commit updates. Recent updates are kept in memory in a memtable. When the memtable fills, it is written to GFS as an immutable SSTable file.
[guch06] D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat. Enforcing performance isolation across virtual machines in xen. In Proc. of the ACM/IFIP/USENIX 7th International Middleware Conference, 2006. [ bib | .pdf | .pdf ]
[irch06] David E. Irwin, Jeffrey S. Chase, Laura E. Grit, Aydan R. Yumerefendi, David Becker, and Ken Yocum. Sharing networked resources with brokered leases. In Proc. USENIX Technical Conference, pages 199-212, 2006. [ bib | .pdf | .pdf ]
Resource providers make resources available to brokers, which in turn use them to satisy requests from clients. Clients get lease tickets from brokers, which understand which resources are available from which providers, and which implement polcies controlling which clients get which resources. Clients can redeem tickets with resource providers to obtain the lease, which gives the client access to resources for a fixed time window. Shirako is a toolkit to facilitate the constrution of clients, brokers, and resource providers.
[khbe06] G. Khanna, K. Beaty, G. Kar, and A. Kochut. Application performance management in virtualized server environments. In Proc. IEEE/IFIP Network Operations and Management Symposium, pages 373-381, 2006. [ bib | .pdf ]
[rair06] Lavanya Ramakrishnan, David E. Irwin, Laura E. Grit, Aydan R. Yumerefendi, Adriana Iamnitchi, and Jeffrey S. Chase:. Toward a doctrine of containment: Grid hosting with adaptive resource control. In Proc. ACM/IEEE Conference on High Performance Networking and Computing (SC2006), 2006. [ bib | DOI | .pdf ]
[clfr05] Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield. Live migration of virtual machines. In Proc. Symposium on Networked Systems Design and Implementation (NSDI 2005), May 2005. [ bib | .pdf | .pdf ]
[fotu05] Ian Foster and Steven Tuecke. Describing the elephant: The different faces of IT as service. Queue, 3(6):26-29, 2005. [ bib | .pdf ]
[pido05] Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallel analysis with sawzall. Scientific Programming, 13(4):277-298, 2005. [ bib | .pdf ]
[rose05] Mendel Rosenblum. The reincarnation of virtual machines. Queue, 2(5):34-40, 2005. [ bib | .pdf ]
[roga05] Mendel Rosenblum and Tal Garfinkel. Virtual machine monitors: Current technology and future trends. IEEE Computer, 38(5):39-47, 2005. [ bib | .pdf ]
[smna05] James E. Smith and Ravi Nair. The architecture of virtual machines. IEEE Computer, 38(5):32-38, 2005. [ bib | .pdf ]
[waha05] Andrew Warfield, Steven Hand, Keir Fraser, and Tim Deegan. Facilitating the development of soft devices. In Proc. USENIX Annual Technical Conference, pages 379-382, 2005. [ bib | .pdf | .pdf ]
[wimo04] John Wilkes, Jeffrey Mogul, and Jaap Suermondt. Utilification. In Proceedings of the 11th ACM SIGOPS European Workshop, September 2004. [ bib | .pdf | .pdf ]
Discusses the process of preparing software applications and application stacks for execution in a utility computing environment.
[dahe04] Shaul Dar, Gil Hecht, and Eden Shochat. dbswitch: Towards a database utility. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'04), pages 892-896, 2004. [ bib | .pdf ]
[degh04] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proc. Symposium on Operating Systems Design and Implementation (OSDI'04), pages 137-150, 2004. [ bib | .pdf ]
Proposes a programming model for highly parallelizable computations, and describes a system that implements this model. The computation input is a set of input key/value pairs, and the output is a set of output key/value pairs. The computation itself is defined by two functions. A Map function takes an input key value pair and produces a set of intermediate key/value pairs. A Reduce function takes an intermediate key and a set of values, and produces a single value.
[hupe04] Lan Huang, Gang Peng, and Tzi cker Chiueh. Multi-dimensional storage virtualization. In Proc. Joint International Conference on Measurement and Modeling of Computer Systems, pages 14-24, 2004. [ bib | .pdf ]
[krga04] Ivan Krsul, Arijit Ganguly, Jian Zhang, José A. B. Fortes, and Renato J. O. Figueiredo. VMPlants: Providing and managing virtual machine execution environments for grid computing. In Proc. ACM/IEEE Conference on High Performance Networking and Computing (SC2004), 2004. [ bib | DOI | .pdf ]
[chgo03] A. Chandra, P. Goyal, and P. Shenoy. Quantifying the benefits of resource multiplexing in on-demand data centers. In Proc. First Workshop on Algorithms and Architectures for Self-Managing Systems, June 2003. [ bib | .pdf ]
[badr03] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP'03), pages 164-177. ACM Press, 2003. [ bib | .pdf ]
Very nice paper describing the hardware virtualization approach used by Xen and changes it necessitates in the OS. Also includes some empirical performance evaluation.
[maei03] Susan Malaika, Andrew Eisenberg, and Jim Melton. Standards for databases on the grid. SIGMOD Record, 32(3), 2003. [ bib | .pdf ]
An overview of some data-related parts of the grid standardization process, including OGSA, DAIS (Data Access and Integration) for standarizing access to relational and XML data sources, OREP (OGSA Replication Services), and DFDL (Data Format and Description Language).
[anar02] Artur Andrzejak, Martin Arlitt, and Jerry Rolia. Bounding the resource savings of utility computing models. Technical Report HPL-2002-339, HP Laboratories, 2002. [ bib | .pdf | .pdf ]
[foke02] I. Foster, C. Kesselman, J. Nick, and S. Tuecke. Grid services for distributed system integration. Computer, 35(6), 2002. [ bib | .pdf | .pdf ]
Extended version can be found at http://www.globus.org/research/papers/ogsa.pdf. This is an overview of the Open Grid Services Architecture (OGSA), which is defines something very much like a distributed object system.
[rozh02] Jerry Rolia, Xiaoyun Zhu, Martin Arlitt, and Artur Andrzejak. Statistical service assurances for applications in utility grid environments. In IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'02), pages 247-256, 2002. [ bib | .pdf ]
[sach02] Constantine P. Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Optimizing the migration of virtual computers. In Proc. Symposium on Operating System Design and Implementation (OSDI'02), 2002. [ bib | .pdf ]
[chfo01] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets. Journal of Network and Computer Applications, 23:187-200, 2001. [ bib | .pdf | .pdf ]
Defines the core services of a data grid as a file-oriented storage service plus a distributed directory for meta-data. Also has some discussion of higher level services, like replication.

Storage Systems

[anth12] Gary Anthens. Revamping storage performance. Communications of the ACM, 55(1):20-22, January 2012. [ bib ]
[waro12] Carl A. Waldspurger and Mendel Rosenblum. I/o virtualization. Communications of the ACM, 55(1):66-72, January 2012. [ bib ]
[nath09] Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron. Migrating server storage to ssds: Analysis of tradeoffs. In Proc. ACM European Conference on Computer Systems, pages 145-158, 2009. [ bib | DOI | .pdf ]
[spga09] Richard P. Spillane, Sachin Gaikwad, Manjunath Chinni, Erez Zadok, and Charles P. Wright. Enabling transactional file access via lightweight kernel extensions. In Proc. USENIX Conference on File and Storage Technologies (FAST'09), 2009. [ bib | .pdf | .pdf ]
[habo05] Christoffer Hall and Philippe Bonnet. Getting priorities straight: Improving linux support for database I/O. In Proc. International Conference on Very Large Data Bases (VLDB'05), pages 1116-1127, 2005. [ bib | .pdf | .pdf ]
Explores response time vs. throughput tradeoff for I/O in MySQL. Argues that DBMS should maintain steady rather than bursty I/O request patterns to avoid idle I/O capacity. Describes a Linux implementation of prioritized I/O and measurements of its impact on MySQL's InnoDB.
[huhu05] Hai Huang, Wanda Hung, and Kang Shin. FS2: Dynamic data replication in free disk space for improving disk performance and energy-consumption. In Proc. 20th ACM Symposium on Operating Systems Principles (SOSP'05), pages 263-276, 2005. [ bib | .pdf ]
[siba05] Muthian Sivathanu, Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. Database-aware semantically-smart storage. In Proc. of the USENIX Symposium on File and Storage Technologies (FAST'05), pages 239-252, 2005. [ bib | .pdf | .pdf ]
Proposes to improve storage system performance allowing it to interpret some data written by a DBMS client. Types of interpreted information include log sequence numbers, block types, association of blocks with logical DB entities (tables, indexes), and access statistics explicitly generated by the DBMS to the storage system. Considers using this information in the storage system to ensure graceful degradation in case of device failures and to securely erase deleted data. Also considered extending the X-Ray [basi04] mechanism to improve the performance of the storage server cache.
[basi04] Lakshmi N. Bairavasundaram, Muthian Sivathanu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. X-RAY: A non-invasive exclusive caching mechanism for RAIDs. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA '04), June 2004. [ bib | .pdf | .pdf ]
[vome04] K. Voruganti, J. Menon, and S. Gopisetty. Land below a DBMS. SIGMOD Record, 33(1):64-70, March 2004. [ bib ]
A good, brief overview of SAN vs. NAS issues plus some discussion of storage virtualization and policy-based storage management.
[safr04] Yasushi Saito, Svend Frolund, Alistair C. Veitch, Arif Merchant, and Susan Spence. FAB: building distributed enterprise disk arrays from commodity components. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'04), pages 48-58, 2004. [ bib | .pdf | .pdf ]
[ried03] Erik Riedel. Storage systems: Not just a bunch of disks anymore. Queue, 1(4):32-41, June 2003. [ bib | .pdf ]
[snia03] Shared storage model: A framework for describing storage architectures. SNIA Technical Council white paper, April 2003. [ bib | .pdf ]
[ibm02] IBM Storage Tank: A distributed storage system. IBM Corporation white paper, January 2002. [ bib | .pdf | .pdf ]
[aran99] Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, David Patterson, and Kathy Yelick. Cluster I/O with River: Making the fast case common. In Proceedings of the Sixth Workshop on Input/Output in Parallel and Distributed Systems (IOPADS'99), pages 10-22, 1999. [ bib | CiteSeer | .ps | .ps ]
[pagi95] R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, and Jim Zelenka. Informed prefetching and caching. In Proc. ACM Symposium on Operating Systems Principles (SOSP'95), pages 79-95, December 1995. [ bib | .pdf | .pdf ]
Distinguishes hints that disclose from hints that advise, and advocates the former. Applications issue hints describing file access patterns: hints can be used to disclose sequential access to the whole file, or to disclose an access pattern described as a sequence of offset,length pairs.
[hila95] Dave Hitz, James Lau, and Michael Malcolm. File system design for an NFS file server appliance. Technical Report TR-3002, Network Applicance Corp., 1995. [ bib | .pdf | .pdf ]
[weza91] Gerhard Weikum, Peter Zabback, and Peter Scheuermann. Dynamic file allocation in disk arrays. In Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data, pages 406-415, May 1991. [ bib | .pdf ]

Storage System Management and Control

[gupu11] Jorge Guerra, Himabindu Pucha, Joseph S. Glider, Wendy Belluomini, and Raju Rangaswami. Cost effective storage using extent based dynamic tiering. In Proc. USENIX Conf. on File and Storage Technologies, pages 273-286, February 2011. [ bib | .pdf | .pdf ]
Includes a configuration advisor and a run-time tiering mechanism. The advisor uses a storage workload trace to estimate the capacity required in each tier during each time period, assuming that the run-time mechanism is moving each extent to the lowest cost tier that can satisfy the extent's I/O requirements during each epoch. The advisor recommends provisioning according to the maximum demand at each tier over all epochs. Epochs are assumed to be minutes/hours in duration. At run-time, a dynamic tier manager adjusts the placement of extents after each epoch. It choose a tier for each extent that will minimize power consumption, amont tiers that can satisfy the performance extent's performance demands. Within a tier, it also assigns extents to specific devices, attempting to consolidate so that devices can be powered down. Necessary migrations are then scheduled to run gradually.
[babo09] Shivnath Babu, Nedyalko Borisov, Sandeep Uttamchandani, Ramani Routray, and Aameek Singh. DIADS: Addressing the "my-problem-or-yours" syndrome with integrated san and database diagnosis. In Proc. USENIX Conference on File and Storage Technologies (FAST'09), 2009. [ bib | .pdf | .pdf ]
Describes a system that uses database query execution plans, storage layout and storage system configuration to attempt to pinpoint the cause of performance problems (e.g., slow queries) in a DBMS plus storage system stack.
[guah09] Ajay Gulati, Irfan Ahmad, and Carl A. Waldspurger. PARDA: Proportional allocation of resources for distributed storage access. In Proc. USENIX Conference on File and Storage Technologies (FAST'09), 2009. [ bib | .pdf | .pdf ]
Mechanism for proportional allocation of storage server bandwidth among multiple storage clients. Each client observes server response times to detect overload conditions, and then throttles it request stream by an amount determined by the share it expects to receive.
[solu09] Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, and Cristiana Amza. Dynamic resource allocation for database servers running on virtual storage. In Proc. USENIX Conference on File and Storage Technologies (FAST'09), 2009. [ bib | .pdf ]
Considers how to apportion database and storage server buffer cache space and storage system bandwidth across multiple workloads.
[guah08] Ajay Gulati and Irfan Ahmad. Towards distributed storage resource management using flow control. In Int'l Workshop on Storage and I/O Virtualization, Performance, Energy, Evaluation and Dependability (SPEED'08), February 2008. [ bib | .pdf | .pdf ]
[somi08] Gokul Soundararajan, Madalin Mihailescu, and Cristiana Amza. Context-aware prefetching at the storage server. In Proc. USENIX Annual Technical Conference, pages 377-390, 2008. [ bib | .pdf | .pdf ]
[qiiy06] Lin Qiao, Balakrishna R. Iyer, Divyakant Agrawal, and Amr El Abbadi. Automated storage management with qos guarantee in large-scale virtualized storage systems. Bulletin of the IEEE Technical Committee on Data Engineering, 29(3):47-54, September 2006. [ bib | .ps | .pdf ]
[qiag06] Lin Qiao, Divyakant Agrawal, Amr El Abbadi, and Balakrishna R. Iyer. Pulsatingstore: An analytic framework for automated storage management. In Proc. International Conference on Data Engineering Workshops, Workshop on Self-Managing Database Systems (SMDB'06), page 1213, 2006. [ bib | .pdf ]
[qiiy06b] Lin Qiao, Balakrishna R. Iyer, Divyakant Agrawal, and Amr El Abbadi. Automated storage management with qos guarantees. In Proc. International Conference on Data Engineering (ICDE'06), page 150, 2006. [ bib | .pdf ]
[mang05] Radhakrishnan Manga. Database layout with Data ONTAP. Technical Report TR-3411, Network Applicance Corp., September 2005. [ bib | .pdf | .pdf ]
[ansp05] Eric Anderson, Susan Spence, Ram Swaminathan, Mahesh Kallahalla, and Qian Wang. Quickly finding near-optimal storage designs. ACM Transactions on Computer Systems, 23(4):337-374, 2005. [ bib | DOI | .pdf ]
[liiy05] Lin Qiao, Balakrishna R Iyer, Divyakant Agrawal, and Amr El Abbadi. SVL: Storage virtualization engine leveraging DBMS technology. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05), pages 1048-1059, 2005. [ bib | .pdf ]
[qiiy05] Lin Qiao, Balakrishna R. Iyer, Divyakant Agrawal, Amr El Abbadi, and Sandeep Uttamchandani. PULSTORE: Automated storage management with QoS guarantee. In Proc. International Conference on Autonomic Computing (ICAC'05), pages 302-303, 2005. [ bib | .pdf ]
[qiiy05b] Lin Qiao, Balakrishna R Iyer, Divyakant Agrawal, Amr El Abbadi, and Sandeep Uttamchandani. PULSTORE: Automated storage management with QoS guarantee in large-scale virtualized storage systems. This is a longer unpublished version of the ICAC'05 publication [qiiy05]., 2005. [ bib | .pdf ]
[utyi05] Sandeep Uttamchandani, Li Yin, Guillermo A. Alvarez, John Palmer, and Gul Agha. CHAMELEON: A self-evolving, fully-adaptive resource arbitrator for storage systems. In Proc. USENIX 2005 Annual Technical Conference, pages 75-88, 2005. [ bib | .pdf | .pdf ]
[wech04] Wei Jin, Jeffrey S. Chase, and Jasleen Kaur. Interposed proportional sharing for a storage service utility. In Proc. International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS'04), pages 37-48, June 2004. [ bib | .pdf ]
[dech03] Murthy Devarakonda, David Chess, Ian Whalley, Alla Segal, Pawan Goyal, Aamer Sachedina, Keri Romanufa, Ed Lassettre, William Tetzlaff, and Bill Arnold. Policy-based autonomic storage allocation. In Proc. 14th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM), number 2867 in Lecture Notes in Computer Science, pages 143-154. Springer-Verlag, 2003. [ bib | .pdf ]
[goja03] Pawan Goyal, Divyesh Jadav, Dharmendra S. Modha, and Renu Tewari. CacheCOW: QoS for storage system caches. In Eleventh International Workshop on Quality of Service (IWQoS 03), 2003. [ bib | CiteSeer | .pdf ]
Allocation of buffer space in the face of a multi-class workload with QoS (mean response time) requirements for each class. Dynamic algorithms.
[lume03] Christopher Lumb, Arif Merchant, and Guillermo Alvarez. Facade: virtual storage devices with performance guarantees. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies, pages 131-144, 2003. [ bib | CiteSeer | .pdf ]
Enforcement of SLOs for disk. SLOs are load/response time curves for reads and writes. Enforcement mechanism throttles requests from hosts to storage devices. Assumes offered loads are feasible for the underlying storage devices. Uses real-time scheduling (EDF) to put requests into device queues so that deadlines targets are met. Device queue lengths are managed with feedback control.
[anho02] Eric Anderson, Michael Hobbs, Kimberly Keeton, Susan Spence, Mustafa Uysal, and Alistair Veitch. Hippodrome: running circles around storage administration. In Conference on File and Storage Technology (FAST'02), pages 175-188, January 2002. [ bib | .pdf | .pdf ]
Given workload, configure block-level storage system. Workload is described as stores (logically contiguous set of blocks) and streams (details in [veke01]). Workload analysis tool can produce stream-based workload description from a request trace. A configuration consists of number of disks, grouping of disks into arrays, division of arrays into logical units, disk controller and cache settings, and a mapping of the stores in the workload onto the logical units. Includes a migration component to move the storage system between configurations. Details of configuration finder are in [anka01]. Iterative approach does not assume that workload remains constant as the system configuration changes.
[shvi02] Prashant J. Shenoy and Harrick M. Vin. Cello: A disk scheduling framework for next generation operating systems. Real-Time Systems, 22(1-2):9-48, 2002. [ bib ]
[waos02] Julie Ward, Michael O'Sullivan, Troy Shahoumian, and John Wilkes. Appia: automatic storage area network design. In Conference on File and Storage Technology (FAST'02), pages 203-217, January 2002. [ bib | .pdf | .pdf ]
[anka01] E. Anderson, M. Kallahalla, S. Spence, R. Swaminathan, and Q. Wang. Ergastulum: quickly finding near-optimal storage system designs. Technical Report HPL-SSP-2001-5, HP Laboratories, July 2001. [ bib | .pdf | .pdf ]
Input includes a workload description in terms of, logical stores and streams of accesses to the stores, a description of the available physical devices, a set of constraint on how those devices may be configured, maximum utilizations, or total system cost, and finally a cost function that can be used to compare alternative configurations. The output includes a grouping of devices into RAID logical units (LUs) and settings for configuration parameters, such as stripe sizes. Search through the design space is heuristic. Cost functions are externally defined, so the system cannot exploit their structure to improve the search.
[wilk01] John Wilkes. Traveling to Rome: QoS specifications for automated storage system management. In Proc. Intl. Workshop on Quality of Service (IWQoS'2001), number 2092 in Lecture Notes in Computer Science, pages 75-91. Springer-Verlag, June 2001. [ bib | CiteSeer | .pdf ]
Provides an historical overview of an HP effort in automated storage system management. Provides examples of specification language used to describe workloads and system configurations.
[albo01] Guillermo A. Alvarez, Elizabeth Borowsky, Susie Go, Theodore H. Romer, Ralph Becker-Szendy, Richard Golding, Arif Merchant, Mirjana Spasojevic, Alistair Veitch, and John Wilkes. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Transactions on Computer Systems, 19(4):483-518, 2001. [ bib | .ps.Z | .ps.Z ]
[brbr99] John L. Bruno, Jose Carlos Brustoloni, Eran Gabber, Banu Ozden, and Abraham Silberschatz. Disk scheduling with quality of service guarantees. In IEEE International Conference on Multimedia Computing and Systems (ICMCS 1999), Vol. 2, pages 400-405, 1999. [ bib | CiteSeer | .pdf ]

Storage System Performance

[keco04] Terence Kelly, Ira Cohen, Moises Goldszmidt, and Kimberly Keeton. Inducing models of black-box storage arrays. Technical Report HPL-SSP-2004-108, HP Laboratories, Palo Alto, California, June 2004. [ bib | .pdf | .pdf ]
[anka04] Eric Anderson, Mahesh Kallahalla, Mustafa Uysal, and Ram Swaminathan. Buttress: A toolkit for flexible and high fidelity i/o benchmarking. In Proc. of the 3nd USENIX Symposium on File and Storage Technologies (FAST'04), 2004. [ bib | .pdf | .pdf ]
[keme04] Kimberly Keeton and Arif Merchant. A framework for evaluating storage system dependability. In Proceedings of the International Conference on Dependable Systems and Networks, 2004. [ bib | .pdf | .pdf ]
[veke03] Alistair Veitch and Kim Keeton. The Rubicon workload characterization tool. Technical Report HPL-SSP-2003-13, HP Laboratories, March 2003. [ bib | .pdf | .pdf ]
Rubicon is a configurable tool for analyzing workload traces. Analysis is accomplished by Rubicon components (reporters, filters) which can be combined to form pipelines to accomplish analysis tasks.
[kuke03] Zachary Kurmas, Kimberly Keeton, and Kenneth Mackenzie. Synthesizing representative I/O workloads using iterative distillation. In 11th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'03), pages 6-15, 2003. [ bib | .pdf | .pdf ]
[mela02] O. Mesut and N. Lambert. Hdd characterization for a/v streaming applications. IEEE Transactions on Consumer Electronics, 48(3):802-807, August 2002. [ bib ]
[ande01] Eric Anderson. Simple table-based modeling of storage devices. Technical Report HPL-SSP-2001-4, HP Laboratories, Palo Alto, California, July 2001. [ bib | .pdf | .pdf ]
[uyal01] Mustafa Uysal, Guillermo A. Alvarez, and Arif Merchant. A modular, analytical throughput model for modern disk arrays. In Proceedings of the Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS-2001), pages 183-192, 2001. [ bib | .pdf | .pdf ]
[shme98] Elizabeth Shriver, Arif Merchant, and John Wilkes. An analytic behavior model for disk drives with readahead caches and request reordering. In Proceedings of the Sigmetrics '98, pages 182-191, June 1998. [ bib | .ps | .pdf ]

Security and Access Control

[kige06] Daniel Kifer and J. E. Gehrke. Injecting utility into anonymized datasets. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'06), June 2006. [ bib | .pdf | .pdf ]
[mage06] Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer, and Muthuramakrishnan Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In Proc. IEEE International Conference on Data Engineering (ICDE'06), April 2006. [ bib | .pdf | .pdf ]
[kara06] Govind Kabra, Ravishankar Ramamurthy, and S. Sudarshan. Redundancy and information leakage in fine-grained access control. In Proc. ACM SIGMOD international Conference on Management of Data (SIGMOD'06), pages 133-144, 2006. [ bib | .pdf ]
[yusr04] Ting Yu, Divesh Srivastava, Laks V. S. Lakshmanan, and H. V. Jagadish. A compressed accessibility map for XML. ACM Transactions on Database Systems (TODS), 29(2):363-402, June 2004. [ bib ]
This is the journal version of [yusr02].
[fach04] Wenfei Fan, Chee Yong Chan, and Minos Garofalakis. Secure xml querying with security views. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04), 2004. [ bib | .pdf | .pdf ]
Deals with subset of Xpath that is more general than twig queries. Defines a language for specifying access controls by annotating the document DTD, an algorithm for automatically deriving the security view (including a view DTD) from the access control specification, and an algorithm for rewriting queries defined over the view so that they can be evaluated against the base data.
[rime04] Shariq Rizvi, Alberto O. Mendelzon, S. Sudarshan, and Prasan Roy. Extending query rewriting techniques for fine-grained access control. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04), 2004. [ bib | .pdf ]
Defines parameterized, user-specific authorization views over relational tables. Unlike [fach04], argues that query processing should be authorization transparent - that is, queries are expressed against the base tables, not the authorization views. Such a query is said to be valid if it can be rewritten using only authorization views. Invalid queries are rejected, so this is a go/no-go security model, like that SQL. Validity testing is undecidable unless the query language is restricted, e.g., to conjunctive queries.
[rowi04] Arnon Rosenthal and Marianne Winslett. Security of shared data in large systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04), 2004. tutorial presentation. [ bib | http | .pdf ]
[cham02] SungRan Cho, Sihem Amer-Yahia, Laks V. S. Lakshmanan, and Divesh Srivastava. Optimizing the secure evaluation of twig queries. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pages 490-501, August 2002. [ bib | .pdf | .pdf ]
[yusr02] Ting Yu, Divesh Srivastava, Laks V. S. Lakshmanan, and H. V. Jagadish. Compressed accessibility map: Efficient access control for XML. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pages 478-489, August 2002. [ bib | .pdf | .pdf ]
[stfa02] Andrei Stoica and Csilla Farkas. Secure XML views. In Research Directions in Data and Applications Security, IFIP WG 11.3 Sixteenth International Conference on Data and Applications Security, volume 256 of IFIP Conference Proceedings, pages 133-146. Kluwer, July 2002. [ bib | .pdf ]
[swee02] Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5):557-570, 2002. [ bib | .pdf | .pdf ]
[motr89] A. Motro. An access authorization model for relational databases based on algebraic manipulation of view definitions. In International Conference on Data Engineering (ICDE'89), pages 339-347, 1989. [ bib ]
[lamp71] Butler W. Lampson. Protection. In Proc. Fifth Princeton Symposium on Information Sciences and Systems, pages 437-443, March 1971. Reprinted in Operating Systems Review, 8, 1, January 1974, pp. 18-24. [ bib | .pdf ]

Uncategorized Papers

[loch14] David Lo, Liqun Cheng, Rama Govindaraju, Luiz Barroso, and Christos Kozyrakis. Towards energy proportionality for large-scale latency-critical workloads. In Proc. Int'l Symp. on Computer Architecture, June 2014. [ bib | .pdf | .pdf ]
[mawe14] Nirmesh Malviya, Ariel Weisberg, Samuel Madden, and Michael Stonebraker. Rethinking main memory OLTP recovery. In Proc. IEEE Int'l Conf. on Data Engineering, pages 604-615, 2014. [ bib | .pdf ]
[volt14] Voltdb technical overview. VoltDB whitepaper, 2014. downloaded January 2014. [ bib | .pdf | .pdf ]
[stwe13] Michael Stonebraker and Ariel Weisberg. The VoltDB main memory DBMS. Bulletin of the IEEE Technical Committee on Data Engineering, 36(2):21-27, June 2013. [ bib | .pdf ]
[bafe13] Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. HAT, not CAP: Towards highly available transactions. In Proc. Workshop on Hot Topics in Operating Systems, May 2013. [ bib | .pdf ]
[kalm13] David Kalmuk. Understanding the DB2 process model architecture in warehouse and PureScale environments. IDUG DB2 Tech Conference presentation, May 2013. [ bib | .pdf ]
[krpa13] Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, and Alan Fekete. MDCC: Multi-data center consistency. In Proc. EuroSys Conf., pages 113-126, April 2013. [ bib | .pdf | .pdf ]
[vade13] Tamas Vajk, Laszlo Deak, Krisztian Fekete, and Gergely Mezei. Automatic NoSQL schema development: A case study. In Proc. IASTED Int'l Conf. Parallel and Distributed Computing and Networks (PDCN 2013), pages 656-663, February 2013. [ bib | .pdf ]
[agda13] Divyakant Agrawal, Sudipto Das, and Amr. Data Management in the Cloud: Challenges and Opportunities. Number 32 in Synthesis Lectures on Data Management. Morgan & Claypool, 2013. [ bib | .pdf ]
[beda13] Philip A. Bernstein and Sudipto Das. Rethinking eventual consistency. In Proc. ACM SIGMOD Int'l Conf. on Management of Data, 2013. [ bib | .pdf | .pdf ]
This is a short overview of a SIGMOD tutorial.
[difr13] Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Ake Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. Hekaton: SQL Server's memory-optimized OLTP engine. In Proc. ACM SIGMOD Int'l Conf. on Management of Data, pages 1243-1254, 2013. [ bib | DOI | .pdf | .pdf ]
[naag13] Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. Message futures: Fast commitment of transactions in multi-datacenter environments. In Proc. Conf. on Innovative Database Research, 2013. [ bib | .pdf | .pdf ]
[tepr13] Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, and Hussam Abu-Libdeh. Consistency-based service level agreements for cloud storage. In Proc. Symp. on Operating Systems Principles, 2013. [ bib | DOI | .pdf ]
Describes a transactional key-value store called Pileus which allows applications to define an SLA for each Get operation. SLA describes the application's preferred performance/consistency tradeoff. SLA is a list of latency/consistency/utility triples, with earlier items preferred to later items. System tries to achieve SLA by controlling which and how many replicas to use for each Get request.
[tosc13] A. Tomic, D. Sciascia, and F. Pedone. MoSQL: An elastic storage engine for MySQL. In ACM Symposium on Applied Computing, DADS Track, 2013. [ bib | .pdf ]
[tuzh13] Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. Speedy transactions in multicore in-memory databases. In Proc. Symp. on Operating Systems Principles, pages 18-32, 2013. [ bib | DOI | .pdf | .pdf ]
[zhpo13] Yang Zhang, Russell Power, Siyuan Zhou, Yair Sovran, Marcos K. Aguilera, and Jinyang Li. Transaction chains: achieving serializability with low latency in geo-distributed storage systems. In Proc. Symp. on Operating Systems Principles, 2013. [ bib | DOI | .pdf ]
Chops transactions into subtransactions, each of which executes at a single site. Uses static conflict analysis to determine whether subtransactions can be executed independently while ensuring the serializability of the whole outer transaction. User intiated aborts can only occur in the first subtransaction, and system acknowledges commit as soon as the first subtransaction succeeds. Implemented in a system called Lynx.
[atbu12] Paolo Atzeni, Francesca Bugiotti, and Luca Rossi. Uniform access to non-relational database systems: the SOS platform. In Proc. Int'l Conf. on Advanced Info. Systems Engineering, June 2012. [ bib | .pdf | .pdf ]
Describes a meta-system intended to abstract different types of NoSQL systems. Apps see the meta-system, which translates operations to underlying NoSQL systems. Uses an abstract schema consisting of structs, sets, and attributes.
[coli12] James Cowling and Barbara Liskov. Granola: Low-overhead distributed transaction coordination. In Proc. USENIX Annual Technical Conf., June 2012. [ bib | .pdf | .pdf ]
[bigd12] Challenges and opportunities with big data. white paper, February 2012. [ bib | .pdf | .pdf ]
[aubo12] A. Auradkar, C. Botev, S. Das, D. De Maagd, A. Feinberg, P. Ganti, L. Gao, B. Ghosh, K. Gopalakrishna, B. Harris, et al. Data infrastructure at linkedin. In Proc. IEEE Int'l Conf. on Data Engineering, pages 1370-1381, 2012. [ bib ]
Includes a description of Voldemort.
[bave12] Peter Bailis, Shivaram Venkataraman, Joseph M. Hellerstein, Michael Franklin, and Ion Stoica. Probabilistically bounded staleness for practical partial quorums. Technical Report UCB/EECS-2012-4, Dept. of EECS, University of California at Berkeley, January 2012. accepted to VLDB 2012. [ bib | .pdf ]
[cabl12] Ting Caoy, Stephen M Blackburny, Tiejun Gaoy, and Kathryn S McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In Proc. Int'l Symp. on Computer Architecture, 2012. [ bib | .pdf | .pdf ]
[code12] James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google's globally-distributed database. In Proc. USENIX Conf. on Operating Systems Design and Implementation, 2012. [ bib | .pdf ]
Stores versioned (timestamped) key to value mappings, replicated using Paxos. Paxos leader also implements a lock table and transaction manager. Transactions local to one group (Paxos instance) are handled by that leader, otherwise 2PC is used among group leaders. Groups span zones (datacenters). Underlying data are stored in Colossus (which is local to a data center?) Keys are grouped into common prefix directories, and directories are assigned to groups. (There are also tablets - difference between tablets and directories is not clear.) Responsibility for controlling replication split between apps and admin. Spanner admins specify a set of replication options (number and placement of copies), apps choose which of these options to use for each directory. Quasi-relational data model, with versioned values, is implemented on top of the basic key to value mapping. Relations are organized hierarchically, with rows linkage based on common key prefixes. Each unique key in the top level table corresponds to a directory.
[krpa12] Tim Kraska, Gene Pang, Michael J. Franklin, and Samuel Madden. MDCC: Multi-data center consistency. Computing Research Repository (CoRR), abs/1203.6049(arXiv:1203.6049v1), 2012. [ bib | .pdf | http ]
[lipo12] Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno Preguica, and Rodrigo Rodrigues. Making geo-replicated systems fast as possible, consistent when necessary. In Proc. USENIX Conf. on Operating Systems Design and Implementation, pages 265-278, 2012. [ bib | .pdf | .pdf ]
[vaja12] Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. Scheduling heterogeneous multi-cores through performance impact estimation (pie). In Proc. Int'l Symp. on Computer Architecture, 2012. [ bib | .pdf | .pdf ]
[vowa12] Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, and Beng Chin Ooi. LogBase: A scalable log-structured database system in the cloud. Proc. of the VLDB Endowment, 5(10):1004-1015, 2012. [ bib | .pdf | .pdf ]
Objective include high write bandwidth and low read latency. Relational data abstraction. Data are vertically partitioned (using a workload), and then horizontally partitioned within each vertical partition, resulting in tablets. Data are versioned. Interface is record oriented, includes get, put, insert, delete, scan. Each server maintains a single log for all tablets it is responsible for. Also, a multi-version index to locate records on reads. Index is checkpointed to disk periodically to reduce recovery time. Multi-version optimistic concurrency control used to provide SI over multiple records.
[wepi11] Zhou Wei, Guillaume Pierre, and Chi-Hung Chi. CloudTPS: Scalable transactions for Web applications in the cloud. IEEE Transactions on Services Computing, 5(4):525-539, 2012. [ bib | .pdf | .pdf ]
[llfr11] Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. Don't settle for eventual: scalable causal consistency for wide-area storage with COPS. In Proc. Symp. on Operating Systems Principles, October 2011. [ bib | DOI | .pdf ]
[pore11] Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan. CryptDB: Protecting confidentiality with encrypted query processing. In Proc. Symp. on Operating Systems Principles, October 2011. [ bib | .pdf | .pdf ]
[sopo11] Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. Transactional storage for geo-replicated systems. In Proc. Symp. on Operating Systems Principles, October 2011. [ bib | .pdf | .pdf ]
[kovi11] Ioannis Koltsidas and Stratis D. Viglas. Data management over flash memory (tutorial presentation). In Proc. ACM SIGMOD Int'l Conf. on Management of Data, June 2011. [ bib | .pdf ]
[mrys11] Michael Rys. Scalable SQL. Communications of the ACM, 54(6):48-53, June 2011. [ bib ]
Discusses data and functional partitioning in fairly generic terms. Also includes a case study of scaleout for MySpace, using SQL Server.
[shmi11] Mohammad Bilal Sheikh, Umar Farooq Minhas, Omar Zia Khan, Ashraf Aboulnaga, Pascal Poupart, and David J. Taylor. A bayesian approach to online performance modeling for database appliances using gaussian models. In Proc. Int'l Conf. on Autonomic Computing, June 2011. [ bib | .pdf ]
[bihu11] Kenneth P. Birman, Qi Huang, and Dan Freedman. Overcoming the D in CAP: Using Isis2 to build locally responsive cloud services. Technical report, Cornell University, April 2011. unnumbered technical report. [ bib | .pdf | .pdf ]
[lajo11] Horacio Lagar, Kaustubh Joshi, Matti Hiltunen, Roy Bryant, Eyal de Lara, Alexey Tumanov, Olga Irzak, and Adin Scannell. Kaleidoscope: Cloud micro-elasticity via vm state coloring. In Proc. EuroSys Conf., April 2011. [ bib | .pdf | .pdf ]
[babo11] Jason Baker, Chris Bond, James Corbett, J.J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proc. Conf. on Innovative Database Research, January 2011. [ bib | .pdf | .pdf ]
Data is partitioned into entity groups, each independently and sychronously replicated over a wide area. Multiple data centers, each with a NoSQL data store (BigTable). Transactions allowed within entity groups, but not across. Transactions seem to be reads then writes. Uses Paxos to do wide area replication of a log that includes all transactions' updates, and get agreement on their order of execution. Concurrency control is effectively optimistic - if two transactions are writing in the same entity group at the same time, one may be aborted and retried.
[becs11] Philip A. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kallan, Gopal Kakivaya, David B. Lomet, Ramesh Manne, Lev Novik, and Tomas Talius. Adapting Microsoft SQL Server for cloud computing. In Proc. IEEE Int'l Conf. on Data Engineering, 2011. [ bib | .pdf ]
Describes Cloud SQL Server, used by SQL Azure DBMS-as-a-service. Uses partitioning, transactions confined to a single partition. Each DBMS instance has private storage. Synchronous master-slave DBMS-level replication for HA.
[cujo11] Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh Malviya, Eugene Wu, Samuel Madden, Hari Balakrishnan, and Nickolai Zeldovich. Relational Cloud: A database service for the cloud. In Proc. Conf. on Innovative Database Research, January 2011. [ bib | .pdf | .pdf ]
Multiple multi-tenant DBMS, each hosting one or more workloads. Large workloads can be scaled-out over multiple DBMS using workload-aware partitioning.
[goli11] Wojciech M. Golab, Xiaozhou Li, and Mehul A. Shah. Analyzing consistency properties for fun and profit. Technical Report HPL-2011-6, HP Laboratories, 2011. Technical report version of a PODC11 paper. [ bib | .pdf ]
[hewi11] Eben Hewitt. Cassandra: The Definitive Guide. O'Reilly, 2011. [ bib | .pdf ]
[nawi11] Mahdi Tayarani Najaran, Primal Wijesekera, Andrew Warfield, and Norman C. Hutchinson. Distributed indexing and locking: In search of scalable consistency. In Proc. Workshop on Large Scale Distributed Systems and Middleware, 2011. [ bib | .pdf ]
[onru11] Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, and Mendel Rosenblum. Fast crash recovery in RAMCloud. In Proc. Symp. on Operating Systems Principles, pages 29-41, 2011. [ bib | .pdf | .pdf ]
RamCloud appears to apps as a simple key-value storage system. Tries to provide low-latency (5-10 us) access times - needs Infiniband to support this. Tables broken into tablets (contiguous ranges of keys in a tablet), each tablet is assigned to a server. Data are organized in memory as a log. On write, insert new record into the log and update a hash table which indicates where the record can be found. Also push the change to several backup copies of the log on other servers. Periodic garbage collection reclaims log segments. Backups buffer their copies of the log in memory and gradually flush it to disk (not on every update). Suggest use of battery backup or capacitors to ensure that unflushed updates can make it to disk in case power is lost to a backup server. Recovery is highly parallelized. Failed server's keys are repartitioned among multiple recovery masters, each of which recovers its part of the key space and becomes the master for that part of the key space.
[pato11] Ippokratis Pandis, Pinar Tozun, Ryan Johnson, and Anastasia Ailamaki. Plp: Page latch-free shared-everything oltp. Proc. of the VLDB Endowment, 4(10):610-621, 2011. [ bib | .pdf | .pdf ]
[safr11] Ido Safruti. The great mobile slowdown. cotendo white paper, 2011. [ bib | .pdf ]
[sopr11] João Soares and Nuno Preguiça. Combining mobile and cloud storage for providing ubiquitous data access. In Proc. Int'l Euro-Par Conf. on Parallel Processing, 2011. [ bib | .pdf | .pdf ]
[trag11] Nguyen Tran, Marcos K. Aguilera, and Mahesh Balakrishnan. Online migration for geo-distributed storage systems. In Proc. USENIX Annual Technical Conf., 2011. [ bib | .pdf | .pdf ]
Describes an abstraction called overlays for data migration in distributed key-value storage systems.
[catt10] Rick Cattell. Scalable SQL and NoSQL data stores. SIGMOD Record, 39(4):12-27, December 2010. [ bib | .pdf | .pdf ]
[adbo10] Sarita V. Adve and Hans-J. Boehm. Memory models: A case for rethinking parallel languages and hardware. Communications of the ACM, 53(8):90-101, August 2010. [ bib ]
Excellent overview of hardware and high-level language memory models.
[stuh10] Julian Stuhler. Ibm db2 purescale: The next big thing or a solution looking for a problem? Database Journal, July 2010. [ bib | http ]
[brho10] Erik Brynjolfsson, Paul Hofmann, and John Jordan. Cloud computing and electricity: Beyond the utility model. Communications of the ACM, 53(5):32-34, May 2010. [ bib | .pdf ]
Discussion of technical and business strengths and weaknesses of the utility computing model, including security, lock-in and interoperability.
[durk10] Dave Durkee. Why cloud computing will never be free. Communications of the ACM, 53(5):62-69, May 2010. [ bib ]
Discusses cloud service pricing, the cloud computing marketplace, and strategies the may be used by vendors to keep costs low, and weaknesses of current cloud SLAs. Then discusses requirements for Cloud 2.0, meaning cloud services intended to support critical enterprise applications. Issues include storage system performance - argues that access randomness and working set size are proportional to the number of applications supported by a shared storage service. Also discusses administration, SLAs and automation.
[scne10] Daniel J. Scales, Mike Nelson, and Ganesh Venkitachalam. The design and evaluation of a practical system for fault-tolerant virtual machines. Technical Report VMware-TR-2010-001, VMWare, May 2010. [ bib | .pdf ]
[cami10] Mustafa Canim, George A. Mihaila, Bishwaranjan Bhattacharjee, Kenneth A. Ross, and Christian A. Lang. Ssd bufferpool extensions for database systems. Proc. of the VLDB Endowment, 3(2):1435-1446, 2010. [ bib | .pdf | .pdf ]
[daag10a] Sudipto Das, Shashank Agarwal, Divyakant Agrawal, and Amr El Abbadi. Elastras: An elastic, scalable, and self managing transactional database for the cloud. Technical Report 2010-04, University of California, Santa Barbara, 2010. [ bib | .pdf ]
[dani10] Sudipto Das, Shoji Nishimura, Divyakant Agrawal, and Amr El Abbadi. Live database migration for elasticity in a multitenant database for cloud platforms. Technical Report 2010-09, Department of Computer Science, University of California Santa Barbara, 2010. [ bib | .pdf ]
[dese10] Biplob Debnath, Sudipta Sengupta, and Jin Li. Flashstore: High throughput persistent key-value store. Proc. of the VLDB Endowment, 3(2):1414-1425, 2010. [ bib | .pdf | .pdf ]
Writes collected in RAM and batched to SSD in chunks large enough to fill a flash page. Hash table in memory is used to index key,value pairs in the SSD. There is also a read cache in RAM. Berkeley DB is used to index key,value records on disk. Record read checks RAM read cache, then RAM write buffer, then SSD, then disk. All reads are added to the RAM read cache. Records are inserted into the SSD when they are written (after staging). SSD pages are organized as a ring buffer. When SSD fills, records on early pages are recycled - either by reinserting them into the SSD or by destaging them to the disk. A clock like algorithm (with recent-reference bit) is used to determine whether a record is reinserted into SSD or destaged to disk.
[feze10] Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten. SPORC: Group collaboration using untrusted cloud resources. In Proc. USENIX Conf. on Operating Systems Design and Implementation, 2010. [ bib | .pdf | .pdf ]
[guku10] Ajay Gulati, Chethan Kumar, Irfan Ahmad, and Karan Kumar. Basil: Automated io load balancing across storage devices. In USENIX Conference on File and Storage Technology (FAST'10), 2010. [ bib | .pdf ]
[jopa10] Ryan Johnson, Ippokratis Pandis, Radu Stoica, Manos Athanassoulis, and Anastasia Ailamaki. Aether: A scalable approach to logging. Proc. of the VLDB Endowment, 3(1):681-692, 2010. [ bib | .pdf | .pdf ]
Includes a performance evaluation of Early Lock Release (release locks before commit record goes to disk, but do not return results to client and ensure subsequent transaction's commits are dependent on this one). Also asynchronous log flushing (called flush pipelining) so that threads don't context-switch while waiting for log I/O. Also a technique for parallelizing log buffer insertion.
[joab10] Evan P.C. Jones, Daniel J. Abadi, and Samuel Madden. Low overhead concurrency control for partitioned main memory databases. In Proc. ACM SIGMOD Int'l Conf. on Management of Data, pages 603-614, 2010. [ bib | DOI | .pdf | .pdf ]
[jobo10] William K. Josephson, Lars A. Bongo, David Flynn, and Kai Li. Dfs: A file system for virtualized flash storage. In USENIX Conference on File and Storage Technology (FAST'10), 2010. [ bib | .pdf ]
[leig10] Tom Leighton. Akamai and cloud computing: A perspective from the edge of the cloud. Akamai white paper, 2010. [ bib | .pdf ]
[lizh10] Zhichun Li, Ming Zhang, Zhaosheng Zhu, Yan Chen, Albert Greenberg, and Yi-Min Wang. WebProphet: Automating performance prediction for web services. In Proc. USENIX Conf. on Networked Systems Design and Implementation, 2010. [ bib | .pdf | .pdf ]
[mase10] Prince Mahajan, Srinath Setty, Sangmin Lee, Allen Clement, Lorenzo Alvisi, Mike Dahlin, and Michael Walfish. Depot: Cloud storage with minimal trust. In Proc. USENIX Conf. on Operating Systems Design and Implementation, 2010. [ bib | .pdf | .pdf ]
[peda10] Daniel Peng and Frank Dabek. Large-scale incremental processing using distributed transactions and notifications. In Proc. USENIX Conf. on Operating Systems Design and Implementation, pages 1-15, 2010. [ bib | .pdf | .pdf ]
Describes Percolator, used to incrementally maintain Google's web search index. Provides multi-row transactions and snapshot isolation, using multi-versioning in BigTable. Some transactions may have high latency.
[ouag09] John Ousterhout, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra, Aravind Narayanan, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, and Ryan Stutsman. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. Operating Systems Review, 43(4):92-105, December 2009. [ bib | .pdf | .pdf ]
A whitepaper presenting motivation for the RAMCloud project.
[lama09] Avinash Lakshman and Prashant Malik. Cassandra - a decentralized structured storage system. In Proc. ACM SIGOPS Int'l Workshop on Large Scale Distributed Systems and Middleware (LADIS'09), October 2009. [ bib | .pdf | .pdf ]
[pure09] Transparent application scaling with ibm db2 purescale. IBM white paper, October 2009. [ bib | .pdf | .pdf ]
[wure09] Xiaojian Wu and A. L. Narasimha Reddy. Managing storage space in a flash and disk hybrid storage system. In Proc. IEEE/ACM Int'l Symp. on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), September 2009. [ bib | .pdf | .pdf ]
[coco09] Greenplum. Mad skills: New analysis practices for big data. Greenplum white paper, March 2009. [ bib | .pdf | .pdf ]
[chot09] Whei-Jen Chen, Masafumi Otsuki, Paul Descovich, Selvaprabhu Arumuggharaj, Toshihiko Kubo, and Yong Jun Bi. High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows. IBM Redbook, February 2009. [ bib | .pdf | .pdf ]
[abba09] Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel J. Abadi, Alexander Rasin, and Avi Silberschatz. Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. of the VLDB Endowment, 2(1):922-933, 2009. [ bib | .pdf | .pdf ]
[auja09] Stefan Aulbach, Dean Jacobs, Alfons Kemper, and Michael Seibold. A comparison of flexible schemas for software as a service. In Proc. ACM SIGMOD Int'l Conference on Management of Data, pages 881-888, 2009. [ bib | DOI | .pdf ]
[cabh09] Mustafa Canim, Bishwaranjan Bhattacharjee, George Mihaila, Christian Lang, and Ken Ross. An object placement advisor for db2 using solid state storage. Proc. of the VLDB Endowment, 2(2):1318-1329, 2009. [ bib | .pdf | .pdf ]
[daag09] Sudipto Das, Divyakant Agrawal, and Amr El Abbadi. ElasTraS: An elastic transactional data store in the cloud. In Proc. USENIX Workshop on Hot Topics in Cloud Computing, 2009. [ bib | .pdf | .pdf ]
[frpa09] Eric Friedman, Peter M. Pawlowski, and John Cieslewicz. Sql/mapreduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions. Proc. of the VLDB Endowment, 2(2):1402-1413, 2009. [ bib | .pdf | .pdf ]
[gare09] John Garrison and A. L. Narasimha Reddy. Umbrella file system: Storage management across heterogeneous devices. ACM Transactions on Storage, 5(1), 2009. [ bib | DOI | .pdf | .pdf ]
[gana09] Alan Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan Narayanam, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, and Utkarsh Srivastava. Building a highlevel dataflow system on top of mapreduce: The pig experience. Proc. of the VLDB Endowment, 2(2):1414-1425, 2009. [ bib | .pdf | .pdf ]
[isyu09] Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In Proc. ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD'09), pages 987-994, 2009. [ bib | DOI | .pdf ]
[nuri09] Lucas Nussbaum and Olivier Richard. A comparative study of network link emulators. In Proceedings of the 2009 Spring Simulation Multiconference, pages 85:1-85:8, 2009. [ bib | .pdf ]
[papa09] Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. A comparison of approaches to large-scale data analysis. In Proc. ACM SIGMOD Int'l Conf. on Management of Data (SIGMOD'09), pages 165-178, 2009. [ bib | DOI | .pdf ]
[thsa09] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. Hive - a warehousing solution over a map-reduce framework. Proc. of the VLDB Endowment, 2(2):1626-1629, 2009. [ bib | .pdf | .pdf ]
[webo09] Craig D. Weissman and Steve Bobrowski. The design of the Force.com multitenant internet application development platform. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD), pages 889-896, 2009. [ bib | DOI | .pdf ]
[coha08] Graham Cormode and Marios Hadjieleftheriou. Finding frequent items in data streams. In Proc. Int'l Conference on Very Large Data Bases (VLDB'08), August 2008. [ bib | .pdf ]
[selt08] Margo Seltzer. Beyond relational databases. ACM Queue, 51(7):52-58, July 2008. [ bib | DOI | .pdf ]
Argues for modular and configurable DBMS to address new applications: warehousing, directory services, web search, mobile device caching, XML, streams.
[prit08] Dan Pritchett. BASE: An acid alternative. ACM Queue, 6(3):48-55, May 2008. [ bib | DOI | .pdf ]
[abma08] Daniel J. Abadi, Samuel Madden, and Nabil Hachem. Column-stores vs. row-stores: How different are they really? In Proc. ACM SIGMOD Int'l Conf. on Management of Data, pages 967-980, 2008. [ bib | .pdf ]
[aggo08] Marcos K. Aguilera, Wojciech M. Golab, and Mehul A. Shah. A practical scalable distributed b-tree. Proc. of the VLDB Endowment, 1(1):598-609, 2008. [ bib | .pdf | .pdf ]
[augr08] Stefan Aulbach, Torsten Grust, Dean Jacobs, Alfons Kemper, and Jan Rittinger. Multi-tenant databases for software as a service: schema-mapping techniques. In Pro. ACM SIGMOD Int'l Conference on Management of Data, pages 1195-1206, 2008. [ bib | DOI | .pdf ]
[brfl08] Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann, and Tim Kraska. Building a database on S3. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD), pages 251-264, 2008. [ bib | DOI | .pdf ]
[caro08] Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. Serializable isolation for snapshot databases. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD), pages 729-738, 2008. [ bib | DOI | .pdf ]
[depa08] David J. DeWitt, Erik Paulson, Eric Robinson, Jeffrey F. Naughton, Joshua Royalty, Srinath Shankar, and Andrew Krioukov. Clustera: an integrated computation and data management system. Proc. of the VLDB Endowment, 1(1):28-41, 2008. [ bib | .pdf | .pdf ]
[kaki08] Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alexander Rasin, Stanley Zdonik, Evan P. C. Jones, Samuel Madden, Michael Stonebraker, Yang Zhang, John Hugg, and Daniel J. Abadi. H-store: A high-performance, distributed main memory transaction processing system. In Proc. Int'l Conf. on Very Large Data Bases, volume 1, pages 1496-1499, 2008. [ bib | .pdf | .pdf ]
[kovi08] Ioannis Koltsidas and Stratis D. Viglas. Flashing up the storage layer. Proc. of the VLDB Endowment, 1(1):514-525, 2008. [ bib | DOI | .pdf | .pdf ]
Considers architecture with both flash and magnetic disk available for persistent storage. Each block lives persistently either on disk or on flash, not both. Assumes there is a demand-paged in-memory block cache that makes a placement decision on eviction of a dirty page. Proposed placement algorithms count page reads and writes uses the counts, as well as the costs of read and write operations on disk and flash, to decide where to place an evicted page. Placement decisions are made independently for each page. In particular, there are no capacity constraints and thus the algorithms may choose to place all blocks on the same device. Proposed cache replacement algorithm keeps some number of least-recently-used pages in four queues corresponding to whether the page is clean or dirty and whether the page is located on flash or disk. Always evict the page with the lowest eviction cost from among these least-recently used pages.
[drep07] Ulrich Drepper. What every programmer should know about memory. November 2007. [ bib | .pdf ]
[kesh07] S. Keshav. How to read a paper. ACM SIGCOMM Computer Communication Review, 37(3):83-84, July 2007. [ bib | http | .pdf ]
[laju07] Pepijn de Langen and Ben H. H. Juurlink. Trade-offs between voltage scaling and processor shutdown for low-energy embedded multiprocessors. In Int'l Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, number 4599 in Lecture Notes in Computer Science. Springer-Verlag, July 2007. [ bib | .pdf ]
[stke07] Christopher Stewart, Terence Kelly, and Alex Zhang. Exploiting nonstationarity for performance prediction. In Proc. EuroSys 2007, pages 31-46, March 2007. [ bib | .pdf ]
[agme07] Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, and Christos Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. In Proc. ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pages 159-174, 2007. [ bib | DOI | .pdf | .pdf ]
[grae07] Goetz Graefe. The five-minute rule twenty years later, and how flash memory changes the rules. In Proc. Int'l Workshop on Data Management on New Hardware, pages 1-9, 2007. [ bib | DOI | .pdf ]
[hest07] Joseph Hellerstein, Michael Stonebraker, and James Hamilton. Architecture of a database system. Foundations and Trends in Databases, 1(2):141-259, 2007. [ bib | .pdf | .pdf ]
[orac07] Oracle. Scalability and performance with Oracle 11g database. Oracle white paper, 2007. [ bib | .pdf ]
[stma07] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. The end of an architectural era (it's time for a complete rewrite). In Proc. Int'l Conf. on Very Large Data Bases, pages 1150-1160, 2007. [ bib | .pdf ]
[beda06] Philip A. Bernstein, Nishant Dani, Badriddine Khessib, Ramesh Manne, and David Shutt. Data management issues in supporting large-scale web services. Bulletin of the IEEE Technical Committee on Data Engineering, 29(4):3-9, December 2006. [ bib | .ps | .ps ]
[rale06] Parthasarathy Ranganathan, Phil Leech, David E. Irwin, and Jeffrey S. Chase. Ensemble-level power management for dense blade servers. In Proc. International Symposium on Computer Architecture (ISCA'06), pages 66-77, June 2006. [ bib | .pdf | .pdf ]
Power management for groups (ensembles) of servers, under the assumption that the servers in a group are likely to require peak power at different times. Goal is to reduce the amount of power overprovisioning required for the group.
[arba06] Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL continuous query language: Semantic foundations and query execution. VLDB Journal, 15:121-142, February 2006. [ bib ]
CQL is the query language implemented by the Stanford STREAM database system.
[burr06] Michael Burrows. The chubby lock service for loosely-coupled distributed systems. In Proc. of the Symp. on Operating System Design and Implementation (OSDI'06), pages 335-350, 2006. [ bib | .pdf | .pdf ]
Chubby cell has a primary and secondaries. Primary handles all reads and writes. Writes are replicated to secondaries and acked when a majority have acked. Primary holds master lease, which it will renew unless it fails. Primary failure causes election of new primary via distributed consensus protocol. Chubby exports a simple Unix-like file system interface. File's can act as reader/writer locks. Chubby also provides notifications of various events, such as modification of file contents or master failover. Clients maintain sessions, which a terminated if the client dies or the session becomes idle. Server guarantees a minimum idle lease time before it will determine session is idle and terminate it. Client also maintains a (conservative) session timeout and will eventually decide its session is expired. Sessions can potentially be preserved across master failovers.
[grla06] jim gray and leslie lamport. Consensus on transaction commit. ACM Transactions on Database Systems, 31(1):133-160, 2006. [ bib | DOI | .pdf ]
Defines a fault-tolerant commit protocol called Paxos Commit, which makes progress as long as a majority of participants are available. Does not block on failure of a coordinator, as in standard 2PC.
[crwu06] Sailesh Krishnamurthy, Chung Wu, and Michael Franklin. On-the-fly sharing for streamed aggregation. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'06), pages 623-634, 2006. [ bib | DOI | .pdf ]
[lova06] David Lomet, Zografoula Vagena, and Roger Barga. Recovery from "bad" user transactions. In Proc. ACM SIGMOD Int'l Conference on Management of Data (SIGMOD'06), pages 337 - 346, 2006. [ bib | http | .pdf ]
[nive06] Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. Rethink the sync. In USENIX Symposium on Operating Systems Design and Implementation (OSDI'06), 2006. [ bib | .pdf | .pdf ]
[paju06] Seon-yeong Park, Dawoon Jung, Jeong-uk Kang, Jin-soo Kim, and Joonwon Lee. Cflru: A replacement algorithm for flash memory. In Proc. Int'l Conf. on Compilers, Architecture and Synthesis for Embedded Systems, pages 234-241, 2006. [ bib | DOI | .pdf ]
[wudi06] Eugene Wu, Yanlei Diao, and Shariq Rizvi. High-performance complex event processing over streams. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'06), pages 407-418, 2006. [ bib | DOI | .pdf ]
[xigo05] Man Xiong, Brian Goldstein, and Chris Auger. Scaling out SQL Server with data-dependent routing. Dell Power Solutions, August 2005. [ bib | .pdf | .pdf ]
[waro05] Andrew Warfield, Russ Ross, Keir Fraser, Christian Limpach, and Steven Hand. Parallax: managing storage for a million machines. In Proc. USENIX Hot Topics in Operating Systems (HOTOS'05), June 2005. [ bib | .pdf | .pdf ]
Block level storage virtualization targeted at virtual machines. Uses copy-on-write and trie-based block indexing to support versioned device images. Virtualization is implemented in dedicated virtual machines, one for each node in a cluster.
[moch05] Justin D. Moore, Jeffrey S. Chase, Parthasarathy Ranganathan, and Ratnesh K. Sharma. Making scheduling "cool": Temperature-aware workload placement in data centers. In Proc. USENIX Annual Technical Conference, pages 61-75, April 2005. [ bib | .pdf | .pdf ]
[hedi05] Taliver Heath, Bruno Diniz, Enrique V. Carrera, Wagner Meira Jr., and Ricardo Bianchini. Energy conservation in heterogeneous server clusters. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'05), pages 186-195, 2005. [ bib | DOI | .pdf ]
How to distribute work in a cluster given that different nodes may have different performance and power characteristics. Objective is to minimize power consumption per unit of throughtput. Test implementation is in a cluster web server, and control is achieved by re-distributing the workload among the cluster nodes. Two distribution mechanisms are used: a simple front-end load balancer, and a peer-to-peer mechanism for redistributing requests among servers.
[meag05] Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. Efficient computation of frequent and top-k elements in data streams. In Proc. International Conference on Database Theory (ICDT), January 2005. [ bib | .pdf | http ]
[stab05] Michael Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Samuel Madden, Elizabeth J. O'Neil, Patrick E. O'Neil, Alex Rasin, Nga Tran, and Stanley B. Zdonik. C-store: A column-oriented DBMS. In Proc. Int'l Conf. on Very Large Data Bases, pages 553-564, 2005. [ bib | .pdf | .pdf ]
[zhha05] Ning Zhang, Peter J. Haas, Vanja Josifovski, Guy M. Lohman, and Chun Zhang. Statistical learning techniques for costing XML queries. In Proc. International Conference on Very Large Data Bases (VLDB'05), pages 289-300, 2005. [ bib | .pdf | .pdf ]
[zhko05] Rui Zhang, Nick Koudas, Beng Chin Ooi, and Divesh Srivastava. Multiple aggregations over data streams. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'05), pages 299-310, 2005. [ bib | DOI | .pdf ]
[bhtr04] Suparna Bhattacharya, John Tran, Mike Sullivan, and Chris Mason. Linux AIO performance and robustness for enterprise workloads. In Linux Symposium, pages 63-78, 2004. [ bib | .pdf ]
[dihe04] Yixin Diao, Joseph L. Hellerstein, Adam J. Storm, Maheswaran Surendra, Sam Lightstone, Sujay S. Parekh, and Christian Garcia-Arellano. Incorporating cost of control into the design of a load balancing controller. In IEEE Real-Time and Embedded Technology and Applications Symposium, 2004. [ bib | .pdf ]
[lech04] Byung Suk Lee, Li Chen, Jeff Buzas, and Vinod Kannoth. Regression-based self-tuning modeling of smooth user-defined function costs for an object-relational database management system query optimizer. The Computer Journal, 47(6):673-693, 2004. [ bib | .pdf ]
Builds a cost model by tracking costs of recent UDF invocations, including their costs and values of cost-related parameters, and then fitting a model to these data. Includes discussion of statistical issues like collinearity and removal of outliers and collinearity.
[likr04] Jinyuan Li, Maxwell Krohn, David Maziéres, and Dennis Shasha. Secure untrusted data repository (SUNDR). In Proc. USENIX Conf. on Operating Systems Design and Implementation, pages 121-136, 2004. [ bib | .pdf | .pdf ]
[mamu04] John MacCormick, Nick Murphy, Marc Najork, Chandramohan A. Thekkath, and Lidong Zhou. Boxwood: abstractions as the foundation for storage infrastructure. In Proc. of the Symp. on Operating System Design and Implementation (OSDI'04), 2004. [ bib | .pdf ]
[razh04] Amira Rahal, Qiang Zhu, and Per-Ake Larson. Evolutionary techniques for updating query cost models in a dynamic multidatabase environment. VLDB Journal, 13(2):162-176, 2004. [ bib | .pdf ]
Considers cost models as linear functions of a set of explanatory variables for each query class. Initial model is constructed by regression over an initial set of labeled cost samples. Proposes two methods to incrementally maintain such models by folding in new samples and removing the effects of old samples. Assumes that queries from the application workload are labeled and used to train the model.
[akam04] A developers guide to on-demand distributed computing. Akamai white paper, 2004. [ bib | .pdf ]
[pobe03] Rachel Pottinger and Philip A. Bernstein. Merging models based on given correspondences. In Proceedings of the 29th International Conference on Very Large Data Bases, pages 826-873, September 2003. [ bib | .pdf | .pdf ]
[arha03] Walid G. Aref, Moustafa A. Hammad, Ann Christine Catlin, Ihab F. Ilyas, Thanaa M. Ghanem, Ahmed K. Elmagarmid, and Mirette S. Marzouk. Video query processing in the VDBMS testbed for video database research. In ACM International Workshop on Multimedia Databases (MMDB'03), pages 25-32, 2003. [ bib | DOI | .pdf ]
[crjo03] Charles D. Cranor, Theodore Johnson, Oliver Spatscheck, and Vladislav Shkapenyuk. Gigascope: A stream database for network applications. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD'03), pages 647-651, 2003. [ bib | .pdf ]
[gada03] Lei Gao, Mike Dahlin, Amol Nayate, Jiandan Zheng, and Arun Iyengar. Application specific data replication for edge services. In Proc. Int'l Conf. on World Wide Web (WWW'03), pages 449-460, 2003. [ bib | DOI | .pdf ]
[ghgo03] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. In Proc. Symposium on Operating System Principles (SOSP'03), pages 29-43, 2003. [ bib | .pdf ]
Discusses file system optimized for relatively small number of large files. Workload is large sequential reads and appends. Node failures are normal. Throughtput is more important than latency. Architecture has a single master and many chunk servers. Master stores metadata (namespace, access controls). Chunks are replicated. Local storage on each chunk node is via a Linux file system. Clients do metadata operations through master, then go directly to chunk servers for data retrieval. Implements a weak consistency model. Metadata operations are atomic and serialized. Concurrent writes of the same file range may get mixed, not serialized. Concurrent appends may lead to duplication. To update a chunk, client first determines (from the master or its cache) the locations of all replicas of the chunk. It first send the update to all replicas. It then sends a write request to the master replica, which serializes all such requests. The master forwards the request serialization order to the other replicas, which apply the updates (they already have) in master-chosen order.
[guel03] Isabelle Guyon and Andre Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157-1182, 2003. [ bib | .pdf ]
[ilar03] Ihab F. Ilyas, Walid G. Aref, and Ahmed K. Elmagarmid. Supporting top-k join queries in relational databases. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB'03), pages 754-765, 2003. [ bib | .pdf | .pdf ]
Assumes joined tuples are ranked according to a monotone function of tuple ranks of join inputs. Defines physical join operators that can produce join results in rank order. Operator needs to queue up join results until it can be certain that it will produce them in the proper order.
[shba03] Ratnesh K. Sharma, Cullen E. Bash, Chandrakant D. Patel, Richard J. Friedrich, and Jeffrey S. Chase. Balance of power: Dynamic thermal management for internet data centers. Technical Report HPL-2003-5, HP Laboratories, Palo Alto, California, 2003. [ bib | .pdf | .pdf ]
Describes a methodology for thermal load balancing in server rooms. Thermal imbalances can be caused by imbalanced distribution of server workload and by peculiarities of the airflow in the server room, e.g., racks at the end of a row may be hotter than racks in the middle. Input includes server exhaust temperature readings and cold air temperature. Local thermal imbalances can be corrected by adjusting the allocation of work to the various servers.
[brko02] Nicolas Bruno, Nick Koudas, and Divesh Srivastava. Holistic twig joins: optimal XML pattern matching. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 310-321, 2002. [ bib | .pdf ]
A technique for finding twig query matches without first matching individual binary subrelationships in the twig, i.e., this is an N-way structural join. Description of related work is a succinct classification of previous work on twig query processing.
[gily02] Seth Gilbert and Nancy Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51-59, 2002. [ bib | DOI | .pdf ]
[mena02] Daniel A. Menascé. TPC-W: A benchmark for E-Commerce. IEEE Internet Computing, 6(3):83-87, 2002. [ bib | .pdf ]
[pibi01] Eduardo Pinheiro, Ricardo Bianchini, Enrique Carrera, and Taliver Heath. Load balancing and unbalancing for power and performance in cluster-based systems. In Proc. Workshop on Compilers and Operating Systems for Low Power, September 2001. [ bib | .ps.gz | .ps.gz ]
Automatic power management in a cluster of servers by concentrating load on as few machines as possible and turning others off. Implemented in a web server and in a cluster operation system.
[boco01] P. Bohrer, D. Cohn, E.N. Elnozahy, T. Keller, M. Kistler, C. Lefurgy, R. Rajamony, F. Rawson, and E. V. Hensbergen. Energy conservation for servers. In Proc. IEEE Workshop on Power Management for Real-Time and Embedded Systems, May 2001. [ bib | .pdf | .pdf ]
A brief general overview of the the problem of energy conservation in data centers.
[horn01] Paul Horn. autonomic computing: IBM's perspective on the state of information technology. Technical report, International Business Machines Corporation, Armonk, NY, USA, 2001. [ bib | .pdf ]
[poha01] Rachel Pottinger and Alon Y. Halevy. MiniCon: A scalable algorithm for answering queries using views. VLDB Journal, 10(2-3):182-198, 2001. [ bib | .pdf | .pdf ]
[rodr01] Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proc. IFIP/ACM International Conference on Distributed Systems Platforms (Middleware'01), pages 329-350, 2001. [ bib | .pdf ]
[stmo01] Ion Stoica, Robert Morris, David R. Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. ACM SIGCOMM Conference, pages 149-160, 2001. [ bib | DOI | .pdf | .pdf ]
[brew00] Eric A. Brewer. Towards robust distributed systems. Keynote presentation, ACM Symposium on Principles of Distrbuted Computing (PODC), July 2000. [ bib | .pdf | .pdf ]
Presentation of the CAP conjecture.
[beha00] Philip A. Bernstein, Alon Y. Halevy, and Rachel Pottinger. A vision of management of complex models. SIGMOD Record, 29(4):55-63, 2000. [ bib | .pdf | .pdf ]
[grbr00] Steven D. Gribble, Eric A. Brewer, Joseph M. Hellerstein, and David E. Culler. Scalable, distributed data structures for internet service construction. In Proc. of the Symp. on Operating System Design and Implementation (OSDI'00), pages 319-332, 2000. [ bib | .pdf ]
[yuva00] Haifeng Yu and Amin Vahdat. Design and evaluation of a continuous consistency model for replicated services. In Proc. of the Symp. on Operating System Design and Implementation (OSDI'00), pages 21-21, 2000. [ bib | .pdf ]
[grgr97] Jim Gray and Goetz Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Record, 26(4):63-68, December 1997. [ bib | DOI ]
[pesp97] Karin Petersen, Mike J. Spreitzer, Douglas B. Terry, Marvin M. Theimer, and Alan J. Demers. Flexible update propagation for weakly consistent replication. In Proc. of the ACM Symp. on Operating Systems Principles (SOSP'97), pages 288-301, 1997. [ bib | DOI | .pdf ]
[tsso96] Odysseas G. Tsatalos, Marvin H. Solomon, and Yannis E. Ioannidis. The GMAP: a versatile tool for physical data independence. The VLDB Journal, 5:101-118, 1996. [ bib ]
[lesi92] Eliezer Levy and Avi Silberschatz. Incremental recovery in main memory database systems. IEEE Transactions on Knowledge and Data Engineering, 4(6):529-540, December 1992. [ bib | .pdf ]
Incremental, page-at-a-time database recovery on demand, rather than recovery of entire DB before transaction processing resumes. Disk portion of log has redo records only, grouped by the page they refer to. Parallel processes flush log records to disk and also apply logged updates to disk version of the page. Buffer manager also does page flushes (no steal policy). Safe-fetch rule says log updates are only applied to (disk copies) of pages that are in the buffer pool - not clear how this helps. Relies on non-vol RAM to store a map indicating which pages may be stale after a crash, so that they can be brought up to date before being read in after a failure. Targets shared memory multiprocessor, where logger and propagator (applies log updates to pages) can run on their own processors. Seems to rely on page-level locking for correctness.
[degr92] David J. DeWitt and Jim Gray. Parallel database systems: The future of high-performance database systems. Communications of the ACM, 35(6):85-98, 1992. [ bib | .pdf ]
Discusses scale-up and speed-up as two distinct parallelism objectives. Discusses shared-memory, shared-disk, and shared-nothing architectures and argues that the latter will provide the best scalability because it places the least demands on the interconnection network because interaction is minmized. Discusses data partitioning and parallelization of relational query operators.
[grae90] Goetz Graefe. Encapsulation of parallelism in the volcano query processing system. In Proc. ACM SIGMOD Int'l Conf. on Management of Data, pages 102-111, 1990. [ bib | DOI | .pdf ]
[gr89] The Tandem Database Group. NonStop SQL, a distributed high performance, high availability implementation of SQL. In D. Gawlick, M. N. Haynie, and A. Reuter, editors, Proc. 2nd Int'l Workshop on High Performance Transaction Systems, volume 359 of Lecture Notes in Computer Science, pages 60-104. Springer-Verlag, 1989. workshop dates September 28-30, 1987. [ bib | .pdf | .pdf ]
[okli88] B. Oki and B. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In ACM Symp. on Principles of Distributed Computing, 1988. [ bib ]
[grpu86] Jim Gray and Franco Putzolu. The 5 minute rule for trading memory for disk accesses and the 5 byte rule for trading memory for cpu time. Technical Report 86.1, Tandem Computers, May 1986. Original report was May 1985. [ bib | .pdf | .pdf ]
[lamp78] Leslie Lamport. Time, clocks and the ordering of events in a distributed system. Communications of the ACM, 21(7):558-565, July 1978. [ bib | .pdf ]
[mage70] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78-117, June 1970. [ bib | DOI | .pdf ]
Includes a proof of optimality of the MIN algorithm.
[xero11] Xeround. Xeround cloud database, part 1 - technology. Xeround white paper. downloaded March 2011. [ bib | .pdf | .pdf ]
Multiple MySQL front ends, replicated partitioned data. Assignment of partitions to nodes can be adjusted to support elastic scale-out. Supports distributed query execution. Company offers database service, rather than software.