16
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
16
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
Publié par
Langue
English
Understanding and Validating Database System Administration
Fabio· Oliveira, Kiran Nagaraja, Rekha Bachwani,
Ricardo Bianchini, Richard P. Martin, and Thu D. Nguyen
Department of Computer Science
Rutgers University, Piscataway, NJ 08854
Abstract typically not maskable by redundancy (as in an underly-
ing RAID subsystem) or standard fault-tolerance mech-
A large number of enterprises need their commodity anisms (such as a primary-backup scheme). Thus, DBA
database systems to remain available at all times. Al- mistakes are frequently exposed to the surrounding sys-
though administrator mistakes are a signi cant source tems, database applications and users, causing unavail-
of unavailability and cost in these systems, no study to ability and potentially high revenue losses.
date has sought to quantify the frequency of mistakes in
Previous work has categorized DBA mistakes into
the eld, understand the context in which they occur, or
broad classes and across different DBMSs [10]. How-
develop system support to deal with them explicitly. In
ever, no previous work has quanti ed the frequency of
this paper, we rst characterize the typical administrator
the mistakes in the eld, characterized the context in
tasks, testing environments, and mistakes using results
which they occur, or determined the relationship between
from an extensive survey we have conducted of 51 expe-
DBA experience and mistakes. Furthermore, no previous
rienced administrators. Given the results of this survey,
work has developed system support to deal with DBA
we next propose system support to validate administra-
mistakes explicitly.
tor actions before they are made visible to users. Our
In this paper, we address these issues in detail. We rst
prototype implementation creates a validation environ-
characterize (in terms of class and frequency) the typical
ment that is an extension of a replicated database sys-
DBA tasks, testing environments, and mistakes, using re-
tem, where administrator actions can be validated using
sults from an extensive survey we have conducted of 51
real workloads. The prototype implements three forms
DBAs with at least 2 years of experience. Our survey re-
of validation, including a novel form in which the behav-
sponses show that tasks related to recovery, performance
ior of a database replica can be validated even without an
tuning, and database restructuring are the most common,
example of correct behavior for comparison. Our results
accounting for 50% of the tasks performed by DBAs. Re-
show that the prototype can detect the major classes of
garding the frequency of mistakes, the responses suggest
administrator mistakes.
that DBA mistakes are responsible (entirely or in part)
for roughly 80% of the database administration problems
reported. The most common mistakes are deployment,1 Introduction
performance, and structure mistakes, all of which occur
Most enterprises rely on at least one database manage- once per month on average. These mistakes are caused
ment system (DBMS) running on commodity computers mainly by the current separation of and differences be-
to maintain their data. A large fraction of these enter- tween testing and online environments.
prises, such as Internet services and world-wide corpo- Given the high frequency of DBA mistakes, we next
rations, need to keep their databases operational at all propose system support to validate DBA actions before
times. Unfortunately, doing so has been a dif cult task. exposing their effects to the DBMS clients. As we de-
A key source of unavailability in these systems is scribed in [16], the key idea of validation is to check the
database administrator (DBA) mistakes [10, 15, 20]. correctness of human actions in a validation environment
Database administration is mistake-prone as it involves that is an extension of the online system. In particular,
many complex tasks, such as storage space management, the components under validation, called masked compo-
database structure management, and performance tun- nents, are subjected to realistic (or even live) workloads.
ing. Even worse, as shall be seen, DBA mistakes are Critically, their state and con gurations are not modi edwhen transitioning from validation to live operation. 2 Related Work
In [16], we proposed trace and replica-based valida-
tion for Web and application servers. Both techniques Database administration mistakes. Only a few papers
rely on samples of correct behavior. Trace-based vali- have addressed database administrator mistakes in de-
dation involves periodically collecting traces of live re- tail. In two early papers [11, 12], Gray estimated the
quests and replaying the trace for validation. Replica- frequency of DBA mistakes based on fault data from de-
based validation involves designating each masked com- ployed Tandem systems. However, whereas today’s sys-
ponent as a mirror of a live component. All requests tems are mostly built from commodity components, the
sent to the live component are then duplicated and also Tandem systems included substantial custom hardware
sent to the mirrored, masked component. Results from and software for tolerating single faults. This custom
the masked component are compared against those pro- infrastructure could actually mask several types of mis-
duced by the live component. Here, we extend our work takes that today’s systems may be vulnerable to.
to deal with DBMSs by modifying a database clustering
The work of Gil et al. [10] included a categorization of
middleware called Clustered-JDBC (C-JDBC) [7]. administrator tasks and mistakes into classes, and a com-
Furthermore, we propose a novel form of validation, parison of their speci c details across different DBMSs.
called model-based validation, in which the behavior of Vieira and Madeira [20] proposed a dependability bench-
a masked component can be validated even when we do mark for database systems based on the injection of ad-
not have an example of correct behavior for comparison. ministrator mistakes and observation of their impact. In
In particular, we use model-based validation to verify ac- this paper, we extend these contributions by quantify-
tions that might change the database structure. ing the frequency of the administrator tasks and mistakes
We evaluate our prototype implementation by running in the eld, characterizing the testing environment ad-
a large number of mistake-injection experiments. From ministrators use, and identifying the main weaknesses of
these experiments, we nd that the prototype is easy to DBMSs and support tools with respect to database ad-
use in practice, and that validation is effective in catching ministration. Furthermore, our work develops system
a majority of the mistakes the surveyed DBAs reported. support to deal with administrator mistakes, which these
In particular, our validation prototype detected 19 out of previous contributions did not address.
23 injected mistakes, covering all classes of mistakes re-
Internet service operation mistakes. A few more pa-ported by the surveyed DBAs.
pers have addressed operator mistakes in Internet ser-In summary, we make three main contributions:
vices. The work of Oppenheimer et al. [17] considered
the universe of failures observed by three commercial
We present a wealth of data on the behavior of expe-
services. With respect to operators, they broadly cat-
rienced administrators of real databases. This con-
egorized their mistakes, described a few example mis-
tribution is important in that actual data on DBA
takes, and suggested some avenues for dealing with
mistakes is not publicly available, due to commer-
them. Brown and Patterson [4] proposed undo as a way
cial and privacy considerations.
to rollback state changes when recovering from operator
mistakes. Brown [3] performed experiments in which he
We propose model-based validation for the situa-
exposed human operators to an implementation of undo
tions when the behavior of the components affected
for an email service hosted by a single node. In [16],
by the DBA actions is supposed to change and there
we performed experiments with volunteer operators, de-
are no instances of correct behavior for comparison.
scribing all of the mistakes we observed in detail, and
designing and implementing a prototype validation in- We implement a realistic validation environment for
frastructure that can detect and hide a majority of thedealing with DBA mistakes. We demonstrate the
mistakes. In this paper, we extend these previous con-bene ts of the prototype through an extensive set of
tributions by considering mistakes in database adminis-
mistake-injection experiments.
tration and introducing a new validation technique.
The remainder of the paper is organized as follows. Validation. We originally proposed trace and replica-
The next section describes the related work. Section 3 based validation for Web and application servers in In-
describes our survey and analyzes the responses we re- ternet services [16]. Trace-based validation is similar
ceived. Section 4 describes validation and our prototype. in avor to fault diagnosis approaches [1, 8] that main-
Section 5 presents our validation results. In Section 6, tain statistical models of normal component behavior
we broaden the discussion of the DBA mistakes and the and dynamically inspect the service execution for devi