Basic concepts of the relational model 1 • relation, attribute, tuple cf file, field type, record occurrence Relational Database Models relations have a degree (= # of attributes) and cardinality (= # of tuples) intensional view of relation = time-independent aspect extensional view = current state of relation contents Basic Concepts Relational Theory • keys: primary, candidate, alternate 4/20/2005 1 CS319 Theory of Databases Basic concepts of the relational model 2 4/20/2005 Constraints 1. every 'file' contains only one record type Key constraints 3. each record occurrence has a unique identifier CS319 Theory of Databases Basic concepts of the relational model 3 Relation = special kind of file? 2. each record occurrence in a 'file' has same number of fields cf "OCCURS DEPENDING ON” in COBOL 2 i.e. constraints implied by the existence of candidate keys (as specified in DB intension) • uniqueness of tuples with given key • attributes in primary keys non-null 4. within a 'file', record occurrences have an unspecified ordering, or are ordered according to values assoc with occurrences (needn't be by primary key) 4/20/2005 3 CS319 Theory of Databases 4/20/2005 4 CS319 Theory of Databases Basic concepts of the relational model 4 Basic concepts of the relational model 5 Constraints ... Constraints … Referential constraints Integrity constraints Intension (indirectly) gives a specification of foreign keys in a relation (as in the supplier-parts relation, with tuples of the form (S#, P#, QTY)) The use of keys for supplier and parts in this way independently constrains the S# and P# attributes to values that are either null or designate uniquely identified entities 4/20/2005 5 CS319 Theory of Databases Summary: Basic concepts of relational model Relation: relation, attribute, tuple Relation as analogue of file: cf. file, field type, record Relational scheme for a database: cf. file system - Degree and cardinality of a relation Certain constraints are imposed by the semantics of the data. E.g. person’s height is positive, date-of-birth won't normally be a future date etc Real-world constraints can be too rich to express: • hard to capture type of real-world observables • have data dependent constraints, motivating triggers 4/20/2005 6 CS319 Theory of Databases Query Languages for Relational Databases 1 Issue: how do we model data extraction formally? E.F. (“Ted”) Codd is the pioneer of relational DBs Early papers: 1969, 70, 73, 75 Two classes of query language: algebra / logic - Intensional & extensional views of a relational scheme 1. Algebraic languages a query = evaluating an algebraic expression Keys: primary, candidate, foreign 2. Predicate Calculus languages a query = finding values satisfying predicate Constraints: key, referential, integrity 4/20/2005 7 CS319 Theory of Databases 4/20/2005 8 CS319 Theory of Databases Query Languages for Relational Databases 2 Query Languages for Relational Databases 3 Issue: how do we model data extraction formally? Examples of Query Languages 2. Predicate Calculus languages a query = finding values satisfying predicate algebraic: ISBL - Information System Base Language Two kinds of predicate calculus language tuple relational calculus: QUEL, SQL Terms (primitive objects) tuples xor domain values: domain relational calculus: QBE - Query by Example • tuples ⇒ tuple relational calculus • domain values ⇒ domain relational calculus 4/20/2005 9 CS319 Theory of Databases Issue: how are these languages to be compared? 4/20/2005 Query Languages for Relational Databases 4 10 CS319 Theory of Databases Relational Algebra 1 Issue: how are query languages to be compared? Relational Algebra Answer (Codd) algebra = underlying set with operations on it Can formulate a notion of completeness, and show that the core queries in these languages have equivalent expressive power elements of the underlying set are referred to as "elements of the algebra" relational algebra = set of relations + ops on relations • mathematical notion, based on relational algebra • in practice, is a basic measure of expressive power: practical query languages are ‘more than complete’ 4/20/2005 11 CS319 Theory of Databases cf set of polynomials with addition and multiplication 4/20/2005 12 CS319 Theory of Databases Relational Algebra 2 Relational Algebra 3 … relational algebra = set of relations + ops on relations Mathematical relation is an abstraction Definition: a (mathematical) relation is a subset of D1 × D2 × .... × Dr where D1, D2, ...., Dr are domains • types are restricted to mathematical types e.g. height, weight and currency all numerical data Typical element of a relation is (d1, d2, ...., dr) where di ∈ Di for 1 Ω i Ω r D1 × D2 × .... × Dr is the type of the relation .... ‘abstract’ expressive power unchanged r is the arity of the relation 4/20/2005 13 • components of a mathematical relation are indexed don't use named attributes in the mathematical treatment - in effect, named attributes just make it more convenient to specify relational expressions CS319 Theory of Databases 4/20/2005 Relational Algebra 4 14 CS319 Theory of Databases Relational Algebra 5 Basic algebraic operations on relations Basic algebraic operations on relations ... 1. Union 3. Cartesian Product R ∪ S defined when R and S have same type R ∪ S = union of the sets of tuples in R and S R × S is of type D1 × D2 × .... × Dr × E1 × E2 × .... × Es 2. Set Difference R – S defined when R and S have same type R – S is the set of tuples in R but not in S 4/20/2005 R of type D1 × D2 × .... × Dr S of type E1 × E2 × .... × Es 15 CS319 Theory of Databases R × S is the set of tuples of the form (d1, d2, ...., dr, e1, e2, ...., es) where (d1, d2, ...., dr) ∈ R, (e1, e2, ...., es) ∈ S 4/20/2005 16 CS319 Theory of Databases Relational Algebra 6 Relational Algebra 7 Basic algebraic operations on relations … Basic algebraic operations on relations … 4. Projection 5. Selection ∏i(1), i(2), ..., i(t) (R) is defined whenever R has arity r and i(j)'s are distinct indices with 1 Ω i(j) Ω r for 1 Ω j Ω t For a tuple, projection is defined by ∏i(1), i(2), ..., i(t) (d1, d2, ...., dr) = (di(1), di(2), ..., di(t)) ∏i(1), i(2), ..., i(t)(R) = set of distinct projections of tuples in R 4/20/2005 17 CS319 Theory of Databases Let F be a logical propositional expression made up of elementary algebraic conditions. σF(R) is the set of tuples t in R whose components satisfy the condition F(t). In the absence of attribute names, refer to components of tuples by index in F e.g. σ1="London" ∨ 1="Paris" (R) refers to set of tuples whose first component is either London or Paris 4/20/2005 18 Relational Algebra 8 Relational Algebra 9 Simple examples of basic operations Simple examples of basic operations ... R: R: x a a R∪S: x a a x 4/20/2005 y b y z c c S: union y b y y x a R – S: z c c t 19 y b t c difference/minus x a CS319 Theory of Databases y y z c CS319 Theory of Databases R×S: x x a a a a 4/20/2005 x y z a a b y c c y y b b y y z z c c c c x a x a x a 20 S: x y t a b c cartesian product y t b c y t b c y t b c CS319 Theory of Databases Relational Algebra 10 Relational Algebra 11 Simple examples of basic operations … ∏2, 3 (R): ∏3 (R): y b y z c c R: z c Summary of basic operations … x y z a b c a y c projection … note that duplicates are deleted σ1="a” (R): a a b y 4/20/2005 c c 21 selection CS319 Theory of Databases Codd’s definition of completeness: a query language is complete if it can simulate all 5 basic operations on relations 4/20/2005 Relational Algebra 12 Use of attribute names 22 CS319 Theory of Databases Relational Algebra 13 Definition: In practical use of query languages, commonly use attribute names to define operations, e.g. - projection onto specific attribute names - identification of components in selection - making distinctions between domains - forming natural joins Claim: none of these devices specifies operations that can't be derived from the basic ones 4/20/2005 R∪S R–S R×S ∏i(1), i(2), ..., i(t) (R) σF(R) 1. Union 2. Set Difference 3. Cartesian Product 4. Projection 5. Selection 23 CS319 Theory of Databases a derived operation in an algebraic system is an operation that is expressible in terms of standard operations of the algebra e.g. sq( ) is derived from * via sq(x)=x*x Derived operations on relations include • intersection • quotient • join • natural join 4/20/2005 24 CS319 Theory of Databases Relational Algebra 14 Relational Algebra 15 Derived relational operations … Derived relational operations … 6. Intersection of relations of same type 8. Join A join of R and S is defined as the subset of R × S for which there is an arithmetic relation (<, Ω, =,Δ , >) between the i-th component of R and the j-th component of S R ∩ S ≡ R – (R – S) defines tuples common to R and S 7. Quotient R / S ≡ "inverse of cartesian product" specifies T where T × S = R, when such T exists! In general, R / S ≡ set of tuples t such that (that is, "t concatenated with s") is in R for all s in S 4/20/2005 25 CS319 Theory of Databases Most important kind of join is the equijoin R ∗ S ≡ σi=j(R × S ) i=j A join is a selection from Cartesian product 4/20/2005 Relational Algebra 16 Derived relational operations … 26 CS319 Theory of Databases Relational Algebra 17 Derived relational operations … 9. The Natural Join In practice, Cartesian product often generates relations that are too large to be computed efficiently More practical operation to join relations is natural join. Definition of natural join refers to equality of domains ⇒ simplest to describe w.r.t. named attributes natural join = "equijoin without duplicate columns" 4/20/2005 27 CS319 Theory of Databases Derive the natural join R ∗ S by • forming product R × S • selecting those tuples (r,s) where r and s have same values for all common attributes • making a projection to remove duplicate columns that correspond to these common attributes R ∗ S = ∏i(1), i(2), ..., i(m) σΛ(r.x=s.x)(R × S) with an appropriate choice of indices i(j) & attributes x 4/20/2005 28 CS319 Theory of Databases Summary of Relational Algebra concepts Primitive operations: • • • • • ISBL - Information System Base Language R∪S R–S R×S ∏i(1), i(2), ..., i(t) (R) σF(R) 1. Union 2. Set Difference 3. Cartesian Product 4. Projection 5. Selection Derived operations: intersection, natural join, quotient Codd’s definition of completeness: a query language is complete if it can simulate all 5 basic operations on relations 4/20/2005 ISBL: A Relational Algebra Query Language 1 29 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 2 Devised by Todd in 1976 IBM Peterlee Relational Test Vehicle (PRTV) PL/1 environment with query language ISBL One of the first relational query languages … closely based on relational algebra The six basic operations in ISBL are union, difference, intersection, natural join, projection and selection 4/20/2005 30 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 3 Operators in ISBL are ‘+’, ‘-’, ‘%’, ‘:’ and ‘*’ . Comparison: Relational Algebra vs ISBL R+S union of relations R - S difference operation with extended semantics R % A, B, ... , Z projection onto named attributes R : F selection of tuples subject to boolean formula F R . S intersection R * S natural join R∪S R–S R×S ∏i(1), i(2), ..., i(t) (R) σF(R) contrived derived op R - S is defined whenever R and S have some attribute names in common: delete tuples from R that agree with S on all common attributes. To prove completeness of ISBL, enough to show that can express Cartesian product using the ISBL operators - return to this issue later 4/20/2005 31 CS319 Theory of Databases 4/20/2005 R+S R-S subsumes no direct counterpart R % A, B, ... , Z R:F R*S 32 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 4 Example ISBL query to specify the composition of two binary relations R(A,B) and S(C,D) where A,B,C,D are attributes defined over the same domain X (as when defining composition of functions X!X): ISBL as a query language Two types of statement in ISBL LIST R = print the value of exp assign value of exp to relation R In this context, R is a variable whose value is a relation Notation: use R(A,B,...,Z) to refer to a relation with attributes A, B, ..., Z 4/20/2005 33 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 6 Assignment and call-by-value After the assignment RCS = (R * S) : B=C % A, D the variable RCS retains its assigned value whatever happens to the values of R and S Hence all subsequent "LIST RCS" requests obtain same value until reassignment cf call-by-value parameter passing mechanisms 4/20/2005 35 ISBL: A Relational Algebra Query Language 5 CS319 Theory of Databases Specify composition of R and S as RCS, where RCS = (R * S) : B=C % A, D In this case: R * S = R × S because attribute names (A, B), (C, D) are disjoint [cf. completeness of ISBL] Illustrates archetypal form of query definition: projection of selection of join 4/20/2005 34 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 7 Delayed evaluation and call-by-name • have a delayed evaluation mechanism to change the semantics of assignment cf. a "definitive notation" or a spreadsheet definition • to delay the evaluation of the relation named R in an expression, use N!R in place of R RCS = (N!R * N!S) : B=C % A, D • this means that the variable RCS is evaluated on a call-by-name basis: i.e. it’s value is computed as required using the current values of R and S • whenever the user invokes "LIST RCS" in this case, the value of RCS is re-computed 4/20/2005 36 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 8 ISBL: A Relational Algebra Query Language 9 Uses for delayed evaluation Renaming • definition of views is facilitated For union & intersection, attribute names must match e.g. R(A,B) + S(A,C) is undefined etc. • allows incremental definition of complex expressions: use sub-expressions with temporary names, supply extensional part later • useful for optimisation: assignment means immediate computation at every step, delayed evaluation allows intelligent updating of values 4/20/2005 37 CS319 Theory of Databases Tensions between theory and practice in ISBL • Mathematical relations abstract away certain characteristics of data that are important to the human interpreter – e.g. types, order for table inspection • Certain activities that are an essential part of data processing, such as updating relations, forming aggregates etc are not easy to describe formally • Classical algebra uses homogeneous data types, doesn’t deal elegantly with exceptions 3/0 = ? etc 4/20/2005 39 CS319 Theory of Databases To overcome this can rename attributes of R by (R%A, B → C) This project-and-rename creates relation R(A,C). Can use this to make attributes of R & S disjoint, so that R * S = R × S, proving that ISBL is a complete query language 4/20/2005 38 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 10 Limitations of ISBL ISBL is complete, but lacks features of QUEL, SQL etc e.g. no aggregate operators no insertion, deletion and modification Primarily a declarative query language Address these issues in the PRTV environment - user can also access relations via the general-purpose programming language PL/1 4/20/2005 40 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 11 ISBL: A Relational Algebra Query Language 12 Illustrative examples of ISBL use Illustrative examples of ISBL use MEMBERS(NAME, ADDRESS, BALANCE) ORDERS(ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) Refer to the Happy Valley Food Company [Ullman 82] 1. Print the names of members in the red: Relations in this DB are: MEMBERS(NAME, ADDRESS, BALANCE) ORDERS(ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) 4/20/2005 41 CS319 Theory of Databases LIST MEMBERS : BALANCE < 0 % NAME i.e. select members with negative balance and project out their names 4/20/2005 42 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 13 ISBL: A Relational Algebra Query Language 14 Illustrative examples of ISBL use Illustrative examples of ISBL use MEMBERS(NAME, ADDRESS, BALANCE) MEMBERS(NAME, ADDRESS, BALANCE) ORDERS(ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) ORDERS(ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS(SNAME, SADDRESS, ITEM, PRICE) OS = ORDERS * SUPPLIERS LIST OS: NAME="Brooks" % SNAME, ITEM, PRICE 2. (commentary on answer) Need two of the relations: SUPPLIERS required for supplier details ORDERS to know what Brooks has ordered The join OS holds tuples where item field contains item "ordered with associated order info” and "supplied by supplier with assoc supplier info" ... a simple example of project-select-join … tuples featuring Brooks' name correspond to an item ordered by Brooks with its associated supplier details 2. Print the supplier names, items & prices for suppliers who supply at least one item ordered by Brooks 4/20/2005 43 CS319 Theory of Databases 4/20/2005 44 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 15 ISBL: A Relational Algebra Query Language 16 3. ... suppliers supplying every item ordered by Brooks 3. Print suppliers who supply every item ordered by Brooks "Every item" is universal quantification Strategy: translate (∀x)(p(x)) to ¬(∃x)(¬p(x)) find suppliers who don't supply at least one of the items that is ordered by Brooks, and take the complement of this set of suppliers Notation: ∀ is “for all”, ∃ is “there exists”, ¬ is “not” 4/20/2005 45 CS319 Theory of Databases S = SUPPLIERS % SNAME I = SUPPLIERS % ITEM NS = (S * I) - (SUPPLIERS % SNAME, ITEM) • S records all supplier names, and I all items supplied • NS is the "does not supply" relation: all supplier-item pairs with pairs such that s supplies i eliminated Now specify items ordered by Brooks ... B = ORDERS : NAME="Brooks" % ITEM 4/20/2005 46 CS319 Theory of Databases ISBL: A Relational Algebra Query Language 17 3. … suppliers supplying every item ordered by Brooks NS = B = "doesn't supply" relation "items ordered by Brooks" ... find suppliers who don't supply at least one item in B NSB = NS.(S * B) .... set of (supplier, item) pairs such s doesn't supply i and Brooks ordered i. To follow … Relational Theory: Algebra and Calculus SQL review Answer is the complement of this set: S - NSB % SNAME 4/20/2005 47 CS319 Theory of Databases 4/20/2005 48 CS319 Theory of Databases