All notes in this journal are either from JEDI Database Systems by Ma. Rowena C. Solamo or from CS165 Lectures.


Week 1

(Nov. 23, 2010) - (Database Development Process and Components of Database Engines)


Database Development Process
Principles
  • organizations use resources that flow into them from the environment. They also provide resources by returning them to their environment (input/output)
  • two types of resource: physical resources and conceptual resources

Methodology
  • defines the how-to's to accomplish a design goal
  • modeling tools
    • Computer Aided Software Engineering tools - software products that provide automated support for some portions of the system's development process
    • Integrated CASE - set of CASE tools that support all the phases of the system's development process

Information Engineering Methodology
  • formal methodology that is used to create and maintain information systems
  • enterprise-wise, data-driven (develop application based on how the data is being managed and used)
    • Phases
      • Planning - align information technologies with business strategies
        • Steps:
          1. Identify strategic planning factors
          2. Identify planning objects
          3. Identify entity types
      • Analysis- develop specifications for information system; elaborate entity types
        • Steps:
          1. Develop conceptual data model
          2. Use Entity Relationship Diagrams (ERD)
      • Design- logical database design portion (tables)
        • Steps:
          1. Create designs
            • logical database design - maps conceptual model to database structure
            • physical database design

Components of a Database Management System
Services and functions of a DBMS
  • store, retrieve, update data
  • catalog for data dictionary (metadata)
  • transact (access or change contents)
  • update correctly (data backups)

(Nov. 25, 2010) - (How to configure JavaDB with NetBeans)


Before anything else, you have to download these softwares:
  1. NetBeans IDE
  2. JDK
  3. Java DB
After installation of software 1 and software 2 above, software 3 is now ready to be configured.

Follow these steps to configure Java DB with NetBeans:
  1. Run Java DB self-extracting file. Java DB would be installed in C:\Program Files\Sun\Java DB
  2. Create a new directory to be used as a home directory for individual instances of the database server. You can create this folder in the Java DB root directory or in any other location.
  3. To register this Database in NetBeans IDE:
  • Open your NetBeans IDE.
  • Select the 'Window' menu then select 'Services' (shortcut: Ctrl+5).
  • A window at the left side should appear and under Databases, right-click Java DB then choose Properties. The Java DB settings dialog would open.
  • Java DB installation text field should have the path to the Java DB root directory (in our experience, this field is already configured correctly).
  • Database location text field should have the path of the directory you created in step 2.
  • Now that you're done, click OK.

You are now ready to start a server and create a database!

Our group learned from this activity that Java DB is so easy to configure with NetBeans. We didn't encounter any problems and difficulties since configuration was not that complicated. Also, all three of us just needed to download Java DB since we already have NetBeans and JDK installed in our laptops. :)

This guide was made possible through http://netbeans.org/kb/docs/ide/java-db.html

(Nov. 25, 2010) - (Introduction to Relational Database System)


A relational database
  • consists of relations or tables.
  • consists of relational algebra and calculus

Relational algebra
- Primary key and foreign key relation

Primary key
  • to uniquely identify a row within a table
  • values uniquely defined within a column (not necessarily one column)

Foreign key
  • values are primary keys of another table

Relational Database Components
  1. Data structures - tables, rows, columns
  2. Language - Structured Query Language (SQL), simple and non-procedural
  3. Data Integrity

SQL statements
    • data definition- allows you to manipulate the structure of data
    • data manipulation- allows you to manipulate rows or records of a table
    • transaction control language - allows you to maintain integrity of data

DML
  • SELECT statement - retrieve records
  • INSERT statement - add records to a table
  • UPDATE statement - update records in a table
  • DELETE statement - delete recors from a table

  • SELECT clause
    • defines what columns of the tables are to be retrieved
  • FROM clause
    • identifies what table
  • *
    • return all columns
  • WHERE clause
    • conditions that rows must meet to be retrieved by the SELECT statement
    • like/ matches (more condition)
  • aggregate functions
    • apply to a column or row

(Sample SQL statements here.)




Week 2

(Nov. 30, 2010) - (Data Analysis Phase)



OUTPUT - conceptual model
business data model
Diagram: ERD/ Class Diagram
Technique: Situation Analysis
Method: Interview Transcript - summary of the interview
"What are the important things/ data of the system you need to keep track of"

Entity-relationship Modeling - involves identifying things of importance to an organization (entities), the properties of those things (attributes), and how they relate with one another (relationships).

ERD Basic Notation
  1. Entity - person, place, thing, event, concepts, etc. (Example: Dentist, Patient, Payments, Appointments)
  2. Attributes- properties or characteristics describing entity. (Example: Dentist - username, password. Patient - first name, middle name, last name)
  3. Relationships- associations between instances of an entity with another entity

Candidate Keys - an attribute or a combination of attributes that uniquely identify instances of an entity
Primary Key- the attribute that uniquely identifies instances of entities
Composite Key - primary key of 2 or more attribute
Foreign Key- attribute that is a primary key of another entity
Derivative Data - value is actually computed or derived

Degree of Relationship
  • unary - relationship between instances of one entity
  • binary - relationship between instances of 2 entities
  • ternary - 3 or more entity relationship

Cardinality of Relationship
  • mandatory one (one and only one)
  • optional one (zero or one)
  • many cardinality (one or many)
  • optional many (zero or many)

Situation Analysis
  • Entities
    • nouns in any situation
    • identify by asking what to keep track of
    • ask what uniquely identifies the entities
    • some nouns are attributes
  • Relationships
    • verbs of the situation
    • identify by asking relationships
    • ask the degree and cardinality of relationship
  • Attributes
    • modifiers of the situation
    • identify by asking characters of entity or relation they're interested
    • determine the attribute and entity (intuition -> normalization -> looking at prepositional phrase)

(Dec. 2, 2010) - (Primary key-Foreign key relationship assignment)


Relationship_diagram.jpg

Figure 1: Organic shop Relationship diagram



Week 3

(Dec. 7, 2010 and Dec. 10, 2010) - (Advanced ERD Modeling)


Enhanced ERD Modeling
  • Gerunds - are many-to-many relationships that models relationships of entities with another associated entity (entity that cannot exist without the first entity).

Modeling Multivalued Attributes
  • Multivalued Attributes - an attribute that can have more than one value.
  • Each multivalued attribute is converted as a separate entity type.
(Example: Dental Record - Teeth is converted as a separate entity type because teeth can have more than one value)

Modeling Repeating Group
  • Repeating group - two or more multivalued attributes that are logically related.
  • In case of a repeating group, we create another entity that houses these attributes.
(Example: Payment is a repeating group so it is separated from the Patient and it is created as another entity which has a relation to the Patient)

Modeling Time-Dependent Data
  • In case of a time-dependent data, another entity can be created that would allow to keep track of time.

(Sample ERD here.)




Week 4-5

(Jan. 11, 2011 - Jan. 20, 2011) - (The Relational Database Design)


The objectives of conducting Technical Reviews are to make sure the quality of the products to be delivered are of the best and expected quality of the clients. It is necessary to make sure that all errors are uncovered so that the engineers make necessary corrections to the said errors. Technical reviews can be formal or informal but there are certain guidelines to follow in conducting such a review.

Guidelines during a FORMAL TECHNICAL REVIEW


Before the Review
  • Producer of the product notifies the leader that the product is finished and needs review
  • Project leader contacts a review leader who generates copies of work product, and distributes it to other reviewers, evaluates product for readiness and establishes agenda for review meeting

During the Review
  • Producer introduces agenda and work product.
  • Producer (or designated reader) gives a walkthrough of the work product, reading the material while reviewers raise issues.
  • Recorder notes valid problems or errors discovered.
  • At end of review, all attendees vote-on what to do:
    1. Accept product without further modification (RARE !)
    2. Reject and Rework the product. Another review will be set up.
    3. Correct minor errors. Accept upon review leader approval.
  • All review attendees sign-off, indicating participation in review and concurrence with review team’s findings.

After the Review
  • Recorder prepares review summary report which includes:
    1. List of reviewers & role each played at the review
    2. Date, name of work product reviewed, producer name
    3. Reviewer prep time
    4. Time elapsed in review
    5. List of issues to be resolved with corresponding line #
    6. Outcome/Resolution
    7. Signatures of everyone involved (Sign-off)
  • Review reports placed under configuration management
  • Review Leader follows-up to ensure that open items are properly corrected according to resolution

What makes a good ERD (Data Model)
  1. Completeness
  2. Data Reusability
  3. Stability
  4. Flexibility
  5. Elegance
  6. Communication
  7. Integration
  8. Conflicting Objectives

Logical Database Design - process of trasforming conceptual data model into a logical data model.logical datbase model - design that conforms to the data model for a class of database management system.

4 types of logical databases
  1. Hierarchical Model -records arranged in top-down structure, resembles an upside-down tree, oftenly uses the terms parent and child, child can only have on parent
  2. Network Model - almost the same with hierarchical model except parent and child records cannot be distiguished, many to many relationship
  3. Relational Model - data are represented in table form with rows and columns, associations are represented logically by the values that are stored within the columns.
  4. Object-oriented Model - attributes are called the objects

RELATIONAL DATA MODEL-based on the mathematical concept of a relation which can be physically represented using a table

3 components of a relational model
  1. Data Structure - collection of objects or relations that store data.
  2. Data Manipulation - requires very powerful operations to manipulate the data stored.
  3. Data Integrity - maintains the accuracy and consistency of data.

Relationships
  1. Unary -a relation with only one attribute
  2. Binary - a relation with two attributes
  3. N-ary - relation with n attributes

Steps in Normalization
  1. First Normal Form (FNF) - a relation is in FNF if it contains no repeating groups
  2. Second Normal Form (SNF) - it is in SNF if it is in FNF and every nonkey attribute is fully functionally dependent on the primary key
  3. Third Normal Form (TNF) - a relation is in TNF it is SNF and there are no transitive dependencies existing
  4. Boyce-Codd Normal Form (BCNF) - a relation is in BCND if every determinant is a candidate key
  5. Fourth Normal Form - a relation is in the fourth normal form if it is in BCND and contains no nontrivial multi-valued dependencies
  6. Fifth Normal Form - if a relation has no join dependency

Transforming E-R Diagrams to Relations
  • Represent Entities
    • Primary key of the entity becomes the PK of the relation
      • value of PK must uniquely identify every row of a relation
      • PK should not be redundant. No attribute of the key can be deleted without destroying its unique identification
  • Represent Relationships
    • 1:N relationship: place the primary key on the one side entity as a foreign key on the many side entity
    • M:N relationship: the relationship of the two entities becomes a relation
    • Unary relationship: the entity type is modeled as a relation
    • Is-A relationship: create a separate relation for the class and for each of the subclass
    • Many-to-many: create a new relation having the primary key
  • Normalize the Relations
  • Merging Relations
    • Merge relations that refer to the same entity




Week 6-7

(Jan. 25, 2011 - Feb. 3, 2011) - (Physical Database Design)


Physical Database Design - process of mapping logical database structures into a set of physical database structure. (Example here.)
Decisions made during this stage have a major impact on data accessibility, response time, security, user-friendliness and similar factors.

Three Major Inputs to Physical Database Design
  1. Logical database structure (Relational Schema)
  2. User processing requirements
  3. Characteristics of DBMS and other components of the operating environment

Business Rules (in the context of analysis phase) or Integrity Constraints (in the context of design phase) - specifications that preserves the integrity of the logical data model.
Some advantages of placing integrity constraints:
  1. It provides faster application development with fewer errors.
  2. It reduces maintenance effort and expenditures.
  3. It promotes ease of use of the database.

Data Volume and Usage Analysis
Data Volume analysis - this involves using estimates of the database size to select physical storage devices and estimate the cost of storage.
Usage analysis - involves using estimates of usage paths or patterns to select file organizations and access methods to plan for the use of indexes and to plan a strategy for data distribution.

File Organization and Access Method
File Organization - technique for physically arranging the records of a file on a secondary storage device. It is a physical arrangement of data in a file into records and pages on secondary storage.

Two file organizations:
  • Sequential File Organization
  • Hashed File Organization

File Access Method - defines the steps involved in storing and retrieving records from a file

Three file access methods:
  • Sequential File Access Method
  • Indexed File Access Method
  • Random-access or Direct-access Method

Indexes - is a table that is used to determine the rows in a table that satisfy some condition.
We use indexes for applications that are used primarily to support data retrievals such as decision support applications.

Indexed Sequential Files
  • Indexed Sequential File Organization - records are stored sequentially by primary key value. This normally has a primary key, a separate index or indexes and an overflow area. (Example: phone director listing)
  • Indexed Non-sequential File Organization - records are stored non-sequentially, a full index is requires. (Example: books in the library)

Clustering Indexes - used when records in a file are often retrieved based on non-key attributes.

Multi-level indexing - attempts to reduce the range of searches. It treats the index like any other file, splits the index into a number of smaller indexes, and maintains an index to the indexes.

Denormalization - process of transforming normalized tables into unnormalized physical record specifications.
In general, this may:
  • combine columns from several tables together to form a physical record
  • partition a table into several physical records
  • do a combination of both

Common situations for which denormalization is considered:
  1. Two entities with a one-to-one relationship.
  2. A many-to-many relationship with nonkey attributes.
  3. Reference data.



Week 8-9

(Feb. 15, 2011 - Feb. 24, 2011) - (Database Implementation and Database Connectivity Concepts)



JDBC (Java Database Connectivity)
  • Java API that enables Java programs to execute SQL commands and interact with any SQL-compliant database

JDBC Design Pattern (Example here.)
  • two pattern views
    • Static view
      • object model of the pattern illustrated using a class diagram
      • Classes
        • PersistentCient - requests data from the database, normally control classes that asks something from an entity class
        • DBClass - communicates with the database
        • PersistentClassList - used to return a set of persistent objects as a result of a DB querry
        • PersistentClass - maps a record in the database

    • Dynamic View
      • shows how the classes fro the static view interacts with each other
      • uses sequence diagram to illustrate the behavior
      • Dynamic Behaviors
        • JDBC Initialization
          • must occur before any access to persistent class
          • loads the appropriate drivers and connects to the database
        • JDBC Create
          • creates record, executes INSERT statements
        • JDBC Read
          • retrieves records from the database, executes SELECT statements
        • JDBC Update
          • changes the values of existing database records, executes UPDATE statement
        • JDBC Delete
          • deletes records in the database, executes DELETE statement

Database Server-side Programming
Application side methods are methods invoked inside an application, while Database side procedures are methods invoked within the database.

JDBC Stored Procedures and Functions
  • Stored Procedures - a subroutine that is available to applications accessing the relational database system, represents whole business operations.
  • Stored Function - computes scalar results and enforcing domain constraints.




Week 10

(Mar 1, 2011) - (Transaction Management)


Transaction Management
  • Three important services to ensure that the database is reliable and remains in a consistent state

  • Transaction Support
    • Transaction
      • series of operations on a database that the user wants to be completed; actions or series of actions carried out by a single user or application program which access or changes the contents of the database
      • can have two Outcomes
        • Committed - the transaction completes successfully
        • Rolled back/Undone - the transaction was unsuccessful or aborted
      • Properties
        • Atomicity - 'All or Nothing' property; A transaction is a single unit that is either performed in its entirety or it is not performed at all.
        • Consistency - A transaction must transform the database from one consistent state to another consistent state.
        • Isolation - A transaction executes independently of one another. In other words,the partial effects of incomplete transactions should not be visible to other transactions.
        • Durability - The effects of a successfully completed (committed) transaction are permanently recorded in the database, and must not be lost because of a subsequent failure.

  • Concurrency Control
    • process of managing simultaneous operations on the database without having them interfere with one another
    • Why we need concurrency control?
      • Lost Update Problem - a problem where an apparently successfully completed udpate operation by one user can be overwritten by another update operation of another user
      • Uncomitted Dependency Problem - the problem that occurs when the transaction is allowed to see the intermediate results of another transaction before it has committed
      • Inconsistent Analysis Problem - occurs when a transaction reads several values from the database but a second transaction updates some of them during the execution of the first
    • Recovery & Serializability
      • Stabilizability - a means of helping executions of transactions that are guaranteed to ensure consistency
      • Schedule - sequence of operations by a a set of concurrent transactions that preserves the order of the operations in each of the individual transactions
        • non serial schedule - a schedule where the operations from a set of concurrent transactions are interleaved
        • serial schedule - a schedule where the operations of each transaction are executed consecutively without interleaved operations from other transaction
    • Locking - a procedure used to control concurrent access to data
      • parts:
        • lock - mechanism that prevents another transaction from modifying a data item
          • read lock - it can read an item but cannot write it
          • write lock - it can both read and write a problem
      • Locks are used in the following manner:
        • Any transaction that needs to access a data item whether it for reading or writing must first lock it.
        • If the item is not locked by another transaction, the lock will be granted.
        • If the item is currently lock, the DBMS determines whether the request is compatible with the existing lock.
        • A transaction continues to hold a lock until it explicitly release it.
      • Two-Phase Locking (2PL)
        • protocol that all locking operations precede the first unlock
        • 2 types: rigorous 2PL, strict 2PL
    • Deadlock
      • techniques: deadlock prevention, deadlock detection and recovery
    • Timestamping Methods
      • a concurrency control protocol in which the fundamental goal is to order transactions globally in such a way that older transactions, transactions with smaller timestamps.
      • timestamp - unique identifier by the system that indicates the relative strating time of a transaction
      • each data item has two timestamps:
        • read-timestamp - timestamp of the last transaction to read the item
        • write-timestamp - timestamp of the last transaction to write the item
    • Timestamping Protocol (Basic Timestamp Ordering)
      • T issues a read(x)
      • T issues a write(x)

  • Database Recovery
    • the process of restoring the database to the correct state in the event of a failure
    • Why do we need database recovery?
      • System crashes - hardware or software error that affects in the loss of data in main memory
      • Media failure - head crashes or unreadable media that results in the loss of parts of data in secondary storage
      • Application Software errors - logical errors in the program that is accesses and updates the databases which may cause one or more transactions to fail
      • Natural Physical Disasters - to the fires, floods, earthquakes, or power failures
      • Carelessness or Unintentional Destruction - actions, whether it be intentional or unintentional, by users or operators of the system
    • A DBMS must provide the following mechanisms to assist in database recovery
      • Backup Mechanism - provides backup copies of the database
      • Logging Mechanism - keeps track of the current state of transactions and database changes
      • Checkpoint Mechanism - enables updates to the database that are in progress to be made permanent
      • Recovery Techniques



Week 11

(Mar. 11, 2011) - (Query Processing)


Query Processing
  • involes activities that retrieves data from the DB
  • transforms a query in a high level language into a correct and efficient execution strategy expressing in low-level language

Four Main Phases of Query Processing
  • Query Decomposition - high level query is converted into a relational algebra query
    • checks query if syntactically and semantically correct
  • Query Optimization
    • chooses an efficient execution strategy for processing the query
    • uses database statistics
  • Code Generation
    • phases that transforms the execution strategy into low-level operations
  • Runtime Query Execution
    • phase that executes the low level operations to retrieve from the database

Query Decomposition
  • first phase of any query processing
    • aims to transform high level language into relational alfebra query
    • checks for syntactic and semantic errors

Stages of Query Decomposition
  • Analysis
    • query is analyzed for syntax errors
    • verifies that the relation and attributes in the query are defined in the system catalog
  • Normalization - query is converted into a normalized form that can be easily manipulated
  • Semantic Analysis - rejects normalized queries that are incorrectly formulated or are contradictory
  • Simplification
  • Query Restructuring















All notes in this journal are either from JEDI Database Systems by Ma. Rowena C. Solamo or from CS165 Lectures.