(Nov. 23, 2010) - (Database Development Process and Components of Database Engines)
Database Development Process
Principles
organizations use resources that flow into them from the environment. They also provide resources by returning them to their environment (input/output)
two types of resource: physical resources and conceptual resources
Methodology
defines the how-to's to accomplish a design goal
modeling tools
Computer Aided Software Engineering tools - software products that provide automated support for some portions of the system's development process
Integrated CASE - set of CASE tools that support all the phases of the system's development process
Information Engineering Methodology
formal methodology that is used to create and maintain information systems
enterprise-wise, data-driven (develop application based on how the data is being managed and used)
Phases
Planning - align information technologies with business strategies
Steps:
Identify strategic planning factors
Identify planning objects
Identify entity types
Analysis- develop specifications for information system; elaborate entity types
Steps:
Develop conceptual data model
Use Entity Relationship Diagrams (ERD)
Design- logical database design portion (tables)
Steps:
Create designs
logical database design - maps conceptual model to database structure
physical database design
Components of a Database Management System
Services and functions of a DBMS
store, retrieve, update data
catalog for data dictionary (metadata)
transact (access or change contents)
update correctly (data backups)
(Nov. 25, 2010) - (How to configure JavaDB with NetBeans)
Before anything else, you have to download these softwares:
NetBeans IDE
JDK
Java DB
After installation of software 1 and software 2 above, software 3 is now ready to be configured.
Follow these steps to configure Java DB with NetBeans:
Run Java DB self-extracting file. Java DB would be installed in C:\Program Files\Sun\Java DB
Create a new directory to be used as a home directory for individual instances of the database server. You can create this folder in the Java DB root directory or in any other location.
To register this Database in NetBeans IDE:
Open your NetBeans IDE.
Select the 'Window' menu then select 'Services' (shortcut: Ctrl+5).
A window at the left side should appear and under Databases, right-click Java DB then choose Properties. The Java DB settings dialog would open.
Java DB installation text field should have the path to the Java DB root directory (in our experience, this field is already configured correctly).
Database location text field should have the path of the directory you created in step 2.
Now that you're done, click OK.
You are now ready to start a server and create a database!
Our group learned from this activity that Java DB is so easy to configure with NetBeans. We didn't encounter any problems and difficulties since configuration was not that complicated. Also, all three of us just needed to download Java DB since we already have NetBeans and JDK installed in our laptops. :)
OUTPUT - conceptual model
business data model
Diagram: ERD/ Class Diagram
Technique: Situation Analysis
Method: Interview Transcript - summary of the interview
"What are the important things/ data of the system you need to keep track of"
Entity-relationship Modeling- involves identifying things of importance to an organization (entities), the properties of those things (attributes), and how they relate with one another (relationships).
Attributes- properties or characteristics describing entity. (Example: Dentist - username, password. Patient - first name, middle name, last name)
Relationships- associations between instances of an entity with another entity
Candidate Keys - an attribute or a combination of attributes that uniquely identify instances of an entity
Primary Key- the attribute that uniquely identifies instances of entities
Composite Key - primary key of 2 or more attribute
Foreign Key- attribute that is a primary key of another entity
Derivative Data - value is actually computed or derived
Degree of Relationship
unary - relationship between instances of one entity
binary - relationship between instances of 2 entities
ternary - 3 or more entity relationship
Cardinality of Relationship
mandatory one (one and only one)
optional one (zero or one)
many cardinality (one or many)
optional many (zero or many)
Situation Analysis
Entities
nouns in any situation
identify by asking what to keep track of
ask what uniquely identifies the entities
some nouns are attributes
Relationships
verbs of the situation
identify by asking relationships
ask the degree and cardinality of relationship
Attributes
modifiers of the situation
identify by asking characters of entity or relation they're interested
determine the attribute and entity (intuition -> normalization -> looking at prepositional phrase)
Gerunds - are many-to-many relationships that models relationships of entities with another associated entity (entity that cannot exist without the first entity).
Modeling Multivalued Attributes
Multivalued Attributes - an attribute that can have more than one value.
Each multivalued attribute is converted as a separate entity type.
(Example: Dental Record - Teeth is converted as a separate entity type because teeth can have more than one value)
Modeling Repeating Group
Repeating group - two or more multivalued attributes that are logically related.
In case of a repeating group, we create another entity that houses these attributes.
(Example: Payment is a repeating group so it is separated from the Patient and it is created as another entity which has a relation to the Patient)
Modeling Time-Dependent Data
In case of a time-dependent data, another entity can be created that would allow to keep track of time.
The objectives of conducting Technical Reviews are to make sure the quality of the products to be delivered are of the best and expected quality of the clients. It is necessary to make sure that all errors are uncovered so that the engineers make necessary corrections to the said errors. Technical reviews can be formal or informal but there are certain guidelines to follow in conducting such a review.
Guidelines during a FORMAL TECHNICAL REVIEW
Before the Review
Producer of the product notifies the leader that the product is finished and needs review
Project leader contacts a review leader who generates copies of work product, and distributes it to other reviewers, evaluates product for readiness and establishes agenda for review meeting
During the Review
Producer introduces agenda and work product.
Producer (or designated reader) gives a walkthrough of the work product, reading the material while reviewers raise issues.
Recorder notes valid problems or errors discovered.
At end of review, all attendees vote-on what to do:
Accept product without further modification (RARE !)
Reject and Rework the product. Another review will be set up.
Correct minor errors. Accept upon review leader approval.
All review attendees sign-off, indicating participation in review and concurrence with review team’s findings.
After the Review
Recorder prepares review summary report which includes:
List of reviewers & role each played at the review
Date, name of work product reviewed, producer name
Reviewer prep time
Time elapsed in review
List of issues to be resolved with corresponding line #
Outcome/Resolution
Signatures of everyone involved (Sign-off)
Review reports placed under configuration management
Review Leader follows-up to ensure that open items are properly corrected according to resolution
What makes a good ERD (Data Model)
Completeness
Data Reusability
Stability
Flexibility
Elegance
Communication
Integration
Conflicting Objectives
Logical Database Design - process of trasforming conceptual data model into a logical data model.logical datbase model - design that conforms to the data model for a class of database management system.
4 types of logical databases
Hierarchical Model -records arranged in top-down structure, resembles an upside-down tree, oftenly uses the terms parent and child, child can only have on parent
Network Model - almost the same with hierarchical model except parent and child records cannot be distiguished, many to many relationship
Relational Model - data are represented in table form with rows and columns, associations are represented logically by the values that are stored within the columns.
Object-oriented Model - attributes are called the objects
RELATIONAL DATA MODEL-based on the mathematical concept of a relation which can be physically represented using a table
3 components of a relational model
Data Structure - collection of objects or relations that store data.
Data Manipulation - requires very powerful operations to manipulate the data stored.
Data Integrity - maintains the accuracy and consistency of data.
Relationships
Unary -a relation with only one attribute
Binary - a relation with two attributes
N-ary - relation with n attributes
Steps in Normalization
First Normal Form (FNF) - a relation is in FNF if it contains no repeating groups
Second Normal Form (SNF) - it is in SNF if it is in FNF and every nonkey attribute is fully functionally dependent on the primary key
Third Normal Form (TNF) - a relation is in TNF it is SNF and there are no transitive dependencies existing
Boyce-Codd Normal Form (BCNF) - a relation is in BCND if every determinant is a candidate key
Fourth Normal Form - a relation is in the fourth normal form if it is in BCND and contains no nontrivial multi-valued dependencies
Fifth Normal Form - if a relation has no join dependency
Transforming E-R Diagrams to Relations
Represent Entities
Primary key of the entity becomes the PK of the relation
value of PK must uniquely identify every row of a relation
PK should not be redundant. No attribute of the key can be deleted without destroying its unique identification
Represent Relationships
1:N relationship: place the primary key on the one side entity as a foreign key on the many side entity
M:N relationship: the relationship of the two entities becomes a relation
Unary relationship: the entity type is modeled as a relation
Is-A relationship: create a separate relation for the class and for each of the subclass
Many-to-many: create a new relation having the primary key
Physical Database Design - process of mapping logical database structures into a set of physical database structure. (Example here.)
Decisions made during this stage have a major impact on data accessibility, response time, security, user-friendliness and similar factors.
Three Major Inputs to Physical Database Design
Logical database structure (Relational Schema)
User processing requirements
Characteristics of DBMS and other components of the operating environment
Business Rules (in the context of analysis phase) or Integrity Constraints (in the context of design phase) - specifications that preserves the integrity of the logical data model.
Some advantages of placing integrity constraints:
It provides faster application development with fewer errors.
It reduces maintenance effort and expenditures.
It promotes ease of use of the database.
Data Volume and Usage Analysis
Data Volume analysis - this involves using estimates of the database size to select physical storage devices and estimate the cost of storage.
Usage analysis - involves using estimates of usage paths or patterns to select file organizations and access methods to plan for the use of indexes and to plan a strategy for data distribution.
File Organization and Access Method
File Organization - technique for physically arranging the records of a file on a secondary storage device. It is a physical arrangement of data in a file into records and pages on secondary storage.
Two file organizations:
Sequential File Organization
Hashed File Organization
File Access Method - defines the steps involved in storing and retrieving records from a file
Three file access methods:
Sequential File Access Method
Indexed File Access Method
Random-access or Direct-access Method
Indexes - is a table that is used to determine the rows in a table that satisfy some condition.
We use indexes for applications that are used primarily to support data retrievals such as decision support applications.
Indexed Sequential Files
Indexed Sequential File Organization - records are stored sequentially by primary key value. This normally has a primary key, a separate index or indexes and an overflow area. (Example: phone director listing)
Indexed Non-sequential File Organization - records are stored non-sequentially, a full index is requires. (Example: books in the library)
Clustering Indexes - used when records in a file are often retrieved based on non-key attributes.
Multi-level indexing - attempts to reduce the range of searches. It treats the index like any other file, splits the index into a number of smaller indexes, and maintains an index to the indexes.
Denormalization - process of transforming normalized tables into unnormalized physical record specifications.
In general, this may:
combine columns from several tables together to form a physical record
partition a table into several physical records
do a combination of both
Common situations for which denormalization is considered:
Two entities with a one-to-one relationship.
A many-to-many relationship with nonkey attributes.
object model of the pattern illustrated using a class diagram
Classes
PersistentCient - requests data from the database, normally control classes that asks something from an entity class
DBClass - communicates with the database
PersistentClassList - used to return a set of persistent objects as a result of a DB querry
PersistentClass - maps a record in the database
Dynamic View
shows how the classes fro the static view interacts with each other
uses sequence diagram to illustrate the behavior
Dynamic Behaviors
JDBC Initialization
must occur before any access to persistent class
loads the appropriate drivers and connects to the database
JDBC Create
creates record, executes INSERT statements
JDBC Read
retrieves records from the database, executes SELECT statements
JDBC Update
changes the values of existing database records, executes UPDATE statement
JDBC Delete
deletes records in the database, executes DELETE statement
Database Server-side Programming
Application side methods are methods invoked inside an application, while Database side procedures are methods invoked within the database.
JDBC Stored Procedures and Functions
Stored Procedures - a subroutine that is available to applications accessing the relational database system, represents whole business operations.
Stored Function - computes scalar results and enforcing domain constraints.
Week 10
(Mar 1, 2011) - (Transaction Management)
Transaction Management
Three important services to ensure that the database is reliable and remains in a consistent state
Transaction Support
Transaction
series of operations on a database that the user wants to be completed; actions or series of actions carried out by a single user or application program which access or changes the contents of the database
can have two Outcomes
Committed - the transaction completes successfully
Rolled back/Undone - the transaction was unsuccessful or aborted
Properties
Atomicity - 'All or Nothing' property; A transaction is a single unit that is either performed in its entirety or it is not performed at all.
Consistency - A transaction must transform the database from one consistent state to another consistent state.
Isolation - A transaction executes independently of one another. In other words,the partial effects of incomplete transactions should not be visible to other transactions.
Durability - The effects of a successfully completed (committed) transaction are permanently recorded in the database, and must not be lost because of a subsequent failure.
Concurrency Control
process of managing simultaneous operations on the database without having them interfere with one another
Why we need concurrency control?
Lost Update Problem - a problem where an apparently successfully completed udpate operation by one user can be overwritten by another update operation of another user
Uncomitted Dependency Problem - the problem that occurs when the transaction is allowed to see the intermediate results of another transaction before it has committed
Inconsistent Analysis Problem - occurs when a transaction reads several values from the database but a second transaction updates some of them during the execution of the first
Recovery & Serializability
Stabilizability - a means of helping executions of transactions that are guaranteed to ensure consistency
Schedule - sequence of operations by a a set of concurrent transactions that preserves the order of the operations in each of the individual transactions
non serial schedule - a schedule where the operations from a set of concurrent transactions are interleaved
serial schedule - a schedule where the operations of each transaction are executed consecutively without interleaved operations from other transaction
Locking - a procedure used to control concurrent access to data
parts:
lock - mechanism that prevents another transaction from modifying a data item
read lock - it can read an item but cannot write it
write lock - it can both read and write a problem
Locks are used in the following manner:
Any transaction that needs to access a data item whether it for reading or writing must first lock it.
If the item is not locked by another transaction, the lock will be granted.
If the item is currently lock, the DBMS determines whether the request is compatible with the existing lock.
A transaction continues to hold a lock until it explicitly release it.
Two-Phase Locking (2PL)
protocol that all locking operations precede the first unlock
2 types: rigorous 2PL, strict 2PL
Deadlock
techniques: deadlock prevention, deadlock detection and recovery
Timestamping Methods
a concurrency control protocol in which the fundamental goal is to order transactions globally in such a way that older transactions, transactions with smaller timestamps.
timestamp - unique identifier by the system that indicates the relative strating time of a transaction
each data item has two timestamps:
read-timestamp - timestamp of the last transaction to read the item
write-timestamp - timestamp of the last transaction to write the item
Timestamping Protocol (Basic Timestamp Ordering)
T issues a read(x)
T issues a write(x)
Database Recovery
the process of restoring the database to the correct state in the event of a failure
Why do we need database recovery?
System crashes - hardware or software error that affects in the loss of data in main memory
Media failure - head crashes or unreadable media that results in the loss of parts of data in secondary storage
Application Software errors - logical errors in the program that is accesses and updates the databases which may cause one or more transactions to fail
Natural Physical Disasters - to the fires, floods, earthquakes, or power failures
Carelessness or Unintentional Destruction - actions, whether it be intentional or unintentional, by users or operators of the system
A DBMS must provide the following mechanisms to assist in database recovery
Backup Mechanism - provides backup copies of the database
Logging Mechanism - keeps track of the current state of transactions and database changes
Checkpoint Mechanism - enables updates to the database that are in progress to be made permanent
Recovery Techniques
Week 11
(Mar. 11, 2011) - (Query Processing)
Query Processing
involes activities that retrieves data from the DB
transforms a query in a high level language into a correct and efficient execution strategy expressing in low-level language
Four Main Phases of Query Processing
Query Decomposition - high level query is converted into a relational algebra query
checks query if syntactically and semantically correct
Query Optimization
chooses an efficient execution strategy for processing the query
uses database statistics
Code Generation
phases that transforms the execution strategy into low-level operations
Runtime Query Execution
phase that executes the low level operations to retrieve from the database
Query Decomposition
first phase of any query processing
aims to transform high level language into relational alfebra query
checks for syntactic and semantic errors
Stages of Query Decomposition
Analysis
query is analyzed for syntax errors
verifies that the relation and attributes in the query are defined in the system catalog
Normalization - query is converted into a normalized form that can be easily manipulated
Semantic Analysis - rejects normalized queries that are incorrectly formulated or are contradictory
Simplification
Query Restructuring
All notes in this journal are either from JEDI Database Systems by Ma. Rowena C. Solamo or from CS165 Lectures.
Week 1
Table of Contents
(Nov. 23, 2010) - (Database Development Process and Components of Database Engines)
Database Development Process
Principles
Methodology
Information Engineering Methodology
Components of a Database Management System
Services and functions of a DBMS
(Nov. 25, 2010) - (How to configure JavaDB with NetBeans)
Before anything else, you have to download these softwares:
- NetBeans IDE
- JDK
- Java DB
After installation of software 1 and software 2 above, software 3 is now ready to be configured.Follow these steps to configure Java DB with NetBeans:
You are now ready to start a server and create a database!
Our group learned from this activity that Java DB is so easy to configure with NetBeans. We didn't encounter any problems and difficulties since configuration was not that complicated. Also, all three of us just needed to download Java DB since we already have NetBeans and JDK installed in our laptops. :)
This guide was made possible through http://netbeans.org/kb/docs/ide/java-db.html
(Nov. 25, 2010) - (Introduction to Relational Database System)
A relational database
Relational algebra
- Primary key and foreign key relation
Primary key
Foreign key
Relational Database Components
SQL statements
DML
(Sample SQL statements here.)
Week 2
(Nov. 30, 2010) - (Data Analysis Phase)
OUTPUT - conceptual model
business data model
Diagram: ERD/ Class Diagram
Technique: Situation Analysis
Method: Interview Transcript - summary of the interview
"What are the important things/ data of the system you need to keep track of"
Entity-relationship Modeling - involves identifying things of importance to an organization (entities), the properties of those things (attributes), and how they relate with one another (relationships).
ERD Basic Notation
Candidate Keys - an attribute or a combination of attributes that uniquely identify instances of an entity
Primary Key- the attribute that uniquely identifies instances of entities
Composite Key - primary key of 2 or more attribute
Foreign Key- attribute that is a primary key of another entity
Derivative Data - value is actually computed or derived
Degree of Relationship
Cardinality of Relationship
Situation Analysis
(Dec. 2, 2010) - (Primary key-Foreign key relationship assignment)
Figure 1: Organic shop Relationship diagram
Week 3
(Dec. 7, 2010 and Dec. 10, 2010) - (Advanced ERD Modeling)
Enhanced ERD Modeling
Modeling Multivalued Attributes
- Multivalued Attributes - an attribute that can have more than one value.
- Each multivalued attribute is converted as a separate entity type.
(Example: Dental Record - Teeth is converted as a separate entity type because teeth can have more than one value)Modeling Repeating Group
- Repeating group - two or more multivalued attributes that are logically related.
- In case of a repeating group, we create another entity that houses these attributes.
(Example: Payment is a repeating group so it is separated from the Patient and it is created as another entity which has a relation to the Patient)Modeling Time-Dependent Data
(Sample ERD here.)
Week 4-5
(Jan. 11, 2011 - Jan. 20, 2011) - (The Relational Database Design)
The objectives of conducting Technical Reviews are to make sure the quality of the products to be delivered are of the best and expected quality of the clients. It is necessary to make sure that all errors are uncovered so that the engineers make necessary corrections to the said errors. Technical reviews can be formal or informal but there are certain guidelines to follow in conducting such a review.
Guidelines during a FORMAL TECHNICAL REVIEW
Before the Review
During the Review
After the Review
What makes a good ERD (Data Model)
Logical Database Design - process of trasforming conceptual data model into a logical data model.logical datbase model - design that conforms to the data model for a class of database management system.
4 types of logical databases
RELATIONAL DATA MODEL-based on the mathematical concept of a relation which can be physically represented using a table
3 components of a relational model
Relationships
Steps in Normalization
Transforming E-R Diagrams to Relations
Week 6-7
(Jan. 25, 2011 - Feb. 3, 2011) - (Physical Database Design)
Physical Database Design - process of mapping logical database structures into a set of physical database structure. (Example here.)Decisions made during this stage have a major impact on data accessibility, response time, security, user-friendliness and similar factors.
Three Major Inputs to Physical Database Design
Business Rules (in the context of analysis phase) or Integrity Constraints (in the context of design phase) - specifications that preserves the integrity of the logical data model.
Some advantages of placing integrity constraints:
Data Volume and Usage Analysis
Data Volume analysis - this involves using estimates of the database size to select physical storage devices and estimate the cost of storage.
Usage analysis - involves using estimates of usage paths or patterns to select file organizations and access methods to plan for the use of indexes and to plan a strategy for data distribution.
File Organization and Access Method
File Organization - technique for physically arranging the records of a file on a secondary storage device. It is a physical arrangement of data in a file into records and pages on secondary storage.
Two file organizations:
File Access Method - defines the steps involved in storing and retrieving records from a file
Three file access methods:
Indexes - is a table that is used to determine the rows in a table that satisfy some condition.
We use indexes for applications that are used primarily to support data retrievals such as decision support applications.
Indexed Sequential Files
Clustering Indexes - used when records in a file are often retrieved based on non-key attributes.
Multi-level indexing - attempts to reduce the range of searches. It treats the index like any other file, splits the index into a number of smaller indexes, and maintains an index to the indexes.
Denormalization - process of transforming normalized tables into unnormalized physical record specifications.
In general, this may:
Common situations for which denormalization is considered:
Week 8-9
(Feb. 15, 2011 - Feb. 24, 2011) - (Database Implementation and Database Connectivity Concepts)
JDBC (Java Database Connectivity)
JDBC Design Pattern (Example here.)
Database Server-side Programming
Application side methods are methods invoked inside an application, while Database side procedures are methods invoked within the database.
JDBC Stored Procedures and Functions
Week 10
(Mar 1, 2011) - (Transaction Management)
Transaction Management
Week 11
(Mar. 11, 2011) - (Query Processing)
Query Processing
Four Main Phases of Query Processing
Query Decomposition
Stages of Query Decomposition
All notes in this journal are either from JEDI Database Systems by Ma. Rowena C. Solamo or from CS165 Lectures.