1 Data and Database

1.1 Data

Data are pieces of information or facts related to the object being considered. For example, examples of data relating to an individual would be the person’s name, age, height, weight, ethnicity, hair color, and birthdate. Data is not limited to facts themselves, as pictures, images, and files are also considered data.

There are a few key terms that would be useful to help one understand data more, particularly the relation between data and databases.

Fields: Within a database, a field contains the most detailed information about events, people, objects, and transactions.

Record: A record is a collection of related fields.

Table: A table is a collection of related records with a unique table name

Database: A database is a collection of related tables.

1.2 Database

A database refers to a collection of logically related information organized so that it can be easily accessible, managed, and updated. Databases are generally accessed electronically from a computer system and are usually controlled by a database management system (DBMS).

Enterprises typically make use of both internal databases and external databases.

Internal databases typically include operational databases and data warehouses.
External databases refer to databases external to an organization and are generally accessed over the Internet and are owned by other organizations

Components of a Database

The five major components of a database are:

Hardware : Hardware refers to the physical, electronic devices such as computers and hard disks that offer the interface between computers and real-world systems.
Software : Software is a set of programs used to manage and control the database and includes the database software, operating system, network software used to share the data with other users, and the applications used to access the data.
Data : Data are raw facts and information that need to be organized and processed to make it more meaningful. Database dictionaries are used to centralize, document, control, and coordinate the use of data within an organization. A database is a repository of information about a database (also called metadata).
Procedures : Procedures refer to the instructions used in a database management system and encompass everything from instructions to setup and install, login and logout, manage the day-to-day operations, take backups of data, and generate reports.
Database Access Language : Database Access Language is a language used to write commands to access, update, and delete data stored in a database. Users can write commands using Database Access Language before submitting them to the database for execution. Through utilizing the language, users can create new databases, tables, insert data, and delete data.

1.3 Database Management Sustem(DBMS)

A Database Management System (DBMS) refers to a collection of programs that enable users to access databases and manipulate, maintain, report, and relate data.

A DBMS is often used to reduce data redundancy, share data in a controlled way, and reduce data integrity problems. DBMS is not an information system but is simply software.

The relational model, which saves data in table formats, is the most widely used DBMS. The relational DBMS organizes information into rows, columns, and tables, making it easier to find relevant information. Relational databases are popular because they are easy to extend, and new data categories can be added after the original database is created without large amounts of modification.

The Structured Query Language (SQL) is considered the standard user and application program interface for a relational database, and all relational DBMS software supports SQL. Examples include FileMaker Pro, Microsoft Access, Microsoft SQL Server, MySQL, and Oracle.

Types of Databases

In general, there are two common types of databases:

Non-Relational : A non-relational database, often referred to as NoSQL (Not Only SQL), is a type of database that stores data in a non-tabular form. Unlike traditional relational databases that use tables with rows and columns, non-relational databases use various flexible data models such as documents, key-value pairs, graphs, and wide-column stores
Relational : A relational database is a type of database that organizes data into rows and columns, forming tables where data points are related to each other. This structure allows for efficient data management and retrieval. Each table in a relational database has a unique identifier called a primary key, and tables can be linked using foreign keys

2 Table Basics

2.1 Table

The data in an RDBMS is stored in database objects known as tables. This table is basically a collection of related data entries and it consists of numerous columns and rows.

A table is the most common and simplest form of data storage in a relational database. Here is an example of a CUSTOMERS table which stores customer’s ID, Name, Age, Salary, City and Country −

ID	Name	Age	Salary	City	Country
1	Ramesh	32	2000.00	Hyderabad	India
2	Mukesh	40	5000.00	New York	USA
3	Sumit	45	4500.00	Muscat	Oman
4	Kaushik	25	2500.00	Kolkata	India
5	Hardik	29	3500.00	Bhopal	India
6	Komal	38	3500.00	Saharanpur	India
7	Ayush	25	3500.00	Delhi	India
8	Javed	29	3700.00	Delhi	India

2.2 Field

Every table is broken up into smaller entities called fields. A field is a column in a table that is designed to maintain specific information about every record in the table.

For example, our CUSTOMERS table consists of different fields like ID, Name, Age, Salary, City and Country.

2.3 Record or a Row

A record is also called as a row of data is each individual entry that exists in a table. For example, there are 7 records in the above CUSTOMERS table. Following is a single row of data or record in the CUSTOMERS table −

ID	Name	Age	Salary	City	Country
1	Ramesh	32	2000.00	Hyderabad	India

A record is a horizontal entity in a table.

2.4 Column

A column is a vertical entity in a table that contains all information associated with a specific field in a table.

For example, our CUSTOMERS table have different columns to represent ID, Name, Age, Salary, City and Country.

2.5 NULL Value

A NULL value in a table is a value in a field that appears to be blank, which means a field with a NULL value is a field with no value.

It is very important to understand that a NULL value is different than a zero value or a field that contains spaces. A field with a NULL value is the one that has been left blank during a record creation. Following table has three records where first record has NULL value for the salary and second record has a zero value for the salary.

ID	Name	Age	Salary	City	Country
1	Ramesh	32		Hyderabad	India
2	Mukesh	40	00.00	New York	USA
3	Sumit	45	4500.00	Muscat	Oma

3 SQL

3.1 Definition

SQL stands for Structured Query Language and is a computer language that we use to interact with a relational database. SQL is a tool for organizing, managing, and retrieving archived data from a computer database.The original name was given by IBM as Structured English Query Language (acronym SEQUEL)

When data needs to be retrieved from a database, SQL is used to make the request.The DBMS processes the SQL query retrieves the requested data and returns it to us. Rather, SQL statements describe how a collection of data should be organized or what data should be extracted or added to the database

Uses of SQL

Data definition: It is used to define the structure and organization of the stored data and the relationships among the stored data items.
Data retrieval: SQL can also be used for data retrieval.
Data manipulation: If the user wants to add new data, remove data, or modifying in existing data then SQL provides this facility also.
Access control: SQL can be used to restrict a user’s ability to retrieve, add, and modify data, protecting stored data against unauthorized access.
Data sharing: SQL is used to coordinate data sharing by concurrent users, ensuring that changes made by one user do not inadvertently wipe out changes made at nearly the same time by another user.

3.2 Why SQL?

SQL is an interactive question language. Users type SQL instructions into an interactive SQL software to retrieve facts and show them on the screen, presenting a convenient, easy-to-use device for ad hoc database queries.
SQL is a database programming language. Programmers embed SQL instructions into their utility packages to access the facts in a database. Both user-written packages and database software packages (consisting of document writers and facts access tools) use this approach for database access.
SQL is a client/server language. Personal computer programs use SQL to communicate over a network with database servers that save shared facts. This client/server architecture is utilized by many famous enterprise-class applications.
SQL is Internet facts access language. Internet net servers that interact with company facts and Internet utility servers all use SQL as a widespread language for getting access to company databases, frequently through embedding SQL databases get entry to inside famous scripting languages like PHP or Perl.
SQL is a distributed database language.Distributed database control structures use SQL to assist distribute facts throughout many linked pc structures. The DBMS software program on every gadget makes use of SQL to speak with the opposite structures, sending requests for facts to get entry to.
SQL is a database gateway language.In a pc community with a mixture of various DBMS products, SQL is frequently utilized in a gateway that lets one logo of DBMS speak with every other logo. SQL has for this reason emerged as a useful, effective device for linking people, pc packages, and pc structures to the facts saved in a relational database.

3.3 SQL Commands

Developers use structured query language (SQL) commands, which are specific keywords or SQL statements, to work with data stored in relational databases. The following are categories for SQL commands

Data Definition Language (DDL) : SQL commands used to create the database structure are known as data definition language (DDL). Based on the needs of the business, database engineers create and modify database objects using DDL. The CREATE command, for instance, is used by the database engineer to create database objects like tables, views, and indexes.
Data Manipulation Language (DML) : A relational database can be updated with new data using data manipulation language (DML) statements. The INSERT command, for instance, is used by an application to add a new record to the database
Data Query Language (DQL) : Data retrieval instructions are written in the data query language (DQL), which is used to access relational databases. The SELECT command is used by software programs to filter and return particular results from a SQL table
Data Control language (DCL) : Data control language (DCL) is a programming language used by database administrators to control or grant other users access to databases. For instance, they can allow specific applications to manipulate one or more tables by using the GRANT command
Transaction Control Language (TCL) : To automatically update databases, the relational engine uses transaction control language (TCL). For instance, the database can reverse a mistaken transaction using the ROLLBACK command.

Here’s a simplified explanation of how SQL works:

Writing and Running the Query: A user or programmer writes an SQL query, which is then parsed.
Processing by the Query Optimizer: The query is processed by a query optimizer. This happens in three phases:
- Parsing: This process checks the syntax of the query.
- Binding: This process checks the semantics of the query.
- Optimization: This process generates the query execution plan. It involves generating all possible permutations and combinations to find the most effective query execution plan in a reasonable time.
Executing the Query: The query is then executed according to the plan determined in the optimization phase.

4 Entity Relationship Diagram (ERD)

An Entity-Relationship Diagram (ERD) is a visual representation of the structure of a database. It illustrates how different entities (objects or concepts) are connected and interact with each other through relationships.

Components of ER Diagram

Entities : Entities are objects that exist in the real world and can be distinctly identified. They can be physical objects like a person, car, or house, or conceptual objects like a company, job, or university course. Entities are represented by rectangles in an ER diagram

Attributes : Attributes are properties that describe an entity. For example, a student entity might have attributes like Roll_No, Name, DOB, Age, Address, and Mobile_No. Attributes are represented by ovals in an ER diagram

Relationships : Relationships describe how entities interact with each other. For example, a student might be enrolled in a course. Relationships are represented by diamonds in an ER diagram

ER Diagram of College Management System shows the system entity relationships in each entity and their supposed functions in each relationship.

Students, Instructors, transactions, courses, subjects, and schedules make up the college management system tables in the entity-relationship diagram. The tables meet the system’s needs and provide more particular information about each entity.

5 Keys in a RDBMS

Keys are one of the basic requirements of a relational database model. It is widely used to identify the tuples (rows) uniquely in the table. We also use keys to set up relations amongst various columns and tables of a relational database.

5.1 Candidate Key

The minimal set of attributes that can uniquely identify a row in a table is known as a candidate key.

It is a super key with no repeated data is called a candidate key.
It must contain unique values and can contain NULL values.
Every table must have at least a single candidate key.
There can be more than one candidate key in a relationship.

In the Student_Table below STUD_NO is the candidate key for relation STUDENT

STUD_NO	SNAME	ADDRESS	PHONE
1	Shyam	Delhi	123456789
2	Rakesh	Kolkata	223365796
3	Suraj	Delhi	175468965

5.2 Primary Key

A column (or a set of columns) whose value exists and is unique for every record in a table is called a primary key

There can be more than one candidate key in relation out of which one can be chosen as the primary key.

Each table can have one and only one primary key
A Primary Key can identify only one tuple (a record) at a time.
A Primary Key has no duplicate values, it has only unique values.
A Primary Key cannot be NULL.

Primary keys are not necessarily to be a single column; more than one column can also be a primary key for a table

For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate keys).

STUD_NO	SNAME	ADDRESS	PHONE
1	Shyam	Delhi	123456789
2	Rakesh	Kolkata	223365796
3	Suraj	Delhi	175468965

5.3 Super Key

The set of attributes that can uniquely identify a tuple (a record) is known as Super Key. A super key is a group of single or multiple keys that identifies rows in a table.

Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.

For Example, STUD_NO, (STUD_NO, STUD_NAME), etc

5.4 Foreign Key

A column that refers to the primary key/unique key of another table is known as a Foreign Key. It demonstrates the relationship between tables and acts as the cross reference among them.

A Foreign Key acts as a primary key in one table and it acts as secondary key in another table.
A Foreign Key combines two or more relations (tables) at a time.
A Foreign Key acts as a cross-reference between the tables.

It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be NULL as well as may contain duplicates i.e. it need not follow uniqueness constraint

5.5 Unique Key

A unique key in a database management system (DBMS) is a condition that is assigned to a certain number of dataset columns to guarantee that the information stored in those columns is distinct for every row in the table. It is used whenever you would like to specify that you don't want to see duplicate data in a given field The reliability of the data is monitored by the unique key restriction, which prohibits copy or null entries in the designated columns.

Every entry in a table in a database is guaranteed to be uniquely identifiable by its unique key
A dependable and distinct identifier for every entry in a database is provided by unique keys, particularly when they are employed as the main keys
For the designated columns, unique entries are usually created using unique keys

6 Relationships in a DBMS

Relationships tell you how much of the data from a foreign key field can be seen in the primary key column of the table the data is related to and vice versa

A relational database implements three different types of relationships:

1. One-to-one (1:1)

2. One-to-many (1:N)

3. Many-to-many (N:N)

A line connecting two tables represents a relationship, while the symbols on the line's end represent the exact relationship type.

6.1 One-to-One Relationship Example

Each record in Table A is associated with one and only one record in Table B, and vice versa

An example to demonstrate a one-to-one relationship is with capital cities. One country has only one capital city, and one capital city belongs to only one country.

Two tables with information about countries and capital cities connect in a database using a primary key. For example, when added to the country table, the unique ID of a capital city (its primary key) becomes a foreign key, creating a relationship

In this case, the one-to-one relationship is mandatory. Every country must have a unique capital city, and the foreign key should be unique to ensure the connection is 1:1.

Other ways to create a 1:1 relationship would be to use the country_id as a unique identifier for the capital table, or reference the country_id as a unique foreign key

6.2 One-to-Many Relationship in a Database

Each record in Table A can be associated with multiple records in Table B, but each record in Table B is associated with only one record in Table A

A one-to-many (1:N) relationship in a database has a single entry on one side and multiple entries on the other end. Every primary key corresponds to one or more records from another table. In this case, the foreign key is not unique.

One-to-many relationships are natural and often appear as a logical connection in database modeling.

An example of a one-to-many relationship is the connection between a mother and children. A mother can have many kids, but every child belongs to one mother only.

A database containing two tables with information about mothers and children connects using a primary key. When added to the child table, the unique ID from a mother becomes a foreign key. Different children can have the same mother.

The relationship is mandatory on both ends, and at least one entry must exist in both tables to establish a connection

6.3 Many-to-Many Relationship in a Database

Each record in Table A can be associated with multiple records in Table B, and vice versa.

Many-to-many (N:N) relationships in a database have multiple entries on both ends of the relationship. Since numerous entries may exist on both ends, a standard solution is to create an association (junction, join) table with foreign keys from both tables.

Many-to-many relationships are a common practice with relational databases implemented in web technologies, such as e-commerce websites.

A many-to-many relationship exists between books and authors. For example, a single book can have multiple authors. Likewise, a single author can have numerous books.

If there is a table containing books and another with authors, the best way to establish the relationship between the two is through a new table. The new table has foreign keys from both parent tables, creating a many-to-many relationship.

Performing different types of JOIN queries fetches data from both tables efficiently while protecting the original tables from redundancies

7 Key Takeaways

Data are pieces of information or facts related to the object being considered
A database refers to a collection of logically related information organized so that it can be easily accessible, managed, and updated.Databases are generally accessed electronically from a computer system and are usually controlled by a database management system (DBMS)
A relational database is a type of database that organizes data into rows and columns, forming tables where data points are related to each other. Each table in a relational database has a unique identifier called a primary key, and tables can be linked using foreign keys
The data in an RDBMS is stored in database objects known as tables which are basically a collection of related data entries and it consists of numerous columns and rows
Every table is broken up into smaller entities called fields. A field is a column in a table that is designed to maintain specific information about every record in the table
A record is also called as a row of data is each individual entry that exists in a table
A column is a vertical entity in a table that contains all information associated with a specific field in a table.
A NULL value in a table is a value in a field that appears to be blank, which means a field with a NULL value is a field with no value. A NULL value is different than a zero value or a field that contains spaces
When data needs to be retrieved from a database, SQL (Structured Query Language) is used to make the request
Developers use structured query language (SQL) commands, which are specific keywords or SQL statements, to work with data stored in relational databases. The following are categories for SQL commands
- Data Definition Language (DDL)
- Data Manipulation Language (DML)
- Data Query Language (DQL)
- Data Control language (DCL)
An Entity-Relationship Diagram (ERD) is a visual representation of the structure of a database. It illustrates how different entities (objects or concepts) are connected and interact with each other through relationships.
Keys are widely used to identify the tuples (rows) uniquely in the table. We also use keys to set up relations amongst various columns and tables of a relational database.
The minimal set of attributes that can uniquely identify a row in a table is known as a candidate key.
A column (or a set of columns) whose value exists and is unique for every record in a table is called a primary key
The set of attributes that can uniquely identify a tuple (a record) is known as Super Key. A super key is a group of single or multiple keys that identifies rows in a table
A column that refers to the primary key/unique key of another table is known as a Foreign Key. It demonstrates the relationship between tables and acts as the cross reference among them.
A unique key in a database management system (DBMS) is a condition that is assigned to a certain number of dataset columns to guarantee that the information stored in those columns is distinct for every row in the table
A relational database implements three different types of relationships:

1. One-to-one (1:1) : Each record in Table A is associated with one and only one record in Table B, and vice versa

2. One-to-many (1:N) : Each record in Table A can be associated with multiple records in Table B, but each record in Table B is associated with only one record in Table A

3. Many-to-many (N:N) : Each record in Table A can be associated with multiple records in Table B, and vice versa.

SQL Chapter 1 - Database Basics

Ravi Mummigatti

2024-11-02