Differentiate between OLTP and OLAP systems.
OLTP (Online Transaction Processing): Designed for large numbers of transactions in real-time. These systems are optimized for write operations and are used for day-to-day operations. Examples: ATM systems, online banking.
OLAP (Online Analytical Processing): Designed for complex queries and data analysis. These systems are optimized for read operations and are used for business intelligence and decision support. Examples: Data warehouses, market analysis systems.
What are ACID properties? Why are they important in transactions?
ACID properties are a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc.
- Atomicity: Ensures that a transaction is treated as a single, indivisible unit. Either all of its operations are executed, or none of them are.
- Consistency: Ensures that a transaction brings the database from one valid state to another.
- Isolation: Ensures that concurrent transactions do not interfere with each other.
- Durability: Ensures that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
Explain normalization. When is denormalization preferred?
Normalization is the process of organizing the columns (attributes) and tables (relations) of a relational database to minimize data redundancy. The goal is to divide larger tables into smaller, well-structured tables and define relationships between them.
Denormalization is the process of intentionally adding redundant data to one or more tables. It is preferred in read-heavy applications where query performance is more important than data redundancy, such as in data warehousing and reporting.
How do indexing and hashing improve query performance?
Indexing: An index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed.
Hashing: Hashing is a method of directly locating a row in a table by its hash value. It is used for fast lookups on a specific key.
What is a view in SQL? Discuss its advantages and disadvantages.
A view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.
Advantages:
- Simplicity: Views can simplify complex queries.
- Security: Views can be used to expose only a subset of data to users.
- Consistency: Views can provide a consistent view of data, even if the underlying tables change.
Disadvantages:
- Performance: Queries on complex views can be slow.
- Update Restrictions: Not all views are updatable.
Explain CAP theorem in the context of distributed databases.
The CAP theorem states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:
- Consistency: Every read receives the most recent write or an error.
- Availability: Every request receives a (non-error) response – without the guarantee that it contains the most recent write.
- Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.
How do NoSQL databases differ from relational databases?
Relational Databases (SQL):
- Store data in tables with a fixed schema.
- Use SQL (Structured Query Language).
- Are vertically scalable.
NoSQL Databases:
- Have dynamic schemas for unstructured data.
- Can be document, key-value, wide-column, or graph-based.
- Are horizontally scalable.