Normalization in SQL Server | Normalization & TSQL in SQL Server


Normalization in SQL Server – Table of Content

What is Normalization?

Normalization is the organization of data using a set of rules called normal forms while designing a database. It helps improve data accuracy and integrity while reducing data redundancy and inconsistent dependency. It was developed by IBM researcher Edgar Frank Codd in the 1970s to increase data and relational clarity in a database. The process includes organizing data in tabular formats and defining relationships among them. Codd proposed the relational model of databases and introduced the Normal Forms. Most practical applications of database organization can be achieved using the Third Normal Form. But still, some dependencies could exist so in 1974, he was joined by Raymond F. Boyce to develop a stronger version of 3NF, the Boyce-Codd Normal Form.

Types of Normalization

The set of rules used to create a database are called ‘forms’, these help in measuring the level of normalization of an entity. The different types of Normalization Forms are as follows:

1. First Normal Form (1NF):

1NF divides the database into logical units called ‘tables’ consisting of unique values in each related field making it easy to search, filter, and sort the information. While normalizing a database for 1NF a Primary key i.e. a single column is allotted to each data category. It helps in the redevelopment of the raw database into a manageable record. The primary key may consist of a combination of columns and the set is known as Composite Key.

2. Second Normal Form (2NF):

 2NF is the schema of further breaking down the tables based on the partial dependency of data on the primary key. The specific units have a full functional dependency that applies to a single column of Primary key. The entity must completely comply with relationship rules of 1NF to be considered for 2NF and there shouldn’t be any partial dependency. A table with a Composite Primary Key must be split into 2 to generate a foreign key. The foreign key will be the column that references the Primary Key of the other table.

3. Third Normal Form (3NF):

 The objective of entities eligible for 3NF is to eliminate non-dependent data while addressing the update anomaly. The inconsistency of the database following an update is called transitive dependency. Removal of these transitive dependencies leads to normalization from 2NF to 3NF. This is the ideal form of normalization of almost all tables.

4. Boyce Code Normal Form (BCNF):

Redundancies arising from functional dependencies are resolved by 3NF but any anomalies arising from additional constraints are handled through BCNF, also known as 3.5NF. A 3NF table or relation without a transitive dependency is in BCNF.

5. Fourth Normal Form (4NF):

At the 4NF level there are no non-trivial multivalued dependencies other than a candidate key. A relation from a table in the BCNF, without multi-value dependency, only can be in the 4NF.

6. Fifth Normal Form (5NF):

5NF is also known as project-join normal form (PJ/NF). It reduces redundancy in relational databases by isolating semantically related multiple relationships. For a table to be in 5NF its non-trivial join dependency should be implied by candidate keys.

7. Domain/Key Normal Form (DKNF):

DKNF is a stricter normal form than 5NF and it removes any additional type of dependencies and constraints. The main requirements for a 5NF to qualify for DKNF are that each constraint on the table should be a logical consequence and non-existence of all constraints other than domain and keys. Also, there shouldn’t be any insert or delete anomalies in the database. Specifying general integrity constraints is tough so the practical use of DKNF relation is limited.

8. Sixth Normal Form (6NF):

6th normal form is not a standardized form but a table eligible for 5NF only can qualify for 6NF. To be in the 6NF a relation should not contain any non-trivial join dependencies. It is stricter and less redundant that DKNF. The relational variables of entities in this form become irreducible components.

  Become a MSBI Certified professional  by learning this HKR MSBI Training !

MSBI Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

Importance of Database Normalization

Normalization of operational data stores (ODSs) and data warehouses (DWs) helps in the following ways:

1. Consistency: As all information is stored in a single place, any chances of inconsistency are ruled out.

2. Object-to-data mapping: Normalized data schemas help with object-oriented goals.

3. Flexibility: Data values can be easily added to rows.

4. Accessibility:  Normalized data can be easily accessed, processed, and understood.

5. Uniqueness: Data redundancy is minimized.

Advantages of Normalization

Database Normalization is used to design an organized and managed database to maintain accuracy and enhance productivity. The main advantages of normalizing a database are:

  • Organization of the database through normalization improves data accuracy and reduces redundant data.
  • Data consistency and flexibility improves the logical usage of data.
  • Enhanced database security.
  • All necessary functional dependencies are handled during the normalization process.
  • Makes Index searching easier as the indexes tend to be narrow and short.

What is TSQL?

TSQL is an abbreviation for Transact-SQL or T-SQL. It is a set of proprietary extensions to SQL (Structured Query Language) created by Sybase and owned by Microsoft since 1987. This procedural language expands the Microsoft SQL Server standard with extra features such as declared variables, transaction control, stored procedures, error and exception handling, triggers, string operations, etc. TSQL is used to operate SQL server-based relational databases. It is easier to understand and Turing complete. All interactions with a SQL Server through an application are carried out by T-SQL.

The dominant features of TSQL are:

1. It is a procedural programming language used to create applications.

2. Generates compact and readable codes that are less vulnerable.

3. Support functions for string processing, date and time processing, and mathematics operations.

4. Availability of user-defined custom functions.

5. Offers developers flexible control over the application flow through local variables.

TSQL Functions

Functions can be defined using TSQL beyond the built-in functions of SQL Server.

There are four types of T-SQL functions:

Aggregate functions: 

These deterministic functions operate on a collection of values to calculate one summary value. The values of multiple rows are submitted as input to obtain a more significant value.

Ranking functions:

These are nondeterministic functions that return a ranking value for every row in a partition. The ranks for rows with the same values will be the same.  

Rowset functions:

These nondeterministic functions return an object that can be used as a view or table reference in SQL statements. Their results may vary against the same set of input values.

Scalar functions:

These user-defined functions operate on a single value and return a single value. It helps in simplifying a code but cannot be used to update data.

Analytical functions:

These functions support TSQL to perform complex tasks and enable expression of common analysis such as ranking, percentiles, moving averages, and cumulative sums in a single SQL statement.

 Want to know more about MSBI,visit here MSBI Tutorial.

HKR Trainings Logo

Subscribe to our YouTube channel to get new updates..!

Differences between SQL and T-SQL

The differences between SQL and T-SQL are:

  • SQL is an open-format programming language that works for various data providers and TSQL is its proprietary extension designed specifically for Microsoft SQL Server.
  • SQL is used for implementing reporting techniques while TSQL is useful for the installation of Microsoft SQL servers using applications.
  • SQL is a data-oriented language as it operates over data sets while TSQL is a transactional language.
  • SQL can process basic queries but TSQL can be used to create applications and add services to them.
  • At a given time only a single statement can be processed using SQL while a load of statements can be processed using different control and iteration structures of T-SQL.
  • SQL can be embedded into TSQL but the vice versa isn’t possible.
  • Unlike SQL, TSQL is Turing complete and more robust.
  • Unlike SQL, T-SQL offers easy integration with Microsoft Business Intelligence tools like PowerBI.

Advantages of TSQL

TSQL helps in fast-paced development through better interaction with the SQL Server. The advantages of using TSQL are:   

  • TSQL offers modular programming and its extensions enhance its programmability.
  • Increased reliability and proprietary security of the server.
  • Efficient handling of sensitive data to reduce security threats.
  • Minimizes traffic over the server while easily managing complex tasks.
  • Allows incorporation of programming logic into the database.
  • Provides better control over the database instance.

Click here to get latest MSBI Interview Questions and Answers for 2022

MSBI Training

Weekday / Weekend Batches

 Conclusion

Normalization aids in the easy organization of a database and TSQL assists in writing compact codes. Using these two concepts together makes the database and codes more readable and less vulnerable. The main areas of focus while using these will be designing tables as per the database architecture, reviewing and optimizing Query performance, and scaling the database by implementing it on the cloud. Using these in combination will help developers integrate Microsoft Business Intelligence for business analytics.

Other Related Articles:

1. SSIS Interview Questions

2. MSBI Interview Questions

3. Jaspersoft Training



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


What is Data Science?

Data science is the study of how to gain insightful knowledge from data for business choices, developing strategies, and other reasons utilizing state-of-the-art analytical technologies and scientific ideas. Businesses are becoming aware of its significance: among other things, data science insights assist companies in improving their marketing and sales efforts as well as operational effectiveness. They might eventually give you a competitive edge over other businesses.

Data Science combines a number of fields, including statistics, mathematics, software programming, predictive analytics, data preparation, data engineering, data mining, machine learning, and data visualization. Skilled data scientists are generally responsible for it, however, entry-level data analysts may also be engaged. Additionally, a growing number of firms now depend in part on citizen data scientists, a category that can encompass data engineers, business intelligence (BI) specialists, data-savvy business users, business analysts, and other employees without a formal experience in Data Science.

 Become a Data Science Certified professional by learning this HKR Data Science Training!

What is Linear Algebra

Within Data Science and ML, linear algebra is a field of mathematics that is very helpful. In machine learning, linear algebra is perhaps the most crucial math concept. The vast majority of machine learning models may be written as matrices. A matrix is a common way to represent a dataset. The preprocessing, transformation, and assessment of data and models require linear algebra.

A study of linear algebra may involve the following:

  • Vectors
  • Matrices
  • Transpose of a matrix
  • The inverse of a matrix
  • Determinant of a matrix
  • Trace of a matrix
  • Dot product
  • Eigenvalues
  • Eigenvectors

Why learn Linear Algebra in Data Science?

One of the fundamental building elements of Data Science is linear algebra. Without a solid foundation, you cannot erect a skyscraper, can you? Try to picture this example:

You wish to use Principal Component Analysis to minimize the dimensionality of your data (PCA). If you were unsure of how it would impact your data, how would you choose how many Principal Components to keep? Obviously, in order to make this choice, you must be familiar with the workings of the algorithm.

You will be able to gain a better sense for ML and deep learning algorithms and stop treating them as mysterious black boxes if you have a working knowledge of linear algebra. This would enable you to select suitable hyperparameters and create a more accurate model. Additionally, you would be able to develop original algorithms and algorithmic modifications.

Linear Algebra Applications for Data Scientists

We will now learn more about the most common application of linear algebra for data scientists:

Machine learning: loss functions and recommender systems

Without a question, the most well-known use of artificial intelligence is machine learning (AI). Systems automatically learn and get better with experience employing machine learning algorithms, free from human intervention. In order to detect trends and learn from them, machine learning works by creating programs that access and analyze data (whether static or dynamic). The algorithm can use this expertise to analyze fresh data sets once it has identified relationships in the data. (See this page for more information on how algorithms learn.)

Machine learning uses linear algebra in many different ways, including loss functions, regularization, support vector classification, and plenty more.

Join our Data science Course in Singapore today and enhance your skills to new heights!

Data Science Certification Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning
Loss Function

Machine learning algorithms function by gathering data, interpreting it, and then creating a model via various techniques. They can then forecast upcoming data queries depending on the outcomes.

Now, we may assess the model’s correctness by utilizing linear algebra, specifically loss functions. In a nutshell, loss functions provide a way to assess the precision of the prediction models. The output of the loss function will be greater if the model is completely incorrect. In contrast, a good model will cause the function to return a lower value.

Modeling a link involving a dependent variable, Y, and numerous independent variables, Xi’s, is known as regression. We attempt to build a line in place on these variables upon plotting these points, and we utilize this line to forecast future values of Xi’s.

The two most often used loss functions are mean squared error and mean absolute error. There are many different forms of loss functions, many of which are more complex than others.

 Become a Data Science with Python Certified professional by learning this HKR Data Science with Python Training!

Recommender System

A subset of machine learning known as recommender systems provides consumers with pertinent suggestions based on previously gathered data. In order to forecast what the present user (or a new user) might like, recommender systems employ data from the user’s prior interactions with the algorithm focused on their interests, demographics, and other available data. By tailoring material to each user’s tastes, businesses can attract and keep customers.

The performance of recommender systems depends on two types of data being gathered: 

Characteristic data: Knowledge of things, including location, user preferences, and details like their category or price.

User-item interactions: Ratings and the volume of transactions (or purchases of related items).

Are you looking Sample Resume for Data science? Check it out Data Science Sample Resume

Natural language processing: word embedding

Artificial intelligence’s Natural Language Processing (NLP) field focuses on how to connect with people through natural language, most frequently English. Applications for NLP encompass textual analysis, speech recognition, and chatbot.

Applications such as Grammarly, Siri, and Alexa are all based on the concept of NLP.

Word embedding

Text data cannot be understood by computers, not by its own. We use NLP algorithms on text since we need to mathematically express the test data. The use of algebra is now necessary. A sort of word representation known as word embedding enables ML algorithms to comprehend terms with comparable meanings.

With the backdrop of the words still intact, word embeddings portray words as vectors of numbers. These representations are created using the language modeling learning technique of training various neural networks on a huge corpus of text. Word2vec is among the more widely used word embedding methods.

Computer vision: image convolution

Using photos, videos, and deep learning models, the artificial intelligence discipline of computer vision teaches computers to comprehend and interpret the visual environment. This enables algorithms to correctly recognize and categorize items. 

In applications like image recognition as well as certain image processing methods like image convolution and image representation like tensors, we utilize linear algebra in computer vision.

Image Convolution

Convolution results from element-wise multiplying two matrices and then adding them together. Consider the image as a large matrix and the kernel (i.e., convolutional matrix) as just a tiny matrix used for edge recognition, blurring, as well as related image processing tasks. This is one approach to conceiving image convolution. As a result, this kernel slides over the image from top to bottom and from left to right. While doing so, it performs arithmetic operations at every image’s (x, y) location to create a distorted image.

Different forms of image convolutions are performed by various kernels. Square matrices are always used as kernels. They are frequently 3×3, however, you can change the form depending on the size of the image.

Acquire Data Science with R certification by enrolling in the HKR Data Science with R Training program in Hyderabad!

HKR Trainings Logo

Subscribe to our YouTube channel to get new updates..!

Where do we use linear algebra in Data Science?

Data Scientists often make use of Linear Algebra for various applications including:

  • Vectorized Code: To create vectorized codes that are relatively more effective than their non-vectorized counterparts, linear algebra is helpful. This is so that results from vectorized codes can be produced in a single step instead of results from non-vectorized codes, which frequently involve numerous steps and loops.
  • Dimensionality Reduction: In the preparation of data sets required for machine learning, dimensionality reduction is a crucial step. This is particularly true for big data sets or those with many attributes or dimensions. Many of these characteristics may occasionally have a strong correlation with one another.

The speed and effectiveness of the ML algorithm are improved by doing dimensionality reduction on a big data set. This is due to the fact that the algorithm only needs to consider a small number of features before producing a forecast.

Top 30 frequently asked Data Science Interview Questions !

Concepts of linear algebra for Data Science

Linear Algebra for Data Preprocessing – Linear algebra is used for data preprocessing in the following way:

  • Import the required libraries for linear algebra such as NumPy, pandas, pylab, seaborn, etc.
  • Read datasets and display features
  • Define column matrices to perform data visualization

Covariance Matrix– One of the most crucial matrices in Data Science and ML is the covariance matrix. It offers details on the co-movement (correlation) of characteristics. We can create a scatter pair plot to see how the features are correlated. One could construct the covariance matrix to determine the level of multicollinearity or correlation between characteristics. The covariance matrix could be written as a symmetric and real 4 x 4 matrix.
A unitary transformation, commonly known as a Principal Component Analysis (PCA) transformation, can be used to diagonalize this matrix. We note that the sum of the diagonal matrix’s eigenvalues equals the total variance stored in features because the trace of a matrix stays constant during a unitary transformation.

Linear Discriminant Analysis Matrix – The Linear Discriminant Analysis (LDA) matrix is another illustration of a realistic and symmetrical matrix in Data Science. This matrix could be written as follows

Linear Discriminant Analysis Matrix

where SW stands for the scatter matrix within the feature and SB for the scatter matrix between the feature. It implies that L is real and symmetric because the matrices SW & SB are also realistic and symmetrical. A feature subspace with improved class separability and decreased dimensionality is created by diagonalizing L. So, whereas PCA is not a supervised method, LDA is.

Data Science Certification Training

Weekday / Weekend Batches

Conclusion

Often a skipped-over concept due to premeditated assumptions of difficulty, a good hold over linear algebra could help build a crucial foundation for those aspiring to have flourishing careers in Data Science.

Related blogs :

Data Science vs Business Analytics



Source link