Sunday, July 6, 2014

COLLATION DEFINATION AND OVERVIEW

What is collation and can I change it on my MS SQL database?

Collation controls the way string values are sorted. The default collation sorts alphabetically using the standard Latin character set. Other collations will sort in different orders. For example, a traditional Spanish collation places words beginning with "ch" at the end of a list of words beginning with "c."
It is not necessary to change the default collation to use non-Latin characters. All shared MS SQL databases support Unicode characters, allowing the storage of a variety of alphabets.
Modifying collation settings is an advanced database task and should only be performed by experienced database developers. If you determine it is necessary to change your collation, we recommend modifying it at the table or column level. Making this change at the database level disables some Hosting Control Panel functionality, including the installation and removal of ASP Schema objects. Modifying at the table or column level avoids the majority of known compatibility issues with non-standard collation settings.
The following T-SQL sample illustrates how to set the collation at the column level:

CREATE TABLE [DBO].[COLLATIONTEST](
[KEYCOLUMN] [BIGINT] NULL,
[STRCOLUMN] [VARCHAR](MAX) COLLATE TRADITIONAL_SPANISH_CI_AI NULL

) ON [PRIMARY]
  What is collation?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types and character width.

Case sensitivity:
If A and aand b, etc. are treated in the same way then it is case-insensitive. A computer treats A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is 65, while is 97. The ASCII value of B is66 and is 98.
Accent sensitivity:
If a and áo and ó are treated in the same way, then it is accent-insensitive. A computer treats a and á differently because it uses ASCII code for differentiating the input. The ASCII value of a is 97 and áis 225. The ASCII value of o is111 and ó is 243.
Kana Sensitivity:
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.
Width sensitivity:
When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.

Database, Tables and columns with different collation

SQL Server 2000 allows the users to create databases, tables and columns in different collations.

Databases with different collation

use master
go
create database BIN collate Latin1_General_BIN
go
create database CI_AI_KS collate Latin1_General_CI_AI_KS
go
create database CS_AS_KS_WS collate Latin1_General_CS_AS_KS_WS
go

Tables and columns with different collation

Create table Mytable (
[colu] char(10) COLLATE Albanian_CI_AI_KS_WS NULL,
[Maydate] [char] (8) COLLATE Korean_Wansung_Unicode_CS_AS_KS NOT NULL ,
[Risk_Rating] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL 
)

Comparing characters on the databases with different collation

When we run the below code in CI_AI_KS and CS_AS_KS_WS the results will be completely different.
declare @Accentvar1 char(1)
declare @Accentvar2 char(1)
declare @Casevar1 char(1)
declare @Casevar2 char(1)
set @casevar1 ='A'
set @casevar2 ='a'
set @Accentvar1 ='a'
set @Accentvar2 ='á'

if @casevar1 = @casevar2
 begin
  print "A and a are treated same"
 end
 else
 begin
  print "A and a are not treated same"
 end

if @Accentvar1 = @Accentvar2 
 begin
  print "A and á are treated same"
 end
 else
 begin
  print "A and á are not  treated same"
 end
When we execute these statements on a CI_AI_KS database, the results are similar to those shown below.

 A and a are treated same
 A and á are treated same

When we execute these statements on a CS_AS_KS_WS database, the results are similar to those shown below.

 A and a are not treated same
 A and á are not treated same

Simulating case sensitivity in a case in-sensitive database

It is often necessary to simulate case sensitivity in a case insensitive database. The example below shows how you can achieve that.
Use CI_AI_KS
go
declare @var1 varchar(10)
declare @var2 varchar(10)
set @var1 ='A'
set @var2 ='a'
if ASCII(@var1) = ASCII(@var2)
print "A and a are treated same"
else
print "A and a are not same"
However, the function ASCII cannot be used for words. In order to achieve the same functionality of simulating case sensitiveness, we can use the varbinary data type.
Use CI_AI_KS
go
declare @var1 varchar(10)
declare @var2 varchar(10)
set @var1 ='Good'
set @var2 ='gooD'
if cast(@var1 as varbinary) = cast(@var2 as varbinary) 
print "Good and gooD are treated same"
else
print "Good and gooD are not treated same"


Simulating case insensitivity in a case sensitive database

It is often necessary to simulate case insensitivity in a case sensitive database. The example below displays how you can achieve that.
Use CS_AS_KS_WS
go
declare @var1 varchar(10)
declare @var2 varchar(10)
set @var1 ='A'
set @var2 ='a'
if upper(@var1) = upper(@var2)
print "A and a are treated same"
else
print "A and a are not same"
go
Use CS_AS_KS_WS
go
declare @var1 varchar(10)
declare @var2 varchar(10)
set @var1 ='Good'
set @var2 ='gooD'
if upper(@var1) = upper(@var2 )
print "Good and gooD are treated same"
else
print "Good and gooD are not treated same"

Simulating Accent sensitivity in a Accent insensitive database

It is often necessary to simulate case sensitivity in a case insensitive database:
Use CI_AI_KS
go
declare @var1 varchar(10)
declare @var2 varchar(10)
set @var1 ='A'
set @var2 = 'á'
if ASCII(@var1) = ASCII(@var2)
print "A and á are treated same"
else
print "A and á are not treated same"
Again, the function ASCII cannot be used for words. In order to achieve the same functionality of simulating case sensitiveness, we can use the varbinary data type:
Use CI_AI_KS
go
declare @var1 varchar(10)
declare @var2 varchar(10)
set @var1 ='Gold'
set @var2 ='Góld'
if cast(@var1 as varbinary) = cast(@var2 as varbinary) 
print " Gold  and Góld  are treated same"
else
print " Gold  and Góld  are not treated same"

Change the collation

In SQL Server 7.0, Collation can be setup only on the server level during installation. In order to change the collation, you have to rebuild the master. This is located in the Program Files\Microsoft SQL Server\80\Tools\Binn directory.
In SQL Sever 2000, you can change collation on the database level and column level just by using alter statements.
Create database testNorwegian collate Danish_Norwegian_CI_AI_KS
go
Alter database testNorwegian collate Danish_Norwegian_CI_AI
go

Create table Mytable (
[colu] char(10) COLLATE Albanian_CI_AI_KS_WS NULL,
[Maydate] [char] (8) COLLATE Korean_Wansung_Unicode_CS_AS_KS NOT NULL ,
[Risk_Rating] [char] (2) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL 
)
go
Alter table Mytable alter column [colu] char(10) COLLATE Korean_Wansung_Unicode_CS_AS_KS NULL
go

Find the collation of the current server

select SERVERPROPERTY ('collation')

Find the collation of a particular database

select convert(sysname,DatabasePropertyEx('CS_AS_KS_WS','Collation'))

Find the collation of the current database

select convert(sysname,DatabasePropertyEx(db_name(),'Collation'))

Find collation of all columns in a table

select name, collation from syscolumns where [id]=object_id('Mytable')

Find all collation available in SQL Server

select * from ::fn_helpcollations()

Conclusion

This article explains the basic concept of collation and its uses. You can create functions for simulating accent insensitive in an accent sensitive database by mapping a for á. Kana sensitivity and Width sensitivity are focused mainly on Asian languages.
                                                                                                                  

No comments:

Post a Comment