Scalar Functions vs Table-Valued Functions

September 26, 2018October 3, 2018 Danilo BragaLeave a comment

Today I’m going to give you another tip, but now focusing on Scalar Functions vs Table-Valued Functions.

Many developers love to use Scalar Functions inside of SQL Server because it lets them easily reuse code, which is a common goal in object-oriented programming.
Why would you ever write the same piece of logic more than once after all?

But the problem with Scalar Functions is that the query optimizer performs row-by-row processing instead of set-based logic.
Scalar Functions are known as “silent performance killer“, because even if we use some of the techniques in our benchmark toolkit, like looking at SET Statistics IO, and execution plans, we wouldn’t be able to see the row-by-row processing.

Let’s take a look at a sample query using Scalar Function:

CREATE FUNCTION GetmaxProductQty_Scalar (@ProductId INT)
returns INT
AS
  BEGIN
      DECLARE @maxQty INT 

      SELECT @maxQty = Max(sod.orderqty)
      FROM   Sales.SalesOrderDetail sod
      WHERE  sod.productid = @ProductId 

      RETURN ( @maxQty )
  END

Here’s a query that uses the new function. It does nothing more than getting the max quantity sold for each product in the Production.Product table.

SELECT productid,
       dbo.GetMaxProductqty_scalar(productid) m
FROM   Production.Product
ORDER  BY productid

If we look at the SET Statistics IO, you might think this is great query just because we only have 15 logical reads. But does that mean that we don’t have anything else to improve here?

Query1

Keep in mind that this is just the Product’s table, and our function is referencing SalesOrderDetail. That is the reason why this is a “silent performance killer”.
If we check the execution plan, there is only an Index Scan from Product’s table.

plan1

However, using SQL Server Profiler, we are able to see the row-by-row processing:

Profiler

Now, let’s create the Table-Valued Function, doing exactly the same thing on SalesOrderDetail.
You will notice that there is a slight difference compared to the previous function GetMaxProductQty_Scalar.

CREATE FUNCTION GetMaxProductQty_inline (@ProductId INT)
returns TABLE
AS
    RETURN
      (SELECT Max(sod.orderqty) AS maxqty
       FROM   sales.SalesOrderdetail sod
       WHERE  sod.productid = @ProductId)

You can check the following examples using the new function:

SELECT      productid,
            maxqty
FROM        production.product
CROSS apply dbo.GetMaxProductqty_Inline(productid) MaxQTy
ORDER BY    productid

SELECT   productid,
         ( SELECT maxqty
FROM dbo.GetMaxProductqty_Inline(productid)) MaxQty
FROM     production.product
ORDER BY productid

If we now look at the SET Statistics IO, we can see the information from SalesOrderDetail:

Statistics2

The other thing to notice is that our execution plan has changed, and now includes everything. We can see the index scan on SalesOrderDetail, so we still may have some opportunities for improvement.
plan11
The main point here is that our silent performance killer is gone, and we don’t see the row-by-row processing anymore. So, if you have the opportunity to push that over to a Table Function, it will allow you to use set-based logic, which gives you much faster execution times.

The sessions from Microsoft Ignite 2018

September 29, 2018October 2, 2018 Danilo BragaLeave a comment

SQL Server 2017 has brought to market a new modern data platform including support for Linux, Docker Containers and rich features in intelligent performance, HADR, machine learning, and graph database. Come learn about the roadmap and new functionality planned for SQL Server including intelligent query processing, data virtualization, new features for mission critical security and HADR, and new scenarios for Linux and Docker Containers.

Release announcement for SQL Server 2019 preview
A detailed introduction to SQL Server big data clusters
General Availability of Azure Data Studio
SQL Server 2019 preview and Containers
Detailed blog post on database engine enhancements for SQL Server 2019 preview

Below are the videos from the SQL Server 2019 public preview sessions at Ignite.

It’s a deep dive across the investments and features released in SQL Server 2019. There is more than 10 hours of content. Enjoy!

Specify ORDER BY if you need ordered data

September 16, 2018October 3, 2018 Danilo BragaLeave a comment

Some of you guys may not know that, so I’d like to share it.

If you need ordered data and your query does not have an ORDER BY clause, the data might return in any order. Yeah. It’s not guaranteed that it will return based in the order of the “primary key”.
So, you can’t assume that SQL Server has some “default” order. You might believe there is, because usually when you SELECT a specific table it will use the cluster index(PK) with identity IDs.

But, it’s not always the case. Check the following example:

	–Adding 100K rows to the test table.
	CREATE TABLE dbo.Test
	(ID INT NOT NULL ,
	RandomInt INT NOT NULL ,
	CharFiller CHAR(1000) NULL ,
	CONSTRAINT PK_Test PRIMARY KEY ( ID )
	) ;
	SET NOCOUNT ON ;
	DECLARE @ID INT ;
	SET NOCOUNT ON ;
	SET @ID = 1 ;
	WHILE @ID < 100000
	BEGIN
	INSERT INTO dbo.Test ( ID, RandomInt, CharFiller )

	SELECT @ID ,
	RAND() * 1000000 ,
	'dcb' ;
	SET @ID = @ID + 1 ;
	END ;

view raw

Adding 100K rows to the test table.sql

hosted with ❤ by GitHub

The following query will run just fine, and “as expected”.

SQL1
Now, let’s create the following index.

CREATE INDEX Test_RandomInt ON dbo.Test(RandomInt);

The same query executed before is now broken, just because SQL Server has decided to change the index in the execution plan .

SQL2
I’m not saying that you should start sorting every single query (it’s very expensive performance wise). But, if you really need it, you must provide the ORDER BY clause.

It’s preferable to sort on the app side from a resource perspective, because generally it’s easier to add more servers or add more CPU and Memory to the application layer. It’s much more difficult, and expensive, to do so for the SQL servers.

Recommend MaxDop settings for SQL Server

July 16, 2018July 16, 2018 Danilo BragaLeave a comment

The Microsoft SQL Server max degree of parallelism (MAXDOP) configuration option controls the number of processors that are used for the execution of a query in a parallel plan. This option determines the computing and thread resources that are used for the query plan operators that perform the work in parallel. Depending on whether SQL Server is set up on a symmetric multiprocessing (SMP) computer, a non-uniform memory access (NUMA) computer, or hyperthreading-enabled processors, you have to configure the max degree of parallelism option appropriately.

When setting up a new SQL Server, I usually use the following code to determine a good starting point for the MAXDOP setting:

view raw

Recommend MaxDop settings.sql

hosted with ❤ by GitHub

I recommend setting MAXDOP to a setting that makes sense for your server load. Only you can determine the best setting for your environment. This script is just a good way to start.

Error updating temporal tables

July 10, 2018July 10, 2018 Danilo BragaLeave a comment

Scenario

At this point, you have probably heard about the feature called System Versioned Temporal Tables released on SQL Server 2016. So, let’s imagine that your team has decided to implement that on a table called “Orders”.

The application is using different access patterns to modify the data in this table. There are direct statements coming from the application or the application is running long batches in explicit transactions where it makes several changes to multiple tables. The update on the “Orders” table is not the first statement in those long batches! So, suddenly you start seeing the following error message:

Data modification failed on system-versioned table “Orders” because transaction time was earlier than period start time for affected records.

You can simulate the error message running the following script:

CREATE TABLE dbo.Orders
(
[OrderId] INT NOT NULL PRIMARY KEY CLUSTERED
, [OrderValue] DECIMAL(19,4)
, [ValidFrom] DATETIME2 (2) GENERATED ALWAYS AS ROW START
, [ValidTo] DATETIME2 (2) GENERATED ALWAYS AS ROW END
, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.OrdersHistory));
GO
INSERT dbo.Orders ([OrderId], [OrderValue])
VALUES (1, 9.99), (2, 9.99);
GO
SELECT * FROM dbo.Orders;
GO

--Run first query
BEGIN TRAN
WAITFOR DELAY '00:00:15';
UPDATE dbo.Orders
SET [OrderValue] = [OrderValue] + 1;
COMMIT TRAN

--Run Query 2 in another session
BEGIN TRAN
UPDATE dbo.Orders
SET [OrderValue] = [OrderValue] + 1;
COMMIT TRAN

But, what’s the problem here?

The system generated start and end times are keyed to the server’s system time at the moment the BEGIN TRANSACTION is executed.

The times recorded in the system datetime2 columns are based on the begin time of the transaction itself. For example, all rows inserted within a single transaction will have the same UTC time recorded in the column corresponding to the start of the SYSTEM_TIME period.

Is that a bug?

No, this is a standard behavior of System Versioned Temporal Tables.
More info: https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-2017#how-does-temporal-work

Workarounds

Build in exception handling that can retry the transaction.
Use triggers instead of Temporal Tables
Wait for Microsoft to build/release the Application-time feature (that is part of the Standard). More info: https://en.wikipedia.org/wiki/SQL:2011

	declare @hyperthreadingRatio bit
	declare @logicalCPUs int
	declare @HTEnabled int
	declare @physicalCPU int
	declare @SOCKET int
	declare @logicalCPUPerNuma int
	declare @NoOfNUMA int

	select @logicalCPUs = cpu_count — [Logical CPU Count]
	,@hyperthreadingRatio = hyperthread_ratio — [Hyperthread Ratio]
	,@physicalCPU = cpu_count / hyperthread_ratio — [Physical CPU Count]
	,@HTEnabled = case
	when cpu_count > hyperthread_ratio
	then 1
	else 0
	end — HTEnabledfrom sys.dm_os_sys_info
	option (recompile);

	select @logicalCPUPerNuma = COUNT(parent_node_id) — [NumberOfLogicalProcessorsPerNuma]
	from sys.dm_os_schedulers
	where [status] = 'VISIBLE ONLINE'
	and parent_node_id < 64group by parent_node_id
	option (recompile);

	select @NoOfNUMA = count(distinct parent_node_id)
	from sys.dm_os_schedulers — find NO OF NUMA Nodes
	where [status] = 'VISIBLE ONLINE'
	and parent_node_id < 64

	— Report the recommendations ….select
	— 8 or less processors and NO HT enabled
	case
	when @logicalCPUs < 8
	and @HTEnabled = 0
	then 'MAXDOP setting should be : ' + CAST(@logicalCPUs as varchar(3))
	— 8 or more processors and NO HT enabled
	when @logicalCPUs >= 8
	and @HTEnabled = 0
	then 'MAXDOP setting should be : 8'
	— 8 or more processors and HT enabled and NO NUMA
	when @logicalCPUs >= 8
	and @HTEnabled = 1
	and @NoofNUMA = 1
	then 'MaxDop setting should be : ' + CAST(@logicalCPUPerNuma / @physicalCPU as varchar(3))
	— 8 or more processors and HT enabled and NUMA
	when @logicalCPUs >= 8
	and @HTEnabled = 1
	and @NoofNUMA > 1
	then 'MaxDop setting should be : ' + CAST(@logicalCPUPerNuma / @physicalCPU as varchar(3))
	else ''
	end as Recommendations

Share this:

Share this:

Share this:

Share this:

Scenario

But, what’s the problem here?

Is that a bug?

Workarounds

Share this: