Two of my favourite keywords in SQL Server are SCHEMA and MERGE. They are two of the most useful facilities available, yet they seem to be consistently underused at any organisation I am a part of. I really don’t understand why this is the case, particularly when I work on newer systems that make heavy use of stored procedures, as they so often help simplify and organise data in large systems. I suspect the lack of use might have something to do with the “software developer mindset”, which tends to be oriented more around procedures than sets, but that’s a topic of conversation for another time.
The SCHEMA keyword enables database objects to be placed in to logical namespaces, which I find to be most useful when engaging in domain-driven design or preparing a database for some sort of ETL process.
MERGE has consistently been a huge help in both ETL processes and when scrubbing data. The syntax is extremely clear and very flexible, and it has built in auditing capabilities via the OUTPUT clause.
Surprisingly though, I recently experienced the first time where I would actually use both of these features together. Specifically, I had tables of data in a schema named [DataSync], which represented a completely clean data set extracted and loaded from a secondary data source. These data sets needed to be merged in to tables in the [dbo] schema. I turned to my favourite SQL feature, creating a stored procedure that would exist in the same [DataSync] namespace as the tables it was to draw its data from, using MERGE to insert or update as needed.
When done, I tried a test run of the stored procedure on a copy of my data. Everything seemed to go well, running the procedure resulted in no error messages. I checked my auditing table, and saw lots of rows of data updates were captured. Yet when I compared the modified table of data with the original, there was no difference between the two. I’d used SQL Data Compare for the comparison, so there definitely wasn’t anything wrong with that. But there it was, an audit of changes where none had actually happened.
Here’s a short snippet of the statement I wrote, slightly modified of course to keep company data protected. Can you spot my mistake?
MERGE Company AS Target USING ValidCompanyData AS Source ON Target.CompanyId = Source.CompanyId
It’s easy to miss, and certainly not something I’d have expected to be a problem. Here’s the snippet again, this time written to actually work:
MERGE [dbo].Company AS Target USING ValidCompanyData AS Source ON Target.CompanyId = Source.CompanyId
Purely on a hunch I decided to prefix the target table name, Company, with the default schema [dbo]. I ran the procedure again, and this time got the result I wanted; hundreds of rows of data scrubbed and corrected.
I often see other developers skip prefixing their table names in queries whenever they’re namespaced inside [dbo]. Generally it’s okay, although it does bother me when it’s done across databases, ala dbname..tablename. My advice is to just be cautious: if you’re ever in doubt, be explicit with the schema when you query your tables. Especially if you’re in a database you’re not very familiar with. There’s probably a DBA out there somewhere who will thank you.