Archive for the ‘SQL Server 2014’ Category

SQL Server Spatial

Author:  Cole Francis, Architect

THE PROBLEM

So, wrap your head around this for a minute.  Let’s suppose that you’re a small company that sells casualty insurance to property owners in South Florida.

Your business is thriving, but you feel like you’ve completely saturated the market in that region, so now you want to expand your offerings to a small territory in central Florida.

After conducting rigorous research on the demographic data, you realize that hurricane insurance along the central-east coast is white hot right now.  You conclude that this is the result of booming home turnovers within that area.

The demographic area that your team has decided to pursue starts at West Palm Beach and extends all the way down to Miami.  What’s more, most of the activity in the area appears to be occurring between the coast and 10 miles inland.

So, you assemble your Sales and Marketing Team, and you provide them with the results of your fact finding.  Then you ask them to formulate a strategy that will allow the company to maximize their insurance sales efforts within that area.

Days later, the team reassembles and tells you what they found out about the area.  It turns out they recommend targeting a small territory of homes, which have recently sold along the A1A in West Palm Beach along Ocean Blvd, suffering from sudden turnover.

The team also explains that the sudden turnover is predominantly due to a large concentration of aging homeowners in the area that are selling their large homes and are opting for smaller living arrangements.

Much to everyone’s glee, the area also comes with a strong per capita household income, has a low crime rate, and has an average occurrence of natural disasters.

The entire team is excited about pursuing the new region per the demographic data, so their next step is to carefully map out the latitude/longitude coordinates of the area using their favorite mapping website.  Here are the coordinates they’ve used to construct their target market area.  Do you notice how they form a nice little polygon?

A1A

Next, you march down to the Palm Beach County Clerk’s Office and request an Excel Spreadsheet containing the addresses and latitude/longitude coordinates for all new and existing homes sales in the area for the past ninety days.

Of course, the spreadsheet the county offers lists far more addresses and geocoordinates than the small demographic region that your business is targeting.

Therefore, it’s up to your company to pare down the county’s results to only include addresses that are inside your target demographic polygon.  Realizing the manual complexity of this effort, you hand the data and other project artifacts over to your technical team to figure out.

Regardless, in most cases you can easily tell if a home’s latitude/longitude coordinates are well within the acceptable range by just visually inspecting them.

To this point, what I’m showing you in the picture below are the coordinates of those homes that are on the fringe of acceptability, meaning we can’t easily tell if the residences are inside or outside our target demographic polygon using a simple visual inspection.

Given this, you’ll need a quick way to process this data and pare the results down to only those households that the Sales and Marketing Team wants to pursue in the defined area.  So, how can you do this?

Well, you can either manually check them in Google Maps, but this would mean that you either have a really small set of data or a whole lot of time on your hands.  😁

Or, you can try a more automated approach using a platform like SQL Server.  Why SQL Server you ask?  The answer really lies in that there are two spatial data types in SQL Server that can help us quickly solve this problem.  The data types are called Geography and Geometry.

It’s important that you understand the difference between these two data type objects, including what they are and how they are used, because they’re similar but definitely not the same.

THE GEOGRAPHY DATATYPE

Although the geography data type sounds like the right fit for what we’re about to accomplish, it actually compounds the problem.  This is because the Geography data type is used for terrestrial spatial data covering the convex surface of the Earth.

Because of its ellipsoidal nature, any polygons that you use to define an area cannot exceed a single hemisphere and must specify the correct ring orientation.

It’s for these very reasons that simply drawing a polygon somewhere on its surface doesn’t give you enough information to make accurate determinations about geocoordinates that get fed into it.

For instance, if I were to draw an area around the Equator of the globe, and then I were to ask you if a specific latitude/longitude coordinate fell inside or outside the boundary I just drew, you wouldn’t have enough information to answer my question.

Why you might ask?  It’s because you would have to know if I was targeting the northern or southern ring of the Equator, the western or eastern hemisphere, and whether the polygon I constructed was meant to include or exclude the target latitude/longitude coordinates.

THE GEOMETRY DATATYPE

When I think of the SQL Server geometry data type and apply it to this problem, one of the first things that comes to my mind is a 1991 book, “Inventing the Flat Earth”, written by retired University of California Professor, Jeffrey Burton Russell.  In the book, Russell discusses how the “flat Earth” myth was disseminated by early 19th century writers like Antoinne-Jean Letronne, and others of course.

In the case of this data type, Microsoft SQL Server takes an opposite viewpoint of Russell.   Instead, they provide a geometry data type that allows us to construct the problem and solution using what you might think of as a “steroidal planar Earth object”, which conforms to the Open Geospatial Consortium (OGC).

I actually coined the term “Steroidal Flat Earth Object”, because the object’s range extends far beyond the -90 to 90 latitudes and -180 to 180 longitude maximums defining the Earth’s geographic range.

Because it’s just a very large, single-dimension plane, it’s not necessary to define if the target coordinates lie to the West or East of the Prime Meridian when using this data type.

However, there are still some basic rules that need to be followed in order to construct a well-formed polygon.  One is that the sequence order in which the vertices get added is important, just as I mentioned earlier in this article.  If you get the sequence wrong, then your polygon could have issues like the example shown below.

PolygonVertices

The other important rule is that you must make sure that your final coordinate is equal to your first coordinate in order to officially close the loop on your polygon.  If you don’t close the loop on the polygon, then SQL Server throws an error when you try to execute the code.

But, once you get things right, the result is a built-in SQL Server math function that offers an accurate determination on whether a point (i.e. a single coordinate) or a line (i.e. one or more coordinates) intersects or lies inside the perimeter of a well-defined polygon.

Pretty nifty, huh?  So, here’s the solution…

THE SOLUTION (COPY & PASTE THIS INTO SQL SERVER, AND THEN RUN IT)

--create a variable table
declare @coordinates table
(
id int identity (1,1),
coordinate geometry
)
declare @recordCount as int

-- @area represents the demographic area
-- *note: The sequence in how you add each point to construct
-- the polygon is very important.
declare @area as geometry

--turn off the verbose SQL Server logging
set nocount on

-- enter the fringe lat/long County coordinates into the table (this process would normally be automated).
insert into @coordinates (coordinate)
select geometry::Point(26.688379,-80.034762,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.684662,-80.037804,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.679250,-80.035808,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.674490,-80.037095,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.675722,-80.039778,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.700533,-80.034413,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.675402,-80.036230,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.694902,-80.037331,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.692697,-80.039852,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.693809,-80.041225,4326)
insert into @coordinates (coordinate)
select geometry::Point(26.677084,-80.037010,4326)

-- set up the demographic area by constructing a polygon of coordinates.-- *note: the final coordinate MUST match the first coordinate in order-- to close the polygon.  If it’s left open, then it won’t work.
-- *note: the sequence of the coordinates used to construct the polygon is also important.
set @area = geometry::STGeomFromText('POLYGON(( 26.697583 -80.033367, 26.691103 -80.033635, 26.681781 -80.035083, 26.679250 -80.035094, 26.679116 -80.035287, 26.674653 -80.035791, 26.672870 -80.035952, 26.668939 -80.035920, 26.668498 -80.035748, 26.665876 -80.036177, 26.664601 -80.036467, 26.661456 -80.036467, 26.661101 -80.038184, 26.661417 -80.038431, 26.663517 -80.038420, 26.665228 -80.038855, 26.665391 -80.038791, 26.666254 -80.038855, 26.666340 -80.038764, 26.666891 -80.038764, 26.667706 -80.038657, 26.669571 -80.038802, 26.670300 -80.038684, 26.670923 -80.039001, 26.671575 -80.039108, 26.671652 -80.039044, 26.673430 -80.039151, 26.673507 -80.039516, 26.675731 -80.039773, 26.683496 -80.038325, 26.686564 -80.037338, 26.687676 -80.037005, 26.689516 -80.037048, 26.691011 -80.037863, 26.692219 -80.039322, 26.692161 -80.040084, 26.693081 -80.040041, 26.693110 -80.038957, 26.694452 -80.038914, 26.694682 -80.038807, 26.694720 -80.040910, 26.694375 -80.041125, 26.694394 -80.042058, 26.697583 -80.033367))', 4326)

select @recordCount = count(id) FROM @coordinates
while (@recordCount > 0)
begin 
     declare @forwardPoint geometry 
     declare @identity int 
     select Top(1) @identity=id, @forwardPoint=coordinate from @coordinates 
     set @forwardPoint = @forwardPoint.MakeValid();

     if (@forwardPoint.STIntersection(@area).ToString() <> 'GEOMETRYCOLLECTION EMPTY') and (@forwardPoint.STIntersection(@area).ToString() <> 'GEOMETRYCOLLECTION EMPTY') 
     begin 
          print 'Id ' + cast(@recordCount as varchar(10)) + ' is inside the target demographic area for lat/lng ' + cast(@forwardPoint as varchar(100)) 
     end 
     else 
     begin 
          print 'Id ' + cast(@recordCount as varchar(10)) + ' is outside the target demographic area for lat/lng ' + cast(@forwardPoint as varchar(100)) 
     end
     
     delete from @coordinates where id = @identity select @recordCount = count(id) FROM @coordinates
 end

-- turn verbose logging back on
set nocount on


THE END RESULT IS THIS

The Sales Team will now pursue every household (represented as a lat/lng coordinate below) that falls inside the target demographic area.  Coordinates that fall outside the target area will be ignored by the Sales Team for now:

Id 11 is inside the target demographic area for lat/lng POINT (26.688379 -80.034762)
Id 10 is inside the target demographic area for lat/lng POINT (26.684662 -80.037804)
Id 9 is inside the target demographic area for lat/lng POINT (26.67925 -80.035808)
Id 8 is inside the target demographic area for lat/lng POINT (26.67449 -80.037095)
Id 7 is outside the target demographic area for lat/lng POINT (26.675722 -80.039778)
Id 6 is outside the target demographic area for lat/lng POINT (26.700533 -80.034413)
Id 5 is inside the target demographic area for lat/lng POINT (26.675402 -80.03623)
Id 4 is inside the target demographic area for lat/lng POINT (26.694902 -80.037331)
Id 3 is inside the target demographic area for lat/lng POINT (26.692697 -80.039852)
Id 2 is outside the target demographic area for lat/lng POINT (26.693809 -80.041225)
Id 1 is inside the target demographic area for lat/lng POINT (26.677084 -80.03701)

For more information on Microsoft SQL Server Spatial Data Types, click here.

Hi, I’m Cole Francis, a Solution Architect for The PSC Group in Schaumburg, IL.  I’ve been successfully designing, developing, and delivering custom software solutions for an impressive and extensive list of well-branded clients for over twenty years.

Thanks for reading and keep on coding! 🙂

Advertisements

XFactor

By: Cole Francis, Architect, PSC, LLC


THE PROBLEM

So, what do you do when you’re building a website, and you have a long-running client-side call to a Web API layer. Naturally, you’re going to do what most developers do and call the Web API asynchronously.  This way, your code can continue to cruise along until a result finally return from the server.

But, what if matters are actually worse than that?  What if your Web API Controller code contacts a Repository POCO that then calls a stored procedure through the Entity Framework.  And, what if the Entity Framework leverages a project dedicated database, as well as a system-of-record database, and calls to your system-of-record database sporadically fail?

Like most software developers, you would lean towards looking at the log files, offering traceability and logging for your code.  But, what if there wasn’t any logging baked into the code?  Even worse, what if this problem only occurred sporadically?  And, when it occurs, orders don’t make it into the system-of-record database, which means that things like order changes and financial transactions don’t occur.  Have you ever been in a situation like this one?


PART I – HERE COMES ELMAH

From a programmatic perspective, let’s hypothetically assume that the initial code had the controller code calling the repository POCO in a simple For/Next loop that iterates a hardcoded 10 times.  So, if just one of the 10 iterating attempts succeeds, then it means that the order was successfully processed.  In this case, the processing thread would break free from the critical section in the For/Next loop and continue down its normal processing path.  This, my fellow readers, is what’s commonly referred to as “Optimistic Programming”.

The term, “Optimistic Programming”, hangs itself on the notion that your code will always be bug-free and operate on a normal execution path.  It’s this type of programming that provides a developer with an artificial comfort level.  After all, at least one of the 10 iterative calls will surely succeed.  Right?  Um…right?  Well, not exactly.

Jack Ganssle, from the Ganssle Group, does an excellent job explaining why this development approach can often lead to catastrophic consequences.  He does this in his 2008 online rant entitled, “Optimistic Programming“.  Sure, his article is practically ten years old at this point, but his message continues to be relevant to this very day.

The bottom line is that without knowing all of the possible failure points, their potential root cause, and all the alternative execution paths a thread can tread down if an exception occurs, then you’re probably setting yourself up for failure.  I mean, are 10 attempts really any better than one?  Are 10,000 calls really any better than 10?  Not only are these flimsy hypothesis with little or no real evidence to back them up, but they further convolute and mask the underlying root cause of practically any issue that arises.  The real question is, “Why are 10 attempts necessary when only one should suffice?”

So, what do you do in a situation when you have very little traceability into an ailing application in Production, but you need to know what’s going on with it…like yesterday!  Well, the first thing you do is place a phone call to The PSC Group, headquartered in Schaumburg, IL.  The second thing you do is ask for the help of Blago Stephanov, known internally to our organization as “The X-Factor”, and for a very good reason.  This guy is great at his craft and can accelerate the speed of development and problem solving by at least a factor 2…that’s no joke.

In this situation, Blago recommends using a platform like Elmah for logging and tracing unhandled errors.  Elmah is a droppable, pluggable logging framework that dynamically captures all unhandled exceptions.  It also offers color-coded stack traces with line numbers that can help pinpoint exactly where the exception was thrown.  Even more impressive, its very quick to implement and requires low personal involvement during integration and setup.  In a nutshell, its implementation is quick and it makes debugging a breeze.

Additionally, Elmah comes with a web page that allows you to remotely view the unhandled exceptions.  This is a fantastic function for determining the various paths, both normal and alternate, that lead up to an unhandled error. Elmah also allows developers to manually record their own information by using the following syntax.

ErrorSignal.FromCurrentContext().Raise(ex);

 

Regardless, Elmah’s capabilities go well beyond just recording exceptions. For all practical purposes, you can record just about any information you desire. If you want to know more about Elmah, then you can read up on it by clicking here.  Also, you’ll be happy to know that you can buy if for the low, low price of…free.  It just doesn’t get much better than this.


PART II – ONE REALLY COOL (AND EXTREMELY RELIABLE) RE-TRY PATTERN

So, after implementing Elmah, let’s say that we’re able to track down the offending lines of code, and in this case the code was failing in a critical section that iterates 10 times before succeeding or failing silently.  We would have been very hard-pressed to find it without the assistance of Elmah.

Let’s also assume that the underlying cause is that the code was experiencing deadlocks in the Entity Framework’s generated classes whenever order updates to the system-of-record database occur.  So, thanks to Elmah, at this point we finally have some decent information to build upon.  Elmah provides us with the stack trace information where the error occurred, which means that we would be able to trace the exception back to the offending line(s) of code.

After we do this, Blago recommends that we craft a better approach in the critical section of the code.  This approach provides more granular control over any programmatic retries if a deadlock occurs.  So, how is this better you might ask?  Well, keep in mind from your earlier reading that the code was simply looping 10 times in a For/Next loop.  So, by implementing his recommended approach, we’ll have the ability to not only control the number of iterative reattempts, but we can also control wait times in between reattempted calls, as well as the ability to log any meaningful exceptions if they occur.

 

       /// <summary>
       /// Places orders in a system-of-record DB
       /// </summary>
       /// <returns>An http response object</returns>
       [HttpGet]
       public IHttpActionResult PlaceOrder()
       {
           using (var or = new OrderRepository())
           {
               Retry.DoVoid(() => or.PlaceTheOrder(orderId));
               return Ok();
           }
       }

 

The above Retry.DoVoid() method calls into the following generic logic, which performs its job flawlessly.  What’s more, you can see in the example below where Elmah is being leveraged to log any exceptions that we might encounter.

 

using Elmah;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace PSC.utility
{
   /// <summary>
   /// Provides reliable and traceable retry logic
   /// </summary>
   public static class Retry
   {
       /// <summary>
       /// Retry logic
       /// </summary>
       /// <returns>Fire and forget</returns>
       public static void DoVoid(Action action, int retryIntervallInMS = 300, int retryCount = 5)
       {
           Do<object>(() =>
           {
               action();
               return null;
           }, retryIntervallInMS, retryCount);
       }

       public static T Do<T>(Func<T> action, int retryIntervallInMS = 300, int retryCount = 5)
       {
           var exceptions = new List<Exception>();
           TimeSpan retryInterval = TimeSpan.FromMilliseconds(retryIntervallInMS);

           for (int retry = 0; retry < retryCount; retry++)
           {
               bool success = true;

               try
               {
                   success = true;

                   if (retry > 0)
                   {
                       Thread.Sleep(retryInterval);
                   }
                   return action();
               }
               catch (Exception ex)
               {
                   success = false;
                   exceptions.Add(ex);
                   ErrorSignal.FromCurrentContext().Raise(ex);
               }
               finally
               {
                   if (retry > 0 && success) {
                       ErrorSignal.FromCurrentContext().Raise(new Exception(string.Format("The call was attempted {0} times. It finally succeeded.", retry)));
                   }
               }
           }
           throw new AggregateException(exceptions);
     }
   }
}

As you can see, the aforementioned Retry() pattern offers a much more methodical and reliable approach to invoke retry actions in situations where our code might be failing a few times before actually succeeding.  But, even if the logic succeeds, we still have to ask ourselves questions like, “Why isn’t one call enough?” and “Why are we still dealing with the odds of success?”

After all, not only do we have absolutely no verifiable proof that looping and reattempting 10 times achieves the necessary “odds of success”.  Therefore, the real question is why there should there be any speculation at all in this matter?  After all, we’re talking about pushing orders into a system-of-record database for revenue purposes, and the ability to process orders shouldn’t boil down to “odds of success”.  It should just work…every time!

Nonetheless, what this approach will buy us is one very valuable thing, and that’s enough time to track down the issue’s root cause.  So, with this approach in place, our number one focus would now be to find and solve the core problem.


PART III – PROBLEM SOLVED

So, at this point we’ve relegated ourselves to the fact that, although the aforementioned retry logic doesn’t hurt a thing,  it masks the core problem.

Blago recommends that the next step is to load test the failing method by creating a large pool of concurrent users (e.g. 1,000) all simulating the order update function at the exact same time.  I’ll also take it one step further by recommending that we also need to begin analyzing and profiling the SQL Server stored procedures that are being called by the Entity Framework and rejected.

I recommend that we first review the execution plans of the failing stored procedures, making sure their compiled execution plans aren’t lopsided.  if we happen to notice that too much time is being spent on individual tasks inside the stored procedure’s execution plan, then our goal should be to optimize them.  Ideally, what we want to see is an even distribution of time optimally spread across the various execution paths inside our stored procedures.

In our hypothetical example, we’ll assume there are a couple of SQL Server tables using complex keys to comprise a unique record on the Order table.

Let’s also assume that during the ordering process, there’s a query that leverages the secondary key to retrieve additional data before sending the order along to the system-of-record database.   However, because the complex keys are uniquely clustered, getting the data back out of the table using a single column proves to be too much of a strain for the growing table.  Ultimately, this leads to query timeouts and deadlocks, particularly under load.

To this end, optimizing the offending stored procedures by creating a non-clustered, non-unique index for the key attributes in the offending tables will vastly improve their efficiency.  Once the SQL optimizations are complete, the next step should be to perform more load tests and to leverage the SQL Server Profiling Tool to gauge the impact of our changes.  At this point, the deadlocks should disappear completely.


LET’S SUMMARIZE, SHALL WE

The moral of this story is really twofold.  (1) Everyone should have an “X-Factor” on their project; (2) You can’t beat great code traceability and logging in a solution. If option (1) isn’t possible, then at a minimum make sure that you implement option (2).

Ultimately, logging and traceability help out immeasurably on a project, particularly where root cause analysis is imperative to track down unhandled exceptions and other issues.  It’s through the introduction of Elmah that we were able to quickly identify and resolve the enigmatic database deadlock problems that plagued our hypothetical solution.

Regardless, while this particular scenario is completely conjectural, situations like these aren’t all that uncommon to run across in the field.  Regardless, most of this could have been prevented by following Jack Ganssule’s 10-year old advice, which is to make sure that you check those goesintas and goesoutas!  But, chances are that you probably won’t.

Thanks for reading and keep on coding! 🙂

sqltitle

Author: Cole Francis, Architect

THE PROBLEM

I was recently tasked with restoring a 220GB SQL Server backup file using SQL Server 2014 Management Studio (SSMS), and the database server I was restoring the backup to was very limited on space.  So, we threw the SQL backup file on a UNC share with an abundance of space, and I conveniently mapped a drive to the UNC share on the database server.

THE SOLUTION

Unfortunately, when it came time to restore the SQL backup file in SSMS, I was unable to see the newly mapped drive in SSMS, even though I could plainly see it in a File Explorer window.  So, to get around this little problem, I ran the following SQL commands, and now the mapped drive shows up properly in SSMS:


-- Turn on the advanced options
exec sp_configure 'show advanced options', 1
go
reconfigure
go

-- Reconfigure the advanced options values and enable the command shell
exec sp_configure 'xp_cmdshell', 1
go
reconfigure
go

-- Force SSMS to display the mapped drive
exec xp_cmdshell 'net use Z: \\YourNetworkFolder\YourSubFolder\YourSubSubFolder YourPassword /user:YourDomainName\YourUserName'


 

Thanks for reading and keep on coding! 🙂