Archive for January, 2018

CreateImagefromPDF

By: Cole Francis, Senior Architect at The PSC Group, LLC.

Let’s say you’re working on a hypothetical project, and you run across a requirement for creating an image from the first page of a client-provided PDF document.  Let’s say the PDF document is named MyPDF.pdf, and your client wants you to produce a .PNG image output file named MyPDF.png.

Furthermore, the client states that you absolutely cannot read the contents of the PDF file, and you’ll only know if you’re successful if you can read the output that your code generates inside the image file.  So, that’s it, those are the only requirements.   What do you do?

SOLUTION

Thankfully, there are a number of solutions to address this problem, and I’m going to use a lesser known .NET NuGet package to handle this problem.  Why?  Well, for one I want to demonstrate what an easy problem this is to solve.  So, I’ll start off by searching in the .NET NuGet Package Manager Library for something describing what I want to do.  Voila, I run across a lesser known package named “Pdf2Png”.  I install it in less than 5 seconds.

Pdf2Png.png

So, is the Pdf2Png package thread-safe and server-side compliant?  I don’t know, but I’m not concerned about it because it wasn’t listed as a functional requirement.  So, this is something that will show up as an assumption in the Statement-of-Work document and will be quickly addressed if my assumption is incorrect.

Next, I create a very simple console application, although this could be just about any .NET file type, as long as it has rights to the file system.  The process to create the console application takes me another 10 seconds.

Next, I drop in the following three lines of code and execute the application, taking another 5 secondsThis would actually be one line of code if I was passing in the source and target file locations and names.

 string pdf_filename = @"c:\cole\PdfToPng\MyPDF.pdf";
 string png_filename = @"c:\cole\PdfToPng\MyPDF.png";
 List errors = cs_pdf_to_image.Pdf2Image.Convert(pdf_filename, png_filename);

Although my work isn’t overwhelmingly complex, the output is extraordinary for a mere 20 seconds worth of work!  Alas, I have not one, but two files in my source folder.  One’s my source PDF document, and the other one’s the image that was produced from my console application using the Pdf2Png package.

TwoFiles.png

Finally, when I open the .PNG image file, it reveals the mysterious content that was originally inside the source PDF document:

SomeThingsArentHard.png

Before I end, I have to mention that the Pdf2Png component is not only simple, but it’s also somewhat sophisticated.  The library is a subset of Mark Redman’s work on PDFConvert using Ghostscript gsdll32.dll, and it automatically makes the Ghostscript gsdll32 accessible on a client machine that may not have it physically installed.

Thanks for reading, and keep on coding!  🙂

Advertisements

AngularJS SPA

By:  Cole Francis, Senior Solution Architect at The PSC Group, LLC.

PROBLEM

There’s a familiar theme running around on the Internet right now about certain problems associated with generating SEO-friendly Sitemaps for SPA-based AngularJS web applications.  They often have two funamental issues associated with their poor architectural design:

  1. There’s usually a nasty hashtag (#) or hashbang (#!) buried in the middle of the URL route, which the website ultimately relies upon for parsing purposes in order to construct the real URL route (e.g. https://www.myInheritedWebApp.com/stuff/#/items/2
  2. Because of the embedded hashtag or hashbang, the URL’s are dynamically constructed and don’t actually point to content without parsing the hashtag (or hashbang) operator first.  The underlying problem is that a Sitemap.xml document can’t be auto-generated for SEO indexing.

I realize that some people might be offended by my comment about “poor achitectural design”.  I state this loosely, because it’s really just the nature of the beast.  Why?  Because it’s really easy to get started with AngularJS, and many Software Developers simply start laying down code that’s initially decent, but at some point they start implementing hacks because of added complexity to the original functional requirements.  That’s where they begin to get themselves in trouble very creative. 🙂

If you think I’m kidding, then just try Googling the following keywords and you’ll see exactly what I mean:  AngularJS, hash, hashbang, SEO, Sitemap, problem.

SOLUTION

So, the first step is to remove the hashtag (#) or the hashbang (#!).  I know it sucks, and it’s going to require some work, but let me be clear.  Do it!  For one, generating the Sitemap will be much easier, because you won’t need to parse on a hashtag (or hashbang) to get the real URL.  Secondly, all the remediation work you do will be a reminder the next time you think about taking shortcuts.

Regardless, after correcting the hashtag problem, you still have another issue.  Your website is still an AngularJS SPA-based website, which means that all its content is dynamically generated and injected through JavaScript AJAX calls.

Given this, how will you ever be able to generate a Sitemap containing all your content (e.g. products, catalogs, people, etc…)? Even more concerning, how will people find your people or products when searching on Google?

Luckily, the answer is very simple.  Here’s a little gem that I recently ran across while trying to generate a Sitemap.xml document on an AngularJS SPA architected website, and it works like a charm:  http://botmap.io/

I literally copied the script on the BotMap website to the bottom of my shared\_Layout.cshtml file, just above the closing tag.  This gives BotMap permission to crawl your website.  After doing this, push your website to Production, then point the BotMap website to your publicly-facing URL, and finally click the button on their website to initiate the crawl.  One and done!

BotMap begins to crawl and catalog your website as if it was a real person browsing it. It doesn’t use CURL or xHttp requests to determine what to catalog. The BotMap crawler actually executes the JavaScript, which is how it ultimately learns about all of the content on your website that it will use to construct the Sitemap.  

This is why it’s so great for websites created using AngularJS or other JavaScript frameworks where content is injected inside the JavaScript code itself.  Congratulations, {{vm.youreDone}}!

Thanks for reading, and keep on coding!  🙂