Minimize
 Old Town IT Blog
Dec 9

Written by: Don Worthley
12/9/2008 3:47 PM 

The latest release of the DotNetNuke blog module contains a bug that affects sites using Unicode characters in Page names.  This post contains a couple of workarounds that can be implemented temporarily until the fix is made available in the upcoming 4.0 release of the module.

Disable the SEO Friendly URLs Feature

The first workaround will only be available when the 03.05.01 release is made public.  As of 12/9/2008, this release is not available for public download.

In this release, you can simply uncheck the SEO Friendly URLs option in Module Options.

Update the Permalink column in the Blog_Entries Table

That won't work for some, so another workaround is to replace unicode characters in the URL with their UTF-8 equivalent using a script such as the following:

UPDATE {databaseOwner}{objectQualifier}Blog_Entries SET PermaLink = REPLACE(PermaLink, 'ñ', '%c3%b1') 
 
This would replace the ñ with %c3%b1, which is the UTF-8 version. Since most browsers can't handle Unicode characters in the URL, they convert them to UTF-8 when sending the request to the server and this is something we didn't anticipate when we wrote the 301 redirect code for the new SEO friendly URL format. You can find a listing of common Unicode characters and their UTF-8 equivalent here:

http://www.natalihr.nl/html/encoding.html

This shouldn't have an impact on the SEO for your site since the blog module will always use the 301 redirect to point the user to the version of the URL currently stored in the Permalink field. This means that when the next version of the blog module is released with a fix for this issue and your URLs go back to the format where there is a /Español/ in the URL, search engines still looking for /Espa%c3%b1ol/ will be 301ed (new verb) to the new location, thereby ensuring that your page rank won't be affected.

Create a Temporary Trigger to Automate This Process

Finally, since this last solution will require running the update script after each new entry is added to the blog, I've included some code to create a trigger to automate this process.  The following trigger would need to be modified to include the specific Unicode characters included in URLs in your site.  Here's an example of a trigger designed to replace the ñ:

CREATE TRIGGER {databaseOwner}{objectQualifier}Blog_Entries_Temp_Update_Trigger 
ON {databaseOwner}{objectQualifier}Blog_Entries 
AFTER UPDATE 
AS 
BEGIN 
  -- SET NOCOUNT ON added to prevent extra result sets from 
  -- interfering with SELECT statements. 
  SET NOCOUNT ON; 
  UPDATE {databaseOwner}{objectQualifier}Blog_Entries 
  SET PermaLink = REPLACE(PermaLink, 'ñ', '%c3%b1') 
END 
GO

If you're using SQL 2000, simply change the AFTER UPDATE to FOR UPDATE. Also, if you have multiple characters that need to be replaced, just include multiple update statements for each special character that needs to be replaced.

To create this trigger, simply log in with a host level account and execute your SQL from the Host | SQL page.  If you're not familiar with SQL, you should backup your database before executing this SQL.

To remove the trigger once the update comes out, simply execute the following:

DROP TRIGGER {databaseOwner}{objectQualifier}Blog_Entries_Temp_Update_Trigger

HTH,

Don

[Update]

After a great conversation last night with Néstor Sánchez, we discovered some interesting things about UTF-8 encoding of URLs.  While I think we came up with more questions than answers, it's interesting to know what we found.  And my guess is that it may help others who are facing similar issues with UTF-8 encoding anomalies.

In short, we found that UTF-8 encoding of URLs seems to be handled differently in different environments.  While more testing is required, it appears that the regional settings may have an effect on how URLs are handled by both the client and the web server.

Using Fiddler, we captured some traffic between the client and the web server and this is what we found:

My Laptop

image

The interesting thing is that the URL that I clicked on has this format in the database:

http://home.itcrossing.com/Home/Español/tabid/72/EntryId/37/Writing-in-Espanol.aspx

I was aware that the ñ in the TabName would be UTF-8 encoded by the browser.  Most browsers do this since Unicode characters aren't fully supported in URLs yet (Although that's changing).  What's puzzling to me, though is that on Néstor's two machines, the traffic looked like this:

image

The key difference being the middle line.  We tried everything we could think of to duplicate this on my machine.  We changed the culture to es-BO in web.config as well as the uiCulture, which we set to es.  We changed the regional settings of the computer as well as the keyboard and language settings.  Some of the differences we noted were as follows:

  • I'm running IIS 6 and Néstor was running IIS 7
  • I didn't reboot my machine after making changes to the regional settings.  Perhaps that was needed.

Key Takeaway

Here's what I take away from this lesson.  URLs may or may not be UTF-8 encoded when they are received by the web server, so string comparisons that involve the URL should assume that the URL may arrive at the server UTF-8 encoded or not.  In most cases, it will contain UTF-8 encoded strings.  To perform a safe comparison in .NET you'll need to use the UrlDecode method found in System.Web.HttpUtility.

By the way, I had a hard time figuring out what the %c3%83%c2%b1 was in the third URL.  This is due to double UTF-8 encoding.  %c3%83 is the UTF-8 encoded format for %c3 and %c2%b1 is the UTF-8 encoded format for %b1.

Tags:

5 comment(s) so far...

Don,
I am not sure, that your approach is feasible for all users in all countries. AFAIK in China, Russia, Arab or Israel they would need to add a replacement for any character, which is simply not practicable. there is a need for a translation option, generating somewhat readable strings from those characters.

By Sebastian Leupold on   12/15/2008 2:15 PM

Hi Sebastian,

Thanks for your post! Unfortunately, the SEO friendly URLs won't work for these languages anyway. This is something we'll have to address in a future version of the blog module. A solution being proposed, which I think will work great, is to support slugs so that users can create their own SEO friendly URL (at least the title part of the URL).

By SuperUser Account on   12/15/2008 4:28 PM

I assume that the trigger will be considered so future updates don't fail.

By Néstor Sánchez on   12/16/2008 6:36 PM

The trigger should be safe. Also, here's the update to the source code in case anyone's interested in building. Apply to around line 183 in ViewEntry.ascx.vb

If (showSeoFriendly And Not requestedUrl Is Nothing And _
(Not requestedUrl.ToLower().EndsWith(correctUrl.ToLower()) _
And Not System.Web.HttpUtility.UrlDecode(requestedUrl.ToLower()).EndsWith(correctUrl.ToLower()))) Then

By Don Worthley on   12/19/2008 1:36 AM

The last option, changing the code and building worked for me!

By PM on   3/5/2009 1:22 PM

Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Comment:
Add Comment   Cancel 
  
 

Old Town IT LLC | 703.838.2039  | info@oldtownit.com Register   |  Login

Copyright Old Town IT LLC | Privacy Policy: Old Town IT will never share your information with anyone without your permission.