The latest release of the DotNetNuke blog module contains a bug that affects sites using Unicode characters in Page names. This post contains a couple of workarounds that can be implemented temporarily until the fix is made available in the upcoming 4.0 release of the module.
Disable the SEO Friendly URLs Feature
The first workaround will only be available when the 03.05.01 release is made public. As of 12/9/2008, this release is not available for public download.
In this release, you can simply uncheck the SEO Friendly URLs option in Module Options.
Update the Permalink column in the Blog_Entries Table
That won't work for some, so another workaround is to replace unicode characters in the URL with their UTF-8 equivalent using a script such as the following:
UPDATE {databaseOwner}{objectQualifier}Blog_Entries SET PermaLink = REPLACE(PermaLink, 'ñ', '%c3%b1')
This would replace the ñ with %c3%b1, which is the UTF-8 version. Since most browsers can't handle Unicode characters in the URL, they convert them to UTF-8 when sending the request to the server and this is something we didn't anticipate when we wrote the 301 redirect code for the new SEO friendly URL format. You can find a listing of common Unicode characters and their UTF-8 equivalent here:
http://www.natalihr.nl/html/encoding.html
This shouldn't have an impact on the SEO for your site since the blog module will always use the 301 redirect to point the user to the version of the URL currently stored in the Permalink field. This means that when the next version of the blog module is released with a fix for this issue and your URLs go back to the format where there is a /Español/ in the URL, search engines still looking for /Espa%c3%b1ol/ will be 301ed (new verb) to the new location, thereby ensuring that your page rank won't be affected.
Create a Temporary Trigger to Automate This Process
Finally, since this last solution will require running the update script after each new entry is added to the blog, I've included some code to create a trigger to automate this process. The following trigger would need to be modified to include the specific Unicode characters included in URLs in your site. Here's an example of a trigger designed to replace the ñ:
CREATE TRIGGER {databaseOwner}{objectQualifier}Blog_Entries_Temp_Update_Trigger ON {databaseOwner}{objectQualifier}Blog_Entries AFTER UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
UPDATE {databaseOwner}{objectQualifier}Blog_Entries SET PermaLink = REPLACE(PermaLink, 'ñ', '%c3%b1')
END
GO
If you're using SQL 2000, simply change the AFTER UPDATE to FOR UPDATE. Also, if you have multiple characters that need to be replaced, just include multiple update statements for each special character that needs to be replaced.
To create this trigger, simply log in with a host level account and execute your SQL from the Host | SQL page. If you're not familiar with SQL, you should backup your database before executing this SQL.
To remove the trigger once the update comes out, simply execute the following:
DROP TRIGGER {databaseOwner}{objectQualifier}Blog_Entries_Temp_Update_Trigger
HTH,
Don
[Update]
After a great conversation last night with Néstor Sánchez, we discovered some interesting things about UTF-8 encoding of URLs. While I think we came up with more questions than answers, it's interesting to know what we found. And my guess is that it may help others who are facing similar issues with UTF-8 encoding anomalies.
In short, we found that UTF-8 encoding of URLs seems to be handled differently in different environments. While more testing is required, it appears that the regional settings may have an effect on how URLs are handled by both the client and the web server.
Using Fiddler, we captured some traffic between the client and the web server and this is what we found:
My Laptop
The interesting thing is that the URL that I clicked on has this format in the database:
http://home.itcrossing.com/Home/Español/tabid/72/EntryId/37/Writing-in-Espanol.aspx
I was aware that the ñ in the TabName would be UTF-8 encoded by the browser. Most browsers do this since Unicode characters aren't fully supported in URLs yet (Although that's changing). What's puzzling to me, though is that on Néstor's two machines, the traffic looked like this:

The key difference being the middle line. We tried everything we could think of to duplicate this on my machine. We changed the culture to es-BO in web.config as well as the uiCulture, which we set to es. We changed the regional settings of the computer as well as the keyboard and language settings. Some of the differences we noted were as follows:
- I'm running IIS 6 and Néstor was running IIS 7
- I didn't reboot my machine after making changes to the regional settings. Perhaps that was needed.
Key Takeaway
Here's what I take away from this lesson. URLs may or may not be UTF-8 encoded when they are received by the web server, so string comparisons that involve the URL should assume that the URL may arrive at the server UTF-8 encoded or not. In most cases, it will contain UTF-8 encoded strings. To perform a safe comparison in .NET you'll need to use the UrlDecode method found in System.Web.HttpUtility.
By the way, I had a hard time figuring out what the %c3%83%c2%b1 was in the third URL. This is due to double UTF-8 encoding. %c3%83 is the UTF-8 encoded format for %c3 and %c2%b1 is the UTF-8 encoded format for %b1.