Prefer UTF-8 in all layers

If an application displays text with strange, unexpected characters, the likely cause is an incorrect character encoding.

Character encodings control how tools translate raw bytes into text. The best default character encoding is likely UTF-8. It can represent characters in almost all languages, and in an efficient manner. So, it seems to make sense to adopt UTF-8 as an excellent default character encoding.

Simple ASCII encoding should usually be avoided, since it is so inflexible. As well, the default encoding used by the Servlet specification is ISO-8859-1, which is restricted to West European languages.

In a web application, character encodings are used in three separate areas - the browser, the server, and the database. To work together correctly, the same character encoding must be used in each of these areas (see this excellent article for further discussion).

Browser
The browser uses an encoding to present text, and to send request parameters to the server. The request parameter encoding will be the same as the page encoding, unless instructed otherwise. A JSP can instruct the browser on the desired encoding by using a page directive, such as :

<%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>

META tags may be used instead :

<meta http-equiv="Content-Type"  content="text/html; charset=UTF-8" >

As usual, such a policy should be defined in one place, if possible (for example, in a template page).

Server
In principle, the browser should respond to the server by including the character encoding it has already received from the server. In practice, however, browsers do not do a very good job at this. So, even though a JSP has indicated the character encoding, it is likely a good practice to "reset" the character encoding of the request, using, for example :

request.setCharacterEncoding("UTF-8");

Controller could perform this for every incoming request, perhaps using a value configured in web.xml. This method must be called early in processing, before any parameter values are retrieved.

For reference, the servlet API has these methods for managing character encoding :

Database
The database has a character encoding as well. Please consult your database documentation for further information.

Would you use this technique?
Yes   No   Undecided   
© 2010 Hirondelle Systems | Source Code | Contact | License | Quotes | RSS
Individual code snippets can be used under this BSD license - Last updated on June 5, 2010.
Over 150,000 unique IPs last month - Built with WEB4J.
- In Memoriam : Bill Dirani -