Avoid problems with special characters in mysql, PHP and HTML with the UTF-8 character set in 6 steps

I had previously always problems with German umlauts in my home projects.

Now I’ve found a way so that there are no more problems with special characters:

1. UTF-8 encoding of all files

All files must be stored in the character set UTF-8 without BOM.

2. UTF-8 encoding in the HTML Document

In the header, right after <HEAD> set the following meta statement:

<meta charset="utf-8">

3. Set Databases to UTF-8

When you create a database, select the collation utf8_unicode_ci. For existing tables, it can be done with phpmyadmin. If columns are of a different type, they must be also changed, under Structural->Edit.

How to automatically convert all subsequent tables and columns on Unicode, see: Convert all MySQL tables and columns to UTF8 and InnoDB.

4. Set Database connection properly

If a database connection is established, then the connection to UTF-8 must be set.


$mysqli = new mysqli ( $host, $user, $pass, $database );
$mysqli->query ( "SET NAMES 'utf8'" );
$mysqli->query ( "SET CHARACTER SET 'utf8'" );

The “set_charset” is very important, otherwise the “real_escape_string” function cannot protect you from sql-injections. See http://php.net/manual/en/mysqli.real-escape-string.php

5. Send UTF-8 Header with PHP

if (!headers_sent()) {
    header('Content-type: text/html; charset=utf-8');

6. Exceptions

According to Selfhtml.org (German language) there are special cases for characters that are need to be masked. The “less than <” characters must mask with the Unicode &#60;, because it is otherwise seen as a start tag and a leading “Ampersand &” with &#38; because otherwise it will be interpreted as the beginning of a character reference. A single & doesn’t need to be masked.

(7. Web server configuration)

If all the steps do not help, then it can only be to the configuration from the Web server, PHP or mySQL. A new installed Apache web server with PHP and mySQL database should not cause any problems. A good description of what settings to check and set if necessary, can be found here: http://noqqe.de/blog/2011/02/24/charset-utf8-fur-apache-mysql-debian-und-wordpress/ (German language). Provided of course you have access to the configuration files, which for a normal hosting a small side not usually the case. Possibly you can make in the administration interface of the website provider settings or it is possible to influence the character encoding via htaccess files, see here: http://www.w3.org/International/questions/qa-htaccess-charset.de (German language).