HTML Special Characters
HTML special characters are characters outside of the core Latin alphabet that can be represented in HTML, such as <, >, and ©. They have special meaning within HTML and are rendered as a particular symbol, such as an arrow or accent on a letter. Each special character is known as an entity, and each entity has a name and a number.
Special characters are divided into a number of different categories. These categories are ASCII characters, ISO-8859-1 characters, ISO-8859-1 symbols, mathematical symbols, Greek characters, and miscellaneous character entities.
The way special characters work
HTML special characters are assigned an entity name and an entity number, both of which can be used to render the character in an HTML document. These codes and names have a specific format, which is generally represented as
&#xxxx; for numbers and
&xxxx; for names, where xxxx is either a name or a number. Numbers are valid in both decimal and hexadecimal format.
For example, the code for a right-facing arrow, also known as the 'greater than' sign, is
> for the entity number and
> for the entity name. Inserting the code for either of those into an HTML document will render a '>' when displayed to the user.
HTML special characters may be entered directly into an HTML document, but their correct display and interpretation are not guaranteed and will depend on the character encoding used by the HTML document. This is commonly done with keyboards that feature letters with accents, for example. It is much easier and quicker for a user to press a button on their keyboard to add a letter with an accent than to use the correct entity code. This is the same for arrow characters and other common sign characters, like ampersand and the @ sign.
It is not advisable to enter special characters directly into an HTML document, for reasons that will be listed below. If you do, it is important that the correct character encoding format is identified. Attempts will be made to interpret a document with no specified encoding format, but it is always best to be explicit and specify the correct encoding within the document itself. Just because a document displays correctly without an encoding format specified does not mean it will display correctly for all users, and this can adversely impact SEO, as will be mentioned below.
Common HTML special characters
|Character||Number code||Entity code|
These are just examples, the full list of special characters can be found at w3schools.com, for example.
Benefits and drawbacks
HTML special characters were introduced with HTML 4.0 in December 1997. They provided a standardized way of representing characters outside the commonly used ASCII encoding. In particular, they were useful for providing accent characters to a lot of European languages that use a slightly modified version of the Latin alphabet.
Another benefit of HTML special characters is their use within code. HTML relies heavily on both the less than and greater than arrow characters as part of the markup language. To avoid having these signs interpreted as HTML, special characters can be used to ensure they display correctly to the end-user.
While it is possible to use escape characters to delimit which arrows are code and which are part of the document, they can easily clutter up code. To avoid confusion and rendering errors, users can use an HTML entity code to represent a letter they wish to display as part of the document and not part of the markup language.
A problem with HTML special characters is that they are still limited and mainly stick to representing characters within the Greek and Latin alphabets, plus additional accents for languages using those alphabets. Languages with completely different writing systems, such as Chinese, cannot be represented using HTML special characters. As such, Unicode is generally preferred when encoding special characters and signs, as it not only represents all of the accents, arrows, and letters found in HTML special characters but also includes significantly more.
Another drawback is that heavy usage of entity names can bloat a document considerably. This is not a problem with sparing usage, but for languages with lots of letters with accents or signs, the extensive use of HTML special characters can significantly increase the overall size of a document.
Importance of HTML special characters for search engine optimization
HTML special characters are important for SEO because they allow for both the correct display of characters to the user and the correct interpretation of them by search engines. Using non-standard character encodings for letters and accents can result in them being incorrectly interpreted, or not at all, by search engines and result in making content harder to index and be found.
Search engines are generally optimized for Unicode, so it is recommended to use Unicode encoding where possible. This is because all the signs, accents, and characters supported by HTML special characters are also supported by Unicode, in addition to many more. Unicode has become the industry standard and, therefore, the most widely supported character encoding format.
Nevertheless, HTML special characters can still be used in an SEO-friendly way. The most important thing to do is specify the character encoding format to ensure the correct display and interpretation of characters for search results. It is recommended to do this within either the Content-type header of a document or by specifying it within an HTML meta tag in the header of the document.