Fortify PHP Webapps against Cross-Site-Scripting

I’ve been working since 2007 to distill a simple method to prevent web applications to be threatened by insecure user input. Here I mean „Cross-Site-Scripting“ in special. After reading some books of well known security specialist, one thing seems sure: using „white list“-filtering is the only way to have a solid and durable solution.

It was here (german), where I first found the code to be the frame for next development. It was this piece of Perl code:

<br />
sub saveHTML ($) {<br />
    
my $str $_;<br />
    
$str =~ s/([^a-zA-Z0-9_.-])/sprintf(&#8218;&#038;#x%04x;&#8216;,ord($1))/ge;<br />
    
return($str);<br />
}<
br />

For every character which is not an ASCII literal or number or some punctuation, convert it into a hexadecimal Unicode entity. Obviously, there is no support for multi-byte characters, so Chinese characters will not be converted correctly. I took this functionality over to PHP and added multi-byte support. The result is a function made to convert every singe- or multi-byte character (excepting some few from the white list) to hexadecimal Unicode characters. Fortunately, all major browser can handle this output even in input fields and forms. So there is no need to care about what happens with the converted input later on. Just use it as if it is plain text. Now let’s take a look at the bunch of code for PHP:

<br />
<?
php
function uniord__($c) {
    
$h ord($c{0});
    if (
$h <= 0x7F) {
        return 
$h;
    } else if (
$h 0xC2) {
        return 
false;
    } else if (
$h <= 0xDF) {
        return (
$h &#038; 0x1F) << 6 | (ord($c{1}) &#038; 0x3F);
    
} else if ($h <= 0xEF) {
        return (
$h &#038; 0x0F) << 12 | (ord($c{1}) &#038; 0x3F) << 6
                                 
| (ord($c{2}) &#038; 0x3F);
    
} else if ($h <= 0xF4) {
        return (
$h &#038; 0x0F) << 18 | (ord($c{1}) &#038; 0x3F) << 12
                                 
| (ord($c{2}) &#038; 0x3F) << 6
                                 
| (ord($c{3}) &#038; 0x3F);
    
} else {
        return 
false;
    }
}

/**
 * Secures input string against XSS-attacks.
 * Return value can be send to browser securely.
 * supports single &#038; multi byte UTF-8
 */
function SEQ_OUTPUT($string_ '') {
    
$string mb_convert_encoding($string_"UTF-8""7bit, UTF-7, UTF-8, UTF-16, ISO-8859-1, ASCII");

    
$output '';

    for (
$i 0$i mb_strlen($string); $i++)  {
        if (
preg_match('/(&#91;a-zA-Z0-9_.-&#93;)/'$string&#91;$i&#93;)) {
            
$output .= $string&#91;$i&#93;;
            
continue;
        }
        
$byte ord($string&#91;$i&#93;);
        
if ($byte <= 127)  {
            
$length 1;
            
$output .= sprintf("&#038;#x%04s;"dechex(uniord__(mb_substr($string$i$length))));
        } else if (
$byte >= 194 &#038;&#038; $byte <= 223)  {
            
$length 2;
            
$output .= sprintf("&#038;#x%04s;"dechex(uniord__(mb_substr($string$i$length))));
        } else if (
$byte >= 224 &#038;&#038; $byte <= 239)  {
            
$length 3;
            
$output .= sprintf("&#038;#x%04s;"dechex(uniord__(mb_substr($string$i$length))));
        } else if (
$byte >= 240 &#038;&#038; $byte <= 244)  {
            
$length 4;
            
$output .= sprintf("&#038;#x%04s;"dechex(uniord__(mb_substr($string$i$length))));
        }
    }

    return 
$output;
}
?><br />

Use it as it is, include it in your code and use it like this:

<br />
echo 
SEQ_OUTPUT(&#8218;プライバシー <script>alert(3)</script>&#8218;);<br />

Alternatively you may use the whole security library SSEQ-LIB which includes this functionality too with even more security aspects being covered. After including SSEQ-LIB into your code, the usage is as shown above. So let’s see how it actually works:

You may not be able to see Chinese characters, if your operating system does not support it. Visit www.google.jp to make sure it works for you.

Das könnte dich auch interessieren …

2 Antworten

  1. 11. Juni 2009

    […] Für die Sicherheitsbibliothek SSEQ_LIB wurde die Funktion nach PHP portiert.UPDATE 11.06.2009: Die neue Version der Funktion SEQ_OUTPUT kann auch multi-byte Zeichen korrekt kodieren. Ein Nachpflegen von Umlaute oder Sonderzeichen […]

  2. 6. Oktober 2009

    […] in einem String zählen Hallo, ich benutze zur Sicherheit die Funktion von Erich Kachel (Fortify PHP Webapps against Cross-Site-Scripting | PHP Application and Website Defense, die jedes Zeichen (mit Ausnahme von a-z A-Z und 0-9) in Unicode wandelt. Außer der […]