preg expression regular expression details and introduction

preg expression regular expression details and introduction

Useful regex examples
Reference:

http://www.catswhocode.com/blog/15-php-regular-expressions-for-web-developers

REGULAR EXPRESSIONS SYNTAX

Regular Expression	Will match…
foo	The string “foo”
^foo	“foo” at the start of a string
foo$	“foo” at the end of a string
^foo$	“foo” when it is alone on a string
[abc]	a, b, or c
[a-z]	Any lowercase letter
[^A-Z]	Any character that is not a uppercase letter
(gif\|jpg)	Matches either “gif” or “jpg”
[a-z]+	One or more lowercase letters
[0-9.-]	Ðny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$	Any word of at least one letter, number or _
([wx])([yz])	wy, wz, xy, or xz
[^A-Za-z0-9]	Any symbol (not a number or a letter)
([A-Z]{3}\|[0-9]{4})	Matches three letters or four numbers

PHP REGULAR EXPRESSION FUNCTIONS

Function	Description
preg_match()	The preg_match() function searches string for pattern, returning true if pattern exists, and false otherwise.
preg_match_all()	The preg_match_all() function matches all occurrences of pattern in string.
preg_replace()	The preg_replace() function operates just like ereg_replace(), except that regular expressions can be used in the pattern and replacement input parameters.
preg_split()	The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.
preg_grep()	The preg_grep() function searches all elements of input_array, returning all elements matching the regexp pattern.
preg_ quote()	Quote regular expression characters

<?php
//Some regex functions:
//A better solution for validate email syntax is using filter_var.
if (filter_var('test+email@fexample.com', FILTER_VALIDATE_EMAIL)) {
echo "Your email is ok.";
} else {
echo "Wrong email address format.";
}
//Validate username, consist of alpha-numeric (a-z, A-Z, 0-9), underscores, and has minimum 5 character and maximum 20 character.
//You could change the minimum character and maximum character to any number you like.
$username = "user_name12";
if (preg_match('/^[a-z\d_]{5,20}$/i', $username)) {
echo "Your username is ok.";
} else {
echo "Wrong username format.";
}
//Validate domain
$url = "http://komunitasweb.com/";
if (preg_match('/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i', $url)) {
echo "Your url is ok.";
} else {
echo "Wrong url.";
}
//Extract domain name from certain URL
$url = "http://komunitasweb.com/index.html";
preg_match('@^(?:http://)?([^/]+)@i', $url, $matches);
$host = $matches[1];
echo $host;
//Highlight a word in the content
$text = "Sample sentence from KomunitasWeb, regex has become popular in web programming. Now we learn regex. According to wikipedia, Regular expressions (abbreviated as regex or regexp, with plural forms regexes, regexps, or regexen) are written in a formal language that can be interpreted by a regular expression processor";
$text = preg_replace("/\b(regex)\b/i", '<span style="background:#5fc9f6">\1</span>', $text);
echo $text;
?>

Get all image urls from an html document

$images = array();
preg_match_all('/(img|src)\=(\"|\')[^\"\'\>]+/i', $data, $media);
unset($data);
$data=preg_replace('/(img|src)(\"|\'|\=\"|\=\')(.*)/i',"$3",$media[0]);
foreach($data as $url)
{
 $info = pathinfo($url);
 if (isset($info['extension']))
 {
  if (($info['extension'] == 'jpg') || 
  ($info['extension'] == 'jpeg') || 
  ($info['extension'] == 'gif') || 
  ($info['extension'] == 'png'))
  array_push($images, $url);
 }
}

MATCHING A XML/HTML TAG

function get_tag( $tag, $xml ) {
  $tag = preg_quote($tag);
  preg_match_all('{<'.$tag.'[^>]*>(.*?)</'.$tag.'>.'}',
                   $xml,
                   $matches,
                   PREG_PATTERN_ORDER);

  return $matches[1];
}


FIND PAGE TITLE
$fp = fopen("http://www.catswhocode.com/blog","r"); 
while (!feof($fp) ){
    $page .= fgets($fp, 4096);
}

$titre = eregi("<title>(.*)</title>",$page,$regs); 
echo $regs[1];
fclose($fp);

CHECKING PASSWORD COMPLEXITY



'A(?=[-_a-zA-Z0-9]*?[A-Z])(?=[-_a-zA-Z0-9]*?[a-z])(?=[-_a-zA-Z0-9]*?[0-9])[-_a-zA-Z0-9]{6,}z'




http://www.regular-expressions.info/replacetutorial.html

http://www.noupe.com/php/php-regular-expressions.html (important)

Operator Description
^ The circumflex symbol marks the beginning of a pattern, although in some cases it can be omitted
$ Same as with the circumflex symbol, the dollar sign marks the end of a search pattern
. The period matches any single character
? It will match the preceding pattern zero or one times
+ It will match the preceding pattern one or more times
* It will match the preceding pattern zero or more times
| Boolean OR
- Matches a range of elements
() Groups a different pattern elements together
[] Matches any single character between the square brackets
{min, max} It is used to match exact character counts
\d Matches any single digit
\D Matches any single non digit caharcter
\w Matches any alpha numeric character including underscore (_)
\W Matches any non alpha numeric character excluding the underscore character
\s Matches whitespace character




Example Description
‘/hello/’ It will match the word hello
‘/^hello/’ It will match hello at the start of a string. Possible matches are hello or helloworld, but not worldhello
‘/hello$/’ It will match hello at the end of a string.
‘/he.o/’ It will match any character between he and o. Possible matches are helo or heyo, but not hello
‘/he?llo/’ It will match either llo or hello
‘/hello+/’ It will match hello on or more time. E.g. hello or hellohello
‘/he*llo/’ Matches llo, hello or hehello, but not hellooo
‘/hello|world/’ It will either match the word hello or world
‘/(A-Z)/’ Using it with the hyphen character, this pattern will match every uppercase character from A to Z. E.g. A, B, C…
‘/[abc]/’ It will match any single character a, b or c
‘/abc{1}/’ Matches precisely one c character after the characters ab. E.g. matches abc, but not abcc
‘/abc{1,}/’ Matches one or more c character after the characters ab. E.g. matches abc or abcc
‘/abc{2,4}/’ Matches between two and four c character after the characters ab. E.g. matches abcc, abccc or abcccc, but not abc



    preg_filter – performs a regular expression search and replace
    preg_grep – returns array entries that match a pattern
    preg_last_error – returns the error code of the last PCRE regex execution
    preg_match – perform a regular expression match
    preg_match_all – perform a global regular expression match
    preg_quote – quote regular expression characters
    preg_replace – perform a regular expression search and replace
    preg_replace_callback – perform a regular expression search and replace using a callback
    preg_split – split string by a regular expression


6. Useful Regex Functions

Here are a few PHP functions using regular expressions which you could use on a daily basis.

Validate e-mail. This function will validate a given e-mail address string to see if it has the correct form.

function validate_email($email_address)
{
    if( !preg_match("/^([a-zA-Z0-9])+([a-zA-Z0-9\._-])*@([a-zA-Z0-9_-])+
                     ([a-zA-Z0-9\._-]+)+$/", $email_address))
    {
        return false;
    } 
    return true;
}

Validate a URL

function validate_url($url)
{
    return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?
                      (/.*)?$|i', $url);
}

Remove repeated words. I often found repeated words in a text, such as this this. This handy function will remove such duplicate words.

function remove_duplicate_word($text)
{
    return preg_replace("/s(w+s)1/i", "$1", $text);
}

Validate alpha numeric, dashes, underscores and spaces

function validate_alpha($text)
{
    return preg_match("/^[A-Za-z0-9_- ]+$/", $text);
}

Validate US ZIP codes

function validate_zip($zip_code)
{
    return preg_match("/^([0-9]{5})(-[0-9]{4})?$/i",$zip_code); 
}

7. Regex Cheat Sheet

Because cheat sheets are cool nowadays, below you can find a PCRE cheat sheet that you can run through quickly anytime you forget something.
Meta Characters
  Description
^ Marks the start of a string
$ Marks the end of a string
. Matches any single character
| Boolean OR
() Group elements
[abc] Item in range (a,b or c)
[^abc] NOT in range (every character except a,b or c)
\s White-space character
a? Zero or one b characters. Equals to a{0,1}
a* Zero or more of a
a+ One or more of a
a{2} Exactly two of a
a{,5} Up to five of a
a{5,10} Between five to ten of a
\w Any alpha numeric character plus underscore. Equals to [A-Za-z0-9_]
\W Any non alpha numeric characters
\s Any white-space character
\S Any non white-space character
\d Any digits. Equals to [0-9]
\D Any non digits. Equals to [^0-9]
Pattern Modifiers
  Description
i Ignore case
m Multiline mode
S Extra analysis of pattern
u Pattern is treated as UTF-8


 
 
 

//simple preg_match
if(preg_match('/[^0-9A-Za-z]/',$test_string)) // this is the preg_match version. the /'s are now required.

just give space and "-" in your brackets [ ]
example

<?php
$name = "A- "; 
if(preg_match("/^[_a-zA-Z0-9- ]+$/", $name))
{
echo "hello";
}
?>


/^.*(?=.{4,})(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).*$/

Reads as:
        ^.* From Start, capture 0-many of any character
  (?=.{4,}) if there are at least 4 of anything following this
(?=.*[0-9]) if there is: 0-many of any, ending with an integer following
(?=.*[a-z]) if there is: 0-many of any, ending with a lowercase letter following
(?=.*[A-Z]) if there is: 0-many of any, ending with an uppercase letter following
        .*$ 0-many of anything preceding the End
  
  
  


if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
}



$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE(check number of matches and array return), 3(offset));
print_r($matches);


// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {


/* The \b in the pattern indicates a word boundary, so only the distinct
 * word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {



// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
    "http://www.php.net/index.html", $matches);
$host = $matches[1];

// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";



function is_ipv4($string)
{
    // The regular expression checks for any number between 0 and 255 beginning with a dot (repeated 3 times)
    // followed by another number between 0 and 255 at the end. The equivalent to an IPv4 address.
    return (bool) preg_match('/^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])'.
    '\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]?|[0-9])$/', $string);
}


another alternative to preg_match
stripos($string, 'you')




reading files from a dir without "." or ".."
<?php
$handle = opendir('content/pages/');
$pages = array();
while (false !== ($file = readdir($handle))) {
      $case=preg_match("/^[.]/",$file,$out, PREG_OFFSET_CAPTURE);
      //echo($case);
      if(!$case){
       echo("$file<br />");
       array_push($pages,$file);
       }
}
echo(count($pages));
?>





function removeHtmlTagsWithExceptions($html, $exceptions = null){
    if(is_array($exceptions) && !empty($exceptions))
    {
        foreach($exceptions as $exception)
        {
            $openTagPattern  = '/<(' . $exception . ')(\s.*?)?>/msi';
            $closeTagPattern = '/<\/(' . $exception . ')>/msi';

            $html = preg_replace(
                array($openTagPattern, $closeTagPattern),
                array('||l|\1\2|r||', '||l|/\1|r||'),
                $html
            );
        }
    }

    $html = preg_replace('/<.*?>/msi', '', $html);

    if(is_array($exceptions))
    {
        $html = str_replace('||l|', '<', $html);
        $html = str_replace('|r||', '>', $html);
    }

    return $html;
} 

// example:
print removeHtmlTagsWithExceptions(<<<EOF
<h1>Whatsup?!</h1>
Enjoy <span style="text-color:blue;">that</span> script<br />
<br />
EOF
, array('br'));
?>



//email address validator regular expression
$email_address = "phil.taylor@a_domain.tv";

    if (preg_match("/^[^@]*@[^@]*\.[^@]*$/", $email_address)) {
        return "E-mail address";       
    } 
 


//check vat number
function checkVatNumber( $country, $vat_number ) {
    switch($country) {
        case 'Austria':
            $regex = '/^(AT){0,1}U[0-9]{8}$/i';
            break;
        case 'Belgium':
            $regex = '/^(BE){0,1}[0]{0,1}[0-9]{9}$/i';
            break;
        case 'Bulgaria':
            $regex = '/^(BG){0,1}[0-9]{9,10}$/i';
            break;
        case 'Cyprus':
            $regex = '/^(CY){0,1}[0-9]{8}[A-Z]$/i';
            break;
        case 'Czech Republic':
            $regex = '/^(CZ){0,1}[0-9]{8,10}$/i';
            break;
        case 'Denmark':
            $regex = '/^(DK){0,1}([0-9]{2}[\ ]{0,1}){3}[0-9]{2}$/i';
            break;
        case 'Estonia':
        case 'Germany':
        case 'Greece':
        case 'Portugal':
            $regex = '/^(EE|EL|DE|PT){0,1}[0-9]{9}$/i';
            break;
        case 'France':
            $regex = '/^(FR){0,1}[0-9A-Z]{2}[\ ]{0,1}[0-9]{9}$/i';
            break;
        case 'Finland':
        case 'Hungary':
        case 'Luxembourg':
        case 'Malta':
        case 'Slovenia':
            $regex = '/^(FI|HU|LU|MT|SI){0,1}[0-9]{8}$/i';
            break;
        case 'Ireland':
            $regex = '/^(IE){0,1}[0-9][0-9A-Z\+\*][0-9]{5}[A-Z]$/i';
            break;
        case 'Italy':
        case 'Latvia':
            $regex = '/^(IT|LV){0,1}[0-9]{11}$/i';
            break;
        case 'Lithuania':
            $regex = '/^(LT){0,1}([0-9]{9}|[0-9]{12})$/i';
            break;
        case 'Netherlands':
            $regex = '/^(NL){0,1}[0-9]{9}B[0-9]{2}$/i';
            break;
        case 'Poland':
        case 'Slovakia':
            $regex = '/^(PL|SK){0,1}[0-9]{10}$/i';
            break;
        case 'Romania':
            $regex = '/^(RO){0,1}[0-9]{2,10}$/i';
            break;
        case 'Sweden':
            $regex = '/^(SE){0,1}[0-9]{12}$/i';
            break;
        case 'Spain':
            $regex = '/^(ES){0,1}([0-9A-Z][0-9]{7}[A-Z])|([A-Z][0-9]{7}[0-9A-Z])$/i';
            break;
        case 'United Kingdom':
            $regex = '/^(GB){0,1}([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2})|([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2}[\ ]{0,1}[0-9]{3})|((GD|HA)[0-9]{3})$/i';
            break;
        default:
            return -1;
            break;
    }
   
    return preg_match($regex, $vat_number);
} 



//phone match regex
I see a lot of people trying to put together phone regex's and struggling (hey, no worries...they're complicated). Here's one that we use that's pretty nifty. It's not perfect, but it should work for most non-idealists.

*** Note: Only matches U.S. phone numbers. ***

<?php

// all on one line...
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';

// or broken up
$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})'
        .'(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})'
        .'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';

?>

If you're wondering why all the non-capturing subpatterns (which look like this "(?:", it's so that we can do this:

<?php

$formatted = preg_replace($regex, '($1) $2-$3 ext. $4', $phoneNumber);

// or, provided you use the $matches argument in preg_match

$formatted = "($matches[1]) $matches[2]-$matches[3]";
if ($matches[4]) $formatted .= " $matches[4]";

?>

*** Results: ***
520-555-5542 :: MATCH
520.555.5542 :: MATCH
5205555542 :: MATCH
520 555 5542 :: MATCH
520) 555-5542 :: FAIL
(520 555-5542 :: FAIL
(520)555-5542 :: MATCH
(520) 555-5542 :: MATCH
(520) 555 5542 :: MATCH
520-555.5542 :: MATCH
520 555-0555 :: MATCH
(520)5555542 :: MATCH
520.555-4523 :: MATCH
19991114444 :: FAIL
19995554444 :: MATCH
514 555 1231 :: MATCH
1 555 555 5555 :: MATCH
1.555.555.5555 :: MATCH
1-555-555-5555 :: MATCH
520-555-5542 ext.123 :: MATCH
520.555.5542 EXT 123 :: MATCH
5205555542 Ext. 7712 :: MATCH
520 555 5542 ext 5 :: MATCH
520) 555-5542 :: FAIL
(520 555-5542 :: FAIL
(520)555-5542 ext .4 :: FAIL
(512) 555-1234 ext. 123 :: MATCH
1(555)555-5555 :: MATCH



Because making a truly correct email validation function is harder than one may think, consider using this one which comes with PHP through the filter_var function (http://www.php.net/manual/en/function.filter-var.php):

<?php
$email = "someone@domain .local";

if(!filter_var($email, FILTER_VALIDATE_EMAIL)) {
    echo "E-mail is not valid";
} else {
    echo "E-mail is valid";
}
?>



Some times a Hacker use a php file or shell as a image to hack your website. so if you try to use move_uploaded_file() function as in example to allow for users to upload files, you must check if this file contains a bad codes or not so we use this function. preg match

in this function we use

unlink() - http://php.net/unlink

after you upload file check a file with below function.

<?php

/**
 * A simple function to check file from bad codes.
 *
 * @param (string) $file - file path.
 * @author Yousef Ismaeil - Cliprz[at]gmail[dot]com.
 */
function is_clean_file ($file)
{
    if (file_exists($file))
    {
        $contents = file_get_contents($file);
    }
    else
    {
        exit($file." Not exists.");
    }

    if (preg_match('/(base64_|eval|system|shell_|exec|php_)/i',$contents))
    {
        return true;
    }
    else if (preg_match("#&\#x([0-9a-f]+);#i", $contents))
    {
        return true;
    }
    elseif (preg_match('#&\#([0-9]+);#i', $contents))
    {
        return true;
    }
    elseif (preg_match("#([a-z]*)=([\`\'\"]*)script:#iU", $contents))
    {
        return true;
    }
    elseif (preg_match("#([a-z]*)=([\`\'\"]*)javascript:#iU", $contents))
    {
        return true;
    }
    elseif (preg_match("#([a-z]*)=([\'\"]*)vbscript:#iU", $contents))
    {
        return true;
    }
    elseif (preg_match("#(<[^>]+)style=([\`\'\"]*).*expression\([^>]*>#iU", $contents))
    {
        return true;
    }
    elseif (preg_match("#(<[^>]+)style=([\`\'\"]*).*behaviour\([^>]*>#iU", $contents))
    {
        return true;
    }
    elseif (preg_match("#</*(applet|link|style|script|iframe|frame|frameset|html|body|title|div|p|form)[^>]*>#i", $contents))
    {
        return true;
    }
    else
    {
        return false;
    }
}
?>

Use

<?php
// If image contains a bad codes
$image   = "simpleimage.png";

if (is_clean_file($image))
{
    echo "Bad codes this is not image";
    unlink($image);
}
else
{
    echo "This is a real image.";
}
?>




 




//function to replace anchor with text
function replaceAnchorsWithText($data) {
    /**
     * Had to modify $regex so it could post to the site... so I broke it into 6 parts.
     */
    $regex  = '/(<a\s*'; // Start of anchor tag
    $regex .= '(.*?)\s*'; // Any attributes or spaces that may or may not exist
    $regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; // Grab the link
    $regex .= '\s*(.*?)\s*>\s*'; // Any attributes or spaces that may or may not exist before closing tag
    $regex .= '(?P<name>\S+)'; // Grab the name
    $regex .= '\s*<\/a>)/i'; // Any number of spaces between the closing anchor tag (case insensitive)
   
    if (is_array($data)) {
        // This is what will replace the link (modify to you liking)
        $data = "{$data['name']}({$data['link']})";
    }
    return preg_replace_callback($regex, 'replaceAnchorsWithText', $data);
}

$input  = 'Test 1: <a href="http: //php.net1">PHP.NET1</a>.<br />';
$input .= 'Test 2: <A name="test" HREF=\'HTTP: //PHP.NET2\' target="_blank">PHP.NET2</A>.<BR />';
$input .= 'Test 3: <a hRef=http: //php.net3>php.net3</a><br />';
$input .= 'This last line had nothing to do with any of this';

echo replaceAnchorsWithText($input).'<hr/>';
?>
Will output:
Test 1: PHP.NET1(http: //php.net1).
Test 2: PHP.NET2(HTTP: //PHP.NET2).
Test 3: php.net3 (is still an anchor)
This last line had nothing to do with any of this


//a site for function online

http://www.functions-online.com/preg_match.html

Common or rare errors in php, Drupal, nodejs, magento

Search This Blog

preg expression regular expression details and introduction

REGULAR EXPRESSIONS SYNTAX

PHP REGULAR EXPRESSION FUNCTIONS

FIND PAGE TITLE

CHECKING PASSWORD COMPLEXITY

Comments

Post a Comment

Popular posts from this blog

Error: ios-deploy was not found. Please download, build and install version 1.9.0 or greater from https://github.com/phonegap/ios-deploy into your path, or do 'npm install -g ios-deploy' solution this solved the issue sudo npm install -g ios-deploy -unsafe-perm

How to upload file in MEAN Stack

Send push motification to iphone and android code