

I used regular expression to filter the strings. For this purpose I created two functions:
RemoveExtraSpaces - Remove extra spaces from string. The function allow only one space between each word.
CountWords - The function return number of words in a string. It use the function RemoveExtraSpaces and use regular expression to remove HTML Code , New Lines and Dividers.
///
/// This function remove Extra spaces , the Regular expression is
looking for white spaces that appears 2 times and more
///
/// This is the string that we want to check
///
private string RemoveExtraSpaces(string s)
{
Regex FindExtraSpace = new Regex("\\s{2,}");
return FindExtraSpace.Replace(s, " ");
}
///
/// This function return the number of words in a string that are
/// separated by space
///
/// The text that we want to check
///
public int CountWords(string strText)
{
string exp = "#;#";
// The expression look for Html and new lines and the divider that
// we define before
Regex Match = new Regex("<[^>]+>|" + exp + "|\r\n|\n");
// Replace the tags with an empty string so they are not
// considered in count
strText = Match.Replace(strText, "");
// Remove the extra Spaces
strText = RemoveExtraSpaces(strText);
// Count the words in the string by splitting them wherever a
// space is found
return strText.Split(' ').Length;
}
Private void regExample()
{ string str = “Hello. My name is Inigo Montoya. You killed my father prepare to die.”;
string exp = @”\binigo\b”;
Regex ex1 = new Regex(exp,RegexOptions.IgnoreCase);
// /b in the end and in the begining means that the word is Inigo and not part as other words like Inigojbkbj
// Moreover the option IgnoreCase mean that there is no meaning to the letters case.
Console.WriteLine(ex1.IsMatch(str));
// Match can give you more data where this string is located
Match match = ex1.Match(str);
Console.WriteLine(“Found string ‘” + match.Captures[0].Value + “‘ in:” + match.Captures[0].Index);
Console.WriteLine(ex1.Replace(str, “David”));
// Let’s find Inigo and afterward somewhere in the string should appear the word father
Regex ex2 = new Regex(@”\bInigo\b.*\bfather\b”);
match = ex2.Match(str);
foreach (Capture c in match.Captures)
{
Console.WriteLine(“Found the pattern in:” + c.Value + ” in Index: “ + c.Index);
}
}
Special Characters that can help us search a pattern:
Character
|
Description
|
Example
|
/b
|
Matches at the position between a word character
|
/blior/b mean that the search will look for the word lior
|
| |
Causes the regex engine to match either the part on the left side, or the part on the right side
|
Abc | def – check for abc or def
|
. |
Match any single character
|
Ab. à
can be ab or abc , and and so on..
|
^ |
Matches at the start of the string
|
^ab – the string need to start with ab
|
[^]
|
Accept any character expect the characters in bracket
|
[^d-f] – expect all charcters d or e or f
|
$
|
Verify the end of the string
|
Def$ - check if the string end with def
|
* |
Repeats the previous item zero or more times
|
a* - a,aa,aaa,aaaa,aaaaaaa
|
+
|
Repeat the previous item one or more times |
a+ - aa,aaaa,aaaaaa
|
?
|
Zero or one of the preceding item
|
Dogs? -> dog, dogs
|
{n}
|
{n}
where n is an integer >= 1
|
B{2} = bb
|
{n,m}
|
{n,m}
where n >= 0 and m >= n
|
B{2,4} = bb,bbb,bbbb
|
{n,} |
Repeat the previous item at least n times
|
D{2} = dd, dddd and so on.
|
/d
|
Any number
|
0-9
|
/w
|
Any number or letter
|
1,2,3,a,b,d,g and so on.
|
/s
|
Match any whitespace character
|
|