INCLUDE_DATA

Lior.Net


Count words Using Regular Expression

 

I had to create a program that counts words in a string. The program need to count all words in the string except:

 

1. Html Code

2. Dividers

3. Extra spaces + new lines 

I used regular expression to filter the strings. For this purpose I created two functions:

RemoveExtraSpaces – Remove extra spaces from string. The function allow only one space between each word.

CountWords – The function return number of words in a string. It use the function RemoveExtraSpaces and use regular expression to remove HTML Code , New Lines and Dividers.

///

<summary>

 

///

This function remove Extra spaces , the Regular expression is  

 

    looking for white spaces that appears 2 times and more

 

///

</summary>

 

/// <param name=”s”>This is the string that we want to check

</param>

 

/// <returns>Fixed String</returns>

 

private

string RemoveExtraSpaces(string s)

 

{

 

            Regex FindExtraSpace = new Regex(“\\s{2,}”);

 

            return FindExtraSpace.Replace(s, ” “);

 

}

///

<summary>

 

///

This function return the number of words in a string that are

 

///

separated by space

 

/// </summary>

 

/// <param name=”strText”>The text that we want to check

</param>

 

/// <returns>number of words

</returns>

 

public

int CountWords(string strText)

 

{

 

    string exp = “#;#”;

 

   // The expression look for Html and new lines and the divider that  

 

   // we define before

 

   Regex Match = new Regex(“<[^>]+>|” + exp + “|\r\n|\n”);

 

   // Replace the tags with an empty string so they are not

 

   // considered in count

 

   strText = Match.Replace(strText, “”);

 

   // Remove the extra Spaces

 

   strText = RemoveExtraSpaces(strText);

 

   // Count the words in the string by splitting them wherever a

 

   // space is found

 

   return strText.Split(‘ ‘).Length;

 

 }

Tags: , , ,

Comments are closed.

Jajah is the VoIP player that brought you web-activated telephony.