Lior.Net


Posts Tagged ‘Count words’

Count words Using Regular Expression

Saturday, June 28th, 2008

 I had to create a program that counts words in a string. The program need to count all words in the string except:



1. Html Code

2. Dividers

3. Extra spaces + new lines 



I used regular expression to filter the strings. For this purpose I created two functions:

RemoveExtraSpaces - Remove extra spaces from string. The function allow only one space between each word.

CountWords - The function return number of words in a string. It use the function RemoveExtraSpaces and use regular expression to remove HTML Code , New Lines and Dividers.

/// <summary>



/// This function remove Extra spaces , the Regular expression is  

    looking for white spaces that appears 2 times and more

/// </summary>

/// <param name=”s”>This is the string that we want to check</param>

/// <returns>Fixed String</returns>

private string RemoveExtraSpaces(string s)

{

            Regex FindExtraSpace = new Regex(“\\s{2,}”);

            return FindExtraSpace.Replace(s, ” “);

}

/// <summary>

/// This function return the number of words in a string that are

/// separated by space

/// </summary>

/// <param name=”strText”>The text that we want to check</param>

/// <returns>number of words</returns>

public int CountWords(string strText)

{

    string exp = “#;#”;

   // The expression look for Html and new lines and the divider that  

   // we define before

   Regex Match = new Regex(“<[^>]+>|” + exp + “|\r\n|\n”);

   // Replace the tags with an empty string so they are not

   // considered in count

   strText = Match.Replace(strText, “”);

   // Remove the extra Spaces

   strText = RemoveExtraSpaces(strText);

   // Count the words in the string by splitting them wherever a

   // space is found

   return strText.Split(‘ ‘).Length;

 }

Jajah is the VoIP player that brought you web-activated telephony.