Shaun Mccran

My digital playground

23
O
C
T
2009

Pop quiz! Create an object of items and counts from a paragraph of text

I was digging around in some old code the other day, having a 'server' tidy up, and I came across a pair of code challenges that a company set for me a few years back. They were to try and gauge how you approach a problem, and see if you are at least familiar with the cfml code base. I quite like Ray Camden's Friday challenge idea, so in a blatant homage, I'm posing these two code challenges in the same way. This one this week, and a numeric one next week.

The challenge:

Take a paragraph of text and return a data object(whatever format you want) of the words in it, and their frequency. This is the example paragraph given.

view plain print about
1The old lady pulled her spectacles down and looked over them about the room;
2then she put them up and looked out under them. She seldom or never looked THROUGH
3them for so small a thing as a boy; they were her state pair, the pride of her heart,
4and were built for "style," not service -- she could have seen through a pair of
5stove-lids just as well.

The type of object and the method of producing it are entirely open. You an choose how to handle punctuation and text casing.

I've written a test form, and a solution myself, but it is always interesting to see how different minds approach the same problem.

I'll give it a while, and then post my solution here.

Update Here is a CFC that I put together to solve this. It strips out the punctuation, and creates a Structure of the words, and their count.

view plain print about
1<cfcomponent displayname="Parser" hint="Parses a set of text" output="false">
2
3    <cffunction name="parseText" hint="Parses a passed in section of text, returns a struct of values" access="public" output="true" returntype="Struct">
4        <cfargument name="rawString" required="true" hint="Text to parse">
5
6        <!--- list items to remove --->
7        <cfset var itemsToRemove = '-,;,",.'>
8        <cfset var parsedString = structNew()>

9        <cfset var part = "">

10
11        <!--- Clean the punctuation out of the arg --->
12        <cfset var cleanedString = ReplaceList(arguments.rawString, itemsToRemove, " ")>
13        <cfset cleanedString = lCase(Replace(cleanedString, ",", " ", "ALL"))>
14
15        <cfloop list="#cleanedString#" index="part" delimiters=" ">
16            <cfif NOT StructKeyExists(parsedString, "
#part#")>
17                <!--- Add to struct --->
18                <cfset parsedString[part] = 1>

19            <cfelse>
20                <!--- increment count, get it and add one --->
21                <cfset parsedString[part] = parsedString[part] + 1>
22            </cfif>
23        </cfloop>
24
25        <cfreturn parsedString />
26    </cffunction>
27
28</cfcomponent>

TweetBacks
Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
Shaun McCran's Gravatar I had a reply back from a colleague, he had a slightly different view on it than I did. If anything it is a little more succinct.

<cfsavecontent variable="variables.para">
   The old lady pulled her spectacles down and looked over them about the room;
   then she put them up and looked out under them. She seldom or never looked THROUGH
   them for so small a thing as a boy; they were her state pair, the pride of her heart,
   and were built for "style," not service -- she could have seen through a pair of
   stove-lids just as well.
</cfsavecontent>

<cfset variables.tokens = variables.para.split("\s")>
<cfset variables.freqs = {}>

<cfloop array="#variables.tokens#" index="variables.token">
   <cfset variables.token = rereplace(lcase(trim(variables.token)), "[^[:alnum:]]", "", "ALL")>
   <cfif len(variables.token)>
      <cfif structkeyexists(variables.freqs, variables.token)>
         <cfset variables.freqs[variables.token] = variables.freqs[variables.token] + 1>
      <cfelse>
         <cfset variables.freqs[variables.token] = 1>
      </cfif>
   </cfif>
</cfloop>

<cfdump var="#variables.freqs#"><cfabort>
# Posted By Shaun McCran | 27/10/2009 16:56
Kemsar Rooch's Gravatar The system of the triumph of the fm radio and all such elements are wonderful and needed. It is source of dispensation of knowledge with the tinge of http://www.ultimatespelling.com/enhancing-writing-.... Its scope and extent is magnificent and urgent.
# Posted By Kemsar Rooch | 15/12/2015 22:25
Back to top