|
Pop quiz! Create an object of items and counts from a paragraph of text |
I was digging around in some old code the other day, having a 'server' tidy up, and I came across a pair of code challenges that a company set for me a few years back. They were to try and gauge how you approach a problem, and see if you are at least familiar with the cfml code base. I quite like Ray Camden's Friday challenge idea, so in a blatant homage, I'm posing these two code challenges in the same way. This one this week, and a numeric one next week.
The challenge:
Take a paragraph of text and return a data object(whatever format you want) of the words in it, and their frequency. This is the example paragraph given.
2then she put them up and looked out under them. She seldom or never looked THROUGH
3them for so small a thing as a boy; they were her state pair, the pride of her heart,
4and were built for "style," not service -- she could have seen through a pair of
5stove-lids just as well.
The type of object and the method of producing it are entirely open. You an choose how to handle punctuation and text casing.
I've written a test form, and a solution myself, but it is always interesting to see how different minds approach the same problem.
I'll give it a while, and then post my solution here.
Update Here is a CFC that I put together to solve this. It strips out the punctuation, and creates a Structure of the words, and their count.
2
3 <cffunction name="parseText" hint="Parses a passed in section of text, returns a struct of values" access="public" output="true" returntype="Struct">
4 <cfargument name="rawString" required="true" hint="Text to parse">
5
6 <!--- list items to remove --->
7 <cfset var itemsToRemove = '-,;,",.'>
8 <cfset var parsedString = structNew()>
9 <cfset var part = "">
10
11 <!--- Clean the punctuation out of the arg --->
12 <cfset var cleanedString = ReplaceList(arguments.rawString, itemsToRemove, " ")>
13 <cfset cleanedString = lCase(Replace(cleanedString, ",", " ", "ALL"))>
14
15 <cfloop list="#cleanedString#" index="part" delimiters=" ">
16 <cfif NOT StructKeyExists(parsedString, "#part#")>
17 <!--- Add to struct --->
18 <cfset parsedString[part] = 1>
19 <cfelse>
20 <!--- increment count, get it and add one --->
21 <cfset parsedString[part] = parsedString[part] + 1>
22 </cfif>
23 </cfloop>
24
25 <cfreturn parsedString />
26 </cffunction>
27
28</cfcomponent>
<cfsavecontent variable="variables.para">
The old lady pulled her spectacles down and looked over them about the room;
then she put them up and looked out under them. She seldom or never looked THROUGH
them for so small a thing as a boy; they were her state pair, the pride of her heart,
and were built for "style," not service -- she could have seen through a pair of
stove-lids just as well.
</cfsavecontent>
<cfset variables.tokens = variables.para.split("\s")>
<cfset variables.freqs = {}>
<cfloop array="#variables.tokens#" index="variables.token">
<cfset variables.token = rereplace(lcase(trim(variables.token)), "[^[:alnum:]]", "", "ALL")>
<cfif len(variables.token)>
<cfif structkeyexists(variables.freqs, variables.token)>
<cfset variables.freqs[variables.token] = variables.freqs[variables.token] + 1>
<cfelse>
<cfset variables.freqs[variables.token] = 1>
</cfif>
</cfif>
</cfloop>
<cfdump var="#variables.freqs#"><cfabort>