They say you learn something new every day.

Couting Words (27/02/2012)

So, as I said, proof reading is hard. Really, to misquote Douglas Adams, hard. You just won’t believe how vastly, hugely, mind-bogglingly hard it is. I mean, you may think doing maths is hard, but that’s nothing compared to proofreading.

One of the things I find difficult to spot is duplicate words. Being a techie sort, I decided to code myself out of it, so I wrote a little internet app: the repeated word finder.

Basically it searches for cases of the same word being used in close proximity and highlights them. Obviously, there are lots of legitimate uses for repeated words (like both the ones in the illustration), and I know that you can never code better writing, but it helps you see the errors. My hope is that by highlighting these things  it’ll help me spot them.

It’s interesting – there are some things humans are good at, and some things computers are good at. Humans are very good at reading what should be there, and improving phrasing etc. Computers are very good at reading what is actually there and highlighting things that humans would just gloss over.

Hopefully, this is just the beginning of a larger proof reading tool. It’s something of a sister to the uber-wordcount tool, which needs a bit of a rewrite really. My plan is to handle all of this sort of thing – stats, wordcounts, etc, in one javascript based application. There’s no need to do anything server side with this at all.

I’ve written the app in javascript, and I have to admit, my javascript is rusty. I was quite exicited to find a javascript minifier. This is the original:

function countit(){

var formcontent=document.wordcount.words.value
formcontent = formcontent.replace(/\n/g, “

formcontent = formcontent.split(” “)
var recentbits = “”

for ( var i = 0; i < formcontent.length; i++ )
if ( recentbits.toLowerCase().indexOf(” ” + formcontent[i].toLowerCase()) > 0)
formcontent[i] = “” + formcontent[i] + “

recentbits = “”

for (var count=0; count < 20; count++)
recentbits = recentbits + ” ” + formcontent[i-count]


var totalwords = formcontent.length

document.getElementById(‘totalwords’).innerHTML = “Output: (” + totalwords + ” words)

” + formcontent.join(” “)


And this is after minifying:

function countit(){var a=document.wordcount.words.value;a=a.replace(/\n/g,”

”);a=a.split(” “);var b=”“;for(var c=0;c0){a[c]=””+a[c]+””}b=”“;for(var d=0;d<20;d++){b=b+” “+a[c-d]}}var e=a.length;document.getElementById(“totalwords”).innerHTML=”Output: (“+e+” words)

”+a.join(” “)}

Obviously, you can’t read it, but it’s so much more compact.

Going through it, I don’t think it does anything cleverer than renaming all the variables to consecutive letters and getting rid of all the space. But it’s pretty nifty for loading into the live system.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: