Thursday, May 7, 2015

Day 6 - String minifier, remove whitespaces

While writing a webcrawler I stumbled over an annoying problem: The strings I did extract had CRLF, tabs and spaces in it which I did not want. So first I looked at the strings library. There are Replace and Trim functions, but none of them did exactly what I wanted: I did want to remove all unnecessary whitespaces, but not all of them, there should always be one single space between.

So after googling for more than one hour I decided to write it myself and I was astonished that it took me only 5 minutes and it compiled at the first try (Go is such a great language!):




package main 
 
import (
    "strings"
    "unicode"
)
 
func main() {
    s := "I am a string\n           Containing    tooo    many     spaces     and    \n new lines"
    println("Before: " + s)
    s = stringMinifier(s)
    println("After: " + s)
    println("----------\n" + strings.Replace(s, "am", "am not anymore", 1))
}
  
func stringMinifier(in string) (out string) {

    white := false
    for _, c := range in {
        if unicode.IsSpace(c) {
            if !white {
                out = out + " "
            }
            white = true
        } else {
            out = out + string(c)
            white = false
        }
    }

    return
}
 
 
Output:
 
Before: I am a string
           Containing    tooo    many     spaces     and    
 new lines
After: I am a string Containing tooo many spaces and new lines
----------
I am not anymore a string Containing tooo many spaces and new lines 
 
 

3 comments:

  1. Nice! I have found some html minifiers but none for plain UTF-8 text

    ReplyDelete