The Go Programming Language

I recently read "The Go Programming Language" by Alan A. A. Donovan and Brian W. Kernighan. (I like to imagine Mr. Donovan's full name is Alan Alan Alan Donovan--please don't correct me.) So far I have read the book cover to cover, but not programmed any significant Go.

While reading, I wrote myself a list of questions to look up after I finished. Here are the questions (together with answers).

Q20: Go came out in 2012 with version 1.0. The book was published in 2016 and uses Go 1.5. As of writing it is 2025, and the latest version is 1.24. What has changed in Go since the book came out and now? (Note: Language changes only, no library or tooling changes mentioned)

  • 1.6 (2016) - No changes
  • 1.7 - No changes*
  • 1.8 (2017) - No changes*
  • 1.9
    • Introduced type aliases
  • 1.10 (2018) - No changes*
  • 1.11 - No changes
  • 1.12 (2019) - No changes
  • 1.13
    • New number literal syntax.
    • Shift count can be signed now.
  • 1.14 (2020)
    • Allow overlapping methods for embedded interfaces (solves the diamond problem for interfaces)
  • 1.15 - None
  • 1.16 (2021) - No changes
  • 1.17
    • Allows conversion from slice to fixed-size array pointer (can panic)
  • 1.18 (2022)
    • Generics--type parameters can be used in type definitions as well as function definitions.
    • Added type any as a shorter name for interface{}
    • Added type comparable: == works
    • Added union types: A or B or C
    • Added type ~T : ~int is any type whose underlying type is int
  • 1.19 - None*
  • 1.20 (2023)
    • Allow conversion from slice to fixed-size array.
    • Broading of 'comparable' to include interfaces that might panic at runtime.
  • 1.21
    • New built-ins (min, max)
    • New built-in (clear) -- applies or slice or map
    • Type inference improvements which went a bit over my head.
    • Fixed an edge case around panic(nil).
  • 1.22 (2024)
    • Fixes the loop iteration gotcha caused by lexical scoping inside loops. (Previously, there was one loop index which was updated -- now a new variable is created and assigned each loop).
    • For loops can range over integers.
  • 1.23
    • Added iterator ranges (iterations are functions).
  • 1.24 (2025)
    • Type aliases can be parameterized.

Q1: If you try to take the address &map, the compiler prevents you, because the address of a map is its backing store, which can silently change. How is this done? Can I do it for my own types?

Note: You can take the address of &map, just not &map[2].

"It just does that". Map is a built-in type, not an implementation, so it just does stuff you can't. No you can't do it for your own types. There are garbage collection reasons they made it work this way but they're not interesting.

Q2: Can you take the address of a slice? Can the same problem happen?

You can take the address of both &slice and &slice[2].

If append(slice, 599) re-allocates the backing store, the second points to the original backing store, and prevents it from being garbage collected. Also, any changes to it are not affected in the slice returned by append, so you probably shouldn't.

Q3: What are all the forms of for loops?

  • for INITIALIZER; CONDITION; POST {} - C for loop
  • for {} - Loop forever
  • for CONDITION - C while loop
  • for index, value := range THING {} or for index := range THING {} or for range THING {}. Range can iterate over:
    • array/slice (index, value)
    • string (index, value) - this is unicode code points ("runes") and not bytes
    • map (key, value) - this is in random order
    • channel (e, N/A) - received elements of a channel
    • Since 1.22: int (index, N/A) - from 0 to N-1
    • Since 1.23: function (T1, T2) - function is called with a "loop body" function, which can be called once with each value, and returns whether to keep iterating
  • Note that break and continue affect loops

Q4: What are the signatures of range, if it's a function?

No, it's a keyword (p27, for Go 1.4 see also p141 gotchas). See Q3 for all the range variants, and Q18 for general function overloading.

Q5: Why does Go say -0 is not equal to 0 in the following code?

var z float64
fmt.Println(-z) // Prints -0

IEEE 754 defines a negative zero. Positive and negative compare equal, so code will generally work as you expect. Go chooses to print "-0" rather than "0" for this value in format strings, while other languages print "0" for both.

Additional discoveries:

  • int(-z) is 0
  • the constant -0.0 is positive zero (!)

Q6: (p98) Why does ReadRune() in invalid unicode return a replacement char with length 1 ? The replacement char has byte length 2. Is this a deliberate signal value?

Yes (no citation)

Q7: What happens if you convert Inf, -Inf, NaN, or a float too large to fit into an int, to an int? Book claims conversions don't panic.

All of them are converted to

  • uint/uint64: 2^63 = 9223372036854775808
  • int/int64: -2^63 = -9223372036854775808 (even +Inf and 1e200)

I don't know why these particular values. I have asked on Stack Overflow

Q8: In Go, can you marshal functions or closures?

No.

Reflect does not support it (and so neither does json.Marshal, etc). I couldn't immediately come up with a way even to distinguish closures and non-closures, or get the name of a function. You can get a function pointer and then do some heuristics to get the name, maybe.

Q9.1: How do map literals work for non-strings?

map[Point]string{Point{0, 0}: "orig"}
    or
map[Point]string{{0, 0}: "orig"} // Names can be left out of keys or values in map literals

Q9.2: Can I make user types with this mechanism? (ex. my own literal initialization)

No. Literals are only for built-in types, and the mechanism is not extensible. (But you can have the underlying type be a map an initialize your type with one.)

Q10: Struct fields can have metadata ("struct tags"). Can whole types?

No.

Q11: How does ... variadic notation fail if the slice can be too short to fill all arguments? Is it only allowed for the variadic argument or can it span multiple?

Yeah, you have to match it with the variadic argument.

Q12: Thomson, Pike, Kernighan, Richie -- fill in a Venn Diagram of what they made/wrote.

  • Ken Thompson: B, Unix, Plan 9, Go, regexes, UTF8, QED, ed, chess endgames, Inferno, "Reflections on Trusting Trust"
  • Dennis Richie: B, C, Unix (inc. man pages?), Plan 9, Inferno, Limbo, "The C Programming Language"
  • Brian Kernighan: awk, "The C Programming Language" (including "Hello, world!"), "The Go Programming Language", "The Elements of Programming Style", "The Practice of Programming", "The Unix Programming Environment"
  • Rob Pike: Plan 9, Go , Inferno, Limbo, Newsqueak, sam, acme, Sawsall, "The Unix Programming Environment", "The Practice of Programming"

Q13: What order are deferreds called in?

Last in, first out. Then exit the function, and so on up the stack.

Q14: What happens if a panic happens, a deferred is called, and the deferred panics?

It prints nested panics informationally, but continues to pop the deferreds

Q15: map[x] = y panics if map is a nil map, but slice = append(slice, 1) works fine if slice is a nil slice. Why? I feel like I'm being nickle-and-dimed by Go that the zero value panics.

Both slice and map suck if they're nil. It's just that slice is so bad (normal use case of append panics even for non-nil values) that they added a library append function, which happened to deal with the nil case too.

You can write a map_set which returns a new map much like append. You can't write a better map, because there's no operator overloading (see also Q17)

Q16: Why is the *p vs p method consistency principle a thing?

Because a.Method() notation sugars between the two, but interfaces don't. You want at least one of *p and p to support an interface.

Q17: Is there operator overloading?

No.

And Go has a broader principle that none of the core language calls any specific method name (String(), Error(), etc), which came up in the 1.23 iterator design.

Q18: Is there function overloading? (range, map.get, json.Marshal, type assertion)

Map lookup, type assertion, and channel receive are keyword-level overloading, not functions. They are special cases.

In general, a function has to take the same number of inputs and return the same number of outputs, of the same types. There is one exception, which is that one of the inputs can be variadic--for example, the built-in function make.

1.6 (2016) answer: BUT, you can "return" a generic type like interface{} (which the user has to cast unsafely to the right type) or modify one of the inputs (which can be something like interface{}). The latter is how json.Marshal works and knows what type to deserialize. To compliment this, you can do runtime inspect of types through a select statement or the reflect module.

1.18 (2022) answer: Same for number of arguments, but also functions can now be generic (ex. type A -> A). If only the return type varies, you can use named returns to do stuff with the return type. See Q24 also.

Q19: Does Go have parametric polymorphism?

1.6 (2016): No.

1.18 (2022): Yes.

Q21: Can I extend someone else's package after the fact? (ex. add new methods to json, perhaps to make it support some interface)

No. (But you can do type and interface embedding.)

Q22: What happens if I call defer inside a defer function or during a panic?

It works normally, either way.

If you create an infinite loop of deferred functions (with or without infinite panics) it does a stack overflow, and it's not obvious it was mid-panic immediately.

Q23: (p208) Why does .( type assertion return one OR two things depending? Did not seem to cover in multiple return assignments.

See Q18.

Q24: Can type switching do slices, maps, arrays, etc? (p212)

1.6 (2016): No. You need to use reflection.

1.18 (2022): Unsure. Generics were introduced, and I don't know how they interface with type switching. I think type switches only take (fully-specified) concrete types in the case statements?

Q25: Does Go have a preprocessor or macros?

No to both.

Q26: TODO: Read proposal that caused unix pipes

There wasn't a written one, I was misremembering Douglas McIlroy's suggestions as being a formal memo. The v3 vs v4 pipeline description seems interesting to compare, however. See v3, 1973 notation (p121-123, 3 pages) vs v4, 1973 (p98, one paragraph).

Q27: Is 'make' a keyword? What args does it take for each type? (Can I change what it takes for my types)

Both make and new are built-in functions, not keywords. make takes a type, and optionally size parameters, and returns that type. new takes a type and returns a pointer to a new variable of that type.

  • make(CHANNEL\_TYPE, size) - size defaults to 0
  • make(SLICE\_TYPE, size, capacity) - capacity defaults to size. (no default for size?)
  • make(MAP, starting size) - starting size defaults to something reasonable
  • new(TYPE) - only one form

Q28: Can you write 'map' in Go? (or something to join two channels)

1.6 (2016): Only awkwardly, using reflection (see Q19). Map could have the signature: map(in_list interface{}, f interface{}, out_list interface{})

1.18 (2022): Yes, both. Generics got added.

Q29: Are CSP in Go + Erlang basically the same model?

Not sure, didn't look this one up. But basically no, even if the deeper model is the same.

  • Erlang has out-of-order reading, indefinitely growing channel size, one unidirection 'channel' per process, and the notion of 'links' between processes to cause cascading failure.
  • Go has channel closing, and the notion of a specific channel size (which defaults to 0), so it's more synchronous by default.

Q30: Why is there a & in memo := &Memo{request: make(chan request)} on p278, when I thought you couldn't address constants (p159)?

It's a special case for & and new only. From Stack Overflow:

Calling the built-in function new or taking the address of a composite literal allocates storage for a variable at run time. Such an anonymous variable is referred to via a (possibly implicit) pointer indirection.

Suggested exerciae 31: (p280) Test # of goroutines and stack sizes before crash

Knock yourself out.

Suggested exercise 32: Test # of bits in an int/uint

^uint(0) >> 63 == 1

Q33: How do you detect int overlow (signed or unsigned) in Go?

You can't. There is a library for it


While reading the book, I noticed three big problems in Go that popped out to me.

  • The gotchas around for-loop scoping (fixed in 2024)
  • The lack of generics looked really painful (fixed in 2022). Functional programming looked pretty impossible (annoying, since Go lets you pass around functions and even closures), and it looked hard to glue together channels at a high level. The book's example of memoization code was pretty bad. This mostly seems all fixed (although I'm not sure how to test "A is a B" for non-concrete B at runtime).
  • The number of built-in panics looked bad. In particular, I though the default value for map being nil, which panics when you try to insert something, was a dumb default. Now that I learned more, I think it's a dumb default and the default slice is dumb too.

Adding generics to the language made me much more likely to give it a whirl.

References:

[1]: https://go.dev/play/ "The Go Playground"

[2]: https://go.dev/doc/#references "The Go Documentation"

[3]: The Go Programming Language, by Alan A. A. Donovan and Brian W. Kernighan

Tagged ,

Hack-A-Day, Day 26: No Room for Error

I made a coding challenge, vaguely tied up as a game. Your goal is to complete simple coding challenges, with a major twist--you only get one try. You can only hit RUN once. (Well, actually you can hit RUN more than once. But it gets marked as failed.)

 the game is called "No Room For Error"
the game is called "No Room For Error"

It put together about 10 challenges, together with some story narration.

The game problems are now tested.

You can play online. The code is on github

Tagged , ,

Repulsive Dots

Lately I’ve been messing about in Godot, a framework for making video games (similar to Unity).

I wanted to make a 3D game. In my game, you live in a geodesic dome, and can’t go outside, because mumble mumble mumble poisonous atmosphere?.

A geodesic dome, I learned, is related to the icosahedron, or d20 from RPGs.

A simple dome is the top half of the icosahedron. As they get more complex, you divide each triangle into more and more smaller triangles.

Icosahedron getting more and more detailed. Geodesic domes are the top half of each sphere.
Icosahedron getting more and more detailed. Geodesic domes are the top half of each sphere.

So to make a nice geodesic dome, we could find one (I failed), make one in Blender (too hard), or use some math to generate one in Godot. And to do that math, we need to know the list of 20 icosahedron faces. Which basically just needs the list of the 12 vertices!

Now, obviously you could look up the vertices, but I thought of a more fun way. Let’s put 12 points on a sphere, make them all repel each other (think magnetically, I guess), and see where on the sphere they slide to. Maybe they will all be spaced out evenly in the right places. Well, here’s what it looks like:

So… kinda? It was certainly entertaining.

By the way, the correct coordinates for the vertices of an icosahedron inside a unit sphere are:

  • the top at (0, 1, 0)
  • the bottom at (0, -1, 0)
  • 10 equally spaced points around a circle. they alternate going up and down below the center line.
    (±1/√5, sin(angle), cos(angle)) [projected onto the sphere]
Tagged , ,

Testing scrapers faster

Recently I wrote a scraper. First, I downloaded all the HTML files. Next, I wanted to parse the content. However, real world data is pretty messy. I would run the scraper, and it would get partway though the file and fail. Then I would improve it, and it would get further and fail. I’d improve it more, and it would finish the whole file, but fail on the fifth one. Then I’d re-run things, and it would fail on file #52, #1035, and #553,956.

To make testing faster, I added a scaffold. Whenever my parser hit an error, it would print the filename (for me, the tester) and record the filename to an error log. Then, it would immediately exit. When I re-ran the parser, it would test all the files where it had hit a problem first. That way, I didn’t have to wait 20 minutes until it got to the failure case.

if __name__ == "__main__":
    if os.path.exists("failures.log"):
        # Quicker failures 
        with open("failures.log", "r") as f:
            failures = set([x.strip() for x in f])
        for path in tqdm.tqdm(failures, desc="re-checking known tricky files"):
            try:
                with open(path) as input:
                    parse_file(input)
            except Exception:
                print(path, "failed again (already failed once")
                raise

    paths = []
    for root, dirs, files in os.walk("html"):
        for file in sorted(files):
            path = os.path.join(root, file)
            paths.append(path)
    paths.sort()

    with open("output.json", "w") as out:
        for path in tqdm.tqdm(paths, desc="parse files"): # tqdm is just a progress bar. you can also use 'for path in paths:
            with open(input, "r") as input:
                try:
                    result = parse_file(input)
                except Exception:
                    print(path, "failed, adding to quick-fail test list")
                    with open("failures.log", "a") as fatal:
                        print(path, file=fatal)
                    raise
                json.dump(result, out, sort_keys=True) # my desired output is one JSON dict per line
                out.write("\n")
Tagged , ,

Crawling Etiquette

I participate in a mentoring program, and recently one of the people I mentor asked me about whether it was okay to crawl something. I thought I would share my response, which is posted below nearly verbatim.

For this article, I’m skipping the subject of how to scrape websites (as off-topic), or how to avoid bans.

People keep telling me that if I scrape pages like Amazon that I’ll get banned. I definitely don’t want this to happen! So, what is your opinion on this?

Generally bans are temporary (a day to two weeks). I’d advise getting used to it, if you want to do serious scraping! If it would be really inconvenient, either don’t scrape the site or learn to use a secondary IP, so when your scraper gets banned, you can still use the site as a user.

More importantly than getting banned, you should learn about why things like bans are in place, because they’re not easy to set up–someone decided it was a good idea. Try to be a good person. As a programmer, you can cause a computer to blindly access a website millions of times–you get a big multiplier on anything a normal person can do. As such, you can cause the owners and users of a site problems, even by accident. Learn scraping etiquette, and always remember there’s an actual computer sitting somewhere, and actual people running the site.

That said, there’s a big difference between sending a lot of traffic to a site that hosts local chili cookoff results, and amazon.com. You could cause make the chili cookoff site hard to access or run up a small bill for the owners if you screw up enough, while realistically there’s nothing you can do to slow down Amazon.com even if you tried.

Here are a couple reasons people want to ban automated scraping:

  1. It costs them money (bandwidth). Or, it makes the site unusable because too many “people” (all you) are trying to access it at once (congestion). Usually, it costs them money because the scaper is stupid–it’s something like a badly written search engine, which opens up every comment in a blog as a separate page, or opens up an infinite series of pages. For example, I host a bunch of large binaries (linux installers–big!), and I’ve had a search engine try to download every single one, once an hour. As a scraper, you can can avoid causing these problems by
    • rate-limiting your bot (ex. only scraping one page every 5-10 seconds, so you don’t overload their server). This is a good safety net–no matter what you do, you can’t break things too badly. If you’re downloading big files, you can also rate-limit your bandwidth or limit your total bandwidth quota.
    • examining what your scraper is doing as it runs (so you don’t download a bunch of unncessessary garbage, like computer-generated pages or a nearly-identical page for every blog comment)
    • obeying robots.txt, which you can probably get a scraping framework to do for you. you can choose to ignore robots.txt if you think you have a good reason to, but make sure you understand why robots.txt exists before you decide.
    • testing the site while you’re scraping by hand or with a computerized timer. If you see the site do something like load slower (even a little) because of what you’re doing, stop your scraper, and adjust your rate limit to be 10X smaller.
    • make your scraper smart. download only the pages you need. if you frequently stop and restart the scraper, have it remember the pages you downloaded–use some form of local cache to avoid re-downloading things. if you need to re-crawl (for example to maintain a mirror) pass if-modified-since HTTP headers.
    • declare an HTTP user-agent, which explains what you’re doing and how to contact you (email or phone) in case there is a problem. i’ve never had anyone actually contact me but as a site admin I have looked at user agents.
    • probably some more stuff i can’t think of off the top of my head
  2. They want to keep their information secret and proprietary, because having their information publicly available would lose them money. This is the main reason Amazon will ban you–they don’t want their product databases published. My personal ethics says I generally ignore this consideration, but you may decide differently
  3. They have a problem with automated bots posting spam or making accounts. Since you’re not doing either, this doesn’t really apply to you, but your program may be caught by the same filters trying to keep non-humans out.

For now I would advise not yet doing any of the above, because you’re basically not doing serious scraping yet. Grabbing all the pages on xkcd.com is fine, and won’t hurt anyone. If you’re going to download more than (say) 10,000 URLs per run, start looking at the list above. One exception–DO look at what your bot does by hand (the list of URLs, and maybe the HTML results), because it will be educational.

Also, in my web crawler project I eventually want to grab the text on each page crawled and analysis it using the requests library. Is something like this prohibited?

Prohibited by whom? Is it against an agreement you signed without reading with Amazon? Is it against US law? Would Amazon rather you didn’t, while having no actual means to stop you? These are questions you’ll have to figure out for yourself, and how much you care about each answer. You’ll also find the more you look into it that none of the three have very satisfactory answers.

The answer of “what bad thing might happen if I do this” is perhaps less satisfying if you’re trying to uphold what you perceive as your responsibilities, but easier to answer.

These are the things that may happen if you annoy a person or company on the internet by scraping their site. What happens will depend both on what you do, and what entity you are annoying (more on the second). Editor’s note: Some of the below is USA-specific, especially the presence/absence of legal or government action.

  • You may be shown CAPTCHAs to see if you are a human
  • Your scaper’s IP or IP block may be banned
  • You or your scraper may be blocked in some what you don’t understand
  • Your account may be deleted or banned (if your scraper uses an account, and rarely even if not)
  • They may yell at you, send you an angry email, or send you a polite email asking you to stop and/or informing you that you’re banned and who to contact if you’d like to change that
  • You may be sent a letter telling you to stop by a lawyer (a cease-and-desist letter), often with a threat of legal action if you do not
  • You may be sued. This could be either a legitimate attempt to sue you, or a sort of extra-intimidating cease-and-desist letter. The attempt could be successful, unsuccessful but need you to show up in court, or could be something you can ignore althogether.
  • You may be charged with some criminal charge such as computer, wire, or mail fraud. The only case I’m aware of offhand is Aaron Swartz
  • You may be brought up on some charge by the FBI, which will result in your computers being taken away and not returned, and possibly jailtime. This one will only happen if you are crawling a government site (and is not supposed to happen ever, but that’s the world we live in).

For what it’s worth, so far I have gotten up to the “polite email” section in my personal life. I do a reasonable amount of scraping, mostly of smaller sites.

[… section specific to Amazon cut …]

Craigslist, government sites, and traditional publishers (print, audio, and academic databases) are the only companies I know of that aggressively goes after scrapers through legal means, instead of technical means. Craigslist will send you a letter telling you to stop first.

What a company will do once you publicly post all the information on their site is another matter, and I have less advice there. There are several sites that offer information about historical Amazon prices, for what that’s worth.

You may find this article interesting (but unhelpful) if you are concerned about being sued. Jason Scott is one of the main technical people at the Internet Archive, and people sometimes object to things he posts online.

In my personal opinion, suing people or bringing criminal charges does not work in general, because most people scraping do not live in the USA, and may use technical means to disguise who they are. Scrapers may be impossible to sue or charge with anything. In short, a policy of trying to sue people who scape your site, will result in your site still being scraped. Also, most people running a site don’t have the resources to sue anyone in any case. So you shouldn’t expect this to be a common outcome, but basically a small percentage of people (mostly crackpots) and companies (RIAA and publishers) may.

Tagged , ,

KISS vs DRY

The best practice or goal emphasized above with respect to templates and views is KISS and DRY. As long as the implementation does not become overly complex and difficult to grok, keep the template code DRY, otherwise KISS principle overrides the need to have template code that does not repeat itself.

Vertebrae Framework

A nice illustration of conflicting positive principles and resolution.

Tagged , ,