done

2025-12-06 02:14:39 +01:00 · 2015-03-01 12:19:53 +01:00 · 2015-03-01 12:19:53 +01:00 · 94b2c8300b
commit 94b2c8300b
2 changed files with 464 additions and 0 deletions
--- a/go_shell.go
+++ b/go_shell.go
@ -0,0 +1,455 @@
+// Tutorial - Write a Shell in Go - 28 Feb 2015 - Loic Nageleisen
+//
+// Let's be honest: this tutorial draws heavily from Stephen Brennen's
+// "Write a Shell in C" tutorial, available here:
+//
+// http://stephen-brennan.com/2015/01/16/write-a-shell-in-c/
+//
+// Therefore, all the credit should go to him, as this is mostly the same
+// tutorial, only Go-flavored. This is also an example of literate programming.
+
+// In Go, a package is a collection of source files, and the main package is
+// the one that produces an executable.
+
+package main
+
+// Imports allow to reference external packages. Here we only use packages of
+// Go's standard library, and since we have only one file, a Go workspace is
+// not even necessary.
+//
+// While we're at it, do yourself a favor and install gocode, gofmt and
+// goimports, and take the time to integrate them properly in your favorite
+// editor (which shouldn't take too long). As an example, I did not write a
+// single of those import statements, goimports did it for me.
+
+import (
+	"bufio"
+	"fmt"
+	"io"
+	"os"
+	"os/exec"
+	"sort"
+	"strings"
+	"syscall"
+)
+
+// Basic lifetime of a Shell
+//
+// Let's look a shell from the top down. A shell does three main things in its
+// lifetime.
+//
+// - Initialise: In this step, a typical shell would read and execute its
+//   configuration files. These change aspects of the shell’s behavior.
+// - Interpret: Next, the shell reads commands from stdin (which could be
+//   interactive, or a file) and executes them.
+// - Terminate: After its commands are executed, the shell executes any
+//   shutdown commands, frees up any memory, and terminates.
+
+// These steps are so general that they could apply to many programs, but we’re
+// going to use them for the basis for our shell. Our shell will be so simple
+// that there won’t be any configuration files, and there won’t be any shutdown
+// command. So, we’ll just call the looping function and then terminate. But in
+// terms of architecture, it’s important to keep in mind that the lifetime of
+// the program is more than just looping.
+
+func main() {
+	// Load config files, if any.
+
+	// Run command loop.
+	shellLoop()
+
+	// Perfom any shutdown/cleanup.
+
+	os.Exit(0)
+}
+
+// Here you can see that I just came up with a function, shellLoop(), that will
+// loop, interpreting commands. We’ll see the implementation of that next.
+
+// Basic loop of a shell
+//
+// So we’ve taken care of how the program should start up. Now, for the basic
+// program logic: what does the shell do during its loop? Well, a simple way to
+// handle commands is with three steps:
+//
+// - Read: Read the command from standard input.
+// - Parse: Separate the command string into a program and arguments.
+// - Execute: Run the parsed command.
+//
+// Here, I’ll translate those ideas into code for shellLoop():
+
+func shellLoop() {
+	var line string
+	var args []string
+	var status int
+
+	for {
+		fmt.Printf("> ")
+		line = shellReadLine()
+		args = shellSplitLine(line)
+		status = shellExecute(args)
+
+		if status == 0 {
+			break
+		}
+	}
+}
+
+// Let’s walk through the code. The first few lines are just declarations. The
+// for loop can't be checking the status variable, as it can't execute once
+// before checking its value, therefore we just break on condition a
+// posteriori. Within the loop, we print a prompt, call a function to read a
+// line, call a function to split the line into args, and execute the args.
+// Note that we’re using a status variable returned by shellExecute() to
+// determine when to exit.
+
+// Reading a line
+//
+// Reading a line from stdin sounds so simple, but Go has no ready-made
+// function for that. The result is something that feels slightly lower-level
+// than what one might expect from a batteries-included language, yet is
+// powerful enough to cover any case. In hindsight, this is typical of Go to
+// empower its users with simple yet composable primitives instead of
+// ready-made black boxes with many parameters.
+//
+// Indeed, the sad thing is that you don’t know ahead of time how much text a
+// user will enter into their shell. You can’t simply allocate a slice and hope
+// they don’t exceed it. Instead, you need to start with a slice, and if they
+// do exceed it, reallocate with more space. Fortunately, Go's append does
+// exactly that, and we can even start from a nil slice. An interesting read
+// about slices and append that I encourage you to read lies here:
+// http://blog.golang.org/slices
+// This is a common pattern in Go, and we'll use it to implement
+// shellReadLine()
+
+func shellReadLine() string {
+	var err error
+	var line, data []byte
+	isPrefix := true
+	r := bufio.NewReader(os.Stdin)
+
+	for isPrefix && err == nil {
+		data, isPrefix, err = r.ReadLine()
+		line = append(line, data...)
+	}
+
+	if err == io.EOF {
+		fmt.Println()
+	}
+
+	return string(line)
+}
+
+// The first part is a lot of declarations and initialisations. If you hadn’t
+// noticed, I prefer to keep the style of declaring variables before the rest
+// of the code. I also favor declaring nil-valued vars first, then those whose
+// type will be inferred from the assignment.
+//
+// The meat of the function is the for loop. In the loop, we read until a
+// newline is reached, the bufio reader is filled (which sets isPrefix to
+// true), or an error occurs (which sets err). The EOF error is special-cased
+// as you want people typing ^D to have the shell exit. This is, in fact, not
+// done here, and we simply print a newline to have the prompt displayed again
+// properly.
+//
+// os.Stdin implements the source, bufio.Reader handles all the buffered
+// reading magic, and append handles all the reallocations necessary to
+// accomodate for the whole user input. Since the reader is unconcerned about
+// encodings and returns raw bytes, we convert the line into a Unicode string
+// right before returning. It does not look like much, but this is a prime
+// example of composability at its best.
+
+// Parsing the line
+//
+// OK, so if we look back at the loop, we see that we now have implemented
+// shellReadline(), and we have the line of input. Now, we need to parse that
+// line into a list of arguments. I’m going to make a glaring simplification
+// here, and say that we won’t allow quoting or backslash escaping in our
+// command line arguments. Instead, we will simply use whitespace to separate
+// arguments from each other. So the command echo "this message" would not call
+// echo with a single argument this message, but rather it would call echo with
+// two arguments: '"this' and 'message"'.
+//
+// With those simplifications, all we need to do is split the string on
+// whitespace. Fortunately, Go provides us with numerous string operations in
+// the strings package.
+
+func shellSplitLine(line string) []string {
+	return strings.Fields(line)
+}
+
+// So, once all is said and done, we have an array of tokens, ready to execute.
+// Which begs the question, how do we do that?
+
+// How shells start processes
+//
+// Now, we’re really at the heart of what a shell does. Starting processes is
+// the main function of shells. So writing a shell means that you need to know
+// exactly what’s going on with processes and how they start. That’s why I’m
+// going to take us on a short diversion to discuss processes in Unix.
+//
+// There are only two ways of starting processes on Unix. The first one (which
+// almost doesn’t count) is by being Init. You see, when a Unix computer boots,
+// its kernel is loaded. Once it is loaded and initialized, the kernel starts
+// only one process, which is called Init. This process runs for the entire
+// length of time that the computer is on, and it manages loading up the rest
+// of the processes that you need for your computer to be useful.
+//
+// Since most programs aren’t Init, that leaves only one practical way for
+// processes to get started: the fork() system call. When this function is
+// called, the operating system makes a duplicate of the process and starts
+// them both running. The original process is called the “parent”, and the new
+// one is called the “child”. fork() returns 0 to the child process, and it
+// returns to the parent the process ID number (PID) of its child. In essence,
+// this means that the only way for new processes is to start is by an existing
+// one duplicating itself.
+//
+// This might sound like a problem. Typically, when you want to run a new
+// process, you don’t just want another copy of the same program – you want to
+// run a different program. That’s what the exec() system call is all about. It
+// replaces the current running program with an entirely new one. This means
+// that when you call exec, the operating system stops your process, loads up
+// the new program, and starts that one in its place. A process never returns
+// from an exec() call (unless there’s an error).
+//
+// With these two system calls, we have the building blocks for how most
+// programs are run on Unix. First, an existing process forks itself into two
+// separate ones. Then, the child uses exec() to replace itself with a new
+// program. The parent process can continue doing other things, and it can even
+// keep tabs on its children, using the system call wait().
+//
+// Phew! That’s a lot of information, but with all that background we will be
+// able to make sense of what happens behind the scenes of the following code
+// for launching a program. I say behind the scenes because Go wants to provide
+// an interface as robust and system agnostic as is pragmatically possible.
+
+func shellLaunch(args []string) int {
+	path, err := exec.LookPath(args[0])
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+		return 1
+	}
+
+	procAttr := os.ProcAttr{}
+	procAttr.Files = []*os.File{os.Stdin, os.Stdout, os.Stderr}
+
+	process, err := os.StartProcess(path, args, &procAttr)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+	} else {
+		for {
+			state, err := process.Wait()
+			if err != nil {
+				fmt.Fprintln(os.Stderr, err)
+			}
+
+			waitStatus, ok := state.Sys().(syscall.WaitStatus)
+			if ok && (waitStatus.Exited() || waitStatus.Signaled()) {
+				break
+			}
+		}
+	}
+
+	return 1
+}
+
+// Alright. This function takes the list of arguments that we created earlier.
+// Then, it looks to resolve the absolute path for the program we want to run
+// via the os/exec package. This is because Go does not expose system-specific
+// calls such as execvp that delegate this resolution to the OS, instead
+// exposing a single os.StartProcess function that looks to work uniformly
+// across OSes (including non-Unix ones) and provide compatibility between
+// goroutines (which may map onto threads), forks and exec. The os/exec package
+// also provides a more object-oriented (and more potent) exec.Cmd type, but
+// it's not as useful to us right now as the low-level os.StartProcess, because
+// we handle Wait() differently.
+//
+// In addition to the path to the program and the list of arguments,
+// StartProcess also takes an argument containing additional attributes in
+// order to offer as much functionality as the Unix exec family calls. An
+// exec'd program may inherit the environment variables its spawner runs in
+// (which is the default), or it can be passed an arbitrary environment.
+// Additionally, Unix's fork will make every opened file descriptor readily
+// available to the child as well as the parent, and a typical mistake in C is
+// to forget to close the descriptors that won't be useful to the child. Go
+// takes a friendlier approach and all descriptors will be closed unless they
+// are passed on to the child in the attributes. Since we want our spawned
+// program to interact with the user's terminal, we'd rather pass along Stdin,
+// Stdout and Stderr. It is also possible to pass additional, system specific
+// attributes, but beware of your code's portability.
+//
+// Once the process is started, we wait for the command to finish running. We
+// use Wait() to wait for the process’s state to change. Unfortunately,
+// Wait() returns a ProcessState that is system agnostic but quite light on
+// information. Fortunately ProcessState is symmetric with ProcAttrs and can
+// return a system-specific type, and since it comes from a generic function
+// returning an interface{}, we need to do a type assertion. Processes can
+// change state in lots of ways, and not all of them mean that the process has
+// ended. A process can either exit (normally, or with an error code), or it
+// can be killed by a signal. So, we use the methods of syscall.WaitStatus to
+// wait until either the processes are exited or killed. Then, the function
+// finally returns a 1, as a signal to the calling function that we should
+// prompt for input again.
+
+// Shell Builtins
+//
+// You may have noticed that the shellLoop() function calls shellExecute(), but
+// above, we titled our function shellLaunch(). This was intentional! You see,
+// most commands a shell executes are programs, but not all of them. Some of
+// them are built right into the shell.
+//
+// The reason is actually pretty simple. If you want to change directory, you
+// need to use the function Chdir(). The thing is, the current directory is a
+// property of a process. So, if you wrote a program called cd that changed
+// directory, it would just change its own current directory, and then
+// terminate. Its parent process’s current directory would be unchanged.
+// Instead, the shell process itself needs to execute Chdir(), so that its own
+// current directory is updated. Then, when it launches child processes, they
+// will inherit that directory too.
+//
+// Similarly, if there was a program named exit, it wouldn’t be able to exit
+// the shell that called it. That command also needs to be built into the
+// shell. Also, most shells are configured by running configuration scripts,
+// like ~/.bashrc. Those scripts use commands that change the operation of the
+// shell. These commands could only change the shell’s operation if they were
+// implemented within the shell itself.
+//
+// So, it makes sense that we need to add some commands to the shell itself.
+// The ones I added to my shell are cd, pwd, exit, and help. Here are their
+// function implementations below:
+
+var shellBuiltIn map[string](func([]string) int)
+
+func init() {
+	shellBuiltIn = map[string](func([]string) int){
+		"cd":   shellCd,
+		"pwd":  shellPwd,
+		"exit": shellExit,
+		"help": shellHelp,
+	}
+}
+
+func shellCd(args []string) int {
+	if len(args) < 2 {
+		fmt.Fprintln(os.Stderr, "cd: missing argument")
+	} else if err := os.Chdir(args[1]); err != nil {
+		fmt.Fprintln(os.Stderr, err)
+	}
+	return 1
+}
+
+func shellPwd(args []string) int {
+	dir, err := os.Getwd()
+	if err != nil {
+		fmt.Fprintln(os.Stderr, err)
+	}
+
+	fmt.Println(dir)
+	return 1
+}
+
+func shellHelp(args []string) int {
+	var keys []string
+	for k := range shellBuiltIn {
+		keys = append(keys, k)
+	}
+	sort.Strings(keys)
+
+	for _, name := range keys {
+		fmt.Println(name)
+	}
+	return 1
+}
+
+func shellExit(args []string) int {
+	return 0
+}
+
+// There are three parts to this code. The first part is declaring the map that
+// will resolve our builtin names to their functions. This is so that, in the
+// future, builtin commands can be added simply by modifying this map, rather
+// than editing a large switch statement somewhere in the code.
+//
+// While tempting, initialising shellBuiltIn with a literal would cause a
+// compile-time error because of an initialisation loop with shellHelp. This is
+// where the second part comes into play: we delegate our map initialisation to
+// the package's init function, which is called even before the main function
+// is called. Be careful though, as if your package spans multiple Go source
+// files, their init() function has no execution ordering guarantee.
+//
+// Finally, I implement each function. The shellCd() function first checks that
+// its second argument exists, and prints an error message if it doesn’t. Then,
+// it calls os.Chdir(), checks for errors, and returns. The help function
+// prints a nice message and the names of all the builtins. The pwd function
+// prints the current working directory, and the exit function returns 0, as a
+// signal for the command loop to terminate.
+//
+// Using a map has an interesting effect, obvious to anyone used to similar
+// hash table data structures: the ordering is not guaranteed. Go goes as far
+// as voluntarily ensuring iterating over a map's range yields a random
+// ordering so that nobody relies on a providential ordering due to a
+// particular runtime implementation. Once copied, sorting the keys
+// alphabetically is done in place by the sort package, with a ready-made
+// string sorter.
+
+// Putting together builtins and processes
+//
+// The last missing piece of the puzzle is to implement shellExecute(), the
+// function that will either launch either a builtin, or a process. If you’re
+// reading this far, you’ll know that we’ve set ourselves up for a really
+// simple function:
+
+func shellExecute(args []string) int {
+	if len(args) == 0 {
+		return 1
+	}
+
+	if builtIn := shellBuiltIn[args[0]]; builtIn != nil {
+		return builtIn(args)
+	}
+
+	return shellLaunch(args)
+}
+
+// All this does is check if the command exists as a builtin, and if so, run
+// it. If it doesn’t match a builtin, it calls shellLaunch() to launch the
+// process. The one caveat is that args might just contain an empty array, if
+// the user entered an empty string, or just whitespace. So, we need to check
+// for that case at the beginning.
+
+// Putting it all together
+//
+// That’s all the code that goes into the shell. If you’ve read along, you
+// should understand completely how the shell works. To try it out (on a Unix
+// machine), just `go run` this file!
+
+// Wrap up
+//
+// If you read this and wondered how in the world I knew how to use those
+// packages, the answer is simple: godoc. And while golang.org is readily
+// available and histong godoc, running godoc locally from your workspace will
+// include documentation from any package you have installed. If you know what
+// you’re looking for, and you just want to know how to use it, godoc is your
+// best friend. If you want to dig a bit deeper, you will want to dive into the
+// source code, which is both incredibly clean, throughtly documented and
+// readily available straight from godoc too! And remember, godoc works both
+// like `man` in a terminal as well as serving a browsable documentation from
+// the comfort of your browser.
+//
+// Obviously, this shell isn’t feature-rich. Some of its more glaring omissions
+// are:
+//
+// - Only whitespace separating arguments, no quoting or backslash escaping.
+// - No piping or redirection.
+// - Few standard builtins.
+// - No globbing.
+//
+// The implementation of all of this stuff is really interesting, but way more
+// than I could ever fit into an article like this. If I ever get around to
+// implementing any of them, I’ll be sure to write a follow-up about it. But
+// I’d encourage any reader to try implementing this stuff yourself. If you’re
+// met with success, drop me a line in the comments below, I’d love to see the
+// code.
+//
+// And finally, thanks for reading this tutorial (if anyone did). I enjoyed
+// writing it, and I hope you enjoyed reading it. Let me know what you think!