go_shell/go_shell.go
2015-03-01 12:19:53 +01:00

455 lines
19 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// Tutorial - Write a Shell in Go - 28 Feb 2015 - Loic Nageleisen
//
// Let's be honest: this tutorial draws heavily from Stephen Brennen's
// "Write a Shell in C" tutorial, available here:
//
// http://stephen-brennan.com/2015/01/16/write-a-shell-in-c/
//
// Therefore, all the credit should go to him, as this is mostly the same
// tutorial, only Go-flavored. This is also an example of literate programming.
// In Go, a package is a collection of source files, and the main package is
// the one that produces an executable.
package main
// Imports allow to reference external packages. Here we only use packages of
// Go's standard library, and since we have only one file, a Go workspace is
// not even necessary.
//
// While we're at it, do yourself a favor and install gocode, gofmt and
// goimports, and take the time to integrate them properly in your favorite
// editor (which shouldn't take too long). As an example, I did not write a
// single of those import statements, goimports did it for me.
import (
"bufio"
"fmt"
"io"
"os"
"os/exec"
"sort"
"strings"
"syscall"
)
// Basic lifetime of a Shell
//
// Let's look a shell from the top down. A shell does three main things in its
// lifetime.
//
// - Initialise: In this step, a typical shell would read and execute its
// configuration files. These change aspects of the shells behavior.
// - Interpret: Next, the shell reads commands from stdin (which could be
// interactive, or a file) and executes them.
// - Terminate: After its commands are executed, the shell executes any
// shutdown commands, frees up any memory, and terminates.
// These steps are so general that they could apply to many programs, but were
// going to use them for the basis for our shell. Our shell will be so simple
// that there wont be any configuration files, and there wont be any shutdown
// command. So, well just call the looping function and then terminate. But in
// terms of architecture, its important to keep in mind that the lifetime of
// the program is more than just looping.
func main() {
// Load config files, if any.
// Run command loop.
shellLoop()
// Perfom any shutdown/cleanup.
os.Exit(0)
}
// Here you can see that I just came up with a function, shellLoop(), that will
// loop, interpreting commands. Well see the implementation of that next.
// Basic loop of a shell
//
// So weve taken care of how the program should start up. Now, for the basic
// program logic: what does the shell do during its loop? Well, a simple way to
// handle commands is with three steps:
//
// - Read: Read the command from standard input.
// - Parse: Separate the command string into a program and arguments.
// - Execute: Run the parsed command.
//
// Here, Ill translate those ideas into code for shellLoop():
func shellLoop() {
var line string
var args []string
var status int
for {
fmt.Printf("> ")
line = shellReadLine()
args = shellSplitLine(line)
status = shellExecute(args)
if status == 0 {
break
}
}
}
// Lets walk through the code. The first few lines are just declarations. The
// for loop can't be checking the status variable, as it can't execute once
// before checking its value, therefore we just break on condition a
// posteriori. Within the loop, we print a prompt, call a function to read a
// line, call a function to split the line into args, and execute the args.
// Note that were using a status variable returned by shellExecute() to
// determine when to exit.
// Reading a line
//
// Reading a line from stdin sounds so simple, but Go has no ready-made
// function for that. The result is something that feels slightly lower-level
// than what one might expect from a batteries-included language, yet is
// powerful enough to cover any case. In hindsight, this is typical of Go to
// empower its users with simple yet composable primitives instead of
// ready-made black boxes with many parameters.
//
// Indeed, the sad thing is that you dont know ahead of time how much text a
// user will enter into their shell. You cant simply allocate a slice and hope
// they dont exceed it. Instead, you need to start with a slice, and if they
// do exceed it, reallocate with more space. Fortunately, Go's append does
// exactly that, and we can even start from a nil slice. An interesting read
// about slices and append that I encourage you to read lies here:
// http://blog.golang.org/slices
// This is a common pattern in Go, and we'll use it to implement
// shellReadLine()
func shellReadLine() string {
var err error
var line, data []byte
isPrefix := true
r := bufio.NewReader(os.Stdin)
for isPrefix && err == nil {
data, isPrefix, err = r.ReadLine()
line = append(line, data...)
}
if err == io.EOF {
fmt.Println()
}
return string(line)
}
// The first part is a lot of declarations and initialisations. If you hadnt
// noticed, I prefer to keep the style of declaring variables before the rest
// of the code. I also favor declaring nil-valued vars first, then those whose
// type will be inferred from the assignment.
//
// The meat of the function is the for loop. In the loop, we read until a
// newline is reached, the bufio reader is filled (which sets isPrefix to
// true), or an error occurs (which sets err). The EOF error is special-cased
// as you want people typing ^D to have the shell exit. This is, in fact, not
// done here, and we simply print a newline to have the prompt displayed again
// properly.
//
// os.Stdin implements the source, bufio.Reader handles all the buffered
// reading magic, and append handles all the reallocations necessary to
// accomodate for the whole user input. Since the reader is unconcerned about
// encodings and returns raw bytes, we convert the line into a Unicode string
// right before returning. It does not look like much, but this is a prime
// example of composability at its best.
// Parsing the line
//
// OK, so if we look back at the loop, we see that we now have implemented
// shellReadline(), and we have the line of input. Now, we need to parse that
// line into a list of arguments. Im going to make a glaring simplification
// here, and say that we wont allow quoting or backslash escaping in our
// command line arguments. Instead, we will simply use whitespace to separate
// arguments from each other. So the command echo "this message" would not call
// echo with a single argument this message, but rather it would call echo with
// two arguments: '"this' and 'message"'.
//
// With those simplifications, all we need to do is split the string on
// whitespace. Fortunately, Go provides us with numerous string operations in
// the strings package.
func shellSplitLine(line string) []string {
return strings.Fields(line)
}
// So, once all is said and done, we have an array of tokens, ready to execute.
// Which begs the question, how do we do that?
// How shells start processes
//
// Now, were really at the heart of what a shell does. Starting processes is
// the main function of shells. So writing a shell means that you need to know
// exactly whats going on with processes and how they start. Thats why Im
// going to take us on a short diversion to discuss processes in Unix.
//
// There are only two ways of starting processes on Unix. The first one (which
// almost doesnt count) is by being Init. You see, when a Unix computer boots,
// its kernel is loaded. Once it is loaded and initialized, the kernel starts
// only one process, which is called Init. This process runs for the entire
// length of time that the computer is on, and it manages loading up the rest
// of the processes that you need for your computer to be useful.
//
// Since most programs arent Init, that leaves only one practical way for
// processes to get started: the fork() system call. When this function is
// called, the operating system makes a duplicate of the process and starts
// them both running. The original process is called the “parent”, and the new
// one is called the “child”. fork() returns 0 to the child process, and it
// returns to the parent the process ID number (PID) of its child. In essence,
// this means that the only way for new processes is to start is by an existing
// one duplicating itself.
//
// This might sound like a problem. Typically, when you want to run a new
// process, you dont just want another copy of the same program you want to
// run a different program. Thats what the exec() system call is all about. It
// replaces the current running program with an entirely new one. This means
// that when you call exec, the operating system stops your process, loads up
// the new program, and starts that one in its place. A process never returns
// from an exec() call (unless theres an error).
//
// With these two system calls, we have the building blocks for how most
// programs are run on Unix. First, an existing process forks itself into two
// separate ones. Then, the child uses exec() to replace itself with a new
// program. The parent process can continue doing other things, and it can even
// keep tabs on its children, using the system call wait().
//
// Phew! Thats a lot of information, but with all that background we will be
// able to make sense of what happens behind the scenes of the following code
// for launching a program. I say behind the scenes because Go wants to provide
// an interface as robust and system agnostic as is pragmatically possible.
func shellLaunch(args []string) int {
path, err := exec.LookPath(args[0])
if err != nil {
fmt.Fprintln(os.Stderr, err)
return 1
}
procAttr := os.ProcAttr{}
procAttr.Files = []*os.File{os.Stdin, os.Stdout, os.Stderr}
process, err := os.StartProcess(path, args, &procAttr)
if err != nil {
fmt.Fprintln(os.Stderr, err)
} else {
for {
state, err := process.Wait()
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
waitStatus, ok := state.Sys().(syscall.WaitStatus)
if ok && (waitStatus.Exited() || waitStatus.Signaled()) {
break
}
}
}
return 1
}
// Alright. This function takes the list of arguments that we created earlier.
// Then, it looks to resolve the absolute path for the program we want to run
// via the os/exec package. This is because Go does not expose system-specific
// calls such as execvp that delegate this resolution to the OS, instead
// exposing a single os.StartProcess function that looks to work uniformly
// across OSes (including non-Unix ones) and provide compatibility between
// goroutines (which may map onto threads), forks and exec. The os/exec package
// also provides a more object-oriented (and more potent) exec.Cmd type, but
// it's not as useful to us right now as the low-level os.StartProcess, because
// we handle Wait() differently.
//
// In addition to the path to the program and the list of arguments,
// StartProcess also takes an argument containing additional attributes in
// order to offer as much functionality as the Unix exec family calls. An
// exec'd program may inherit the environment variables its spawner runs in
// (which is the default), or it can be passed an arbitrary environment.
// Additionally, Unix's fork will make every opened file descriptor readily
// available to the child as well as the parent, and a typical mistake in C is
// to forget to close the descriptors that won't be useful to the child. Go
// takes a friendlier approach and all descriptors will be closed unless they
// are passed on to the child in the attributes. Since we want our spawned
// program to interact with the user's terminal, we'd rather pass along Stdin,
// Stdout and Stderr. It is also possible to pass additional, system specific
// attributes, but beware of your code's portability.
//
// Once the process is started, we wait for the command to finish running. We
// use Wait() to wait for the processs state to change. Unfortunately,
// Wait() returns a ProcessState that is system agnostic but quite light on
// information. Fortunately ProcessState is symmetric with ProcAttrs and can
// return a system-specific type, and since it comes from a generic function
// returning an interface{}, we need to do a type assertion. Processes can
// change state in lots of ways, and not all of them mean that the process has
// ended. A process can either exit (normally, or with an error code), or it
// can be killed by a signal. So, we use the methods of syscall.WaitStatus to
// wait until either the processes are exited or killed. Then, the function
// finally returns a 1, as a signal to the calling function that we should
// prompt for input again.
// Shell Builtins
//
// You may have noticed that the shellLoop() function calls shellExecute(), but
// above, we titled our function shellLaunch(). This was intentional! You see,
// most commands a shell executes are programs, but not all of them. Some of
// them are built right into the shell.
//
// The reason is actually pretty simple. If you want to change directory, you
// need to use the function Chdir(). The thing is, the current directory is a
// property of a process. So, if you wrote a program called cd that changed
// directory, it would just change its own current directory, and then
// terminate. Its parent processs current directory would be unchanged.
// Instead, the shell process itself needs to execute Chdir(), so that its own
// current directory is updated. Then, when it launches child processes, they
// will inherit that directory too.
//
// Similarly, if there was a program named exit, it wouldnt be able to exit
// the shell that called it. That command also needs to be built into the
// shell. Also, most shells are configured by running configuration scripts,
// like ~/.bashrc. Those scripts use commands that change the operation of the
// shell. These commands could only change the shells operation if they were
// implemented within the shell itself.
//
// So, it makes sense that we need to add some commands to the shell itself.
// The ones I added to my shell are cd, pwd, exit, and help. Here are their
// function implementations below:
var shellBuiltIn map[string](func([]string) int)
func init() {
shellBuiltIn = map[string](func([]string) int){
"cd": shellCd,
"pwd": shellPwd,
"exit": shellExit,
"help": shellHelp,
}
}
func shellCd(args []string) int {
if len(args) < 2 {
fmt.Fprintln(os.Stderr, "cd: missing argument")
} else if err := os.Chdir(args[1]); err != nil {
fmt.Fprintln(os.Stderr, err)
}
return 1
}
func shellPwd(args []string) int {
dir, err := os.Getwd()
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
fmt.Println(dir)
return 1
}
func shellHelp(args []string) int {
var keys []string
for k := range shellBuiltIn {
keys = append(keys, k)
}
sort.Strings(keys)
for _, name := range keys {
fmt.Println(name)
}
return 1
}
func shellExit(args []string) int {
return 0
}
// There are three parts to this code. The first part is declaring the map that
// will resolve our builtin names to their functions. This is so that, in the
// future, builtin commands can be added simply by modifying this map, rather
// than editing a large switch statement somewhere in the code.
//
// While tempting, initialising shellBuiltIn with a literal would cause a
// compile-time error because of an initialisation loop with shellHelp. This is
// where the second part comes into play: we delegate our map initialisation to
// the package's init function, which is called even before the main function
// is called. Be careful though, as if your package spans multiple Go source
// files, their init() function has no execution ordering guarantee.
//
// Finally, I implement each function. The shellCd() function first checks that
// its second argument exists, and prints an error message if it doesnt. Then,
// it calls os.Chdir(), checks for errors, and returns. The help function
// prints a nice message and the names of all the builtins. The pwd function
// prints the current working directory, and the exit function returns 0, as a
// signal for the command loop to terminate.
//
// Using a map has an interesting effect, obvious to anyone used to similar
// hash table data structures: the ordering is not guaranteed. Go goes as far
// as voluntarily ensuring iterating over a map's range yields a random
// ordering so that nobody relies on a providential ordering due to a
// particular runtime implementation. Once copied, sorting the keys
// alphabetically is done in place by the sort package, with a ready-made
// string sorter.
// Putting together builtins and processes
//
// The last missing piece of the puzzle is to implement shellExecute(), the
// function that will either launch either a builtin, or a process. If youre
// reading this far, youll know that weve set ourselves up for a really
// simple function:
func shellExecute(args []string) int {
if len(args) == 0 {
return 1
}
if builtIn := shellBuiltIn[args[0]]; builtIn != nil {
return builtIn(args)
}
return shellLaunch(args)
}
// All this does is check if the command exists as a builtin, and if so, run
// it. If it doesnt match a builtin, it calls shellLaunch() to launch the
// process. The one caveat is that args might just contain an empty array, if
// the user entered an empty string, or just whitespace. So, we need to check
// for that case at the beginning.
// Putting it all together
//
// Thats all the code that goes into the shell. If youve read along, you
// should understand completely how the shell works. To try it out (on a Unix
// machine), just `go run` this file!
// Wrap up
//
// If you read this and wondered how in the world I knew how to use those
// packages, the answer is simple: godoc. And while golang.org is readily
// available and histong godoc, running godoc locally from your workspace will
// include documentation from any package you have installed. If you know what
// youre looking for, and you just want to know how to use it, godoc is your
// best friend. If you want to dig a bit deeper, you will want to dive into the
// source code, which is both incredibly clean, throughtly documented and
// readily available straight from godoc too! And remember, godoc works both
// like `man` in a terminal as well as serving a browsable documentation from
// the comfort of your browser.
//
// Obviously, this shell isnt feature-rich. Some of its more glaring omissions
// are:
//
// - Only whitespace separating arguments, no quoting or backslash escaping.
// - No piping or redirection.
// - Few standard builtins.
// - No globbing.
//
// The implementation of all of this stuff is really interesting, but way more
// than I could ever fit into an article like this. If I ever get around to
// implementing any of them, Ill be sure to write a follow-up about it. But
// Id encourage any reader to try implementing this stuff yourself. If youre
// met with success, drop me a line in the comments below, Id love to see the
// code.
//
// And finally, thanks for reading this tutorial (if anyone did). I enjoyed
// writing it, and I hope you enjoyed reading it. Let me know what you think!