This commit is contained in:
Loic Nageleisen 2015-03-01 12:19:53 +01:00
commit 94b2c8300b
2 changed files with 464 additions and 0 deletions

455
go_shell.go Normal file
View file

@ -0,0 +1,455 @@
// Tutorial - Write a Shell in Go - 28 Feb 2015 - Loic Nageleisen
//
// Let's be honest: this tutorial draws heavily from Stephen Brennen's
// "Write a Shell in C" tutorial, available here:
//
// http://stephen-brennan.com/2015/01/16/write-a-shell-in-c/
//
// Therefore, all the credit should go to him, as this is mostly the same
// tutorial, only Go-flavored. This is also an example of literate programming.
// In Go, a package is a collection of source files, and the main package is
// the one that produces an executable.
package main
// Imports allow to reference external packages. Here we only use packages of
// Go's standard library, and since we have only one file, a Go workspace is
// not even necessary.
//
// While we're at it, do yourself a favor and install gocode, gofmt and
// goimports, and take the time to integrate them properly in your favorite
// editor (which shouldn't take too long). As an example, I did not write a
// single of those import statements, goimports did it for me.
import (
"bufio"
"fmt"
"io"
"os"
"os/exec"
"sort"
"strings"
"syscall"
)
// Basic lifetime of a Shell
//
// Let's look a shell from the top down. A shell does three main things in its
// lifetime.
//
// - Initialise: In this step, a typical shell would read and execute its
// configuration files. These change aspects of the shells behavior.
// - Interpret: Next, the shell reads commands from stdin (which could be
// interactive, or a file) and executes them.
// - Terminate: After its commands are executed, the shell executes any
// shutdown commands, frees up any memory, and terminates.
// These steps are so general that they could apply to many programs, but were
// going to use them for the basis for our shell. Our shell will be so simple
// that there wont be any configuration files, and there wont be any shutdown
// command. So, well just call the looping function and then terminate. But in
// terms of architecture, its important to keep in mind that the lifetime of
// the program is more than just looping.
func main() {
// Load config files, if any.
// Run command loop.
shellLoop()
// Perfom any shutdown/cleanup.
os.Exit(0)
}
// Here you can see that I just came up with a function, shellLoop(), that will
// loop, interpreting commands. Well see the implementation of that next.
// Basic loop of a shell
//
// So weve taken care of how the program should start up. Now, for the basic
// program logic: what does the shell do during its loop? Well, a simple way to
// handle commands is with three steps:
//
// - Read: Read the command from standard input.
// - Parse: Separate the command string into a program and arguments.
// - Execute: Run the parsed command.
//
// Here, Ill translate those ideas into code for shellLoop():
func shellLoop() {
var line string
var args []string
var status int
for {
fmt.Printf("> ")
line = shellReadLine()
args = shellSplitLine(line)
status = shellExecute(args)
if status == 0 {
break
}
}
}
// Lets walk through the code. The first few lines are just declarations. The
// for loop can't be checking the status variable, as it can't execute once
// before checking its value, therefore we just break on condition a
// posteriori. Within the loop, we print a prompt, call a function to read a
// line, call a function to split the line into args, and execute the args.
// Note that were using a status variable returned by shellExecute() to
// determine when to exit.
// Reading a line
//
// Reading a line from stdin sounds so simple, but Go has no ready-made
// function for that. The result is something that feels slightly lower-level
// than what one might expect from a batteries-included language, yet is
// powerful enough to cover any case. In hindsight, this is typical of Go to
// empower its users with simple yet composable primitives instead of
// ready-made black boxes with many parameters.
//
// Indeed, the sad thing is that you dont know ahead of time how much text a
// user will enter into their shell. You cant simply allocate a slice and hope
// they dont exceed it. Instead, you need to start with a slice, and if they
// do exceed it, reallocate with more space. Fortunately, Go's append does
// exactly that, and we can even start from a nil slice. An interesting read
// about slices and append that I encourage you to read lies here:
// http://blog.golang.org/slices
// This is a common pattern in Go, and we'll use it to implement
// shellReadLine()
func shellReadLine() string {
var err error
var line, data []byte
isPrefix := true
r := bufio.NewReader(os.Stdin)
for isPrefix && err == nil {
data, isPrefix, err = r.ReadLine()
line = append(line, data...)
}
if err == io.EOF {
fmt.Println()
}
return string(line)
}
// The first part is a lot of declarations and initialisations. If you hadnt
// noticed, I prefer to keep the style of declaring variables before the rest
// of the code. I also favor declaring nil-valued vars first, then those whose
// type will be inferred from the assignment.
//
// The meat of the function is the for loop. In the loop, we read until a
// newline is reached, the bufio reader is filled (which sets isPrefix to
// true), or an error occurs (which sets err). The EOF error is special-cased
// as you want people typing ^D to have the shell exit. This is, in fact, not
// done here, and we simply print a newline to have the prompt displayed again
// properly.
//
// os.Stdin implements the source, bufio.Reader handles all the buffered
// reading magic, and append handles all the reallocations necessary to
// accomodate for the whole user input. Since the reader is unconcerned about
// encodings and returns raw bytes, we convert the line into a Unicode string
// right before returning. It does not look like much, but this is a prime
// example of composability at its best.
// Parsing the line
//
// OK, so if we look back at the loop, we see that we now have implemented
// shellReadline(), and we have the line of input. Now, we need to parse that
// line into a list of arguments. Im going to make a glaring simplification
// here, and say that we wont allow quoting or backslash escaping in our
// command line arguments. Instead, we will simply use whitespace to separate
// arguments from each other. So the command echo "this message" would not call
// echo with a single argument this message, but rather it would call echo with
// two arguments: '"this' and 'message"'.
//
// With those simplifications, all we need to do is split the string on
// whitespace. Fortunately, Go provides us with numerous string operations in
// the strings package.
func shellSplitLine(line string) []string {
return strings.Fields(line)
}
// So, once all is said and done, we have an array of tokens, ready to execute.
// Which begs the question, how do we do that?
// How shells start processes
//
// Now, were really at the heart of what a shell does. Starting processes is
// the main function of shells. So writing a shell means that you need to know
// exactly whats going on with processes and how they start. Thats why Im
// going to take us on a short diversion to discuss processes in Unix.
//
// There are only two ways of starting processes on Unix. The first one (which
// almost doesnt count) is by being Init. You see, when a Unix computer boots,
// its kernel is loaded. Once it is loaded and initialized, the kernel starts
// only one process, which is called Init. This process runs for the entire
// length of time that the computer is on, and it manages loading up the rest
// of the processes that you need for your computer to be useful.
//
// Since most programs arent Init, that leaves only one practical way for
// processes to get started: the fork() system call. When this function is
// called, the operating system makes a duplicate of the process and starts
// them both running. The original process is called the “parent”, and the new
// one is called the “child”. fork() returns 0 to the child process, and it
// returns to the parent the process ID number (PID) of its child. In essence,
// this means that the only way for new processes is to start is by an existing
// one duplicating itself.
//
// This might sound like a problem. Typically, when you want to run a new
// process, you dont just want another copy of the same program you want to
// run a different program. Thats what the exec() system call is all about. It
// replaces the current running program with an entirely new one. This means
// that when you call exec, the operating system stops your process, loads up
// the new program, and starts that one in its place. A process never returns
// from an exec() call (unless theres an error).
//
// With these two system calls, we have the building blocks for how most
// programs are run on Unix. First, an existing process forks itself into two
// separate ones. Then, the child uses exec() to replace itself with a new
// program. The parent process can continue doing other things, and it can even
// keep tabs on its children, using the system call wait().
//
// Phew! Thats a lot of information, but with all that background we will be
// able to make sense of what happens behind the scenes of the following code
// for launching a program. I say behind the scenes because Go wants to provide
// an interface as robust and system agnostic as is pragmatically possible.
func shellLaunch(args []string) int {
path, err := exec.LookPath(args[0])
if err != nil {
fmt.Fprintln(os.Stderr, err)
return 1
}
procAttr := os.ProcAttr{}
procAttr.Files = []*os.File{os.Stdin, os.Stdout, os.Stderr}
process, err := os.StartProcess(path, args, &procAttr)
if err != nil {
fmt.Fprintln(os.Stderr, err)
} else {
for {
state, err := process.Wait()
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
waitStatus, ok := state.Sys().(syscall.WaitStatus)
if ok && (waitStatus.Exited() || waitStatus.Signaled()) {
break
}
}
}
return 1
}
// Alright. This function takes the list of arguments that we created earlier.
// Then, it looks to resolve the absolute path for the program we want to run
// via the os/exec package. This is because Go does not expose system-specific
// calls such as execvp that delegate this resolution to the OS, instead
// exposing a single os.StartProcess function that looks to work uniformly
// across OSes (including non-Unix ones) and provide compatibility between
// goroutines (which may map onto threads), forks and exec. The os/exec package
// also provides a more object-oriented (and more potent) exec.Cmd type, but
// it's not as useful to us right now as the low-level os.StartProcess, because
// we handle Wait() differently.
//
// In addition to the path to the program and the list of arguments,
// StartProcess also takes an argument containing additional attributes in
// order to offer as much functionality as the Unix exec family calls. An
// exec'd program may inherit the environment variables its spawner runs in
// (which is the default), or it can be passed an arbitrary environment.
// Additionally, Unix's fork will make every opened file descriptor readily
// available to the child as well as the parent, and a typical mistake in C is
// to forget to close the descriptors that won't be useful to the child. Go
// takes a friendlier approach and all descriptors will be closed unless they
// are passed on to the child in the attributes. Since we want our spawned
// program to interact with the user's terminal, we'd rather pass along Stdin,
// Stdout and Stderr. It is also possible to pass additional, system specific
// attributes, but beware of your code's portability.
//
// Once the process is started, we wait for the command to finish running. We
// use Wait() to wait for the processs state to change. Unfortunately,
// Wait() returns a ProcessState that is system agnostic but quite light on
// information. Fortunately ProcessState is symmetric with ProcAttrs and can
// return a system-specific type, and since it comes from a generic function
// returning an interface{}, we need to do a type assertion. Processes can
// change state in lots of ways, and not all of them mean that the process has
// ended. A process can either exit (normally, or with an error code), or it
// can be killed by a signal. So, we use the methods of syscall.WaitStatus to
// wait until either the processes are exited or killed. Then, the function
// finally returns a 1, as a signal to the calling function that we should
// prompt for input again.
// Shell Builtins
//
// You may have noticed that the shellLoop() function calls shellExecute(), but
// above, we titled our function shellLaunch(). This was intentional! You see,
// most commands a shell executes are programs, but not all of them. Some of
// them are built right into the shell.
//
// The reason is actually pretty simple. If you want to change directory, you
// need to use the function Chdir(). The thing is, the current directory is a
// property of a process. So, if you wrote a program called cd that changed
// directory, it would just change its own current directory, and then
// terminate. Its parent processs current directory would be unchanged.
// Instead, the shell process itself needs to execute Chdir(), so that its own
// current directory is updated. Then, when it launches child processes, they
// will inherit that directory too.
//
// Similarly, if there was a program named exit, it wouldnt be able to exit
// the shell that called it. That command also needs to be built into the
// shell. Also, most shells are configured by running configuration scripts,
// like ~/.bashrc. Those scripts use commands that change the operation of the
// shell. These commands could only change the shells operation if they were
// implemented within the shell itself.
//
// So, it makes sense that we need to add some commands to the shell itself.
// The ones I added to my shell are cd, pwd, exit, and help. Here are their
// function implementations below:
var shellBuiltIn map[string](func([]string) int)
func init() {
shellBuiltIn = map[string](func([]string) int){
"cd": shellCd,
"pwd": shellPwd,
"exit": shellExit,
"help": shellHelp,
}
}
func shellCd(args []string) int {
if len(args) < 2 {
fmt.Fprintln(os.Stderr, "cd: missing argument")
} else if err := os.Chdir(args[1]); err != nil {
fmt.Fprintln(os.Stderr, err)
}
return 1
}
func shellPwd(args []string) int {
dir, err := os.Getwd()
if err != nil {
fmt.Fprintln(os.Stderr, err)
}
fmt.Println(dir)
return 1
}
func shellHelp(args []string) int {
var keys []string
for k := range shellBuiltIn {
keys = append(keys, k)
}
sort.Strings(keys)
for _, name := range keys {
fmt.Println(name)
}
return 1
}
func shellExit(args []string) int {
return 0
}
// There are three parts to this code. The first part is declaring the map that
// will resolve our builtin names to their functions. This is so that, in the
// future, builtin commands can be added simply by modifying this map, rather
// than editing a large switch statement somewhere in the code.
//
// While tempting, initialising shellBuiltIn with a literal would cause a
// compile-time error because of an initialisation loop with shellHelp. This is
// where the second part comes into play: we delegate our map initialisation to
// the package's init function, which is called even before the main function
// is called. Be careful though, as if your package spans multiple Go source
// files, their init() function has no execution ordering guarantee.
//
// Finally, I implement each function. The shellCd() function first checks that
// its second argument exists, and prints an error message if it doesnt. Then,
// it calls os.Chdir(), checks for errors, and returns. The help function
// prints a nice message and the names of all the builtins. The pwd function
// prints the current working directory, and the exit function returns 0, as a
// signal for the command loop to terminate.
//
// Using a map has an interesting effect, obvious to anyone used to similar
// hash table data structures: the ordering is not guaranteed. Go goes as far
// as voluntarily ensuring iterating over a map's range yields a random
// ordering so that nobody relies on a providential ordering due to a
// particular runtime implementation. Once copied, sorting the keys
// alphabetically is done in place by the sort package, with a ready-made
// string sorter.
// Putting together builtins and processes
//
// The last missing piece of the puzzle is to implement shellExecute(), the
// function that will either launch either a builtin, or a process. If youre
// reading this far, youll know that weve set ourselves up for a really
// simple function:
func shellExecute(args []string) int {
if len(args) == 0 {
return 1
}
if builtIn := shellBuiltIn[args[0]]; builtIn != nil {
return builtIn(args)
}
return shellLaunch(args)
}
// All this does is check if the command exists as a builtin, and if so, run
// it. If it doesnt match a builtin, it calls shellLaunch() to launch the
// process. The one caveat is that args might just contain an empty array, if
// the user entered an empty string, or just whitespace. So, we need to check
// for that case at the beginning.
// Putting it all together
//
// Thats all the code that goes into the shell. If youve read along, you
// should understand completely how the shell works. To try it out (on a Unix
// machine), just `go run` this file!
// Wrap up
//
// If you read this and wondered how in the world I knew how to use those
// packages, the answer is simple: godoc. And while golang.org is readily
// available and histong godoc, running godoc locally from your workspace will
// include documentation from any package you have installed. If you know what
// youre looking for, and you just want to know how to use it, godoc is your
// best friend. If you want to dig a bit deeper, you will want to dive into the
// source code, which is both incredibly clean, throughtly documented and
// readily available straight from godoc too! And remember, godoc works both
// like `man` in a terminal as well as serving a browsable documentation from
// the comfort of your browser.
//
// Obviously, this shell isnt feature-rich. Some of its more glaring omissions
// are:
//
// - Only whitespace separating arguments, no quoting or backslash escaping.
// - No piping or redirection.
// - Few standard builtins.
// - No globbing.
//
// The implementation of all of this stuff is really interesting, but way more
// than I could ever fit into an article like this. If I ever get around to
// implementing any of them, Ill be sure to write a follow-up about it. But
// Id encourage any reader to try implementing this stuff yourself. If youre
// met with success, drop me a line in the comments below, Id love to see the
// code.
//
// And finally, thanks for reading this tutorial (if anyone did). I enjoyed
// writing it, and I hope you enjoyed reading it. Let me know what you think!