srcco (pronounced "source-co") is a literate-programming-style documentation generator that links up source code so you can jump to the definition of any function, type, or variable.
Built on top of srclib (https://srclib.org).
Inspired by Docco (http://jashkenas.github.io/docco/), Groc (http://nevir.github.io/groc/), and Gocco (http://nikhilm.github.io/gocco/).
Installation:
$ go get sourcegraph.com/sourcegraph/srcco
And install srclib:
$ go get sourcegraph.com/sourcegraph/srclib/cmd/src # This will only pull down the Go toolchain. $ src toolchain install-std --skip-ruby --skip-javascript --skip-python
Then call srcco like this in the directory you want to build:
$ srcco .
If you want to host your docs on GitHub Pages, run:
$ srcco -github-pages=true . Usage: srcco [FLAGS] DIR Generate documentation for the project at DIR. -enable-sourcegraph=false: generate links to Sourcegraph.com for references to external (out of repo) definitions -github-pages=false: create docs in gh-pages branch -out="docs": the directory name for the output files -v=false: show verbose output
I extended the Go srclib toolchain (https://sourcegraph.com/sourcegraph/srclib-go) to add start and end ranges to comments. None of the other toolchains output this information currently, but it shouldn't be that hard to add.
Languages that will be supported soon: (if you're interested in hacking on a srclib toolchain, get in touch with the author of srcco and I can help you get spun up)
- Python
- Ruby
- JavaScript
- Java
Patches welcome!
We define our option flags here.
verboseOpt tells srcco to print out debugging logs.
outDirOpt is the output directory for the generated documentation.
gitHubPagesOpt tells srcco to generate the docs in the repository's "gh-pages" branch and push it to GitHub. If gitHubPagesOpt is true, outDirOpt is ignored.
enableSourcegraphLinksOpt tells srcco to generate links to Sourcegraph.com for references to external (out of repo) definitions.
The vLogger is used for verbose logging.
Source units have lots of information associated with them, but we only care about the files.
failedCmd is an error type for failed shell commands.
ensureSrclibExists is a hack to make sure that "src" is accessible from the PATH.
command takes a set of command line arguments and returns the cmd object, stdout, and stderr for that command.
execute is the function that does all of the work. It takes the project directory as dir. If dir is the empty string, then the current working directory is used.
First, we check to make sure that srclib exists.
We need to get a list of all of the files that we want to generate. First, we need to turn dir into an absolute path.
We could import sourcegraph.com/sourcegraph/srclib/src and call src.APIUnitsCmd.Execute, but I want to demonstrate how to use src's command line interface. Plus, the user needs to set up srclib with their toolchains after installing it, so it might confuse them if go get'ing srcco also downloaded srclib's repo.
Get all of the file names associated with this project.
If we haven't found any files, that means the user probably hasn't installed any srclib language toolchains. Short-circuit here if that's the case.
We need to remove all srclib build data from the project directory in order to add the correct files in our gh-pages script (in "data/publish-gh-pages.sh").
We pipe the gh-pages script into bash. Bash's "-s" option tells it to read from stdin.
If we aren't generating a gh-pages site, generate the docs normally.
doc represents a comment. srclib also gives us the definition a comment is attached to, but we don't care about that.
A ref represents a reference to a definition. A definition is also a reference, it is a reference to itself. We can identify a ref's definition by joining DefUnit and DefPath.
A def is a definition, and it includes things like functions, variables, and types.
We only care about TreePath to create a structured table of contents.
genDocs generates a set of docs for the project at root for the code in files, and it outputs the docs in the directory siteName (which must be relative to root).
structuredTOCs is a map from file name to html-formatted structured table of contents. This is created but ignored for now (the ui is in the works! Contact the author of srcco if you're interested in helping!)
defsMap is a map from defKeys to defs. We use it to store all of the defs that exist in this project so we can quickly look them up. Ideally, we would use "src api describe", but that call is too slow right now because it doesn't hit the new, faster srclib backend... yet :)
Grab all the defs.
We create the table of contents for the defs here. We wrap the defs in an interface that exposes their TreePaths as Path(), so we can use createTableOfContents on files too. See the documentation on createTableOfContents for more info.
The files are wrapped as Pathers (which have the method Path()) so that createTableOfContents can be used with defs too. See the documentation on createTableOfContents for more info.
Okay, this is where the real work gets done! We process the refs for each file and generate the HTML for the code views in this loop.
We filter out nonunique comments here, and comments that don't have the format "text/html". I fixed a bug in the Go toolchain that was generating overlapping comments, so that may not be needed. We're adding more powerful API commands to take advantage of the new srclib backend, and I want to replace this logic with a srclib call when that's done.
We turn the refs into HTML annotations that can be applied to the source code.
Sort everything *again* just to be sure!
Now we create the segments, which have the type "segment". They are fed into the template.
After gathering all that data, we feed it into our template!
We copy our resource files at the end.
copyBytes is a helper function that copies b to file.
HTMLOutput is fed into our code view template.
These files are read from a really clever Go library, go-bindata, that takes resource files and packs them into the binary to make them easier to distribute.
class is copied from sourcegraph.com/sourcegraph/annotate, because it isn't exported in that package..
Annotate is also copied from sourcegraph.com/sourcegraph/annotate
resourcePrefix takes a file path and gives you the number of "../"'s needed to get to the "root" of that file. It's used so that we don't need to know the explicit root of our generated docs.
htmlFilename takes a file, prepends a resource prefix to it and appends ".html".
ann is a function that takes a source file, a set of refs for that source file, the file name, and a map of all the defs in the repository, and creates a set of annotations that can be applied to the source file.
Run the source code through a generic code syntax highlighter to identify language units (vars, functions, etc) and give them classes.
refAt is a helper function that tells us the ref at a certain point.
Now we go through all of the annotations, and we add links to annotations that are sitting on top of refs, and that we have definitions for in our def map.
TODO: move api to new backend (which obsoletes 'r.DefRepo != ""')
Now we go through all of the defs and mark them up with "invisible" anchor tags which only have an id associated with them so that we can jump to them.
A segment represents a row in the final output.
createSegments takes the source code, all of the annotations, and the docs, and it interleaves them into segments, where docs only appear in the DocHTML bits, and code in the CodeHTML parts. anns and docs must be sorted.
addSegment is a wrapper function for appending a new segment and creating a new one at 's'. It may be an abuse of closures :)
If we're on a doc, add it to DocHTML and advance i to the end of the doc.
This is a bit tricky, but if lineComment is true, that means we've already added that doc to DocHTML, so we shouldn't add it again.
After ignoring the first line comment, we are no longer in a line comment (because they expand to the rest of the line).
Now, we need to determine how long the CodeHTML block should be. It will extend to either the beginning of the next comment or the end of the document, whichever comes sooner.
Throw away annotations that overlapped with the docs we just added.
Skip any newlines (in reality, we should be skipping any lines that are *all* whitespace, but that requires backtracking or looking forward, which I didn't want to put in v0.1.)
Special case: check to see if there's a newline between i and runTo. If there isn't, that means there's a line comment on the next line, and it should begin a new section.
If there are no annotations left, we can short-circuit this process by stuffing the rest of the source code into the CodeHTML block.
Add all the space between i and a.Start to the CodeHTML block
We continue so that the 'i < runTo' check happens again, because we may have reached runTo.
If the annotation extends past the end of our run, the state of our program is messed up (usually means the srclib-cache hasn't been refreshed.)
Now we add the annotation in full to the CodeHTML block.
At the end of our loop, we add a segment.
And that's it! We set up the flags and start the program in our main function.
Everything below is my work in progress table of contents stuff, some of which you saw above. I think it's pretty interesting, but it's loosely commented for now. If you want to help out, email the author of srcco :)
We've created the head node and added it to our map. Now, we need to go through all of the pathers and add them to the correct node. To do this, we get the "name" for each section of the tree path. For instance, for the tree path "a/b/c/d", the nodes associated with it are "a", "a/b", and "a/b/c", and the pather, "d", should be added to "a/b/c".
If "i" is the last index of "parts", that means it represents the pather. We're ignoring pathers for now.
"i" is not the last index of "parts", which means that it is a node. First, we check to see if it exists.
If "n" is non-nil, then the node has already been created.
The node does not exist. First we add it to the nodes map, and then we add it to its parent.