Gant Software Systems

T4 Templates: Multi-File Outputs Considered Harmful

I’ve been working on a side project with a good friend and it uses T4 templates fairly heavily for wrapping database code (no, we couldn’t use EF for this when we started – we might look at it again later though) and for building up a nice wrapper for templated email. It worked like a champ, until I decided to do a bit of code cleanup to make it more aesthetically pleasing to myself. I was getting concerned because the code generated by the templates was somewhere north of 20,000 lines, declared well north of 100 separate classes with dozens of interfaces and lots of logic for interacting with the database (and with caching). It was all beautiful, clean, eloquent code (ok, I’m lying, but it wasn’t ugly enough that I hated it) and did its job remarkably well. I heartily recommend the Tangible T4 Text editor plugin for Visual Studio, as it makes a lot of this code work quite smoothly and pretty much never gave me a problem (I had to look up which tool I was using – I promise you I could remember off the top of my head if it broke regularly).

However, the way I coded the templates was a bit of a pain to work with. It had started as a very simple thing and kept getting features added until it was bloated beyond reason or recognition. Every time I had to alter something, I had to dig through a massive T4 template several thousand lines in size, make the appropriate tweaks, regenerate and then test. I suppose it wasn’t the worst thing in the world, but I began to get concerned about file size. Exactly how big of a file does it take before visual studio chokes and falls over? No one knows (except that one guy that suddenly and painfully acquired that knowledge – the guy I didn’t want to be). So I did a little cleanup. First, I broke the logical sections of the system down into .ttinclude files, so that it was a little easier to find relevant bits of code. After I did this, the only things the main file contained was a little bit of configuration and code to loop to actually control the code generation process. The rest of the work would be done in the includes. I was quite happy with this (and still am – that was worth the rather large amount of work this required, because the starting code was dirtier than I cared for).

However, after that is where I screwed up. I started by pulling down TemplateFileManagerV2.1.ttinclude, which is available by the Tangible editor plugin. That particular include file works fairly well, and I really don’t have a ton of objections to it. However, this is where I started running into problems. The following issues plagued me once I implemented the code changes required to split the T4 output into multiple files.

  1. Every other time I ran the template, it would break, saying that it couldn’t find one of the files it was supposed to be generating. I looked into the include file for some clues on this, and it looks like it is actually talking to Visual Studio to add the file to the solution. Frequently, the timing seemed to get mixed up and this would either result in a corrupted .csproj file (and a manual edit of the underlying xml), a mysterious crash that could be resolved by re-running the template, or the file I was trying to build being put alongside the T4 template instead of in the directory I had directed it to be placed.

  2. Debugging the templates because impossible. Having a debugger running in visual studio while adding files is not something I’ve managed to do successfully. I submit that it may be possible, but I didn’t want to try and chase that down, as that gets into an area where my expertise is limited.

  3. When the files did output correctly, they frequently were placed in the file system, but not included in the project. This broke compilation and resulted in rather bizarre behavior when I was generating partial classes, as sometimes the partial was present and sometimes it was not.

I did a little more digging and thinking about why this experience was so painful to work with. After some intense thought on the matter, I realized that it isn’t the fault of the authors of that include. It was the guy implementing it (that would be me). Most developers (myself included) have learned that when things seem arbitrarily painful, that usually means that you missing something important. In this case I was. I believe that there are strong technical reasons for (almost) never using a T4 template to generate multiple files. These are as follows:

  1. The T4 templating subsystem in Visual Studio was intended from the outset to generate a single file. The ability to generate multiple files was added on after the fact. Generating a single output file from a single input within the VS environment is handled in loads of places and is a well-understood and thoroughly debugged problem. Generation of multiple files is not so clearly and cleanly handled. The fact that one has to include a special template that does low-level talking to Visual Studio itself during template execution indicates that this is still a hack, even if implemented well.

  2. Users typically edit T4 templating code, unlike the code that is used by other custom tools to convert various things into artifacts before building. Thus, given that users are writing code, they will need to debug it. When that same user code is interacting with Visual Studio in the course of its execution, debugging becomes several orders of magnitude more complex, as the running of the debugger is now changing the state of the development environment upon which the T4 template system is dependent.

  3. The typical use for a template is to take a datasource, or set of datasources, and convert them into one or more outputs. However the resultant code is typically not considered user-editable, as the outputs are overwritten the next time the template is run (which happens when you save said template, by the way). If you shouldn’t edit the file, what’s the point in breaking it up into a whole bunch of files that shouldn’t be touched in the first place? I can’t think of one, other than the aforementioned notion that perhaps visual studio can’t handle file over a certain size. However, I also know a developer or two who love to cram so much stuff into a single file that I feel relatively safe that they will hit a problem before I do.
    However, like any discussion of an anti-pattern (I suppose that’s what this is, right?), it wouldn’t be complete without some discussion of how to handle various scenarios where someone might want to generate multiple files. I know I don’t particularly care for it when someone says “this is a bad idea” and doesn’t at least address my reasons for wanting to do something, so here are some situations where you might want to use a T4 template to generate multiple files and how to get by without doing it.

  4. You are generating multiple outputs because you are using multiple inputs and generating a single output for each input. Probably the easiest way to fix this is to do most of your logic in individual .ttinclude files, and then have the bare minimum in a .tt (that’s the T4 template extension, by the way) file for each output you want. This way, you still get strong code reuse from your include files, but you aren’t asking visual studio to do something that it is ill-suited to do. Debugging of the templates will still work and you won’t be dealing with any hacky weirdness trying to add files to the environment while a debugger is attached.

  5. You are generating multiple outputs from multiple inputs, but there is not a 1:1 correlation between your inputs and your outputs. This one is a little uglier, as you don’t want to potentially have to execute your code to read the datasource once for each intended output, as that is slow (although you may be able to get away with slow, depending on your circumstances). In this case, you may want to look into combining the data into some sort of local cache, like an xml file optimized for the structure of your data, that you then parse with multiple templates. You should also strongly consider whether you really need multiple output files, as for purposes other than debugging, you aren’t going to be digging into the files anyway.

  6. You are generating multiple one-time-generated files to go alongside something that is generated repeatedly. This was actually the use case I was confronting. I had a huge swath of generated code, but the generated classes were partials, whose behavior could be overridden by simply adding a separate file implementing the partial. Initially, I intended that my template would generate those files with reasonable defaults, so that it was easy to find the place an override was required. In such cases, you should look into whether you really need to generate the partials with the overrides in the first place. More than likely, you can get by by simply specifying reasonable defaults in the generated code, and using something like resharper templates for the edge cases where you need to override something (you do have sensible defaults, yes?). If you find yourself having to override the generated values in most cases, you should look into whether generating that part of the code is appropriate in the first place, or whether you can bring in some other data to allow the generator to build a more reasonable default for generated code. In my case, I simply set sensible defaults and used a combination of generics, partial classes, and the occasional odd delegate to make my code configurable if needed, but reasonable by default.

Essentially, multi-file generation with a single T4 template is a lot of a pain for very little gain. Like most things in coding that are painful, it is usually a sign of a bad approach when something regularly derails you with issues not related to the problem you are actually trying to solve. Largely, it’s important to consider what your intent is. If you are generating a bunch of code to make it so that someone else doesn’t need to write a bunch of repetitive boilerplate, you don’t necessarily have to break the result into a bunch of smaller files, as you don’t want to invite someone to edit generated code, only to have potential hours of their work blown away later. Further, you are likely building such a system in order to save effort down the line. The upfront and ongoing costs of breaking T4 outputs into multiple files are not typically justified (in my humble and hopefully correct opinion) by the value that such a split produces.

Finally, it’s likely that you aren’t writing a template-based code generator as your main goal. You’re probably trying to get something else done. It’s better not to over-complicate and gold-plate something that isn’t critical to your project. It’s important to do good work, mind you, but it’s also important to realize at the end of the day that you are trying to produce a product that serves a need and hopefully either saves or generates revenue for someone (who will pass some of it on to you). It’s very unlikely that code generation is the intended end-goal of your project in the first place (in fact, it may be something you eventually tear out of the project as it matures). You don’t want to spend more time on it than it deserves. That’s really why I think that multiple file generation with T4 is an anti-pattern – it’s not that it is necessarily bad, but rather that it is like gold-plating a urinal in that it isn’t the best use of resources.