Gant Software Systems

A C# Bulk File Downloader

I think about code a lot. I expect that a lot of other developers do as well. But one thing that I think most of us miss while we are building huge systems to automate tasks for other people is how easy it is to code up simple things that improve our own lives and make things easier for us. Sometimes, all you need is a stupid little script and I’ve decided to have a feature on this blog for highlighting these sorts of things in the hope that it will generate some interest, both in programming for people just getting started, and in automating some of the drudgery out of life for those who are a little more seasoned (this is a polite way of saying “jaded”, “cynical”, or “old” in some quarters). Note that this code is for a very narrow task, probably won’t be repeatedly used, and thus does not meet particularly good standards for cleanliness, code commenting, etc. I believe that this sort of coding would be tremendously beneficial to the general community of computer users and I find it incredibly sad that such minimal skill levels aren’t taught. I imagine people actually spending the time to click through and manually download over a hundred (currently 192) episodes of a podcast in order to get the whole set (RSS will get you some, but it cuts off) and my mind reels at how much time and effort they will waste doing that (or worse, they miss out on an awesome podcast that can teach them a lot).

First, we’ll start off with the problem domain. There is a podcast that I really like (and recommend for anybody building a startup without VC funding). It’s called “Startups for the Rest of Us” and has a lot of episodes. I wanted to download them all in advance so I could listen to them while working, without saturating the client’s network. I also wanted to make sure that when I was downloading that I wasn’t saturating my home network (or whatever system they are using to serve content), both because I don’t want to just crush my home internet connection while the wife is trying to use it, and because I don’t want to do anything that makes it look like I’m attacking their network. So, let’s get started.

One thing that makes this particular operation very easy is the way the hosts name their podcast files. In fact, they are all (thus far) named something like http://media.blubrry.com/startupsrestofus/www.project98.com/podcast/startups-for-the-rest-of-us-xxx.mp3, where xxx is the three digit representation of the episode number (padded on the left with zeroes). This makes it very easy to work with. So, below is what I did (I’ll explain after the code block). I built this application as a command-line tool to make it quick to develop, but you could do the same thing with windows forms or WPF and just trigger the action with a button. I also made it output a little bit more on the console than was strictly needed, simply so I could periodically check in and see what it was doing. At the start, I was given a file that looked like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
using System;
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Threading;

namespace QuickDownloader
{
class Program
{
static void Main(string[] args)
{


}
}
}

The code above is what Visual Studio provides when you create a new command-line application. If you are following along, be sure and check that all the “using” statements in the block above are actually present, as you’ll need them for the following code to work. With that in place, I altered the code as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
using System;
using System.Collections.Generic;
using System.IO;
using System.Net;
using System.Threading;


namespace QuickDownloader
{
class Program
{
private static string mask = "http://media.blubrry.com/startupsrestofus/www.project98.com/podcast/startups-for-the-rest-of-us-{0}.mp3";
private static int FirstEpisode = 11;
private static int LastEpisode = 181; //181 is most recent
private static string OutDir = @"C:\_Docs\AmazonCloud\MP3\podcast\StartupsForTheRestOfUs";
private static List<int> fails = new List<int>();

static void Main(string[] args)
{

var random = new Random(DateTime.Now.Millisecond);
using (WebClient client = new WebClient())
{
for (var i = FirstEpisode; i <= LastEpisode; i++)
{
Console.WriteLine("Downloading episode {0} of {1}", i, LastEpisode);

var outPath = Path.Combine(OutDir, string.Format("SFTROU-{0}.mp3", i.ToString("0000")));
var downloadPath = string.Format(mask, i.ToString("000"));

var waitTimer = random.Next(20, 50);

try
{
client.DownloadFile(downloadPath, outPath);
Console.WriteLine("File written to {0}. Sleeping for {1} seconds", outPath, waitTimer);
}
catch (Exception ex)
{
Console.WriteLine(ex);
fails.Add(i);
}

Thread.Sleep(waitTimer * 1000);
}

if (fails.Count > 0)
{
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.WriteLine("Dowload failed for the following episode numbers");
fails.ForEach(Console.WriteLine);
}

Console.Read();
}
}
}
}

First of all, I decided to declare a few variables in the section before the static void Main(). This is mainly to make the code a little more readable. You could easily hardcode everything, but I generally do a better job of getting code out the door with at least some basic coding hygiene, and I find that it makes things easier to customize if I want to reuse it later as well. The following code declares the variables that store how the URLs for the podcasts are formatted, the starting and ending podcast numbers (I had just finished listening to number 11 when I wrote this, so I decided to exclude the ones I had already listened to – the above code sample starts at number 1). I also specified the directory into which the downloaded podcasts would be placed, as they have to go somewhere. Finally, I also declared an integer list to keep the list of the podcasts that had failed to download so that I could display a friendly message on the screen.

In the main() method, I first declare a variable of type Random and seed it with the current millisecond. I do this so that I can wait a random length of time (20 to 50 seconds) after completing a download before starting the next. This should keep what I’m doing from just overwhelming the server, although it does slow down the total download time a bit (by 1 to 2-1/2 hours, which didn’t matter, since for the actual download I kicked it off on the way out the door in the morning). With this done, now I can actually write the code that pulls down the files. First, we need to loop through the set of episodes that are available.

1
2
for (var i = FirstEpisode; i <= LastEpisode; i++)
{

Next, I displayed a message indicating what episode is being downloaded, followed by actually figuring out what the source path should be, where the file should go, and how long to wait before starting on the next one. I did this before even starting on the actual download process.

1
2
3
4
5
6
Console.WriteLine("Downloading episode {0} of {1}", i, LastEpisode);

var outPath = Path.Combine(OutDir, string.Format("SFTROU-{0}.mp3", i.ToString("0000")));
var downloadPath = string.Format(mask, i.ToString("000"));

var waitTimer = random.Next(20, 50);

Now to the part of the program that actually matters, the part that actually does the work of downloading episodes. I wrapped this in a try…catch block so that any errors that occur don’t terminate the program. Using the HttpClient object that we created earlier (client), I download the file to the output path that we figured out a minute ago and then display a message on the console. When an error occurs, the code in the catch block will dump the exception information out to the console as well as storing the failed items in a list for later use. With that done, I also wait for a random number of seconds (the value we initialized into the waitTimer value earlier). After this, the process repeats until there are no more audio files to download.

1
2
3
4
5
6
7
8
9
10
11
try
{
client.DownloadFile(downloadPath, outPath);
Console.WriteLine("File written to {0}. Sleeping for {1} seconds", outPath, waitTimer);
}
catch (Exception ex)
{
Console.WriteLine(ex);
fails.Add(i);
}
Thread.Sleep(waitTimer * 1000);
1
2
3
4
5
6
7
8
if (fails.Count > 0)
{
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.WriteLine("Dowload failed for the following episode numbers");
fails.ForEach(Console.WriteLine);
}

Console.Read();

I hope this was a useful sample. The code is actually fairly simple. It took me about five minutes to write it, probably twice that to test it, and several hours for it to complete the set of downloads. This is the sort of simple, easy little program that is a perfect place for beginners to start, as well as being actually useful, unlike most samples you are given in school.