Moving Pieces: January 2011

This is a pain. This needs to be fixed. Workarounds should not be needed, but for now we do what we must...

There is some kind of limitation on the iPhone 4 (and iPhone 3GS apparently) that will not allow you to download / save video that you have recorded if it is larger then about a couple hundred MBs. If you have recorded video on your iPhone of any significant length of time—and your file ends up being say, 1GB or so—you simply won't be able to copy it to your system. Picasa can't even do it.

The simplest workaround that I have found as I Googled:

Use iTunes to Sync your phone
Open up My Computer and navigate your way to C:\Documents and Settings\UserName\Application Data\Apple Computer\MobileSync\Backup\
Open up the folder contained at this location (it'll have a lengthy name ce46da1123whatever)
Select View → Details
Select View → Size or click on the Size column label to sort in ascending or descending filesize
Look for the largest file(s)
Copy them to the location of your choice, rename them and add the .mov extension
Done!

Very useful (to me). It took a good deal of searching to find the right little application to do what I needed—convert locally saved HTML pages to PDF and collate them into a single file.

It sounds so simple, but actually finding a free / Open Source app to do this was problematic. I hoped HTMLDOC would do the trick and perhaps with a little more experience, I could achieve the desired results. In the meantime I discovered an app for Linux (also available for other operating systems) called wkhtmltopdf that did just what I needed it to do.

It's a command line utility, very easy to use and quite effective. It doesn't make a mess of the pages when converting/collating them. It also allows you to customize page headers & footers among other things. I have been using it under Windows XP¹ (which is kind of a pain due to shell limitations compared to bash under Ubuntu, but whatever as long as it works). Entering the command: wkhtmltopdf.exe --extended-help gives you a nice list of options to explore.

My goal was (or seemed) simple enough: Create a single PDF file from the multiple web pages that I had saved locally via the Firefox add-on DownThemAll!

Why do this? The print version pages of the (lengthy) material I wanted to read seems to be intentionally organized on the website to make it difficult to work with for off-line viewing. Lame, but not uncommon by any means of course. Here's the method I used to solve my dilemma:

Made a list of the printable version URLs and saved the html files locally. (Not as tedious as it sounds, and mostly automated once the structure of the URLs was determined.)

Saved the pages locally with DownThemAll!

Used wkhtmltopdf to put the files together using a command similar to the following:

"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe" --footer-center [page] -s Letter articles.htm articles_001.htm articles_002.htm articles_003.htm articles_004.htm articles_005.htm articles_006.htm CoolRead.pdf

--footer-center [page] gives me page numbering at the bottom and -s Letter sets it to 8.5" x 11" page size (other page size options available). Several other options are also available. Can you simply create a lists of the pages you want to collate instead of listing them on the CLI? I haven't noticed that as an option in --help or --extended-help but it's probably there and easy enough to do. (Can you tell I have only recently started using wkhtmltopdf?) I wanted some kind of cover and toc-type page for the articles. and it just so happens that the main webpage with the links for the on-line articles fit the bill well enough. wkhtmltopdf can work directly with on-line pages so away we went:

"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe" -s Letter -O Portrait "http://supercoolwebsite.org/docs/?docID=6" Cover.pdf

Update 1/3/2010: I decided I didn't like the "cover sheet" generated with the command above. I ended up creating my own cover page and "table of contents" in OpenOffice Draw and exporting it to PDF. To make life simple I put the wkhtmltopdf command text into a batch file and ran it in the same folder containing the html files. Easy to modify the options & run until I got the (more) precise results I wanted.

Example:

"C:\Program Files\wkhtmltopdf\wkhtmltopdf.exe" -g --disable-smart-shrinking --footer-center [page] -s Letter -T 14mm -B 14mm -L 14mm -R 14mm --disable-external-links --disable-internal-links articles.htm articles_001.htm articles_002.htm articles_003.htm articles_004.htm articles_005.htm articles_006.htm CoolRead.pdf

The additional options shown apply the following formatting to the generated PDF file:

-g = Generate in Grayscale
--disable-smart-shrinking = Without this option, the font size of the generated PDF files was too small. This option provided a larger font-size and made the documents easier to read.
--footer-center [page] = Add page numbers to the bottom-center of each page in the file.
-s Letter = Letter sized pages.
-T 14mm -B 14mm -L 14mm -R 14mm = Increase the page margins. The default put the text too close to the edges of the page.
--disable-external-links & --disable-internal-links = These options do just what you would think: they remove the hyperlinks that are in the html files from the generated PDF file.

Time to put the Cover Page/TOC & the collated HTML-turned-PDF files together for the finished product.

Enter PDFTK Builder from the PortableApps collection.

Add the Cover page/TOC & the collated articles PDF files to PDFTK Builder's Collate option
Click Save As... to combine them.
Done!

Does it sound like a lot of work? It's really not, and I get (almost) exactly what I want. Should there be, or is there an all-in-one Open Source application that does ALL of this for you in one shot? Probably, but I haven't found it yet. The most irratating aspect of this approach? HTML files with bad character entities in them. Tracking those weird characters down in the files and figuring out what to use to replace them is a real pain (depending on how large your final file turns out to be).

At the end of the day, using FireFox + DownThemAll!, wkhtmltopdf, and PDFTK Builder (with a generous nod to Notepad++—a great text / code editor for working with the batch files, link lists, etc.) the job gets done.

There are several HTML articles, etc. that I would like to collate into single PDFs. This combination of free tools is a valuable solution for me.

¹ I was not able to use the version of wkhtmltopdf in the Ubuntu repositories as throws an error message about not being built with the correct version of QT4 or something like that. There are articles on how to fix this but I haven't bothered with it yet.

Moving Pieces

Monday, January 03, 2011

Saving Large Video Recordings From An iPhone

Saturday, January 01, 2011

Multiple HTML files to Single PDF with wkhtmltopdf