Apostate Café


By Joshua Ellis

Pubished:

Posted in: meaning-beauty-whimsy
Tagged as: Calepin / Posterous

Moving from Posterous to Calepin

Moving from Posterous to Calepin

My blog has gone from a static HTML site (circa 1999) to a PHP-based site, to Drupal 5, Drupal 6, then to Tumblr, then to Posterous… and finally I’m back to a static HTML site, though this time using Calepin.

Calepin is a publishing platform based on text files stored in Dropbox. [Calepin] is powered by Pelican, a static weblog generator written in Python. [Pelican] is similar to Jekyll, only it’s not written in Ruby, and isn’t quite as feature-heavy. I don’t know Python really well, nor do I know Ruby. But I was able to hack my way around between the two of them to cobble together a script for extracting my Posterous site into a set of Markdown files and images suitable for use with Calepin.

First, you have to install Jekyll, which I leave as an exercise to the reader. Mostly because I don’t remember what all I did to install it, but I know it involved using Homebrew to install Ruby 1.9 on my Mac, then using that version of Ruby to install the necessary gems.

Next, I took the default Posterous migrator module, along with some code from this alternate Posterous migrator, to create a new Posterous migrator that downloaded images, put a Calepin-friendly document preamble on the pages, and called out to html2text (AKA the ASCIInator) to massage the Posterous HTML content into nicely-formatted Markdown.

Again, I am not a Ruby programmer. The code looks like this:

#!/ruby
# encoding: utf-8


require 'rubygems'
require 'jekyll'
require 'fileutils'
require 'net/http'
require 'uri'
require "json"
require 'awesome_print'


module Jekyll
  module JPosterous

    def self.download_image(u)
        path = 'images/%s' % u.split('/')[-1]
        url = URI.parse(u)
        found = false
        until found
            host, port = url.host, url.port if url.host && url.port
            query = url.query ? url.query : ""
            req = Net::HTTP::Get.new(url.path + '?' + query)
            res = Net::HTTP.start(host, port) {|http|  http.request(req) }
            res.header['location'] ? url = URI.parse(res.header['location']) : found = true
        end
        open(path, "wb") do |file|
            file.write(res.body)
        end
        path
    end


    def self.fetch(uri_str, limit = 10)
      # You should choose better exception.
      raise ArgumentError, 'Stuck in a redirect loop. Please double check your email and password' if limit == 0

      response = nil
      Net::HTTP.start('posterous.com') do |http|
        req = Net::HTTP::Get.new(uri_str)
        req.basic_auth @email, @pass
        response = http.request(req)
      end

      case response
        when Net::HTTPSuccess     then response
        when Net::HTTPRedirection then fetch(response['location'], limit - 1)
        when Net::HTTPForbidden   then
          retry_after = response.to_hash['retry-after'][0]
          puts "We have been told to try again after #{retry_after} seconds"
          sleep(retry_after.to_i + 1)
          fetch(uri_str, limit - 1)
        else response.error!
      end
    end

    def self.process(email = 'email@dom.com', pass = 'password', api_token = 'a0a1a2a3a4a5a6a7a8a9a0b1b2', blog = 'primary', tags_key = 'Tags')
      @email, @pass, @api_token = email, pass, api_token
      FileUtils.mkdir_p "_posts"
      FileUtils.mkdir_p "images"


      posts = JSON.parse(self.fetch("/api/2/sites/#{blog}/posts?api_token=#{@api_token}").body)
      page = 1

      while posts.any?
        posts.each do |post|
          # puts post.inspect
          puts post["title"]
          awesome_print post
          puts "\n\n\n\n"
          title = post["title"]
          posterous_slug = post["slug"]
          slug = posterous_slug[0..44]
          date = Date.parse(post["display_date"])
          content = post["body_html"]
          published = post["is_private"] ? 'draft' : 'published'
          name = "%02d-%02d-%02d-%s.md" % [date.year, date.month, date.day, slug]
          #name = "%s.html" % [slug]

          tags = []
          post["tags"].each do |tag|
            tags.push(tag["name"])
          end

          # Get the relevant fields as a hash, delete empty fields and convert
          # to YAML for the header
          data = {
             'Title' => title.to_s,
             'Status' => published,
             'Date' => date,
             'Slug' => posterous_slug,
             tags_key => tags * ", ",
             'original_posterous_url' => post["full_url"],
           }.delete_if { |k,v| v.nil? || v == ''}.to_yaml
           data[/^---$\n/] = ''

            # awefull hack, do not use on vlog or podcast
            post['media']['images'].each do |img|
                path = download_image(img['full']['url'])
                path2 = 'http://dl.dropbox.com/u/12345678/%s' % img['full']['url'].split('/')[-1]
                tag = "!\[%s\](%s)" % [img['full']['caption'], path2]
                puts tag
                begin
                    content[/\[\[posterous-content:[^\]]*\]\]/] = tag
                rescue IndexError
                    append_img = "<p>%s</p>" % tag
                    content += append_img
                end
            end

          # Write HTML content to temp file, convert to markdown
          File.open("_posts/#{name}.html", "w") do |f|
            f.puts content
            f.close
          end
          content = %x[ html2text.py -b 0 "_posts/#{name}.html" ]
          content.sub!(/\&nbsp_place_holder;/,'')
          FileUtils.rm("_posts/#{name}.html")


          # Write out the data and content to file
          File.open("_posts/#{name}", "w") do |f|
            puts name
            f.puts data
            f.puts "\n\n"
            f.puts content
          end
        end

        page += 1
        posts = JSON.parse(self.fetch("/api/2/sites/#{blog}/posts?api_token=#{@api_token}&page=#{page}").body)
      end
    end
  end
end

Note that you’ll need to update the Posterous username, password, and API key in the self.process() call, and your Dropbox user ID# in the line path2 = 'http://dl.dropbox.com/u/12345678/%s'. If you don’t have your Posterous API key you can get it on the Posterous API reference page.

I called the script jposterous.rb and dropped it in the Jekyll migrators folder, and ran it like this:

# ruby -r /Library/Ruby/Gems/1.8/gems/jekyll-0.11.2/lib/jekyll/migrators/jposterous.rb -e 'Jekyll::JPosterous.process()'

My other tweak was to html2text which strips out <iframe> tags… I wanted those for the Youtube videos I posted. So the following code went in around line 436, right after the code for handling del and strike. Please note I am not a Python programmer, so there are probably better ways to do it. The code looks like this:

    if tag == "iframe":
        if start:
            self.o("<iframe")
            for k in attrs.keys():
                self.o(" "+k+"=\""+attrs[k]+"\"")
            self.o(">")
        else:
            self.o("</"+tag+">")

A couple things to note. Just because of the way the Posterous migrator is written, all the .md files went into a folder called _posts. All the images go into an images folder. All I had to do was copy the Markdown files to ~/Dropbox/Apps/Calepin and the image files to ~/Dropbox/Public.