Todd Harris, PhD

Facilitating scientific discovery at the intersection of genetics, genomics, bioinformatics, big data, cloud computing, and open science.

  • About

Rsync’ing through an SSH tunnel

October 23, 2005 By Todd Harris Leave a Comment

I’ve been experimenting various approaches to keep my remote/local sites in sync. CVS is too much of a pain with blog-driven sites that create static pages. Instead, I’m now using rsync. Since several sites reside behind firewalls, I needed to find a way to tunnel rsync traffic through an ssh hop. Sure enough, the rsync FAQ has some good information on how to do this. I’m using their “Method 2” since I only need to sync against a filesystem target.

rsync through a firewall

If you have a setup where there is no way to directly connect two systems for an rsync transfer, there are several ways to use the firewall system to act as an intermediary in the transfer.

Method 1

Use your remote shell (e.g. ssh) to access the middle system and have it use a remote shell to hop over to the actual target system.

To effect this extra hop, you’ll need to make sure that the remote-shell connection from the middle system to the target system does not involve any tty-based user interaction (such as prompting for a password) because there is no way for the middle system to access the local user’s tty.

One way that works for both rsh and ssh is to enable host-based authentication, which would allow all connections from the middle system to the target system to succeed (when the username remains the same). However, this may not be a desirable setup.

Another method that works with ssh (and is also very safe) is to setup an ssh key (see the ssh-key manpage) and ensure that ssh-agent forwarding is turned on (e.g. “ForwardAgent yes”). You would put the public version of your key onto the middle and target systems, and the private key on your local system (which I recommend you encrypt). With this setup, a series of ssh connections that starts from the system where your private key is available will auto-authorize (after the pass-phrase prompt on the first system).

You should then test that a series of ssh connections works without multiple prompts by running a command like this (put in the real “middle” and “target” hostnames, of course):

ssh middle ssh target uptime
If you get a password/passphrase prompt to get into the middle system that’s fine, but the extra hop needs to occur without any extra user interaction.

Once that’s done, you can do an rsync copy like this:

rsync -av -e “ssh middle ssh” target:/source/ /dest/
Method 2

Assuming you’re using ssh as your remote shell, you can configure ssh to forward a local port through your middle system to the ssh port (22) on the target system.

The first thing we need is an ssh configuration that will allow us to connect to the forwarded port as if we were connecting to the target system, and we need ssh to know what we’re doing so that it doesn’t complain about the host keys being wrong. We can do this by adding this section to your ~/.ssh/config file (substitute “target” and “target_user” as appropriate):

Host target
HostName localhost
Port 2222
HostKeyAlias target
User target_user
Next, we need to enable the port forwarding:

ssh -fN -l middle_user -L 2222:target:22 middle
What this does is cause a connection to port 2222 on the local system to get tunneled to the middle system and then turn into a connection to the target system’s port 22. The -N option tells ssh not to run a command on the remote system, which works with modern ssh versions (you can run a sleep command if -N doesn’t work). The -f option tells ssh to put the command in the background after any password/passphrase prompts.

With this done, you could run a normal-looking rsync command to “target” that would use a connection to port 2222 on localhost automatically:

rsync -av target:/src/ /dest/
Note: starting an ssh tunnel allows anyone on the source system to connect to the localhost port 2222, not just you, but they’d still need to be able to login to the target system using their own credentials.

Method 3

Install and configure an rsync daemon on the target and use an ssh tunnel to reach the rsync sever. This is similar to method 2, but tunnels the daemon port for those that prefer to use an rsync daemon.

Installing the rsync daemon is beyond the scope of this document, but see the rsyncd.conf manpage for more information. Keep in mind that you don’t need to be root to run an rsync daemon as long as you don’t use a protected port.

Once your rsync daemon is up and running, you build an ssh tunnel through your middle system like this:

ssh -fN -l middle_user -L 8873:target:873 middle
What this does is cause a connection to port 8873 on the local system to turn into a connection from the middle system to the target system on port 873. (Port 873 is the normal port for an rsync daemon.) The -N option tells ssh not to run a command on the remote system, which works with modern ssh versions (you can run a sleep command if -N doesn’t work). The -f option tells ssh to put the command in the background after any password/passphrase prompts.

Now when an rsync command is executed with a daemon-mode command-line syntax to the local system, the conversation is directed to the target system. For example:

rsync -av –port 8873 localhost::module/source dest/
rsync -av rsync://localhost:8873/module/source dest/
Note: starting an ssh tunnel allows anyone on the source system to connect to the localhost port 8873, not just you, so you may want to enable username/password restrictions on you rsync daemon.

Share this:

  • Twitter
  • Facebook
  • LinkedIn

Related

Filed Under: howto Tagged With: sysadmin

About Todd Harris

I'm a freelance consultant specializing in all things data (modeling, storage, management, analysis, visualization, cloud) for the biomedical, pharmaceutical, and financial sectors.

Do you have a data management, analysis, or visualization problem you need some help with? Do you need to connect with the best people to build out your team of data scientists, bioinformaticians, or curators? Drop me a line -- I'd be happy to chat with you about your project.

Leave a Reply Cancel reply

Welcome!
My name is Todd Harris. A geneticist by training, I now work at the intersection of biology and computer science developing tools and systems to organize, visualize, and query large-scale genomic data across a variety of organisms.

I'm driven by the desire to accelerate the pace of scientific discovery and to improve the transparency and reproducibility of the scientific process.

Stay in touch!

Enter your address to receive notifications of new posts by email.

Join 1,296 other subscribers

Copyright © 2023 · Genesis Sample Theme on Genesis Framework · WordPress · Log in