The SSH protocol and
SSH Applications - Part 2

Martin Stecher June 25, 2025

SSH and SSH Applications from a Security Gateway Perspective

Part 2: SSH Remote Shell from a Gateway Perspective

In Part 1, we explored the SSH core protocol and how a security gateway can filter remote command execution. In this article, we’ll examine how things change when a remote interactive shell session is started instead of executing a single command.

SSH protocol messages to open a Remote Command vs. a Remote Shell channel.

Starting a Remote Shell

As discussed in Part 1, SSH applications are distinguished by how the channel is opened and configured. Both remote commands and remote shells begin with a SSH_MSG_CHANNEL_OPEN message. This opens a channel of type "session" and includes parameters such as the initial window size and maximum packet size.

After the channel is confirmed by the server (SSH_MSG_CHANNEL_OPEN_CONFIRMATION), the client sends a series of SSH_MSG_CHANNEL_REQUEST messages to configure the session. These may include:

Environment variable settings, e.g., LC_CTYPE=UTF-8
(typically sent without expecting a reply)
For remote shell sessions: a pty-req (pseudo-terminal request) message indicating terminal type (e.g., xterm-256color) and dimensions
A final channel request to start the SSH application:

For remote commands: exec, followed by the command string
For remote shell sessions: shell, with no additional parameters (the environment and terminal are already set)

Executing Commands in a Remote Shell

You might expect the client to buffer input and prepare a structured command message that is then sent to the server like in the remote command style but that’s not how it works.

The client behaves like a dumb terminal emulator: as characters are typed, they are immediately sent to the server inside SSH_MSG_CHANNEL_DATA messages. The server, in turn, replies with character data (including control sequences) to be rendered in the client’s terminal window.

There’s no SSH-level indication of command boundaries, making interpretation by intermediaries extremely difficult. Whether prompt, command, command-reply or additional interaction with interactive commands, everything is just a flow of unstructured data and only the server has the context of which data belongs where.

This behavior originates from early computing history: where terminals like teletypewriters (hence the "tty" in pty-req) sent characters directly to mainframes, and formatting was entirely controlled by the host.

Terminal Control Sequences

The remote shell’s data stream is filled with control characters. Multiple standards exist and many of them go in parallel. Two types are very popular when examining data streams:

ASCII control characters (0x00–0x1F)
CSI (Control Sequence Introducer) sequences, which begin with ESC [ (0x1B 0x5B) and encode:

Formatting instructions (bold, inverse, etc.)
Cursor movement
Erasing content
Tabs
Mode changes and more

For example, when the server draws a very simple command prompt, the server may send:

0000: 0D 1B 5B 30 6D 1B 5B 32 37 6D 1B 5B 32 34 6D 1B     ..[0m.[27m.[24m.
0010: 5B 4A 75 73 65 72 40 74 65 73 74 2D 73 65 72 76    [Juser@test-serv
0020: 65 72 20 7E 20 25 20 1B 5B 4B                      er ~ % .[K

The actual characters being printed are user@test-server ~ % , the control characters in front tell the client to jump to the start of the line, ensure that the character style is normal, not-inversed and not-underlined and erasing the display. After the 21 plain-text characters of the prompt, the final CSI sequence again deletes all potential characters till the end of the line. So, 42 bytes are sent to display the 21-characters prompt but allows for fancy styled prompts with all kind of colorizing or other formatting (not used in this example).

Although the server can clear lines effectively with simple CSI control sequences, it typically also sends many blank characters to potentially overwrite existing content character-by-character. And it may even do this twice where in the first write, an inverse prompt char starts the line and will then be overwritten again. Eventually the effect is simply to clear the line but depending on your window size, this will be hundreds of bytes that the server sends for this simple result. The benefit of this complicated method is that in case client and server are misaligned about the window dimensions, the user will notice the broken prompt chars on the screen.

Typing and Visual Feedback

Characters typed by the user are not echoed locally. Instead:

Each character is sent to the server (at normal typing speed, each character may take a separate SSH_MSG_CHANNEL_DATA message).
The server sends back feedback, which the client renders
On slow connections, this causes noticeable latency between typing and screen updates

Worse, the server may not echo characters 1:1. It might include formatting, backspaces, or re-rendering logic. For example, typing pwd<enter> at the client could result in this sequence of data messages.

Seq#	Client sends	Server replies
1	p	p
2	w	\x08pw
3	d	d
4	\x0D (<enter>)	\x0D\x0D\x0A/Users/user\x0D\x0A

In the second step, the server does not simply reply with the ‘w’ character but sends a backspace to position the cursor again at the first char and overwrites the ‘p’ again with the same char before adding the ‘w’. The visual result is obviously the same. In step 4, the \x0D sent by the the client stands for the <enter> key. The \x0D in the server reply instructs the client to bring the cursor back to the start of the line. (And resending this does not cause any difference). The \x0A starts a line and the response is delivered before again moving to the next line.

Next, the server will again send its complex clear-the-line logic followed by a new input prompt as shown above.

Filtering Challenges for Security Gateways

For a security intermediate like our SSH Gateway, remote shell filtering is significantly harder than filtering single commands. It requires an emulator to mimic the functionality of both the client and the server. But even then, data flows asynchronously; the user may already send data without waiting for visual feedback, anticipating the next server response so that the gateway gets to see the <enter> key from the client before it had any chance to understand the context that has been built on the server.

Let’s look at an example and assume that there is a program named pwdump to print out all local passwords and such a program shall be denied by the gateway’s security policy. We will now continue the session started above with the four steps that sent the pwd command.

This time, the user has the key strokes buffered and sends them with a single data message. The content of that single SSH_MSG_CHANNEL_DATA message is <up-arrow> u <tab> <enter>. This is all the context that the gateway will have at the time that the command is about to execute. By forwarding the data to the server, this is how the context evolves on the server side:

Seq#	Processed client data	Server context
5	<up-arrow>	pwd (read from history)
6	u	pwdu
7	<tab>	pwdump (tab expansion)
8	<enter>	Execute the forbidden command

Especially, if the first pwd command was transmitted on an earlier connection, the gateway may have no chance to emulate the server’s history. It neither knows what the server will fetch from history and it cannot anticipate how the tab expansion works. Before the data feedback arrives from the server, the gateway has no knowledge about the current server context. Hence, an <enter>-command sent by the client before the feedback of the server arrives, is rather dangerous from a security perspective.

A simple wait-and-see strategy after receiving an <enter>-key may not be a wise idea either. In best case, it would only introduce latency and the remote shell may feel sluggish. But it’s also possible that the gateway will introduce a deadlock if it holds back data that is needed to continue on the server side.

In summary, for a security gateway, remote shell filtering is significantly harder than filtering single commands:

Protocol-level ambiguity:
There's no structured indication of what constitutes a command.
Context emulation challenges:
Data flows asynchronously. Input for future commands may already arrive at the gateway before it receives proper server feedback.
History, tab-expansion, aliasing:
The gateway can’t know what's stored in the server's shell history, tab-expansion system, or command aliases unless it simulates the entire terminal session.
Prompt vs. command output confusion:
Prompts are custom, dynamic, and unstructured. It is even possible to place parts of the prompt at the end of the line making it very hard to understand where a command actually starts.

There are several corner cases in which the emulator context might get out of sync with the server’s context.

Bracketed Paste Mode: A Useful Signal to synchronize

A feature of modern SSH shells - bracketed paste mode - can help with the synchronization. This feature has been introduced to allow clients to paste larger amount of data (even multiple lines) and the whole block can be quoted with a new pair of CSI control sequences.

While this also introduces a new challenge (security gateways have to process a vector of commands when a single <enter> key is sent), this is also an opportunity to distinguish between a prompt, next command input and command output.

A server supporting bracketed paste mode, sends its own CSI control sequence to inform the client about the feature availability and does that right at the end of the prompt it sends. And as soon as it starts to accept and execute a new command, it will disable the bracketed paste mode again until it will be re-enabled after showing the next prompt line.

The gateway parsing this data, can use those control sequences as anchor points helping to get some structure into the byte flow and keep its emulated context aligned with the server context - at least for modern shells on the server side.

Summary: It’s Not Impossible, but it’s Tricky

A gateway filtering SSH shell commands must walk a tightrope:

Too lax: it risks letting forbidden commands through.
Too strict: it introduces latency or breaks legitimate workflows.

With smart terminal emulation, awareness of shell features like bracketed paste, clever heuristics and a few more innovative ideas, it's possible to filter shell traffic with high fidelity – though never with complete certainty.

The articles in this series

The SSH Core Protocol and the SSH Remote Command
SSH Remote Shell from a Gateway Perspective [this article]
SCP from a Gateway Perspective
SFTP from a Gateway Perspective
Git via SSH from a Gateway Perspective
TCP Tunnels via SSH from a Gateway Perspective

For questions and discussion please comment on our LinkedIn post

The SSH protocol andSSH Applications - Part 2