Skip to content

PPM format

Meemo4556 edited this page Jul 26, 2024 · 24 revisions

Animations created within Flipnote Studio are stored in the PPM format. The file extension comes from Flipnote Studio's original working title, Para Para Manga Koubou (translated as "Flipbook Workshop").

Header

Offset Type Details
0x0 char[4] File magic (PARA)
0x4 uint32 Animation data size
0x8 uint32 Sound data size
0xC uint16 Frame count
0xE uint16 Format version - always 0x24

The app checks that (version & 0xf0) >> 4 != 0 to decide whether it should read the metadata section. This is always the case however, and it's likely a leftover from development.

Note: The frame count in this header is 0 indexed, so you must add 1 in order to get the actual frame count.

Metadata

The metadata section follows directly after the header, starting at 0x10:

Offset Type Details
0x0 uint16 Lock, 0 if unlocked, 1 if locked
0x2 uint16 Thumbnail frame index
0x4 wchar[11] Root author name
0x1A wchar[11] Parent author name
0x30 wchar[11] Current author name
0x46 byte[8] Parent author ID
0x4E byte[8] Current author ID
0x56 byte[18] Parent filename
0x68 byte[18] Current filename
0x7A byte[8] Root author ID
0x82 byte[8] Root filename fragment
0x8A uint32 Last modified timestamp
0x8E uint16 Padding, always null

Frame count starts at 0, and should be incremented by 1 when displayed.

Author names are null-padded UTF-16 LE strings. Author IDs are also stored in little-endian byte order, so you may need to reverse them.

Parent and current filenames are stored as:

  • 3 bytes representing the last 6 digits of the consoles's MAC address
  • 13-character string
  • uint16 edit counter

The root filename fragment is stored as:

  • 3 bytes representing the last 6 digits of the consoles's MAC address
  • 5 bytes representing the first 10 characters of the 13-character string

Filenames are formatted as <3-byte MAC as hex>_<13-character string>_<edit counter as a zero-padded 3 digit number>, eg F78DA8_14768882B56B8_030. See this page for more information on the ID and filename formats.

Timestamps are stored as the number of seconds since midnight on the 1st of January, 2000.

Thumbnail Images

The Flipnote thumbnail starts at 0xA0 and is 1536 bytes long.

Thumbnail images are 64 x 48 and arranged in a series of 8 x 8 tiles. Pixels are stored as 4-bit palette indices, referencing a hardcoded color palette.

Pseudocode:

# create an image that is 64 pixels wide and 48 pixels high
image = Image(64, 48)
# read thumbnail data
data = file.read_bytes(1536)
data_offset = 0

for tile_y = 0; tile_y < 48; tile_y += 8:
  for tile_x := 0; tile_x < 64; tile_x += 8:
    for line := 0; line < 8; line += 1:
      for pixel := 0; pixel < 8; pixel += 2:
        x = tile_x + pixel
        y = tile_y + line
        image.SetPixel(x, y, data[data_offset] & 0x0F)
        image.SetPixel(x + 1, y, (data[data_offset] >> 4) & 0x0F)
        data_offset += 1

Thumbnail Palette

Index Hex color
0 #FFFFFF
1 #525252
2 #FFFFFF
3 #9C9C9C
4 #FF4844
5 #C8514F
6 #FFADAC
7 #00FF00
8 #4840FF
9 #514FB8
10 #ADABFF
11 #00FF00
12 #B657B7
13 #00FF00
14 #00FF00
15 #00FF00

Animation Header

The animation header starts at 0x06A0.

Type Details
uint16 Size of the frame offset table
uint32 Unknown, always seen as 0
uint16 Flags

Following the animation header is a table of uint32 offsets for each frame. These offsets are relative to the start of the animation data section.

Animation Header Flags

Bitmask Details
flags & 0x1 Unknown
flags & 0x2 Loop Flipnote playback if set
flags & 0x4 Unknown
flags & 0x8 Unknown
flags & 0x10 Hide layer 1 if set
flags & 0x20 Hide layer 2 if set
flags & 0x40 Always set

Animation Data

The animation data begins at 0x06A8 + the size of the frame offset table. Frames are not necessarily stored in playback sequence, and can sometimes share the same offset.

Frames are 256 x 192 pixels and comprise of two image layers plus a "paper" background. The paper is either black or white, and layers can be red, blue, or the inverse of the paper color.

Each layer is a 1-bit monochrome bitmap with some basic compression done on a (horizontal) line-by-line basis to make the file more space-efficient.

Frame Header

Every frame begins with at least a one-byte header:

Data Details
(header >> 7) & 0x1 Frame type
(header >> 5) & 0x3 Frame translate flag
(header >> 3) & 0x3 Pen color for layer 2
(header >> 1) & 0x3 Pen color for layer 1
header & 0x1 Paper color

If the frame type is 0, then frame diffing is used on this frame, and if the frame translate flag is also set to anything besides 0 then the header also contains 2 int8 values which give the x and y position of the previous frame compared to the current one. This is covered in more detail in the frame diffing section.

Paper Colors

Color index Name Hex code
0 black 0x0e0e0e
1 white 0xffffff

Layer Colors

Color index Name Hex code
0 inverse of paper (not used under normal circumstances) -
1 inverse of paper -
2 red #ff2a2a
3 blue #0a39ff

Line Compression Types Data

After the frame header, a series of 2-bit values that represent the encoding method used for every line in each layer. The following pseudocode will unpack these line encoding values -- this should be done for both layers:

# array type should be uint8
line_encoding = Array(192)
line_index = 0

# unpack 48 bytes into 192 2-bit line encoding types
for byte_offset = 0; byte_offset < 48; byte_offset += 1:
  byte = file.readUint8()
  # each line's encoding type is stored as a 2-bit value
  for bit_offset = 0; bit_offset < 8; bit_offset += 2:
    line_encoding[line_index] = (byte >> bit_offset) & 0x03
    line_index += 1

Line Compression

Following the layer encoding values (48 bytes per layer) is the compressed frame data itself. It begins with the top layer followed by the bottom layer.

Layers are compressed horizontally, line-by-line. The layer encoding values indicate the type of compression used for each line.

Once decompressed, a line is represented as a list of 256 pixels. A pixel's value will be 0 if it's transparent, or 1 if it uses the layer's pen color. The layer's pen color can be found from the frame's header.

Type 0

No data is stored for this line, it is empty and can skipped.

Type 1

This line is compressed. Compression works by splitting each line into 8-pixel 'chunks' (32 in total) with bitflags to indicate whether a particular chunk is used or not. The line data begins with 32 bits for the chunk flags, followed by the chunk data. If a chunk flag is 1 then you read one byte from the chunk data, otherwise you can skip ahead 8 pixels and try the next chunk flag.

Pseudocode:

line = Array(256)
pixel = 0

# read chunk flags
# they're easier to work with if read as a single big-endian uint32
chunk_flags = file.read_uint32(bigendian=true)

while chunk_flags & 0xFFFFFFFF:

  # check the highest chunk flag is set 
  if chunk_flags & 0x80000000:
    chunk = file.read_uint8()
    # unpack each bit of the chunk
    for bit = 0; bit < 8; bit += 1:
      line[pixel] = chunk >> bit & 0x1
      pixel += 1

  else:
    # skip -- no data is stored for this chunk
    pixel += 8

  chunk_flags <<= 1

Type 2

The same as type 1, except the pixels in this line are first set to 1 before decoding.

Pseudocode:

line = Array(256)
pixel = 0

for i = 0; i < 256; i += 1:
  line[i] = 1

# ... continue reading the line the same way as line type 1

Type 3

Like line type 1 except every chunk is used, so there's no need for the chunk flags.

Pseudocode:

line = Array(256)
pixel = 0

while pixel < 256:
  chunk = file.read_uint8()
  # unpack each bit of the chunk
  for bit = 0; bit < 8; bit += 1:
    line[pixel] = chunk >> bit & 0x1
    pixel += 1

Frame Diffing

If the frame type flag in the frame header is set to 1, then this frame is only stores the difference since the last frame. To produce a complete image, the current frame has to be merged with the previous frame by XORing each layer with the one form the previous frame.

Pseudocode to do this:

# loop through lines
for y = 0; y < 192; y += 1:
  # skip to next line if this one falls off the top edge of the screen
  if y - translation_y < 0:
    continue
  # stop once the bottom screen edge has been reached
  if y - translation_y >= 192:
    break
  # loop through pixels
  for x = 0; x < 256; x += 1:
    # skip to the next pixel if this one falls off the left edge of the screen
    if x - translation_x < 0:
      continue
    #  stop diffing this line once the right screen edge has been reached
    if x - translation_x >= 256:
      break
    # merge pixels with a binary XOR
    # assumes each layer is a 2d array
    # translation_x and translation_y should be read from the frame header if the translation flag is set,
    # else they default to 0
    layer_1[y][x] ^= prev_layer_1[y - translation_y][x - translation_x]
    layer_2[y][x] ^= prev_layer_2[y - translation_y][x - translation_x]

Sound Effect Flags

The sound effect flags begin at 0x6A0 + the animation data size, and are one byte per frame indicating if each sound effect is played on that frame.

Data Details
flags & 0x1 SE1 played
flags & 0x2 SE2 played
flags & 0x4 SE3 played

Sound Header

The sound header offset can be calculated as 0x6A0 + the animation data size + the number of frames, rounded up to the nearest multiple of 4.

Type Details
uint32 BGM track size
uint32 SE1 track size
uint32 SE2 track size
uint32 SE3 track size
uint8 Frame playback speed
uint8 Frame playback speed when recording bgm
char[14] Null padding

Frame speed values are reversed for whatever reason, you must subtract them from 8 to get the real frame speed.

Playback Speeds

Value Frames per second
1 1 / 2
2 1 / 1
3 2 / 1
4 4 / 1
5 6 / 1
6 12 / 1
7 20 / 1
8 30 / 1

Sound Data

Sound tracks are stored in the order of BGM, SE1, SE2 then SE3. Each track is monochannel IMA ADPCM audio sampled at 8192 Hz with the nibbles reversed. 1 second of audio is about 4096 bytes long.

You can decode raw Flipnote audio using sox:

sox -t ima -N -r 8192 [input.adpcm] [output.wav]

Or encode it:

sox [input.wav] -t ima -N -r 8192 [output.adpcm]

Signature

The last 144 bytes of a PPM is an RSA-1024 SHA-1 signature over the rest of the file, followed by 16 bytes of null padding.