Skip to content

Latest commit

 

History

History
155 lines (96 loc) · 4.61 KB

spark-blockmanager-DiskBlockObjectWriter.adoc

File metadata and controls

155 lines (96 loc) · 4.61 KB

DiskBlockObjectWriter

Whenever DiskBlockObjectWriter is requested to write a key-value pair, it makes sure that the underlying output streams are open.

DiskBlockObjectWriter can be in the following states (that match the state of the underlying output streams):

  1. Initialized

  2. Open

  3. Closed

Table 1. DiskBlockObjectWriter’s Internal Registries and Counters
Name Description

initialized

Internal flag…​FIXME

Used when…​FIXME

hasBeenClosed

Internal flag…​FIXME

Used when…​FIXME

streamOpen

Internal flag…​FIXME

Used when…​FIXME

objOut

FIXME

Used when…​FIXME

mcs

FIXME

Used when…​FIXME

bs

FIXME

Used when…​FIXME

objOut

FIXME

Used when…​FIXME

blockId

FIXME

Used when…​FIXME

Note
DiskBlockObjectWriter is a private[spark] class.

updateBytesWritten Method

Caution
FIXME

initialize Method

Caution
FIXME

Writing Bytes (From Byte Array Starting From Offset) — write Method

write(kvBytes: Array[Byte], offs: Int, len: Int): Unit

write…​FIXME

Caution
FIXME

recordWritten Method

Caution
FIXME

commitAndGet Method

commitAndGet(): FileSegment
Note
commitAndGet is used when…​FIXME

close Method

Caution
FIXME

Creating DiskBlockObjectWriter Instance

DiskBlockObjectWriter takes the following when created:

  1. file

  2. serializerManager — SerializerManager

  3. serializerInstance — SerializerInstance

  4. bufferSize

  5. syncWrites flag

  6. writeMetrics — ShuffleWriteMetrics

  7. blockId — BlockId

DiskBlockObjectWriter initializes the internal registries and counters.

Writing Key-Value Pair — write Method

write(key: Any, value: Any): Unit

Before writing, write opens the stream unless already open.

write then writes the key first followed by writing the value.

In the end, write recordWritten.

Note
write is used when BypassMergeSortShuffleWriter writes records and in ExternalAppendOnlyMap, ExternalSorter and WritablePartitionedPairCollection.

Opening DiskBlockObjectWriter — open Method

open(): DiskBlockObjectWriter

open opens DiskBlockObjectWriter, i.e. initializes and re-sets bs and objOut internal output streams.

Internally, open makes sure that DiskBlockObjectWriter is not closed (i.e. hasBeenClosed flag is disabled). If it was, open throws a IllegalStateException:

Writer already closed. Cannot be reopened.

Unless DiskBlockObjectWriter has already been initialized (i.e. initialized flag is enabled), open initializes it (and turns initialized flag on).

Regardless of whether DiskBlockObjectWriter was already initialized or not, open requests SerializerManager to wrap mcs output stream for encryption and compression (for blockId) and sets it as bs.

Note
open uses SerializerManager that was specified when DiskBlockObjectWriter was created
Note
open uses SerializerInstance that was specified when DiskBlockObjectWriter was created

In the end, open turns streamOpen flag on.

Note
open is used exclusively when DiskBlockObjectWriter writes a key-value pair or bytes from a specified byte array but the stream is not open yet.