Skip to content

Commit

Permalink
Revert naming change.
Browse files Browse the repository at this point in the history
  • Loading branch information
GeorgiosSmyrnis committed May 22, 2024
1 parent 490c842 commit a4758cb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion open_lm/datapreprocess/ray/tokenize_shuffle.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ def _flush_buffer(self, folder, counter):
tokens = [int(x) for x in self.buffer[i]["tokens"]]
token_count += len(tokens)
json_string = json.dumps(tokens)
uid = f"{tar_index_str}_{i:0{digits}}"
uid = hashlib.md5(json_string.encode()).hexdigest()
sample = {"__key__": uid, "json.gz": json_string}
sink.write(sample)
bio.seek(0)
Expand Down

0 comments on commit a4758cb

Please sign in to comment.