Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mschaara/dfs dcache merge #15114

Merged
merged 26 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
1b5943c
DAOS-16484 test: Exclude local host in default interface selection (#…
phender Aug 30, 2024
9662e98
DAOS-15800 client: create cart context on specific interface (#14804)
mchaarawi Aug 30, 2024
d2f062a
DAOS-16445 client: Add function to cycle OIDs non-sequentially (#14999)
jolivier23 Aug 30, 2024
407199f
DAOS-16251 dtx: Fix dtx_req_send user-after-free (#15035)
liw Sep 2, 2024
e6be2a6
DAOS-16304 tools: Add daos health net-test command (#14980)
mjmac Sep 2, 2024
46e6383
DAOS-16272 dfs: fix get_info returning incorrect oclass (#15048)
mchaarawi Sep 2, 2024
435e332
DAOS-15863 container: fix a race for container cache (#15038)
gnailzenh Sep 4, 2024
48487d1
DAOS-16471 test: Reduce targets for ioctl_pool_handles.py (#15063)
phender Sep 4, 2024
5cf4654
DAOS-16483 vos: handle empty DTX when vos_tx_end (#15053)
Nasf-Fan Sep 5, 2024
d778a95
DAOS-16271 mercury: Add patch to avoid seg fault in key resolve. (#15…
jgmoore-or Sep 5, 2024
e77265f
DAOS-16484 test: Support mixed speeds when selecting a default interf…
phender Sep 5, 2024
91de313
DAOS-16446 test: HDF5-VOL test - Set object class and container prope…
shimizukko Sep 5, 2024
c57eced
DAOS-16447 test: set D_IL_REPORT per test (#15012)
daltonbohning Sep 5, 2024
369e4f1
DAOS-16450 test: auto run dfs tests when dfs is modified (#15017)
daltonbohning Sep 5, 2024
a05d25d
DAOS-16510 cq: update pylint to 3.2.7 (#15072)
daltonbohning Sep 5, 2024
733fda6
DAOS-16509 test: replace IorTestBase.execute_cmd with run_remote (#15…
daltonbohning Sep 5, 2024
0e52fa5
DAOS-16458 object: fix invalid DRAM access in obj_bulk_transfer (#15026)
Nasf-Fan Sep 6, 2024
1353284
DAOS-16486 object: return proper error on stale pool map (#15064)
NiuYawei Sep 6, 2024
bb1b7c8
DAOS-16514 vos: fix coverity issue (#15083)
NiuYawei Sep 6, 2024
6a59b26
DAOS-16467 rebuild: add DAOS_POOL_RF ENV for massive failure case (#1…
liuxuezhao Sep 6, 2024
1101699
DAOS-16508 csum: retry a few times on checksum mismatch on update (#1…
johannlombardi Sep 9, 2024
b95ef01
DAOS-10877 vos: gang allocation for huge SV (#14790)
NiuYawei Sep 9, 2024
8e20e80
DAOS-16304 tools: Adjust default RPC size for net-test (#15091)
mjmac Sep 9, 2024
d3c8cb5
SRE-2408 ci: Increase timeout (to 15 minutes) for system restore (#14…
grom72 Sep 9, 2024
226e283
DAOS-16251 object: Fix obj_ec_singv_split overflow (#15045)
liw Sep 10, 2024
bf7af85
Merge branch 'master' into mschaara/dfs_dcache_merge
mchaarawi Sep 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/gha_functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ provision_cluster() {
while [ $((SECONDS-START)) -lt $wait_seconds ]; do
if clush -B -S -l root -w "$nodestring" '[ -d /var/chef/reports ]'; then
# shellcheck disable=SC2016
clush -B -S -l root -w "$nodestring" --connect_timeout 30 --command_timeout 600 "if [ -e /root/job_info ]; then
clush -B -S -l root -w "$nodestring" --connect_timeout 30 --command_timeout 900 "if [ -e /root/job_info ]; then
cat /root/job_info
fi
echo \"Last provisioning run info:
Expand Down
1 change: 1 addition & 0 deletions docs/admin/env_variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ Environment variables in this section only apply to the server side.
|DAOS\_DTX\_RPC\_HELPER\_THD|DTX RPC helper threshold. The valid range is [18, unlimited). The default value is 513.|
|DAOS\_DTX\_BATCHED\_ULT\_MAX|The max count of DTX batched commit ULTs. The valid range is [0, unlimited). 0 means to commit DTX synchronously. The default value is 32.|
|DAOS\_FORWARD\_NEIGHBOR|Set to enable I/O forwarding on neighbor xstream in the absence of helper threads.|
|DAOS\_POOL\_RF|Redundancy factor for the pool. The valid range is [1, 4]. The default value is 2.|

## Server and Client environment variables

Expand Down
24 changes: 24 additions & 0 deletions docs/admin/pool_operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -916,6 +916,30 @@ and possibly repair a pmemobj file. As discussed in the previous section, the
rebuild status can be consulted via the pool query and will be expanded
with more information.

## Pool Redundancy Factor

If the DAOS system experiences cascading failures, where the number of failed
fault domains exceeds a pool's redundancy factor, there could be unrecoverable
errors and applications could suffer from data loss. This can happen in cases
of power or network outages and would cause node/engine failures. In most cases
those failures can be recovered and DAOS engines can be restarted and the system
can function again.

Administrator can set the default pool redundancy factor by environment variable
"DAOS_POOL_RF" in the server yaml file. If SWIM detects and reports an engine is
dead and the number of failed fault domain exceeds or is going to exceed the pool
redundancy factor, it will not change pool map immediately. Instead, it will give
critical log message:
intolerable unavailability: engine rank x
In this case, the system administrator should check and try to recover those
failed engines and bring them back with:
dmg system start --ranks=x
one by one. A reintegrate call is not needed.

For true unrecoverable failures, the administrator can still exclude engines.
However, data loss is expected as the number of unrecoverable failures exceeds
the pool redundancy factor.

## Recovering Container Ownership

Typically users are expected to manage their containers. However, in the event
Expand Down
6 changes: 1 addition & 5 deletions src/bio/bio_buffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,6 @@ dma_alloc_chunk(unsigned int cnt)

D_ASSERT(bytes > 0);

if (DAOS_FAIL_CHECK(DAOS_NVME_ALLOCBUF_ERR)) {
D_ERROR("Injected DMA buffer allocation error.\n");
return NULL;
}

D_ALLOC_PTR(chunk);
if (chunk == NULL) {
return NULL;
Expand Down Expand Up @@ -848,6 +843,7 @@ dma_map_one(struct bio_desc *biod, struct bio_iov *biov, void *arg)
bio_iov_set_raw_buf(biov, NULL);
return 0;
}
D_ASSERT(!BIO_ADDR_IS_GANG(&biov->bi_addr));

if (direct_scm_access(biod, biov)) {
struct umem_instance *umem = biod->bd_umem;
Expand Down
3 changes: 2 additions & 1 deletion src/bio/bio_bulk.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* (C) Copyright 2021-2022 Intel Corporation.
* (C) Copyright 2021-2024 Intel Corporation.
*
* SPDX-License-Identifier: BSD-2-Clause-Patent
*/
Expand Down Expand Up @@ -640,6 +640,7 @@ bulk_map_one(struct bio_desc *biod, struct bio_iov *biov, void *data)
goto done;
}
D_ASSERT(!BIO_ADDR_IS_DEDUP(&biov->bi_addr));
D_ASSERT(!BIO_ADDR_IS_GANG(&biov->bi_addr));

hdl = bulk_get_hdl(biod, biov, roundup_pgs(pg_cnt), pg_off, arg);
if (hdl == NULL) {
Expand Down
3 changes: 1 addition & 2 deletions src/bio/bio_xstream.c
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@
/* SPDK blob parameters */
#define DAOS_BS_CLUSTER_SZ (1ULL << 25) /* 32MB */
/* DMA buffer parameters */
#define DAOS_DMA_CHUNK_MB 8 /* 8MB DMA chunks */
#define DAOS_DMA_CHUNK_CNT_INIT 24 /* Per-xstream init chunks, 192MB */
#define DAOS_DMA_CHUNK_CNT_MAX 128 /* Per-xstream max chunks, 1GB */
#define DAOS_DMA_CHUNK_CNT_MIN 32 /* Per-xstream min chunks, 256MB */
Expand Down Expand Up @@ -207,7 +206,7 @@ bio_nvme_init(const char *nvme_conf, int numa_node, unsigned int mem_size,
{
char *env;
int rc, fd;
unsigned int size_mb = DAOS_DMA_CHUNK_MB;
unsigned int size_mb = BIO_DMA_CHUNK_MB;

if (tgt_nr <= 0) {
D_ERROR("tgt_nr: %u should be > 0\n", tgt_nr);
Expand Down
2 changes: 1 addition & 1 deletion src/cart/README.env
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ This file lists the environment variables used in CaRT.

. CRT_CTX_NUM
If set, specifies the limit of number of allowed CaRT contexts to be created.
Valid range is [1, 64], with default being 64 if unset.
Valid range is [1, 128], with default being 128 if unset.

. D_FI_CONFIG
Specifies the fault injection configuration file. If this variable is not set
Expand Down
2 changes: 1 addition & 1 deletion src/cart/crt_internal_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
#define CRT_CONTEXT_NULL (NULL)

#ifndef CRT_SRV_CONTEXT_NUM
#define CRT_SRV_CONTEXT_NUM (64) /* Maximum number of contexts */
#define CRT_SRV_CONTEXT_NUM (128) /* Maximum number of contexts */
#endif


Expand Down
4 changes: 2 additions & 2 deletions src/chk/chk_engine.c
Original file line number Diff line number Diff line change
Expand Up @@ -668,7 +668,7 @@ chk_engine_pool_mbs_one(struct chk_pool_rec *cpr, struct pool_map *map, struct c
int rc = 0;
bool unknown;

dom = pool_map_find_node_by_rank(map, mbs->cpm_rank);
dom = pool_map_find_dom_by_rank(map, mbs->cpm_rank);
if (dom == NULL) {
D_ASSERT(mbs->cpm_rank != dss_self_rank());

Expand Down Expand Up @@ -777,7 +777,7 @@ chk_engine_find_dangling_pm(struct chk_pool_rec *cpr, struct pool_map *map)
int j;
bool down;

rank_nr = pool_map_find_nodes(map, PO_COMP_ID_ALL, &doms);
rank_nr = pool_map_find_ranks(map, PO_COMP_ID_ALL, &doms);
if (rank_nr <= 0)
D_GOTO(out, rc = rank_nr);

Expand Down
2 changes: 1 addition & 1 deletion src/client/api/SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def scons():

if prereqs.client_requested():
libdaos = env.d_library('daos', libdaos_tgts, SHLIBVERSION=API_VERSION,
LIBS=['daos_common'])
LIBS=['daos_common', 'numa'])
if hasattr(env, 'InstallVersionedLib'):
env.InstallVersionedLib('$PREFIX/lib64/', libdaos, SHLIBVERSION=API_VERSION)
else:
Expand Down
39 changes: 34 additions & 5 deletions src/client/api/event.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,24 @@ daos_eq_lib_init(crt_init_options_t *crt_info)
D_GOTO(unlock, rc);
}

/* use a global shared context for all eq for now */
rc = crt_context_create(&daos_eq_ctx);
if (d_dynamic_ctx_g) {
char iface[DAOS_SYS_INFO_STRING_MAX];

rc = dc_mgmt_get_iface(&iface[0]);
if (rc && rc != -DER_NONEXIST) {
D_ERROR("failed to get iface: " DF_RC "\n", DP_RC(rc));
D_GOTO(crt, rc);
}
/** if no interface returned, use the default */
if (rc == -DER_NONEXIST)
rc = crt_context_create(&daos_eq_ctx);
else
rc = crt_context_create_on_iface(iface, &daos_eq_ctx);
} else {
rc = crt_context_create(&daos_eq_ctx);
}
if (rc != 0) {
D_ERROR("failed to create client context: "DF_RC"\n",
DP_RC(rc));
D_ERROR("failed to create client context: " DF_RC "\n", DP_RC(rc));
D_GOTO(crt, rc);
}

Expand Down Expand Up @@ -656,7 +669,23 @@ daos_eq_create(daos_handle_t *eqh)

eqx = daos_eq2eqx(eq);

rc = crt_context_create(&eqx->eqx_ctx);
if (d_dynamic_ctx_g) {
char iface[DAOS_SYS_INFO_STRING_MAX];

rc = dc_mgmt_get_iface(&iface[0]);
if (rc && rc != -DER_NONEXIST) {
D_ERROR("failed to get iface: " DF_RC "\n", DP_RC(rc));
return rc;
}

/** if no interface returned, use the default */
if (rc == -DER_NONEXIST)
rc = crt_context_create(&eqx->eqx_ctx);
else
rc = crt_context_create_on_iface(iface, &eqx->eqx_ctx);
} else {
rc = crt_context_create(&eqx->eqx_ctx);
}
if (rc) {
D_WARN("Failed to create CART context; using the global one, "DF_RC"\n", DP_RC(rc));
eqx->eqx_ctx = daos_eq_ctx;
Expand Down
15 changes: 9 additions & 6 deletions src/client/dfs/dfs_internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,9 +99,6 @@
/** Max recursion depth for symlinks */
#define DFS_MAX_RECURSION 40

/** MAX value for the HI OID */
#define MAX_OID_HI ((1UL << 32) - 1)

/* Default power2(bits) size of dir-cache */
#define DCACHE_SIZE_BITS 16
/** Size of the hash key prefix */
Expand Down Expand Up @@ -196,6 +193,8 @@ struct dfs {
daos_handle_t coh;
/** refcount on cont handle that through the DFS API */
uint32_t coh_refcount;
/** The last oid.hi in the sequence */
uint32_t last_hi;
/** Transaction handle epoch. DAOS_EPOCH_MAX for DAOS_TX_NONE */
daos_epoch_t th_epoch;
/** Transaction handle */
Expand Down Expand Up @@ -377,20 +376,24 @@ oid_gen(dfs_t *dfs, daos_oclass_id_t oclass, bool file, daos_obj_id_t *oid)

D_MUTEX_LOCK(&dfs->lock);
/** If we ran out of local OIDs, alloc one from the container */
if (dfs->oid.hi >= MAX_OID_HI) {
if (dfs->oid.hi == dfs->last_hi) {
/** Allocate an OID for the namespace */
rc = daos_cont_alloc_oids(dfs->coh, 1, &dfs->oid.lo, NULL);
if (rc) {
D_ERROR("daos_cont_alloc_oids() Failed (%d)\n", rc);
D_MUTEX_UNLOCK(&dfs->lock);
return daos_der2errno(rc);
}
dfs->oid.hi = 0;
/** Start such that dfs->last_hi will be final value */
dfs->oid.hi = dfs->last_hi;
}

/** set oid and lo, bump the current hi value */
oid->lo = dfs->oid.lo;
oid->hi = dfs->oid.hi++;
daos_obj_oid_cycle(&dfs->oid);
if (unlikely(dfs->oid.lo == RESERVED_LO && dfs->oid.hi <= 1))
daos_obj_oid_cycle(&dfs->oid); /* Avoid reserved oids */
oid->hi = dfs->oid.hi;
D_MUTEX_UNLOCK(&dfs->lock);

/** if a regular file, use UINT64 typed dkeys for the array object */
Expand Down
18 changes: 9 additions & 9 deletions src/client/dfs/mnt.c
Original file line number Diff line number Diff line change
Expand Up @@ -685,20 +685,20 @@ dfs_mount(daos_handle_t poh, daos_handle_t coh, int flags, dfs_t **_dfs)

/** if RW, allocate an OID for the namespace */
if (amode == O_RDWR) {
dfs->last_hi = (unsigned int)d_rand();
/** Avoid potential conflict with SB or ROOT */
if (dfs->last_hi <= 1)
dfs->last_hi = 2;

rc = daos_cont_alloc_oids(coh, 1, &dfs->oid.lo, NULL);
if (rc) {
D_ERROR("daos_cont_alloc_oids() Failed, " DF_RC "\n", DP_RC(rc));
D_GOTO(err_root, rc = daos_der2errno(rc));
}

/*
* if this is the first time we allocate on this container,
* account 0 for SB, 1 for root obj.
*/
if (dfs->oid.lo == RESERVED_LO)
dfs->oid.hi = ROOT_HI + 1;
else
dfs->oid.hi = 0;
dfs->oid.hi = dfs->last_hi;
/** Increment so that dfs->last_hi is the last value */
daos_obj_oid_cycle(&dfs->oid);
}

dfs->mounted = DFS_MOUNT;
Expand Down Expand Up @@ -1045,7 +1045,7 @@ dfs_global2local(daos_handle_t poh, daos_handle_t coh, int flags, d_iov_t glob,

/** allocate a new oid on the next file or dir creation */
dfs->oid.lo = 0;
dfs->oid.hi = MAX_OID_HI;
dfs->oid.hi = dfs->last_hi;

rc = D_MUTEX_INIT(&dfs->lock, NULL);
if (rc != 0) {
Expand Down
5 changes: 2 additions & 3 deletions src/client/dfs/obj.c
Original file line number Diff line number Diff line change
Expand Up @@ -86,15 +86,14 @@ dfs_obj_get_info(dfs_t *dfs, dfs_obj_t *obj, dfs_obj_info_t *info)
if (dfs->attr.da_dir_oclass_id)
info->doi_dir_oclass_id = dfs->attr.da_dir_oclass_id;
else
rc = daos_obj_get_oclass(dfs->coh, 0, 0, 0,
rc = daos_obj_get_oclass(dfs->coh, DAOS_OT_MULTI_HASHED, 0, 0,
&info->doi_dir_oclass_id);

if (dfs->attr.da_file_oclass_id)
info->doi_file_oclass_id = dfs->attr.da_file_oclass_id;
else
rc = daos_obj_get_oclass(dfs->coh, 0, 0, 0,
rc = daos_obj_get_oclass(dfs->coh, DAOS_OT_ARRAY_BYTE, 0, 0,
&info->doi_file_oclass_id);

if (rc) {
D_ERROR("daos_obj_get_oclass() failed " DF_RC "\n", DP_RC(rc));
return daos_der2errno(rc);
Expand Down
4 changes: 4 additions & 0 deletions src/common/checksum.c
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,10 @@ daos_csummer_compare_csum_info(struct daos_csummer *obj,
match = daos_csummer_csum_compare(obj, ci_idx2csum(a, i),
ci_idx2csum(b, i),
a->cs_len);
if (unlikely(!match))
D_ERROR("Checksum mismatch at index %d/%d "DF_CI_BUF" != "DF_CI_BUF"\n", i,
a->cs_nr, DP_CI_BUF(ci_idx2csum(a, i), a->cs_len),
DP_CI_BUF(ci_idx2csum(b, i), b->cs_len));
}

return match;
Expand Down
Loading
Loading