No longer copy voxels before scheduling threaded tasks. VoxelBuffers now have a lock.

2020-08-29 22:09:54 +01:00 · 2020-08-29 22:09:54 +01:00 · d4d2b6fd9e
parent 92b10d9175
commit d4d2b6fd9e
14 changed files with 320 additions and 96 deletions
--- a/cube_tables.cpp
+++ b/cube_tables.cpp
@ -165,4 +165,38 @@ const Vector3i g_moore_neighboring_3d[MOORE_NEIGHBORING_3D_COUNT] = {
 	Vector3i(1, 1, 1),
 };

+// Order is IMPORTANT:
+// This is used in multithread context, in which we may iterate blocks in XYZ order, to avoid deadlocks.
+const Vector3i g_ordered_moore_area_3d[MOORE_AREA_3D_COUNT] = {
+	Vector3i(-1, -1, -1),
+	Vector3i(0, -1, -1),
+	Vector3i(1, -1, -1),
+	Vector3i(-1, 0, -1),
+	Vector3i(0, 0, -1),
+	Vector3i(1, 0, -1),
+	Vector3i(-1, 1, -1),
+	Vector3i(0, 1, -1),
+	Vector3i(1, 1, -1),
+
+	Vector3i(-1, -1, 0),
+	Vector3i(0, -1, 0),
+	Vector3i(1, -1, 0),
+	Vector3i(-1, 0, 0),
+	Vector3i(0, 0, 0),
+	Vector3i(1, 0, 0),
+	Vector3i(-1, 1, 0),
+	Vector3i(0, 1, 0),
+	Vector3i(1, 1, 0),
+
+	Vector3i(-1, -1, 1),
+	Vector3i(0, -1, 1),
+	Vector3i(1, -1, 1),
+	Vector3i(-1, 0, 1),
+	Vector3i(0, 0, 1),
+	Vector3i(1, 0, 1),
+	Vector3i(-1, 1, 1),
+	Vector3i(0, 1, 1),
+	Vector3i(1, 1, 1)
+};
+
 } // namespace Cube
--- a/cube_tables.h
+++ b/cube_tables.h
@ -81,6 +81,10 @@ extern const unsigned int g_edge_corners[EDGE_COUNT][2];
 const unsigned int MOORE_NEIGHBORING_3D_COUNT = 26;
 extern const Vector3i g_moore_neighboring_3d[MOORE_NEIGHBORING_3D_COUNT];

+const unsigned int MOORE_AREA_3D_COUNT = 27;
+const unsigned int MOORE_AREA_3D_CENTRAL_INDEX = 13;
+extern const Vector3i g_ordered_moore_area_3d[MOORE_AREA_3D_COUNT];
+
 } // namespace Cube

 #endif // CUBE_TABLES_H
--- a/doc/access_to_voxels_and_multithreading.md
+++ b/doc/access_to_voxels_and_multithreading.md
@ -0,0 +1,41 @@
+Access to voxels and multithreading
+======================================
+
+This section explains in more detail how multithreading is implemented with voxel storage, and what are the implications when you access and modify them.
+
+
+The problem
+-------------
+
+Up to version `godot3.2.3` of the module, reading and writing to voxels did not care about multithreading. It was possible to access them without locking, because all the threaded operations using them (saving and meshing) were given copies of the voxels, made on the main thread.
+
+This made things simple, however it causes several issues.
+- If threads are unable to consume tasks faster than they are issued, copies of voxel data will keep accumulating rapidly and make the game run out of memory.
+- Copying blocks and their neighbors takes time and is potentially wasteful because it's not guaranteed to be used.
+- It assumes the threaded task will only need to access a specific block at a fixed LOD, which is not always the case in other voxel engines (such as UE4 Voxel Plugin by Phyronnaz). For example, Transvoxel running on a big block may attempt to access higher-resolution blocks to better approximate the isosurface, which is not possible with the current approach.
+
+
+Internal changes
+-----------------
+
+The old design starts to change in the next version. Now, copies aren't made preemptively on the main thread anymore, and are done in the actual threaded task instead. This means accessing voxels now require to lock the data during each transaction, to make sure each thread gets consistent data.
+Locking is required **if you access voxels which are part of a multithreaded volume**, like a terrain present in the scene tree. You don't need to if you know the data is not used by any other thread, like inside generators, custom streams, known copies or other storage not owned by an active component of the voxel engine.
+
+The locking strategy is implemented on each `VoxelBuffer`, using `RWLock`. Such locks are read-write-locks, also known as shared mutexes. As described earlier, it is optional, so *none of VoxelBuffer methods actually use that lock*, it's up to you. If you only need to read voxels, lock for *read*. If you also need to modify voxels, lock for *write*. Multiple threads can then read the same block, but only one can modify it at once. If a thread wants to modify the block while it is already locked for *read*, the thread will be blocked until all other threads finished reading it. This can cause stutter if done on the main thread, so if it becomes a problem, a possible solution is to lock for *read*, copy the block and then modify it (Copy-on-Write).
+
+At time of writing, there are no threaded tasks needing a write access to voxels.
+It is possible that more changes happen in the future, in particular with nodes supporting LOD.
+
+
+Editing voxels efficiently
+---------------------------
+
+This matters for scripters.
+
+If you use `VoxelTool`, all locking mechanisms are handled for you automatically. However, you must be aware that it doesn't come for free: if you want to access voxels randomly and modify them randomly, you will pretty much get the worst overhead. If you want to access a well-defined region and you know where to read, and where to write, then optimizing becomes possible.
+
+For example, *on a terrain node*, `VoxelTool.get_voxel` or `set_voxel` are the simplest, yet the slowest way to modify voxels. This is not only because of locking, but also because the engine has to go all the way through several data structures to access the voxel. This is perfectly fine for small isolated edits, like the player digging or building piece by piece. 
+
+If you want to excavate whole chunks or generating structures, try to use specialized bulk functions instead, such as `do_sphere()`, `do_box()`, `raycast` or `paste()`. These will be more efficient because they can cache data structures on the way and perform locking in the best way they can.
+
+If your changes depend on a lot of pre-existing voxels, you can use `copy()` to extract a chunk of voxels into a `VoxelBuffer` so you can read them very fast without locking. You can even choose to do your changes on that same buffer, and finally use `paste()` when you're done.
--- a/edition/voxel_tool_terrain.cpp
+++ b/edition/voxel_tool_terrain.cpp
@ -21,6 +21,7 @@ bool VoxelToolTerrain::is_area_editable(const Rect3i &box) const {

 Ref<VoxelRaycastResult> VoxelToolTerrain::raycast(Vector3 pos, Vector3 dir, float max_distance, uint32_t collision_mask) {
 	// TODO Transform input if the terrain is rotated (in the future it can be made a Spatial node)
+	// TODO Implement broad-phase on blocks to minimize locking and increase performance

 	struct RaycastPredicate {
 		const VoxelTerrain &terrain;
@ -107,6 +108,7 @@ void VoxelToolTerrain::set_voxel_metadata(Vector3i pos, Variant meta) {
 	ERR_FAIL_COND(_terrain == nullptr);
 	VoxelBlock *block = _map->get_block(_map->voxel_to_block(pos));
 	ERR_FAIL_COND_MSG(block == nullptr, "Area not editable");
+	RWLockWrite lock(block->voxels->get_lock());
 	block->voxels->set_voxel_metadata(_map->to_local(pos), meta);
 }

@ -114,6 +116,7 @@ Variant VoxelToolTerrain::get_voxel_metadata(Vector3i pos) {
 	ERR_FAIL_COND_V(_terrain == nullptr, Variant());
 	const VoxelBlock *block = _map->get_block(_map->voxel_to_block(pos));
 	ERR_FAIL_COND_V_MSG(block == nullptr, Variant(), "Area not editable");
+	RWLockRead lock(block->voxels->get_lock());
 	return block->voxels->get_voxel_metadata(_map->to_local(pos));
 }

@ -143,42 +146,64 @@ void VoxelToolTerrain::run_blocky_random_tick(AABB voxel_area, int voxel_count,
 	const int bs_mask = _map->get_block_size_mask();
 	const VoxelBuffer::ChannelId channel = VoxelBuffer::CHANNEL_TYPE;

+	struct Pick {
+		uint64_t value;
+		Vector3i rpos;
+	};
+	std::vector<Pick> picks;
+	picks.resize(batch_count);
+
 	// Choose blocks at random
 	for (int bi = 0; bi < block_count; ++bi) {
 		const Vector3i block_pos = min_block_pos + Vector3i(
 														   Math::rand() % block_area_size.x,
 														   Math::rand() % block_area_size.y,
 														   Math::rand() % block_area_size.z);
+
 		const Vector3i block_origin = _map->block_to_voxel(block_pos);

 		const VoxelBlock *block = _map->get_block(block_pos);
 		if (block != nullptr) {
-			if (block->voxels->get_channel_compression(channel) == VoxelBuffer::COMPRESSION_UNIFORM) {
-				const uint64_t v = block->voxels->get_voxel(0, 0, 0, channel);
-				if (lib.has_voxel(v)) {
-					const Voxel &vt = lib.get_voxel_const(v);
-					if (!vt.is_random_tickable()) {
-						// Skip whole block
-						continue;
+			// Doing ONLY reads here.
+			{
+				RWLockRead lock(block->voxels->get_lock());
+
+				if (block->voxels->get_channel_compression(channel) == VoxelBuffer::COMPRESSION_UNIFORM) {
+					const uint64_t v = block->voxels->get_voxel(0, 0, 0, channel);
+					if (lib.has_voxel(v)) {
+						const Voxel &vt = lib.get_voxel_const(v);
+						if (!vt.is_random_tickable()) {
+							// Skip whole block
+							continue;
+						}
 					}
 				}
+
+				// Choose a bunch of voxels at random within the block.
+				// Batching this way improves performance a little by reducing block lookups.
+				for (int vi = 0; vi < batch_count; ++vi) {
+					const Vector3i rpos(
+							Math::rand() & bs_mask,
+							Math::rand() & bs_mask,
+							Math::rand() & bs_mask);
+
+					const uint64_t v = block->voxels->get_voxel(rpos, channel);
+					picks[vi] = Pick{ v, rpos };
+				}
 			}
-			// Choose a bunch of voxels at random within the block.
-			// Batching this way improves performance a little by reducing block lookups.
-			for (int vi = 0; vi < batch_count; ++vi) {
-				const Vector3i rpos(
-						Math::rand() & bs_mask,
-						Math::rand() & bs_mask,
-						Math::rand() & bs_mask);

-				const uint64_t v = block->voxels->get_voxel(rpos, channel);
+			// The following may or may not read AND write voxels randomly due to its exposition to scripts.
+			// However, we don't send the buffer directly, so it will go through an API taking care of locking.
+			// So we don't (and shouldn't) lock anything here.
+			for (size_t i = 0; i < picks.size(); ++i) {
+				const Pick pick = picks[i];

-				if (lib.has_voxel(v)) {
-					const Voxel &vt = lib.get_voxel_const(v);
+				if (lib.has_voxel(pick.value)) {
+					const Voxel &vt = lib.get_voxel_const(pick.value);

 					if (vt.is_random_tickable()) {
-						const Variant vpos = (rpos + block_origin).to_vec3();
-						const Variant vv = v;
+						const Variant vpos = (pick.rpos + block_origin).to_vec3();
+						const Variant vv = pick.value;
 						const Variant *args[2];
 						args[0] = &vpos;
 						args[1] = &vv;
--- a/meshers/blocky/voxel.h
+++ b/meshers/blocky/voxel.h
@ -16,6 +16,10 @@ class Voxel : public Resource {
 	GDCLASS(Voxel, Resource)

 public:
+	// Convention to mean "nothing".
+	// Don't assign a non-empty model at this index.
+	static const uint16_t AIR_ID = 0;
+
 	// Plain data strictly used by the mesher.
 	// It becomes distinct because it's going to be used in a multithread environment,
 	// while the configuration that produced the data can be changed by the user at any time.
--- a/register_types.cpp
+++ b/register_types.cpp
@ -82,6 +82,9 @@ void register_voxel_types() {
 	// Reminder: how to create a singleton accessible from scripts:
 	// Engine::get_singleton()->add_singleton(Engine::Singleton("SingletonName",singleton_instance));

+	PRINT_VERBOSE(String("Size of VoxelBuffer: {0}").format(varray((int)sizeof(VoxelBuffer))));
+	PRINT_VERBOSE(String("Size of VoxelBlock: {0}").format(varray((int)sizeof(VoxelBlock))));
+
 #ifdef TOOLS_ENABLED
 	VoxelDebug::create_debug_box_mesh();

@ -91,6 +94,10 @@ void register_voxel_types() {
 }

 void unregister_voxel_types() {
+	// TODO At this point, the GDScript module has nullified GDScriptLanguage::singleton!!
+	// That means it's impossible to free scripts still referenced by VoxelServer. And that can happen, because
+	// users can write custom generators, which run inside threads, and these threads are hosted in the server...
+
 	VoxelStringNames::destroy_singleton();
 	VoxelGraphNodeDB::destroy_singleton();
 	VoxelServer::destroy_singleton();
--- a/server/voxel_server.cpp
+++ b/server/voxel_server.cpp
@ -71,6 +71,9 @@ VoxelServer::VoxelServer() {
 	// Init world
 	// TODO How to make this use memnew and memdelete?
 	_world.viewers_for_priority = gd_make_shared<VoxelViewersArray>();
+
+	PRINT_VERBOSE(String("Size of BlockDataRequest: {0}").format(varray((int)sizeof(BlockDataRequest))));
+	PRINT_VERBOSE(String("Size of BlockMeshRequest: {0}").format(varray((int)sizeof(BlockMeshRequest))));
 }

 VoxelServer::~VoxelServer() {
@ -157,7 +160,7 @@ void VoxelServer::invalidate_volume_mesh_requests(uint32_t volume_id) {
 	volume.meshing_dependency->library = volume.voxel_library;
 }

-void VoxelServer::request_block_mesh(uint32_t volume_id, Ref<VoxelBuffer> voxels, Vector3i block_pos, int lod) {
+void VoxelServer::request_block_mesh(uint32_t volume_id, BlockMeshInput &input) {
 	const Volume &volume = _world.volumes.get(volume_id);
 	ERR_FAIL_COND(volume.stream.is_null());
 	CRASH_COND(volume.stream_dependency == nullptr);
@ -167,25 +170,24 @@ void VoxelServer::request_block_mesh(uint32_t volume_id, Ref<VoxelBuffer> voxels
 	// It was previously done by remembering the request with a hashmap by position.
 	// But later we may want to solve it by not pre-emptively copying voxels, only do it on meshing using RWLock

-	BlockMeshRequest r;
-	r.voxels = voxels;
-	r.volume_id = volume_id;
-	r.position = block_pos;
-	r.lod = lod;
+	BlockMeshRequest *r = memnew(BlockMeshRequest);
+	r->volume_id = volume_id;
+	r->blocks = input.blocks;
+	r->position = input.position;
+	r->lod = input.lod;

-	r.smooth_enabled = volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_SDF);
-	r.blocky_enabled = volume.voxel_library.is_valid() &&
-					   volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_TYPE);
+	r->smooth_enabled = volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_SDF);
+	r->blocky_enabled = volume.voxel_library.is_valid() &&
+						volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_TYPE);

-	r.meshing_dependency = volume.meshing_dependency;
+	r->meshing_dependency = volume.meshing_dependency;

-	const Vector3i voxel_pos = (block_pos << lod) * volume.block_size;
-	r.priority_dependency.world_position = volume.transform.xform(voxel_pos.to_vec3());
-	r.priority_dependency.viewers = _world.viewers_for_priority;
+	const Vector3i voxel_pos = (input.position << input.lod) * volume.block_size;
+	r->priority_dependency.world_position = volume.transform.xform(voxel_pos.to_vec3());
+	r->priority_dependency.viewers = _world.viewers_for_priority;

 	// We'll allocate this quite often. If it becomes a problem, it should be easy to pool.
-	BlockMeshRequest *rp = memnew(BlockMeshRequest(r));
-	_meshing_thread_pool.enqueue(rp);
+	_meshing_thread_pool.enqueue(r);
 }

 void VoxelServer::request_block_load(uint32_t volume_id, Vector3i block_pos, int lod) {
@ -356,13 +358,13 @@ void VoxelServer::process() {
 }

 void VoxelServer::get_min_max_block_padding(
-		uint32_t volume_id, unsigned int &out_min_padding, unsigned int &out_max_padding) const {
+		bool blocky_enabled, bool smooth_enabled, unsigned int &out_min_padding, unsigned int &out_max_padding) const {

-	const Volume &volume = _world.volumes.get(volume_id);
+	// const Volume &volume = _world.volumes.get(volume_id);

-	bool smooth_enabled = volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_SDF);
-	bool blocky_enabled = volume.voxel_library.is_valid() &&
-						  volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_TYPE);
+	// bool smooth_enabled = volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_SDF);
+	// bool blocky_enabled = volume.voxel_library.is_valid() &&
+	// 					  volume.stream->get_used_channels_mask() & (1 << VoxelBuffer::CHANNEL_TYPE);

 	out_min_padding = 0;
 	out_max_padding = 0;
@ -422,10 +424,15 @@ void VoxelServer::BlockDataRequest::run(VoxelTaskContext ctx) {
 			stream->emerge_block(voxels, position * block_size, lod);
 			break;

-		case TYPE_SAVE:
-			stream->immerge_block(voxels, position * block_size, lod);
+		case TYPE_SAVE: {
+			Ref<VoxelBuffer> voxels_copy;
+			{
+				RWLockRead lock(voxels->get_lock());
+				voxels_copy = voxels->duplicate(true);
+			}
 			voxels.unref();
-			break;
+			stream->immerge_block(voxels_copy, position * block_size, lod);
+		} break;

 		default:
 			CRASH_NOW_MSG("Invalid type");
@ -444,10 +451,57 @@ bool VoxelServer::BlockDataRequest::is_cancelled() {

 //----------------------------------------------------------------------------------------------------------------------

+static void copy_block_and_neighbors(const FixedArray<Ref<VoxelBuffer>, Cube::MOORE_AREA_3D_COUNT> &moore_blocks,
+		VoxelBuffer &dst, int min_padding, int max_padding) {
+
+	FixedArray<unsigned int, 2> channels;
+	channels[0] = VoxelBuffer::CHANNEL_TYPE;
+	channels[1] = VoxelBuffer::CHANNEL_SDF;
+
+	Ref<VoxelBuffer> central_buffer = moore_blocks[Cube::MOORE_AREA_3D_CENTRAL_INDEX];
+	const int block_size = central_buffer->get_size().x;
+	const unsigned int padded_block_size = block_size + min_padding + max_padding;
+
+	dst.create(padded_block_size, padded_block_size, padded_block_size);
+
+	for (unsigned int ci = 0; ci < channels.size(); ++ci) {
+		dst.set_channel_depth(ci, central_buffer->get_channel_depth(ci));
+	}
+
+	const Vector3i min_pos = -Vector3i(min_padding);
+	const Vector3i max_pos = Vector3i(block_size + max_padding);
+
+	for (unsigned int i = 0; i < Cube::MOORE_AREA_3D_COUNT; ++i) {
+		const Vector3i offset = block_size * Cube::g_ordered_moore_area_3d[i];
+		Ref<VoxelBuffer> src = moore_blocks[i];
+
+		CRASH_COND(src.is_null());
+
+		const Vector3i src_min = min_pos - offset;
+		const Vector3i src_max = max_pos - offset;
+		const Vector3i dst_min = offset - min_pos;
+
+		{
+			RWLockRead read(src->get_lock());
+			for (unsigned int ci = 0; ci < channels.size(); ++ci) {
+				dst.copy_from(**src, src_min, src_max, dst_min, ci);
+			}
+		}
+	}
+}
+
 void VoxelServer::BlockMeshRequest::run(VoxelTaskContext ctx) {
-	CRASH_COND(voxels.is_null());
 	CRASH_COND(meshing_dependency == nullptr);

+	unsigned int min_padding;
+	unsigned int max_padding;
+	VoxelServer::get_singleton()->get_min_max_block_padding(blocky_enabled, smooth_enabled, min_padding, max_padding);
+
+	// TODO Cache?
+	Ref<VoxelBuffer> voxels;
+	voxels.instance();
+	copy_block_and_neighbors(blocks, **voxels, min_padding, max_padding);
+
 	VoxelMesher::Input input = { **voxels, lod };

 	if (blocky_enabled) {
--- a/server/voxel_server.h
+++ b/server/voxel_server.h
@ -40,6 +40,13 @@ public:
 		bool dropped;
 	};

+	struct BlockMeshInput {
+		// Moore area ordered by forward XYZ iteration
+		FixedArray<Ref<VoxelBuffer>, Cube::MOORE_AREA_3D_COUNT> blocks;
+		Vector3i position;
+		uint8_t lod = 0;
+	};
+
 	struct ReceptionBuffers {
 		std::vector<BlockMeshOutput> mesh_output;
 		std::vector<BlockDataOutput> data_output;
@ -58,7 +65,7 @@ public:
 	void set_volume_stream(uint32_t volume_id, Ref<VoxelStream> stream);
 	void set_volume_voxel_library(uint32_t volume_id, Ref<VoxelLibrary> library);
 	void invalidate_volume_mesh_requests(uint32_t volume_id);
-	void request_block_mesh(uint32_t volume_id, Ref<VoxelBuffer> voxels, Vector3i block_pos, int lod);
+	void request_block_mesh(uint32_t volume_id, BlockMeshInput &input);
 	void request_block_load(uint32_t volume_id, Vector3i block_pos, int lod);
 	void request_block_save(uint32_t volume_id, Ref<VoxelBuffer> voxels, Vector3i block_pos, int lod);
 	void remove_volume(uint32_t volume_id);
@ -70,7 +77,8 @@ public:

 	// Gets by how much voxels must be padded with neighbors in order to be polygonized properly
 	void get_min_max_block_padding(
-			uint32_t volume_id, unsigned int &out_min_padding, unsigned int &out_max_padding) const;
+			bool blocky_enabled, bool smooth_enabled,
+			unsigned int &out_min_padding, unsigned int &out_max_padding) const;

 	void process();

@ -169,8 +177,7 @@ private:
 		int get_priority() override;
 		bool is_cancelled() override;

-		// TODO Reference blocks instead of doing preemptive copy
-		Ref<VoxelBuffer> voxels;
+		FixedArray<Ref<VoxelBuffer>, Cube::MOORE_AREA_3D_COUNT> blocks;
 		Vector3i position;
 		uint32_t volume_id;
 		uint8_t lod;
--- a/terrain/voxel_lod_terrain.cpp
+++ b/terrain/voxel_lod_terrain.cpp
@ -1280,7 +1280,11 @@ void VoxelLodTerrain::flush_pending_lod_edits() {
 			// Update lower LOD
 			// This must always be done after an edit before it gets saved, otherwise LODs won't match and it will look ugly.
 			// TODO Try to narrow to edited region instead of taking whole block
-			src_block->voxels->downscale_to(**dst_block->voxels, Vector3i(), src_block->voxels->get_size(), rel * half_bs);
+			{
+				RWLockWrite lock(src_block->voxels->get_lock());
+				src_block->voxels->downscale_to(
+						**dst_block->voxels, Vector3i(), src_block->voxels->get_size(), rel * half_bs);
+			}
 		}

 		src_lod.blocks_pending_lodding.clear();
@ -1305,10 +1309,18 @@ struct ScheduleSaveAction {
 			block->set_shader_material(Ref<ShaderMaterial>());
 		}

+		// TODO Don't ask for save if the stream doesn't support it!
 		if (block->is_modified()) {
 			//print_line(String("Scheduling save for block {0}").format(varray(block->position.to_vec3())));
 			VoxelDataLoader::InputBlock b;
-			b.data.voxels_to_save = with_copy ? block->voxels->duplicate() : block->voxels;
+
+			if (with_copy) {
+				RWLockRead lock(block->voxels->get_lock());
+				b.data.voxels_to_save = block->voxels->duplicate(true);
+			} else {
+				b.data.voxels_to_save = block->voxels;
+			}
+
 			b.position = block->position;
 			b.can_be_discarded = false;
 			b.lod = block->lod_index;
@ -1570,7 +1582,6 @@ Dictionary VoxelLodTerrain::debug_get_block_info(Vector3 fbpos, int lod_index) c
 	const VoxelBlock *block = lod.map->get_block(bpos);

 	if (block) {
-
 		meshed = !block->has_mesh() && block->get_mesh_state() != VoxelBlock::MESH_UP_TO_DATE;
 		visible = block->is_visible();
 		loading_state = 2;
--- a/terrain/voxel_map.cpp
+++ b/terrain/voxel_map.cpp
@ -50,6 +50,7 @@ int VoxelMap::get_voxel(Vector3i pos, unsigned int c) const {
 	if (block == nullptr) {
 		return _default_voxel[c];
 	}
+	RWLockRead lock(block->voxels->get_lock());
 	return block->voxels->get_voxel(to_local(pos), c);
 }

@ -73,6 +74,8 @@ VoxelBlock *VoxelMap::get_or_create_block_at_voxel_pos(Vector3i pos) {

 void VoxelMap::set_voxel(int value, Vector3i pos, unsigned int c) {
 	VoxelBlock *block = get_or_create_block_at_voxel_pos(pos);
+	// TODO If it turns out to be a problem, use CoW
+	RWLockWrite lock(block->voxels->get_lock());
 	block->voxels->set_voxel(value, to_local(pos), c);
 }

@ -83,12 +86,14 @@ float VoxelMap::get_voxel_f(Vector3i pos, unsigned int c) const {
 		return _default_voxel[c];
 	}
 	Vector3i lpos = to_local(pos);
+	RWLockRead lock(block->voxels->get_lock());
 	return block->voxels->get_voxel_f(lpos.x, lpos.y, lpos.z, c);
 }

 void VoxelMap::set_voxel_f(real_t value, Vector3i pos, unsigned int c) {
 	VoxelBlock *block = get_or_create_block_at_voxel_pos(pos);
 	Vector3i lpos = to_local(pos);
+	RWLockWrite lock(block->voxels->get_lock());
 	block->voxels->set_voxel_f(value, lpos.x, lpos.y, lpos.z, c);
 }

@ -190,14 +195,17 @@ void VoxelMap::get_buffer_copy(Vector3i min_pos, VoxelBuffer &dst_buffer, unsign
 		for (bpos.z = min_block_pos.z; bpos.z < max_block_pos.z; ++bpos.z) {
 			for (bpos.x = min_block_pos.x; bpos.x < max_block_pos.x; ++bpos.x) {
 				for (bpos.y = min_block_pos.y; bpos.y < max_block_pos.y; ++bpos.y) {
-					VoxelBlock *block = get_block(bpos);
+					const VoxelBlock *block = get_block(bpos);

 					if (block) {
-						VoxelBuffer &src_buffer = **block->voxels;
+						const VoxelBuffer &src_buffer = **block->voxels;

 						dst_buffer.set_channel_depth(channel, src_buffer.get_channel_depth(channel));

 						Vector3i offset = block_to_voxel(bpos);
+
+						RWLockRead lock(src_buffer.get_lock());
+
 						// Note: copy_from takes care of clamping the area if it's on an edge
 						dst_buffer.copy_from(src_buffer,
 								min_pos - offset,
--- a/terrain/voxel_map.h
+++ b/terrain/voxel_map.h
@ -7,7 +7,9 @@
 #include <core/hash_map.h>
 #include <scene/main/node.h>

-// Infinite voxel storage by means of octants like Gridmap, within a constant LOD
+// Infinite voxel storage by means of octants like Gridmap, within a constant LOD.
+// Convenience functions to access VoxelBuffers internally will lock them to protect against multithreaded access.
+// However, the map itself is not thread-safe.
 class VoxelMap : public Reference {
 	GDCLASS(VoxelMap, Reference)
 public:
@ -128,7 +130,6 @@ private:
 	// Voxel values that will be returned if access is out of map bounds
 	FixedArray<uint64_t, VoxelBuffer::MAX_CHANNELS> _default_voxel;

-	// TODO Consider using OAHashMap
 	// Blocks stored with a spatial hash in all 3D directions
 	HashMap<Vector3i, VoxelBlock *, Vector3iHasher> _blocks;

--- a/terrain/voxel_terrain.cpp
+++ b/terrain/voxel_terrain.cpp
@ -282,10 +282,16 @@ struct ScheduleSaveAction {
 	bool with_copy;

 	void operator()(VoxelBlock *block) {
+		// TODO Don't ask for save if the stream doesn't support it!
 		if (block->is_modified()) {
 			//print_line(String("Scheduling save for block {0}").format(varray(block->position.to_vec3())));
 			VoxelTerrain::BlockToSave b;
-			b.voxels = with_copy ? block->voxels->duplicate() : block->voxels;
+			if (with_copy) {
+				RWLockRead lock(block->voxels->get_lock());
+				b.voxels = block->voxels->duplicate(true);
+			} else {
+				b.voxels = block->voxels;
+			}
 			b.position = block->position;
 			blocks_to_save.push_back(b);
 			block->set_modified(false);
@ -876,10 +882,6 @@ void VoxelTerrain::_process() {

 	// Send mesh updates
 	{
-		unsigned int min_padding;
-		unsigned int max_padding;
-		VoxelServer::get_singleton()->get_min_max_block_padding(_volume_id, min_padding, max_padding);
-
 		for (int i = 0; i < _blocks_pending_update.size(); ++i) {
 			Vector3i block_pos = _blocks_pending_update[i];

@ -894,12 +896,15 @@ void VoxelTerrain::_process() {
 				} else {
 					CRASH_COND(block->voxels.is_null());

-					const uint64_t air_type = 0;
-					if (
-							block->voxels->is_uniform(VoxelBuffer::CHANNEL_TYPE) &&
-							block->voxels->is_uniform(VoxelBuffer::CHANNEL_SDF) &&
-							block->voxels->get_voxel(0, 0, 0, VoxelBuffer::CHANNEL_TYPE) == air_type) {
+					bool is_empty;
+					{
+						RWLockRead lock(block->voxels->get_lock());
+						is_empty = block->voxels->is_uniform(VoxelBuffer::CHANNEL_TYPE) &&
+								   block->voxels->is_uniform(VoxelBuffer::CHANNEL_SDF) &&
+								   block->voxels->get_voxel(0, 0, 0, VoxelBuffer::CHANNEL_TYPE) == Voxel::AIR_ID;
+					}

+					if (is_empty) {
 						// If we got here, it must have been because of scheduling an update
 						CRASH_COND(block->get_mesh_state() != VoxelBlock::MESH_UPDATE_NOT_SENT);

@ -907,8 +912,9 @@ void VoxelTerrain::_process() {
 						block->set_mesh(Ref<Mesh>(), this, _generate_collisions, Vector<Array>(), get_tree()->is_debugging_collisions_hint());
 						block->set_mesh_state(VoxelBlock::MESH_UP_TO_DATE);

-						// Optional, but I guess it might spare some memory
-						block->voxels->clear_channel(VoxelBuffer::CHANNEL_TYPE, air_type);
+						// Optional, but I guess it might spare some memory.
+						// Not doing it anymore cuz now we need to be more careful about multithreaded access.
+						//block->voxels->clear_channel(VoxelBuffer::CHANNEL_TYPE, air_type);

 						continue;
 					}
@ -921,17 +927,18 @@ void VoxelTerrain::_process() {
 			CRASH_COND(block == nullptr);
 			CRASH_COND(block->get_mesh_state() != VoxelBlock::MESH_UPDATE_NOT_SENT);

-			// Create buffer padded with neighbor voxels
-			Ref<VoxelBuffer> nbuffer;
-			nbuffer.instance();
+			// Get block and its neighbors
+			VoxelServer::BlockMeshInput mi;
+			mi.position = block_pos;
+			mi.lod = 0;
+			for (unsigned int i = 0; i < Cube::MOORE_AREA_3D_COUNT; ++i) {
+				const Vector3i npos = block_pos + Cube::g_ordered_moore_area_3d[i];
+				VoxelBlock *nblock = _map->get_block(npos);
+				CRASH_COND(nblock == nullptr);
+				mi.blocks[i] = nblock->voxels;
+			}

-			const unsigned int block_size = _map->get_block_size();
-			nbuffer->create(Vector3i(block_size + min_padding + max_padding));
-
-			const unsigned int channels_mask = (1 << VoxelBuffer::CHANNEL_TYPE) | (1 << VoxelBuffer::CHANNEL_SDF);
-			_map->get_buffer_copy(_map->block_to_voxel(block_pos) - Vector3i(min_padding), **nbuffer, channels_mask);
-
-			VoxelServer::get_singleton()->request_block_mesh(_volume_id, nbuffer, block_pos, 0);
+			VoxelServer::get_singleton()->request_block_mesh(_volume_id, mi);

 			block->set_mesh_state(VoxelBlock::MESH_UPDATE_SENT);
 		}
@ -1136,11 +1143,9 @@ void VoxelTerrain::_bind_methods() {
 	ADD_PROPERTY(PropertyInfo(Variant::BOOL, "run_stream_in_editor"),
 			"set_run_stream_in_editor", "is_stream_running_in_editor");

+	// TODO Add back access to block, but with an API securing multithreaded access
 	ADD_SIGNAL(MethodInfo(VoxelStringNames::get_singleton()->block_loaded,
-			PropertyInfo(Variant::VECTOR3, "position"),
-			PropertyInfo(Variant::OBJECT, "voxels", PROPERTY_HINT_RESOURCE_TYPE, "VoxelBuffer")));
-
+			PropertyInfo(Variant::VECTOR3, "position")));
 	ADD_SIGNAL(MethodInfo(VoxelStringNames::get_singleton()->block_unloaded,
-			PropertyInfo(Variant::VECTOR3, "position"),
-			PropertyInfo(Variant::OBJECT, "voxels", PROPERTY_HINT_RESOURCE_TYPE, "VoxelBuffer")));
+			PropertyInfo(Variant::VECTOR3, "position")));
 }
--- a/voxel_buffer.cpp
+++ b/voxel_buffer.cpp
@ -113,10 +113,13 @@ const char *VoxelBuffer::CHANNEL_ID_HINT_STRING = "Type,Sdf,Data2,Data3,Data4,Da

 VoxelBuffer::VoxelBuffer() {
 	_channels[CHANNEL_SDF].defval = 255;
+	// TODO How many of these can be created? Make it optional?
+	_rw_lock = RWLock::create();
 }

 VoxelBuffer::~VoxelBuffer() {
 	clear();
+	memdelete(_rw_lock);
 }

 void VoxelBuffer::create(int sx, int sy, int sz) {
@ -262,15 +265,6 @@ void VoxelBuffer::set_voxel_f(real_t value, int x, int y, int z, unsigned int ch
 	set_voxel(real_to_raw_voxel(value, _channels[channel_index].depth), x, y, z, channel_index);
 }

-// This version does not cause errors if out of bounds. Use only if it's okay to be outside.
-void VoxelBuffer::try_set_voxel(int x, int y, int z, int value, unsigned int channel_index) {
-	ERR_FAIL_INDEX(channel_index, MAX_CHANNELS);
-	if (!is_position_valid(x, y, z)) {
-		return;
-	}
-	set_voxel(x, y, z, value, channel_index);
-}
-
 void VoxelBuffer::fill(uint64_t defval, unsigned int channel_index) {
 	ERR_FAIL_INDEX(channel_index, MAX_CHANNELS);

@ -474,14 +468,14 @@ void VoxelBuffer::copy_from(const VoxelBuffer &other, unsigned int channel_index

 	ERR_FAIL_COND(other_channel.depth != channel.depth);

-	if (other_channel.data) {
+	if (other_channel.data != nullptr) {
 		if (channel.data == nullptr) {
 			create_channel_noinit(channel_index, _size);
 		}
 		CRASH_COND(channel.size_in_bytes != other_channel.size_in_bytes);
 		memcpy(channel.data, other_channel.data, channel.size_in_bytes);

-	} else if (channel.data) {
+	} else if (channel.data != nullptr) {
 		delete_channel(channel_index);
 	}

@ -557,10 +551,16 @@ void VoxelBuffer::copy_from(const VoxelBuffer &other, Vector3i src_min, Vector3i
 	}
 }

-Ref<VoxelBuffer> VoxelBuffer::duplicate() const {
+Ref<VoxelBuffer> VoxelBuffer::duplicate(bool include_metadata) const {
 	VoxelBuffer *d = memnew(VoxelBuffer);
 	d->create(_size);
+	for (unsigned int i = 0; i < _channels.size(); ++i) {
+		d->set_channel_depth(i, _channels[i].depth);
+	}
 	d->copy_from(*this);
+	if (include_metadata) {
+		d->copy_voxel_metadata(*this);
+	}
 	return Ref<VoxelBuffer>(d);
 }

@ -821,6 +821,20 @@ void VoxelBuffer::copy_voxel_metadata_in_area(Ref<VoxelBuffer> src_buffer, Rect3
 	}
 }

+void VoxelBuffer::copy_voxel_metadata(const VoxelBuffer &src_buffer) {
+	ERR_FAIL_COND(src_buffer.get_size() != _size);
+
+	const Map<Vector3i, Variant>::Element *elem = src_buffer._voxel_metadata.front();
+
+	while (elem != nullptr) {
+		const Vector3i pos = elem->key();
+		_voxel_metadata[pos] = elem->value().duplicate();
+		elem = elem->next();
+	}
+
+	_block_metadata = src_buffer._block_metadata.duplicate();
+}
+
 Ref<Image> VoxelBuffer::debug_print_sdf_to_image_top_down() {
 	Image *im = memnew(Image);
 	im->create(_size.x, _size.z, false, Image::FORMAT_RGB8);
--- a/voxel_buffer.h
+++ b/voxel_buffer.h
@ -12,6 +12,7 @@
 class VoxelTool;
 class Image;
 class FuncRef;
+class RWLock;

 // Dense voxels data storage.
 // Organized in channels of configurable bit depth.
@ -70,8 +71,6 @@ public:
 	uint64_t get_voxel(int x, int y, int z, unsigned int channel_index = 0) const;
 	void set_voxel(uint64_t value, int x, int y, int z, unsigned int channel_index = 0);

-	void try_set_voxel(int x, int y, int z, int value, unsigned int channel_index = 0);
-
 	real_t get_voxel_f(int x, int y, int z, unsigned int channel_index = 0) const;
 	void set_voxel_f(real_t value, int x, int y, int z, unsigned int channel_index = 0);

@ -90,11 +89,13 @@ public:

 	static uint32_t get_size_in_bytes_for_volume(Vector3i size, Depth depth);

+	// Note: these functions don't include metadata on purpose.
+	// If you also want to copy metadata, use the specialized functions.
 	void copy_from(const VoxelBuffer &other);
 	void copy_from(const VoxelBuffer &other, unsigned int channel_index);
 	void copy_from(const VoxelBuffer &other, Vector3i src_min, Vector3i src_max, Vector3i dst_min, unsigned int channel_index);

-	Ref<VoxelBuffer> duplicate() const;
+	Ref<VoxelBuffer> duplicate(bool include_metadata) const;

 	_FORCE_INLINE_ bool is_position_valid(unsigned int x, unsigned int y, unsigned int z) const {
 		return x < (unsigned)_size.x && y < (unsigned)_size.y && z < (unsigned)_size.z;
@ -143,9 +144,15 @@ public:
 	void clear_voxel_metadata();
 	void clear_voxel_metadata_in_area(Rect3i box);
 	void copy_voxel_metadata_in_area(Ref<VoxelBuffer> src_buffer, Rect3i src_box, Vector3i dst_pos);
+	void copy_voxel_metadata(const VoxelBuffer &src_buffer);

 	const Map<Vector3i, Variant> &get_voxel_metadata() const { return _voxel_metadata; }

+	// Internal synchronization.
+	// This lock is optional, and used internally at the moment, only in multithreaded areas.
+	inline const RWLock *get_lock() const { return _rw_lock; }
+	inline RWLock *get_lock() { return _rw_lock; }
+
 	// TODO Make this work, would be awesome for perf
 	//
 	//	template <typename F>
@ -223,6 +230,8 @@ private:

 	Variant _block_metadata;
 	Map<Vector3i, Variant> _voxel_metadata;
+
+	RWLock *_rw_lock;
 };

 VARIANT_ENUM_CAST(VoxelBuffer::ChannelId)