torchmd-net开源程序是训练神经网络潜力

发布于:2025-06-21 ⋅ 阅读:(22) ⋅ 点赞:(0)

​一、软件介绍

文末提供程序和源码下载

TorchMD-NET 提供最先进的神经网络电位 (NNP) 和训练它们的机制。如果有多个 NNP,它可提供高效、快速的实现,并且它集成在 GPU 加速的分子动力学代码中,如 ACEMD、OpenMM 和 TorchMD。TorchMD-NET 将其 NNP 公开为 PyTorch 模块。

二、Available architectures 可用的架构

  • Equivariant Transformer (ET)
  • 等变变压器 (ET)
  • Transformer (T) 变压器 (T)
  • Graph Neural Network (GN)
  • 图神经网络 (GN)
  • TensorNet 张量网

三、Installation 安装

TorchMD-Net 可用作 pip 可安装轮,也可用于 conda-forge

TorchMD-Net provides builds for CPU-only, CUDA 11.8 and CUDA 12.4. CPU versions are only provided as reference, as the performance will be extremely limited. Depending on which variant you wish to install, you can install it with one of the following commands:
TorchMD-Net 提供纯 CPU、CUDA 11.8 和 CUDA 12.4 的构建。CPU 版本仅供参考,因为性能将非常有限。根据您要安装的变体,您可以使用以下命令之一进行安装:

# The following will install the CUDA 12.4 version by default
pip install torchmd-net 
# The following will install the CUDA 11.8 version
pip install torchmd-net --extra-index-url https://download.pytorch.org/whl/cu118 --extra-index-url https://us-central1-python.pkg.dev/pypi-packages-455608/cu118/simple
# The following will install the CUDA 12.4 version
pip install torchmd-net --extra-index-url https://download.pytorch.org/whl/cu124 --extra-index-url https://us-central1-python.pkg.dev/pypi-packages-455608/cu124/simple
# The following will install the CPU only version (not recommended)
pip install torchmd-net --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://us-central1-python.pkg.dev/pypi-packages-455608/cpu/simple   

Alternatively it can be installed with conda or mamba with one of the following commands. We recommend using Miniforge instead of anaconda.
或者,可以使用以下命令之一使用 conda 或 mamba 进行安装。我们建议使用 Miniforge 而不是 anaconda。

mamba install torchmd-net cuda-version=11.8
mamba install torchmd-net cuda-version=12.4

Install from source 从源码安装

TorchMD-Net is installed using pip, but you will need to install some dependencies before. Check this documentation page.
TorchMD-Net 是使用 pip 安装的,但您需要先安装一些依赖项。查看此文档页面。

四、Usage 用法

指定训练参数可以通过配置 yaml 文件或直接通过命令行参数完成。可以在 examples/ 中找到某些模型和数据集的架构和训练规范的几个示例。请注意,如果 yaml 文件和命令行中都存在参数,则命令行版本优先。可以通过设置 CUDA_VISIBLE_DEVICES 环境变量来选择 GPU。否则,该参数 --ngpus 可用于选择要训练的 GPU 数量(默认值 -1 使用所有可用的 GPU 或 中指定的 CUDA_VISIBLE_DEVICES GPU)。请记住,nvidia-smi 报告的 GPU ID 可能与 CUDA_VISIBLE_DEVICES 使用的 GPU ID 不同。
For example, to train the Equivariant Transformer on the QM9 dataset with the architectural and training hyperparameters described in the paper, one can run:
例如,要使用论文中描述的架构和训练超参数在 QM9 数据集上训练等变变换器,可以运行:

mkdir output
CUDA_VISIBLE_DEVICES=0 torchmd-train --conf torchmd-net/examples/ET-QM9.yaml --log-dir output/

Run torchmd-train --help to see all available options and their descriptions.
Run (运行 torchmd-train --help ) 可查看所有可用选项及其描述。

Creating a new dataset 创建新数据集

If you want to train on custom data, first have a look at torchmdnet.datasets.Custom, which provides functionalities for loading a NumPy dataset consisting of atom types and coordinates, as well as energies, forces or both as the labels. Alternatively, you can implement a custom class according to the torch-geometric way of implementing a dataset. That is, derive the Dataset or InMemoryDataset class and implement the necessary functions (more info here). The dataset must return torch-geometric Data objects, containing at least the keys z (atom types) and pos (atomic coordinates), as well as y (label), neg_dy (negative derivative of the label w.r.t atom coordinates) or both.
如果您想使用自定义数据进行训练,请首先查看 torchmdnet.datasets.Custom ,它提供了加载由原子类型和坐标以及能量和/或力组成的 NumPy 数据集的功能。或者,您可以根据实现数据集的 torch-geometric 方式实现自定义类。也就是说,派生 Dataset or InMemoryDataset 类并实现必要的函数(更多信息在这里)。数据集必须返回 torch-geometric Data 对象,至少包含键 z (原子类型) 和 pos (原子坐标) 以及 y (标签) neg_dy (标签 w.r.t 原子坐标的负导数) 或两者。

Custom prior models 自定义先前模型

In addition to implementing a custom dataset class, it is also possible to add a custom prior model to the model. This can be done by implementing a new prior model class in torchmdnet.priors and adding the argument --prior-model <PriorModelName>. As an example, have a look at torchmdnet.priors.Atomref.
除了实现自定义数据集类之外,还可以向模型添加自定义先验模型。这可以通过在 中 torchmdnet.priors 实现一个新的先前模型类并添加参数 --prior-model <PriorModelName> 来完成。例如,请查看 torchmdnet.priors.Atomref 。

Multi-Node Training 多节点训练

In order to train models on multiple nodes some environment variables have to be set, which provide all necessary information to PyTorch Lightning. In the following we provide an example bash script to start training on two machines with two GPUs each. The script has to be started once on each node. Once torchmd-train is started on all nodes, a network connection between the nodes will be established using NCCL.
为了在多个节点上训练模型,必须设置一些环境变量,这些变量为 PyTorch Lightning 提供所有必要的信息。在下文中,我们提供了一个示例 bash 脚本,用于在两台每台机器上开始训练,每台机器有两个 GPU。该脚本必须在每个节点上启动一次。在所有节点上启动后 torchmd-train ,将使用 NCCL 在节点之间建立网络连接。

In addition to the environment variables the argument --num-nodes has to be specified with the number of nodes involved during training.
除了环境变量之外, --num-nodes 还必须指定参数以及训练期间涉及的节点数。

export NODE_RANK=0
export MASTER_ADDR=hostname1
export MASTER_PORT=12910

mkdir -p output
CUDA_VISIBLE_DEVICES=0,1 torchmd-train --conf torchmd-net/examples/ET-QM9.yaml.yaml --num-nodes 2 --log-dir output/
  • NODE_RANK : Integer indicating the node index. Must be 0 for the main node and incremented by one for each additional node.
    NODE_RANK :表示节点索引的整数。必须用于 0 主节点,并且每增加一个节点,其增量为 1。
  • MASTER_ADDR : Hostname or IP address of the main node. The same for all involved nodes.
    MASTER_ADDR :主节点的主机名或 IP 地址。所有相关节点都是一样的。
  • MASTER_PORT : A free network port for communication between nodes. PyTorch Lightning suggests port 12910 as a default.
    MASTER_PORT :用于节点之间通信的空闲网络端口。PyTorch Lightning 建议将 port 12910 作为默认值。

Known Limitations 已知限制

  • Due to the way PyTorch Lightning calculates the number of required DDP processes, all nodes must use the same number of GPUs. Otherwise training will not start or crash.
    由于 PyTorch Lightning 计算所需 DDP 进程数量的方式,所有节点都必须使用相同数量的 GPU。否则,训练将无法开始或崩溃。
  • We observe a 50x decrease in performance when mixing nodes with different GPU architectures (tested with RTX 2080 Ti and RTX 3090).
    我们观察到,当混合使用具有不同 GPU 架构的节点时,性能下降了 50 倍(使用 RTX 2080 Ti 和 RTX 3090 测试)。
  • Some CUDA systems might hang during a multi-GPU parallel training. Try export NCCL_P2P_DISABLE=1, which disables direct peer to peer GPU communication.
    某些 CUDA 系统可能会在多 GPU 并行训练期间挂起。Try export NCCL_P2P_DISABLE=1 ,这将禁用直接的对等 GPU 通信。

Cite 引用

If you use TorchMD-NET in your research, please cite the following papers:
如果您在研究中使用 TorchMD-NET,请引用以下论文:

Main reference 主要参考
<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@misc{pelaez2024torchmdnet,
title={TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations}, 
author={Raul P. Pelaez and Guillem Simeon and Raimondas Galvelis and Antonio Mirarchi and Peter Eastman and Stefan Doerr and Philipp Thölke and Thomas E. Markland and Gianni De Fabritiis},
year={2024},
eprint={2402.17660},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
</code></span></span></span></span>
TensorNet 张量网
<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@inproceedings{simeon2023tensornet,
title={TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials},
author={Guillem Simeon and Gianni De Fabritiis},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=BEHlPdBZ2e}
}
</code></span></span></span></span>
Equivariant Transformer
<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@inproceedings{
tholke2021equivariant,
title={Equivariant Transformers for Neural Network based Molecular Potentials},
author={Philipp Th{\"o}lke and Gianni De Fabritiis},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=zNHzqZ9wrRB}
}
</code></span></span></span></span>
Graph Network Graph 网络
<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>@article{Majewski2023,
  title = {Machine learning coarse-grained potentials of protein thermodynamics},
  volume = {14},
  ISSN = {2041-1723},
  url = {http://dx.doi.org/10.1038/s41467-023-41343-1},
  DOI = {10.1038/s41467-023-41343-1},
  number = {1},
  journal = {Nature Communications},
  publisher = {Springer Science and Business Media LLC},
  author = {Majewski,  Maciej and Pérez,  Adrià and Th\"{o}lke,  Philipp and Doerr,  Stefan and Charron,  Nicholas E. and Giorgino,  Toni and Husic,  Brooke E. and Clementi,  Cecilia and Noé,  Frank and De Fabritiis,  Gianni},
  year = {2023},
  month = sep 
}
</code></span></span></span></span>

Developer guide 开发人员指南

Implementing a new architecture
实施新架构

To implement a new architecture, you need to follow these steps:
要实施新架构,您需要执行以下步骤:
1. Create a new class in torchmdnet.models that inherits from torch.nn.Model. Follow TorchMD_ET as a template. This is a minimum implementation of a model:
1. 在 中 torchmdnet.models 创建一个继承自 torch.nn.Model 的新类。以 TorchMD_ET 为模板。这是模型的最低实现:

class MyModule(nn.Module):
  def __init__(self, parameter1, parameter2):
	super(MyModule, self).__init__()
	# Define your model here
	self.layer1 = nn.Linear(10, 10)
	...
	# Initialize your model parameters here
	self.reset_parameters()

    def reset_parameters(self):
      # Initialize your model parameters here
	  nn.init.xavier_uniform_(self.layer1.weight)
	...
	
  def forward(self,
        z: Tensor, # Atomic numbers, shape (n_atoms, 1)
        pos: Tensor, # Atomic positions, shape (n_atoms, 3)
        batch: Tensor, # Batch vector, shape (n_atoms, 1). All atoms in the same molecule have the same value and are contiguous.
        q: Optional[Tensor] = None, # Atomic charges, shape (n_atoms, 1)
        s: Optional[Tensor] = None, # Atomic spins, shape (n_atoms, 1)
    ) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]:
	# Define your forward pass here
	scalar_features = ...
	vector_features = ...
	# Return the scalar and vector features, as well as the atomic numbers, positions and batch vector
	return scalar_features, vector_features, z, pos, batch

2. Add the model to the __all__ list in torchmdnet.models.__init__.py. This will make the tests pick your model up.
2. 将模型添加到 中的 __all__ torchmdnet.models.__init__.py 列表中。这将使测试选取您的模型。
3. Tell models.model.create_model how to initialize your module by adding a new entry, for instance:
3. 通过添加新条目来告诉models.model.create_model如何初始化您的模块,例如:

    elif args["model"] == "mymodule":
       from torchmdnet.models.torchmd_mymodule import MyModule
       is_equivariant = False # Set to True if your model is equivariant
       representation_model = MyModule(
           parameter1=args["parameter1"],
           parameter2=args["parameter2"],
           **shared_args, # Arguments typically shared by all models
       )

4. Add any new parameters required to initialize your module to scripts.train.get_args. For instance:
4. 添加将模块初始化为 scripts.train.get_args 所需的任何新参数。例如:

  parser.add_argument('--parameter1', type=int, default=32, help='Parameter1 required by MyModule')
  ...

5. Add an example configuration file to torchmd-net/examples that uses your model.
5. 添加一个使用您的模型 torchmd-net/examples 的示例配置文件。
6. Make tests use your configuration file by adding a case to tests.utils.load_example_args. For instance:
6. 通过向 tests.utils.load_example_args 添加 case 来使测试使用您的配置文件。例如:

if model_name == "mymodule":
       config_file = join(dirname(dirname(__file__)), "examples", "MyModule-QM9.yaml")

At this point, if your module is missing some feature the tests will let you know, and you can add it. If you add a new feature to the package, please add a test for it.
此时,如果您的模块缺少某些功能,测试会通知您,您可以添加它。如果您向包中添加了新功能,请为其添加测试。

Code style 代码样式

We use black. Please run black on your modified files before committing.
我们使用黑色。请在提交之前运行 black 您修改的文件。

Testing 测试

To run the tests, install the package and run pytest in the root directory of the repository. Tests are a good source of knowledge on how to use the different components of the package.
要运行测试,请安装软件包并在存储库的根目录中运行 pytest 。测试是有关如何使用包的不同组件的良好知识来源。

五、软件下载

夸克网盘分享

本文信息来源于GitHub作者地址:https://github.com/torchmd/torchmd-net


网站公告

今日签到

点亮在社区的每一天
去签到